# <center>Week 4 Assignment</center>

This week you will be using the MovieLens 1 million ratings dataset. By the time you are finished with this assignment, you will have another SQLite database and NoSQL database to use in other classes or for projects.

Broadly, this assignment will follow the FTE's progression:

* Load MovieLens tables into SQLite (good time to find that "multiple insert")
* Create a query to retrieve reviews into a cursor
* Create a dataclass that represents a movie review
* Translate rows of the cursor into a list of MovieReview objects
* Translate the list of MovieReviews into a list of dictionaries
* Load the list of dictionaries into TinyDB (using `insert_multiple()`)

There is one important point that will need to be addressed:

* MovieLens is comprised of 3 tables:
    * Users
    * Movies
    * Reviews

One complete review consists of data from all three tables joined together. We will work through that part together. 

<hr>

## Part 1 - Storing in SQLite

In this part, you are expected to read MovieLens's README file to find information to proceed. 

<div class="alert alert-block alert-info">
<b>Hint::</b> Jupyter notebook and JupyterLabs can open it.
</div>

In [56]:
import dataset

In [57]:
# Fill in between the quotes for your own system
sql_db_path = "C:/Users/eltac/Desktop/Week_4"

In [58]:
# Fill in the connection string between the parentheses
db = dataset.connect("sqlite:///"+sql_db_path+"/Movie_Lens.db")

In [59]:
# Use a for loop just in case someone snuck in a new table on us
if (len(db.tables) > 0):
    for table in db.tables:
        db[table].drop()

In [60]:
# Are these files comma-separated?
separator = '::'

In [61]:
# Get column names from the README
# Replace *'s with column names
users_head = "UserID::Gender::Age::Occupation::Zip_code".split(separator)
movies_head = "MovieID::Title::Genres".split(separator)
ratings_head = "UserID::MovieID::Rating::Timestamp".split(separator)

Before executing the next line, stop and thnk what should be output. Does the actual output match your expectation?

In [62]:
users_head

['UserID', 'Gender', 'Age', 'Occupation', 'Zip_code']

Now it is time to create the database tables. As mentioned, there will be three of them. Interestingly, the `USERS` table and the `MOVIES` table both have unique ID fields alread - we will have to take that into account. The `RATINGS` table, on the other hand, does not have a unique ID column, so we don't have to worry about it. 

The general, simple format to create a table is:

`table_variable = db.create_table("table_name")` . # This is what you use for ratings.

But, in the case where the data already has an ID, we have to tell DataSet about it. The general form is:

`table_variable = db.create_table("table_name", primary_id="ID_column_name", primary_type=db.types.integer)`

So, in the case of the `MOVIES` table, the `MovieID` column is the primary key.

In [63]:
ratings_table = db.create_table("ratings")

In [64]:
users_table = db.create_table(
    "users", primary_id="UserID", primary_type=db.types.integer
)

In [65]:
movies_table = db.create_table(
    "movies", primary_id="MovieID", primary_type=db.types.integer
)

You can, and probably should, put those `create_table()` function calls in `try / except` blocks.

Let's set up variables for the data file names:

In [66]:
users_file = "C:/Users/eltac/Desktop/Week_4/Data/users.dat"
movies_file = "C:/Users/eltac/Desktop/Week_4/Data/movies.dat"
ratings_file = "C:/Users/eltac/Desktop/Week_4/Data/ratings.dat"

OK. Here it is, the moment you've all been waiting for -- we can start stuffing data in the tables we created. 

But, before we do the first, consider these questions and write the answers below:

**Having the ID column in the data caused one difference in our table creation (vs. Week_2).**

1) Do you notice any other differences, and if so, what are they?

2) If there is a difference, why is it different?

3) If there is a difference, how does it affect the data retrieved with a SELECT statement?

<hr>

OK... I kind of lied a little bit. There is one more thing to show you about the insert. Yu might remember that this data set is called **"ml-1m"** which stands for _MovieLens - 1 million rows_. In the grand scheme of modern data storage, 1 million rows isn't a huge number, but it **is** enough to make even a fast laptop like mine choke a bit, so we are going to use a technique that many RDBMS systems call **Bulk Insert.** 

Bulk insert is optimized for inserting large amounts of similarly-structured data. SQLite is relatively fast so let's do a quick comparison, using the user's table. After that, it will be **up to you to populate the other 2 tables,** We will also use that progress bar from the FTE, just for fun.

In [67]:
%%time
with open (users_file) as ufile:
    for line in ufile:
        u_dict = dict(zip(users_head, line.split("::")))
        users_table.insert(u_dict)

Wall time: 36.1 s


In [68]:
# Drop the table before trying to insert again
# You might remember how to do this from Week 2
db['users'].drop()
users_table = db.create_table(
    "users", primary_id="UserID", primary_type=db.types.integer
)
# HINT: you need the table name, and the drop command...

In [69]:
%%time
users_list = []
with open(users_file) as ufile:
    for line in ufile:
        users_list.append(dict(zip(users_head, line.split("::"))))
users_table.insert_many(users_list)

Wall time: 154 ms


<hr>

Now **YOU** can decide how you want to do the other two tables, using `insert()` or `insert_many()`.

Since there are only 2 of them, I will let you do them one by one. _Don't get used to it!_

In [71]:
%%time
movies_list = []
with open(movies_file) as ufile:
    for line in ufile:
        movies_list.append(dict(zip(movies_head, line.split("::"))))
movies_table.insert_many(movies_list)

Wall time: 83.8 ms


In [72]:
%%time
ratings_list = []
with open(ratings_file) as ufile:
    for line in ufile:
        ratings_list.append(dict(zip(ratings_head, line.split("::"))))
ratings_table.insert_many(ratings_list)

Wall time: 22.9 s


<div class="alert alert-success">
  <strong>Success!</strong> At this point you should have a working relational database containing the MovieLens data!.
</div>

<hr>

### SQL Joins

Records are divided into multiple tables due to the process of **data normalization**. We have to **join tables** in our `SELECT` queries to get one full 
movie rating. 

In general, the **left join** or **left inner join** is the most common, although there are several types. The *left* part refers to the actual layout if you were putting the printed tables side by side on your desk. A left join/left inner join means you have a table with foreign keys on the left side and you are trying to match those keys to their primary keys on the right. Let's look at an example:

<center>Movie</center>

| MovieID | Title | Genre |
|---------|-------|-------|
|1 | Toy Story (1995)  | Animation|Children's|Comedy |
|2 | Jumanji (1995) | Adventure|Children's|Fantasy |
|3 | Grumpier Old Men (1995) | Comedy|Romance |
|4 | Waiting to Exhale (1995) | Comedy|Drama |
|5 | Father of the Bride Part II (1995) | Comedy |

<center>Users</center>

| UserID | Gender | Age | Occupation | ZipCode |
|--------|--------|-----|------------|---------|
| 1 | F | 1 | 10 | 48067 |
| 2 | M | 56 | 16 | 70072 |
| 3 | M | 25 | 15 | 55117 |
| 4 | M | 45 | 7 | 02460 |
| 5 | M | 25 | 20 | 55455 |


<center>Ratings</center>

| UserID | MovieID | Rating | Timestamp|
|--------|---------|--------|----------|
| 1 | 1193 | 5 | 978300760|
| 1 | 661: | 3 | 978302109|
| 1 | 914: | 3 | 978301968|
| 1 | 3408 | 4 | 978300275|
| 1 | 2355 | 5 | 978824291|

It should be obvious in this small example that Ratings are linked to both Movie and Users through their ids. So, to get a complete rating record, we need the Movie record where the MovieIDs match and the user where the UserIDs match. In SQL that loobks like this: 

SQL keywords are in caps.

```
SELECT m.title, m.genres, u.Gender, u.Age, u.Occupation, u.ZipCode, r.Rating, r.Timestamp 
FROM movies m 
INNER JOIN ratings r ON m.MovieID = r.MovieID 
INNER JOIN users u ON r.UserID = u.UserID 
ORDER BY m.Title ASC;
```

Normally, when referencing columns from multiple tables, you have to prefix the column name with the table name, but in this case I used a shortcut -- in the FROM part, I gave each table a one-letter alias. 

Also notice the last two lines. These will put all the matching movie titles together and then alphabetize the list. 

Let's try it and see what comes out.

In [73]:
# Put the query in here. NOTE: If you break up the lines, you need 
# a "continuation character" at the end of the line. 

movie_query = "select m.title, m.genres, u.Gender, u.Age, u.Occupation, u.Zip_Code, r.Rating, r.Timestamp \
from movies m \
inner join ratings r ON m.MovieID = r.MovieID \
inner join users u ON r.UserID = u.UserID \
order by m.Title ASC;"

In [121]:
# Add the command to execute a query. 
# Reference: https://dataset.readthedocs.io/en/latest/api.html#dataset.Database.query
query_result = db.query(movie_query)

In [122]:
# Convert that result into a list for ease of use.
movie_results = []

for row in query_result:
    movie_results.append(row)

# Print out first movie to see what is stored in the list
movie_results[:]



[OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '35'),
              ('Occupation', '1'),
              ('Zip_code', '82601\n'),
              ('Rating', '3'),
              ('Timestamp', '975093319\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '50'),
              ('Occupation', '16'),
              ('Zip_code', '44319\n'),
              ('Rating', '5'),
              ('Timestamp', '974919045\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '25'),
              ('Occupation', '3'),
              ('Zip_code', '84770\n'),
              ('Rating', '4'),
              ('Timestamp', '984794096\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              (

In [123]:
movie_results[:]

[OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '35'),
              ('Occupation', '1'),
              ('Zip_code', '82601\n'),
              ('Rating', '3'),
              ('Timestamp', '975093319\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '50'),
              ('Occupation', '16'),
              ('Zip_code', '44319\n'),
              ('Rating', '5'),
              ('Timestamp', '974919045\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              ('Genres', "Children's|Comedy\n"),
              ('Gender', 'F'),
              ('Age', '25'),
              ('Occupation', '3'),
              ('Zip_code', '84770\n'),
              ('Rating', '4'),
              ('Timestamp', '984794096\n')]),
 OrderedDict([('Title', '$1,000,000 Duck (1971)'),
              (

# Part 2 - Storing in TinyDB

Hopefully you remember that TinyDB inserts dictionaries as documents. This means that the data in the `movie_list` variable is in the correct form to insert. 

In [124]:
from tinydb import TinyDB, Query, where

tiny_db = TinyDB("ml_nosql.json")

In [126]:
tiny_db.insert_multiple(movie_results)

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

<div class="alert alert-success">
  <strong>Success!</strong> At this point you should have a working NoSQL database containing the MovieLens data!.
</div>

Now we can actually start using this data. 

SQL has some aggregation functions that can be interesting. For example, to find an average of a numeric column:

`select avg(column) from table where condition;`

<div class="alert alert-info">
  <strong>Note:</strong> At this point, I'm not sure that the Dataset library gains us anything, since we are just passing straight SQL through it. You can continue to use Dataset or switch to the SQLite3 library. I'll stay with Dataset, since it is already loaded. 
</div>

We can modify our join from above to get an average rating from women for the movie "Die Hard" like this:

In [127]:
movie_query = "select m.title, u.Gender, avg(r.Rating)\
from movies m \
inner join ratings r on m.MovieID = r.MovieID \
inner join users u on r.UserID = u.UserID \
where u.Gender = 'F' and m.title = 'Die Hard (1988)';"

In [128]:
query_result = db.query(movie_query)

In [129]:
# A quick little list comprehension to extract the results
f_avg = [row for row in query_result]

f_avg

[OrderedDict([('Title', 'Die Hard (1988)'),
              ('Gender', 'F'),
              ('avg(r.Rating)', 3.9185667752442996)])]

In [130]:
# So, to print it nicely:
print(f"Average female rating for {f_avg[0]['Title']} is {f_avg[0]['avg(r.Rating)']}")

Average female rating for Die Hard (1988) is 3.9185667752442996


That process is slightly more manual in TinyDB. Here, we can use TinyDB's `where()` command along with `matches()` to find movies with the right title, then use a logical and `&` to limit it to women. We can also take advantage of Python's built in `sum()` and `len()` commands to help us out.

It sounds more complicated than it is. Like this:


In [131]:
female_dh_set = tiny_db.search( (where('Title').matches('Die Hard')) & (where('Gender').matches('F')) )

That gives us a list of dictionaries, prove that is true to yourself, if you need to.

The rest is simple (Remember all numbers are stored as strings!):

In [134]:
dh_avg_f = sum(int(r['Rating']) for r in female_dh_set) / len(female_dh_set)

In [136]:
print(f'Average: {dh_avg_f}')

Average: 3.7107438016528924


## Questions:

1. Using the relational database you built, compare M and F average ratings for "Die Hard."
2. Do the same comparison with the NoSQL database.
3. Do the averages match?
4. What is the age range of female reviewers of "Gone With The Wind?" (Hint: in SQL, you can use a column more than once. Hint 2: There may be built in functions that help.)

Question 1: Using the relational database you built, compare M and F average ratings for "Die Hard."

In [192]:
avg_query = "select u.Gender,  m.title, round(avg(r.Rating),2) AS 'avg'\
from movies m \
inner join ratings r on m.MovieID = r.MovieID \
inner join users u on r.UserID = u.UserID \
where m.title = 'Die Hard (1988)' \
GROUP BY u.Gender,  m.title;"

In [193]:
query_result = db.query(avg_query)

In [194]:
# Convert that result into a list for ease of use.
avg_results = []

for row in query_result:
    avg_results.append(row)

# Print out first movie to see what is stored in the list
avg_results[:]

[OrderedDict([('Gender', 'F'), ('Title', 'Die Hard (1988)'), ('avg', 3.92)]),
 OrderedDict([('Gender', 'M'), ('Title', 'Die Hard (1988)'), ('avg', 4.17)])]

In [195]:
print(f"Average female rating for {avg_results[0]['Title']} is {avg_results[0]['avg']}")
print(f"Average male rating for {avg_results[1]['Title']} is {avg_results[1]['avg']}")

Average female rating for Die Hard (1988) is 3.92
Average male rating for Die Hard (1988) is 4.17


Question 2: Do the same comparison with the NoSQL database.

In [172]:
female_dh_set = tiny_db.search( (where('Title').matches('Die Hard')) & (where('Gender').matches('F')) )

In [200]:
dh_avg_f = round((sum(int(r['Rating']) for r in female_dh_set) / len(female_dh_set)),2)

In [201]:
male_dh_set = tiny_db.search( (where('Title').matches('Die Hard')) & (where('Gender').matches('M')) )

In [204]:
dh_avg_m = round((sum(int(r['Rating']) for r in male_dh_set) / len(male_dh_set)),2)

In [205]:
print(f'Average female rating: {dh_avg_f}')
print(f'Average male rating: {dh_avg_m}')

Average female rating: 3.71
Average male rating: 3.83


Question 3: Do the averages match?

In [207]:
print(f"SQlite average female rating for {avg_results[0]['Title']} is {avg_results[0]['avg']}")
print(f'TinyDB average female rating: {dh_avg_f}')
print('')
print(f"SQlite average male rating for {avg_results[1]['Title']} is {avg_results[1]['avg']}")
print(f'TinyDB average male rating: {dh_avg_m}')

SQlite average female rating for Die Hard (1988) is 3.92
TinyDB average female rating: 3.71

SQlite average male rating for Die Hard (1988) is 4.17
TinyDB average male rating: 3.83


when comparing the averages between the two databases no they do not match. I believe that this is caused by the way the average value is being aggreated. SQLlite uses a function that is built into the system and groups the values together by column to combine the string values together into one row and prevent duplicates. While TinyDB is has to have the average calculated for it differently.

Question 4: What is the age range of female reviewers of "Gone With The Wind?" 

In [209]:
avg_age_query = "select m.title, round(avg(u.Age),2) AS 'avg'\
from movies m \
inner join ratings r on m.MovieID = r.MovieID \
inner join users u on r.UserID = u.UserID \
where m.title = 'Gone with the Wind (1939)' AND u.Gender = 'F' \
GROUP BY u.Gender,  m.title;"

In [210]:
query_result = db.query(avg_age_query)

In [211]:
# Convert that result into a list for ease of use.
avg_age_results = []

for row in query_result:
    avg_age_results.append(row)

# Print out first movie to see what is stored in the list
avg_age_results[:]

[OrderedDict([('Title', 'Gone with the Wind (1939)'), ('avg', 32.9)])]

In [213]:
print(f"Average female age for {avg_age_results[0]['Title']} is {avg_age_results[0]['avg']}")


Average female age for Gone with the Wind (1939) is 32.9


# Gilbert Version - PostgreSQL

Below is my version of doing this assignment. I didnt do the entire thing since I did it above but wanted to try and get it to a place where I could complete the assignement with my tools for fun. The tools used are csv and PostgreSQL. The reason why csv is being used is because it works extremely well with PostgreSQL. I turn each of the dat files into a csv. My csv upload section runs a funtion to see what csv files are located in a folder. I then runs another function to generate a create statment from the csv file and values in it, creates the table, and uploads the data. After this it does moves all the csv files to an archive folder incase it is needed again. This is what was done in my practicum classes.

Practicum 1: https://github.com/GBernal720/Establishment-of-Regis-Healthcare-Informatics-Database

Practicum 2: https://github.com/GBernal720/Establishment-of-Regis-Healthcare-Informatics-Database-Continued

In [214]:
#log on information

#Test log on information
User="postgres"
Password="google"
Host="127.0.0.1"
Port="5432"
Database="postgres"

In [234]:
import csv, ast, psycopg2

def run_test(fullpath,name):
    f=open(fullpath, 'r')
    reader = csv.reader(f)
    longest, headers, type_list=[],[],[]

    def dataType(val, current_type):
        try:
            t=ast.literal_eval(val)
        except ValueError:
            return 'varchar'
        except SyntaxError:
            return 'varchar'
        if type(t) in [int,int,float]:
            if (type(t) in [int, int]) and current_type not in ['float','varchar']:
                if (-32768 < t < 32767) and current_type not in ['int','bigint']:
                    return 'smallint'
                elif (-2147483648 < t < 2147483647) and current_type not in ['bigint']:
                    return 'int'
                else:
                    return 'bigint'
            if type(t) is float and current_type not in ['varchar']:
                return 'decimal'
        else:
            return 'varchar'

    for row in reader:
        if len(headers) ==0:
            headers = row
            for col in row:
                longest.append(0)
                type_list.append(' ')
        else:
            for i in range(len(row)):
                if type_list[i] == 'varchar' or row[i] == 'NA':
                    pass
                else:
                    var_type = dataType(row[i],type_list[i])
                    type_list[i] = var_type
                if len(row[i]) > longest[i]:
                    longest[i] = len(row[i])
    f.close()

    statement = 'create table '+name+'('

    for i in range(len(headers)):
        if type_list[i] == 'varchar':
            statement = (statement + '\n{} varchar({}),').format(headers[i].lower(), str(longest[i]))
        else:
            statement = (statement +'\n' + '{} {}' + ',').format(headers[i].lower(), type_list[i])

    statement = statement[:-1]+');'


    #Commented code is was used for testing purposes to drop table and recreate
    
    #connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
    #cursor = connection.cursor()
    #cursor.execute('Drop table '+name)
    #connection.commit()
    #print("Table dropped PostgreSQL ")
    #cursor.close()
    #connection.close()

    connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
    cursor = connection.cursor()
    cursor.execute(statement)
    connection.commit()
    print (statement)
    print("Table created successfully in PostgreSQL ")
    cursor.close()
    connection.close()
    
    connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
    cursor = connection.cursor()
    cursor.execute("SET client_encoding = 'latin1';")
    cursor.execute('copy PUBLIC.'+name+ ' from '+"'"+fullpath+"'"+" with delimiter ',' csv header;")
    connection.commit()
    print("Data uploaded to PostgreSQL ")
    cursor.close()
    connection.close()

In [235]:
data=[]
users_head = "UserID::Gender::Age::Occupation::Zip_code".split(separator)
movies_head = "MovieID::Title::Genres".split(separator)
ratings_head = "UserID::MovieID::Rating::Timestamp".split(separator)
print(users_head)
print(movies_head)
print(ratings_head)

['UserID', 'Gender', 'Age', 'Occupation', 'Zip_code']
['MovieID', 'Title', 'Genres']
['UserID', 'MovieID', 'Rating', 'Timestamp']


In [236]:
with open('C:/Users/eltac/Downloads/ml-1m/ml-1m/movies.dat') as movie:
    for line in movie:
        data.append(dict(zip(movies_head, line.strip("\n").split("::"))))


In [237]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(data)

[   {   'Genres': "Animation|Children's|Comedy",
        'MovieID': '1',
        'Title': 'Toy Story (1995)'},
    {   'Genres': "Adventure|Children's|Fantasy",
        'MovieID': '2',
        'Title': 'Jumanji (1995)'},
    {   'Genres': 'Comedy|Romance',
        'MovieID': '3',
        'Title': 'Grumpier Old Men (1995)'},
    {   'Genres': 'Comedy|Drama',
        'MovieID': '4',
        'Title': 'Waiting to Exhale (1995)'},
    {   'Genres': 'Comedy',
        'MovieID': '5',
        'Title': 'Father of the Bride Part II (1995)'},
    {'Genres': 'Action|Crime|Thriller', 'MovieID': '6', 'Title': 'Heat (1995)'},
    {'Genres': 'Comedy|Romance', 'MovieID': '7', 'Title': 'Sabrina (1995)'},
    {   'Genres': "Adventure|Children's",
        'MovieID': '8',
        'Title': 'Tom and Huck (1995)'},
    {'Genres': 'Action', 'MovieID': '9', 'Title': 'Sudden Death (1995)'},
    {   'Genres': 'Action|Adventure|Thriller',
        'MovieID': '10',
        'Title': 'GoldenEye (1995)'},
    {   'Genr

        'MovieID': '390',
        'Title': 'Faster Pussycat! Kill! Kill! (1965)'},
    {   'Genres': 'Crime|Drama',
        'MovieID': '391',
        'Title': "Jason's Lyric (1994)"},
    {   'Genres': "Adventure|Children's",
        'MovieID': '392',
        'Title': 'Secret Adventures of Tom Thumb, The (1993)'},
    {'Genres': 'Action', 'MovieID': '393', 'Title': 'Street Fighter (1994)'},
    {'Genres': 'Action', 'MovieID': '394', 'Title': 'Coldblooded (1995)'},
    {'Genres': 'Drama', 'MovieID': '395', 'Title': 'Desert Winds (1995)'},
    {'Genres': 'Drama', 'MovieID': '396', 'Title': 'Fall Time (1995)'},
    {'Genres': 'Horror', 'MovieID': '397', 'Title': 'Fear, The (1995)'},
    {   'Genres': 'Documentary',
        'MovieID': '398',
        'Title': 'Frank and Ollie (1995)'},
    {   'Genres': 'Drama',
        'MovieID': '399',
        'Title': 'Girl in the Cadillac (1995)'},
    {'Genres': 'Drama', 'MovieID': '400', 'Title': 'Homage (1995)'},
    {'Genres': 'Action|Thriller', 'Mo

        'MovieID': '845',
        'Title': 'Day the Sun Turned Cold, The (Tianguo niezi) (1994)'},
    {'Genres': 'Drama', 'MovieID': '846', 'Title': 'Flirt (1995)'},
    {   'Genres': 'Comedy|Drama',
        'MovieID': '847',
        'Title': 'Big Squeeze, The (1996)'},
    {   'Genres': 'Drama',
        'MovieID': '848',
        'Title': 'Spitfire Grill, The (1996)'},
    {   'Genres': 'Action|Adventure|Sci-Fi|Thriller',
        'MovieID': '849',
        'Title': 'Escape from L.A. (1996)'},
    {'Genres': 'Crime|Drama', 'MovieID': '850', 'Title': 'Cyclo (1995)'},
    {'Genres': 'Drama', 'MovieID': '851', 'Title': 'Basquiat (1996)'},
    {'Genres': 'Comedy|Romance', 'MovieID': '852', 'Title': 'Tin Cup (1996)'},
    {'Genres': 'Drama', 'MovieID': '853', 'Title': 'Dingo (1992)'},
    {   'Genres': 'Drama',
        'MovieID': '854',
        'Title': 'Ballad of Narayama, The (Narayama Bushiko) (1958)'},
    {   'Genres': 'Drama',
        'MovieID': '855',
        'Title': 'Every Other Wee

        'MovieID': '1252',
        'Title': 'Chinatown (1974)'},
    {   'Genres': 'Drama|Sci-Fi',
        'MovieID': '1253',
        'Title': 'Day the Earth Stood Still, The (1951)'},
    {   'Genres': 'Adventure',
        'MovieID': '1254',
        'Title': 'Treasure of the Sierra Madre, The (1948)'},
    {'Genres': 'Comedy|Horror', 'MovieID': '1255', 'Title': 'Bad Taste (1987)'},
    {'Genres': 'Comedy|War', 'MovieID': '1256', 'Title': 'Duck Soup (1933)'},
    {   'Genres': 'Comedy',
        'MovieID': '1257',
        'Title': 'Better Off Dead... (1985)'},
    {'Genres': 'Horror', 'MovieID': '1258', 'Title': 'Shining, The (1980)'},
    {   'Genres': 'Adventure|Comedy|Drama',
        'MovieID': '1259',
        'Title': 'Stand by Me (1986)'},
    {   'Genres': 'Crime|Film-Noir|Thriller',
        'MovieID': '1260',
        'Title': 'M (1931)'},
    {   'Genres': 'Action|Adventure|Comedy|Horror',
        'MovieID': '1261',
        'Title': 'Evil Dead II (Dead By Dawn) (1987)'},
    {   

    {   'Genres': "Children's|Comedy",
        'MovieID': '1707',
        'Title': 'Home Alone 3 (1997)'},
    {'Genres': 'Drama', 'MovieID': '1708', 'Title': 'Ill Gotten Gains (1997)'},
    {'Genres': 'Thriller', 'MovieID': '1709', 'Title': 'Legal Deceit (1997)'},
    {'Genres': 'Drama', 'MovieID': '1710', 'Title': 'Man of Her Dreams (1996)'},
    {   'Genres': 'Comedy|Crime|Drama|Mystery',
        'MovieID': '1711',
        'Title': 'Midnight in the Garden of Good and Evil (1997)'},
    {   'Genres': "Children's|Comedy",
        'MovieID': '1713',
        'Title': 'Mouse Hunt (1997)'},
    {   'Genres': 'Romance',
        'MovieID': '1714',
        'Title': 'Never Met Picasso (1996)'},
    {'Genres': 'Thriller', 'MovieID': '1715', 'Title': 'Office Killer (1997)'},
    {   'Genres': 'Drama',
        'MovieID': '1716',
        'Title': 'Other Voices, Other Rooms (1997)'},
    {   'Genres': 'Horror|Thriller',
        'MovieID': '1717',
        'Title': 'Scream 2 (1997)'},
    {   'Genre

    {   'Genres': 'Horror',
        'MovieID': '2119',
        'Title': 'Maximum Overdrive (1986)'},
    {   'Genres': 'Drama|Horror',
        'MovieID': '2120',
        'Title': 'Needful Things (1993)'},
    {'Genres': 'Horror|Thriller', 'MovieID': '2121', 'Title': 'Cujo (1983)'},
    {   'Genres': 'Horror|Thriller',
        'MovieID': '2122',
        'Title': 'Children of the Corn (1984)'},
    {   'Genres': "Animation|Children's",
        'MovieID': '2123',
        'Title': 'All Dogs Go to Heaven (1989)'},
    {   'Genres': 'Comedy',
        'MovieID': '2124',
        'Title': 'Addams Family, The (1991)'},
    {   'Genres': 'Drama|Romance',
        'MovieID': '2125',
        'Title': 'Ever After: A Cinderella Story (1998)'},
    {   'Genres': 'Action|Crime|Mystery|Thriller',
        'MovieID': '2126',
        'Title': 'Snake Eyes (1998)'},
    {   'Genres': 'Drama|Romance',
        'MovieID': '2127',
        'Title': 'First Love, Last Rites (1997)'},
    {'Genres': 'Comedy', 'MovieI

        'MovieID': '2543',
        'Title': 'Six Ways to Sunday (1997)'},
    {   'Genres': 'Drama',
        'MovieID': '2544',
        'Title': "School of Flesh, The (L' École de la chair) (1998)"},
    {   'Genres': 'Comedy',
        'MovieID': '2545',
        'Title': "Relax... It's Just Sex (1998)"},
    {   'Genres': 'Drama',
        'MovieID': '2546',
        'Title': 'Deep End of the Ocean, The (1999)'},
    {'Genres': 'Drama', 'MovieID': '2547', 'Title': 'Harvest (1998)'},
    {   'Genres': 'Horror',
        'MovieID': '2548',
        'Title': 'Rage: Carrie 2, The (1999)'},
    {   'Genres': 'Action|Sci-Fi',
        'MovieID': '2549',
        'Title': 'Wing Commander (1999)'},
    {   'Genres': 'Horror|Thriller',
        'MovieID': '2550',
        'Title': 'Haunting, The (1963)'},
    {   'Genres': 'Drama|Thriller',
        'MovieID': '2551',
        'Title': 'Dead Ringers (1988)'},
    {   'Genres': 'Comedy',
        'MovieID': '2552',
        'Title': "My Boyfriend's Back (19

        'Title': 'Crimes and Misdemeanors (1989)'},
    {'Genres': 'Horror|Thriller', 'MovieID': '2974', 'Title': 'Bats (1999)'},
    {'Genres': 'Drama', 'MovieID': '2975', 'Title': 'Best Man, The (1999)'},
    {   'Genres': 'Drama|Horror',
        'MovieID': '2976',
        'Title': 'Bringing Out the Dead (1999)'},
    {   'Genres': 'Comedy|Drama',
        'MovieID': '2977',
        'Title': 'Crazy in Alabama (1999)'},
    {   'Genres': 'Comedy|Romance',
        'MovieID': '2978',
        'Title': 'Three to Tango (1999)'},
    {'Genres': 'Drama', 'MovieID': '2979', 'Title': 'Body Shots (1999)'},
    {'Genres': 'Drama', 'MovieID': '2980', 'Title': 'Men Cry Bullets (1997)'},
    {   'Genres': 'Documentary',
        'MovieID': '2981',
        'Title': 'Brother, Can You Spare a Dime? (1975)'},
    {   'Genres': 'Horror|Thriller',
        'MovieID': '2982',
        'Title': 'Guardian, The (1990)'},
    {   'Genres': 'Thriller',
        'MovieID': '2983',
        'Title': 'Ipcress File, The

        'MovieID': '3451',
        'Title': "Guess Who's Coming to Dinner (1967)"},
    {   'Genres': 'Action|Romance',
        'MovieID': '3452',
        'Title': 'Romeo Must Die (2000)'},
    {   'Genres': 'Drama|Romance',
        'MovieID': '3453',
        'Title': 'Here on Earth (2000)'},
    {   'Genres': 'Comedy|Romance',
        'MovieID': '3454',
        'Title': 'Whatever It Takes (2000)'},
    {   'Genres': 'Drama|Thriller',
        'MovieID': '3455',
        'Title': 'Buddy Boy (1999)'},
    {   'Genres': 'Drama',
        'MovieID': '3456',
        'Title': 'Color of Paradise, The (Rang-e Khoda) (1999)'},
    {'Genres': 'Drama', 'MovieID': '3457', 'Title': 'Waking the Dead (1999)'},
    {   'Genres': 'Drama|Romance',
        'MovieID': '3458',
        'Title': 'Blood and Sand (Sangre y Arena) (1989)'},
    {'Genres': 'Drama|Horror', 'MovieID': '3459', 'Title': 'Gothic (1986)'},
    {   'Genres': 'Comedy',
        'MovieID': '3460',
        'Title': 'Hillbillys in a Haunted H

        'MovieID': '3904',
        'Title': 'Uninvited Guest, An (2000)'},
    {'Genres': 'Comedy', 'MovieID': '3905', 'Title': 'Specials, The (2000)'},
    {'Genres': 'Crime', 'MovieID': '3906', 'Title': 'Under Suspicion (2000)'},
    {   'Genres': 'Drama',
        'MovieID': '3907',
        'Title': 'Prince of Central Park, The (1999)'},
    {   'Genres': 'Horror',
        'MovieID': '3908',
        'Title': 'Urban Legends: Final Cut (2000)'},
    {   'Genres': 'Comedy|Romance',
        'MovieID': '3909',
        'Title': 'Woman on Top (2000)'},
    {   'Genres': 'Drama|Musical',
        'MovieID': '3910',
        'Title': 'Dancer in the Dark (2000)'},
    {'Genres': 'Comedy', 'MovieID': '3911', 'Title': 'Best in Show (2000)'},
    {'Genres': 'Comedy|Drama', 'MovieID': '3912', 'Title': 'Beautiful (2000)'},
    {   'Genres': 'Documentary',
        'MovieID': '3913',
        'Title': 'Barenaked in America (1999)'},
    {   'Genres': 'Drama',
        'MovieID': '3914',
        'Title': 

In [238]:
import csv
with open("C:/Users/eltac/Desktop/Week_4/gbernal_data/movie.csv", 'w',newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=movies_head)
    writer.writeheader()
    for row in data:
        writer.writerow(row)
csvfile.close()

In [239]:
data2=[]
with open('C:/Users/eltac/Downloads/ml-1m/ml-1m/ratings.dat') as rating:
    for line in rating:
        data2.append(dict(zip(ratings_head, line.strip("\n").split("::"))))

In [240]:
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(data2)

[   {'MovieID': '1193', 'Rating': '5', 'Timestamp': '978300760', 'UserID': '1'},
    {'MovieID': '661', 'Rating': '3', 'Timestamp': '978302109', 'UserID': '1'},
    {'MovieID': '914', 'Rating': '3', 'Timestamp': '978301968', 'UserID': '1'},
    {'MovieID': '3408', 'Rating': '4', 'Timestamp': '978300275', 'UserID': '1'},
    {'MovieID': '2355', 'Rating': '5', 'Timestamp': '978824291', 'UserID': '1'},
    {'MovieID': '1197', 'Rating': '3', 'Timestamp': '978302268', 'UserID': '1'},
    {'MovieID': '1287', 'Rating': '5', 'Timestamp': '978302039', 'UserID': '1'},
    {'MovieID': '2804', 'Rating': '5', 'Timestamp': '978300719', 'UserID': '1'},
    {'MovieID': '594', 'Rating': '4', 'Timestamp': '978302268', 'UserID': '1'},
    {'MovieID': '919', 'Rating': '4', 'Timestamp': '978301368', 'UserID': '1'},
    {'MovieID': '595', 'Rating': '5', 'Timestamp': '978824268', 'UserID': '1'},
    {'MovieID': '938', 'Rating': '4', 'Timestamp': '978301752', 'UserID': '1'},
    {'MovieID': '2398', 'Rating': 

        'UserID': '10'},
    {   'MovieID': '3481',
        'Rating': '4',
        'Timestamp': '978225050',
        'UserID': '10'},
    {   'MovieID': '2826',
        'Rating': '4',
        'Timestamp': '978230540',
        'UserID': '10'},
    {   'MovieID': '3629',
        'Rating': '3',
        'Timestamp': '978225428',
        'UserID': '10'},
    {   'MovieID': '1015',
        'Rating': '3',
        'Timestamp': '978230923',
        'UserID': '10'},
    {   'MovieID': '1016',
        'Rating': '5',
        'Timestamp': '978229171',
        'UserID': '10'},
    {   'MovieID': '1954',
        'Rating': '3',
        'Timestamp': '978225735',
        'UserID': '10'},
    {   'MovieID': '1019',
        'Rating': '4',
        'Timestamp': '978227763',
        'UserID': '10'},
    {   'MovieID': '1884',
        'Rating': '2',
        'Timestamp': '978229772',
        'UserID': '10'},
    {   'MovieID': '3489',
        'Rating': '4',
        'Timestamp': '979168295',
        'UserID': '

        'Timestamp': '978902349',
        'UserID': '11'},
    {   'MovieID': '2174',
        'Rating': '4',
        'Timestamp': '978903278',
        'UserID': '11'},
    {   'MovieID': '3552',
        'Rating': '1',
        'Timestamp': '978903278',
        'UserID': '11'},
    {'MovieID': '778', 'Rating': '4', 'Timestamp': '978219895', 'UserID': '11'},
    {   'MovieID': '2683',
        'Rating': '4',
        'Timestamp': '978904242',
        'UserID': '11'},
    {'MovieID': '342', 'Rating': '3', 'Timestamp': '978221632', 'UserID': '11'},
    {   'MovieID': '1887',
        'Rating': '2',
        'Timestamp': '978902286',
        'UserID': '11'},
    {'MovieID': '345', 'Rating': '4', 'Timestamp': '978220669', 'UserID': '11'},
    {'MovieID': '272', 'Rating': '2', 'Timestamp': '978902477', 'UserID': '11'},
    {   'MovieID': '2321',
        'Rating': '3',
        'Timestamp': '978903107',
        'UserID': '11'},
    {   'MovieID': '2325',
        'Rating': '3',
        'Timestamp': '

        'Timestamp': '978212463',
        'UserID': '15'},
    {   'MovieID': '3534',
        'Rating': '3',
        'Timestamp': '978196348',
        'UserID': '15'},
    {   'MovieID': '3461',
        'Rating': '4',
        'Timestamp': '978212698',
        'UserID': '15'},
    {   'MovieID': '3535',
        'Rating': '2',
        'Timestamp': '978197348',
        'UserID': '15'},
    {   'MovieID': '2598',
        'Rating': '3',
        'Timestamp': '978196775',
        'UserID': '15'},
    {   'MovieID': '2302',
        'Rating': '3',
        'Timestamp': '978198379',
        'UserID': '15'},
    {   'MovieID': '1500',
        'Rating': '3',
        'Timestamp': '978212166',
        'UserID': '15'},
    {   'MovieID': '3105',
        'Rating': '3',
        'Timestamp': '978198125',
        'UserID': '15'},
    {'MovieID': '257', 'Rating': '3', 'Timestamp': '978212463', 'UserID': '15'},
    {   'MovieID': '3108',
        'Rating': '4',
        'Timestamp': '978198616',
        'User

        'UserID': '17'},
    {'MovieID': '356', 'Rating': '5', 'Timestamp': '978159896', 'UserID': '17'},
    {'MovieID': '34', 'Rating': '5', 'Timestamp': '978159683', 'UserID': '17'},
    {   'MovieID': '2407',
        'Rating': '4',
        'Timestamp': '978160848',
        'UserID': '17'},
    {   'MovieID': '2268',
        'Rating': '4',
        'Timestamp': '978159683',
        'UserID': '17'},
    {   'MovieID': '1396',
        'Rating': '3',
        'Timestamp': '978160792',
        'UserID': '17'},
    {   'MovieID': '3713',
        'Rating': '4',
        'Timestamp': '978159210',
        'UserID': '17'},
    {   'MovieID': '2912',
        'Rating': '4',
        'Timestamp': '978160228',
        'UserID': '17'},
    {'MovieID': '866', 'Rating': '5', 'Timestamp': '978159315', 'UserID': '17'},
    {   'MovieID': '3717',
        'Rating': '4',
        'Timestamp': '978158779',
        'UserID': '17'},
    {   'MovieID': '2916',
        'Rating': '5',
        'Timestamp': '9781605

KeyboardInterrupt: 

In [241]:
import csv
with open("C:/Users/eltac/Desktop/Week_4/gbernal_data/ratings.csv", 'w',newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=ratings_head)
    writer.writeheader()
    for row in data2:
        writer.writerow(row)
csvfile.close()

In [242]:
data3=[]
with open('C:/Users/eltac/Downloads/ml-1m/ml-1m/users.dat') as movie:
    for line in movie:
        data3.append(dict(zip(users_head, line.strip("\n").split("::"))))

In [231]:
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(data3)

[   {   'Age': '1',
        'Gender': 'F',
        'Occupation': '10',
        'UserID': '1',
        'Zip_code': '48067'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '16',
        'UserID': '2',
        'Zip_code': '70072'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '3',
        'Zip_code': '55117'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '4',
        'Zip_code': '02460'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '5',
        'Zip_code': '55455'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '9',
        'UserID': '6',
        'Zip_code': '55117'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '7',
        'Zip_code': '06810'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '8',
        'Zip_code': '11413'},
    

        'UserID': '196',
        'Zip_code': '94587'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '197',
        'Zip_code': '10023'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '198',
        'Zip_code': '55108'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '199',
        'Zip_code': '83706'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '200',
        'Zip_code': '84321'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '2',
        'UserID': '201',
        'Zip_code': '55117'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '202',
        'Zip_code': '53706'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '203',
        'Zip_code': '53715'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupati

        'Occupation': '0',
        'UserID': '388',
        'Zip_code': '10021'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '389',
        'Zip_code': '68128'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '390',
        'Zip_code': '55405'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '11',
        'UserID': '391',
        'Zip_code': '22122'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '392',
        'Zip_code': '20037'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '393',
        'Zip_code': '55402'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '394',
        'Zip_code': '55013'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '5',
        'UserID': '395',
        'Zip_code': '55104'},
    {   'Age': '25',
        'Gend

    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '598',
        'Zip_code': '95476'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '599',
        'Zip_code': '53711'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '600',
        'Zip_code': '66209'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '601',
        'Zip_code': '06320'},
    {   'Age': '56',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '602',
        'Zip_code': '14612'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '603',
        'Zip_code': '32256'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '604',
        'Zip_code': '32256'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '605',
        'Zip_code':

        'Zip_code': '19147'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '796',
        'Zip_code': '98237'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '797',
        'Zip_code': '20175'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '798',
        'Zip_code': '48464'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '5',
        'UserID': '799',
        'Zip_code': '98498'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '12',
        'UserID': '800',
        'Zip_code': '72032'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '801',
        'Zip_code': '95776'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '16',
        'UserID': '802',
        'Zip_code': '22801'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'User

        'Occupation': '17',
        'UserID': '987',
        'Zip_code': '48098'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '11',
        'UserID': '988',
        'Zip_code': '48823'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '989',
        'Zip_code': '20706'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '990',
        'Zip_code': '10004'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '9',
        'UserID': '991',
        'Zip_code': '48103'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '3',
        'UserID': '992',
        'Zip_code': '02780'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '993',
        'Zip_code': '45678'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '2',
        'UserID': '994',
        'Zip_code': '92109'},
    {   'Age': '18',
        'Gend

        'Gender': 'F',
        'Occupation': '16',
        'UserID': '1174',
        'Zip_code': '91780'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '2',
        'UserID': '1175',
        'Zip_code': '90020'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '1176',
        'Zip_code': '44256'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '17',
        'UserID': '1177',
        'Zip_code': '01453'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '1178',
        'Zip_code': '48186'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '1179',
        'Zip_code': '91030'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '1180',
        'Zip_code': '95503'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '1181',
        'Zip_code': '20716'},
  

        'Zip_code': '61665'},
    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '10',
        'UserID': '1366',
        'Zip_code': '89509'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '1367',
        'Zip_code': '53707'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '3',
        'UserID': '1368',
        'Zip_code': '50266'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '5',
        'UserID': '1369',
        'Zip_code': '12308'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '2',
        'UserID': '1370',
        'Zip_code': '92821'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '2',
        'UserID': '1371',
        'Zip_code': '60657'},
    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '10',
        'UserID': '1372',
        'Zip_code': '95123'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '1',
        '

    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '1573',
        'Zip_code': '77479'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '1574',
        'Zip_code': '81007'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '1575',
        'Zip_code': '94949'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '9',
        'UserID': '1576',
        'Zip_code': '85202'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '1577',
        'Zip_code': '72227-5733'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '1578',
        'Zip_code': '22201'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '1579',
        'Zip_code': '60201'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '1580',
        'Z

    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '1780',
        'Zip_code': '22181'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '0',
        'UserID': '1781',
        'Zip_code': '08009'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '1782',
        'Zip_code': '23454'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '1783',
        'Zip_code': '10027'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '1784',
        'Zip_code': '18104'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '1785',
        'Zip_code': '04901'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '1786',
        'Zip_code': '85721'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '8',
        'UserID': '1787',
        'Zip_

        'Gender': 'M',
        'Occupation': '12',
        'UserID': '1985',
        'Zip_code': '92122'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '1986',
        'Zip_code': '91977'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '1987',
        'Zip_code': '62629'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '1',
        'UserID': '1988',
        'Zip_code': '85224'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '1989',
        'Zip_code': '02090'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '1990',
        'Zip_code': '85257'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '1991',
        'Zip_code': '48309'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '1992',
        'Zip_code': '85259'},
 

    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '2171',
        'Zip_code': '48103'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '2172',
        'Zip_code': '60641'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '2173',
        'Zip_code': '87502'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '2174',
        'Zip_code': '87505'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '2175',
        'Zip_code': '99217'},
    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '19',
        'UserID': '2176',
        'Zip_code': '73505'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '2177',
        'Zip_code': '95451'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '2178',
        '

        'Occupation': '11',
        'UserID': '2354',
        'Zip_code': '94306'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '9',
        'UserID': '2355',
        'Zip_code': '13203'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '16',
        'UserID': '2356',
        'Zip_code': '13207'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '14',
        'UserID': '2357',
        'Zip_code': '13316'},
    {   'Age': '56',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '2358',
        'Zip_code': '06074'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '2359',
        'Zip_code': '13685'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '2360',
        'Zip_code': '13210'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '2361',
        'Zip_code': '49423'},
    {   'Age': '25',
   

        'UserID': '2542',
        'Zip_code': '37922'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '0',
        'UserID': '2543',
        'Zip_code': '42420'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '2544',
        'Zip_code': '52001'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '2545',
        'Zip_code': '37830'},
    {   'Age': '56',
        'Gender': 'F',
        'Occupation': '13',
        'UserID': '2546',
        'Zip_code': '37931'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '0',
        'UserID': '2547',
        'Zip_code': '37920'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '2548',
        'Zip_code': '37919'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '2549',
        'Zip_code': '37938'},
    {   'Age': '50',
        'Gender': 'M',
        

    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '2732',
        'Zip_code': '92805'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '2733',
        'Zip_code': '94002'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '2734',
        'Zip_code': '02912'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '2735',
        'Zip_code': '22903'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '2736',
        'Zip_code': '80303'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '3',
        'UserID': '2737',
        'Zip_code': '50311'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '2',
        'UserID': '2738',
        'Zip_code': '22181'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '2739',
        'Zi

        'Zip_code': '44256'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '2924',
        'Zip_code': '94121'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '2925',
        'Zip_code': '23454'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '2',
        'UserID': '2926',
        'Zip_code': '55118'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '2927',
        'Zip_code': '24060'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '2',
        'UserID': '2928',
        'Zip_code': '90068'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '2',
        'UserID': '2929',
        'Zip_code': '60614'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '2930',
        'Zip_code': '45420'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        

    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '3111',
        'Zip_code': '56520'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '3112',
        'Zip_code': '98133'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '10',
        'UserID': '3113',
        'Zip_code': '55414'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '5',
        'UserID': '3114',
        'Zip_code': '65211'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '3115',
        'Zip_code': '48323'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '3116',
        'Zip_code': '98034'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '3117',
        'Zip_code': '83704'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '3118',
        'Z

        'Occupation': '20',
        'UserID': '3308',
        'Zip_code': '15701-1348'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '3309',
        'Zip_code': '20707'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '3310',
        'Zip_code': '12561'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '3311',
        'Zip_code': '90039'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '3312',
        'Zip_code': '90039'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '3313',
        'Zip_code': '90292'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '3314',
        'Zip_code': '06516'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '3315',
        'Zip_code': '78731'},
    {   'Age': '1'

    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '3504',
        'Zip_code': '02215'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '15',
        'UserID': '3505',
        'Zip_code': '55455'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '3506',
        'Zip_code': '80503'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '3507',
        'Zip_code': '02472'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '3508',
        'Zip_code': '02151'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '3509',
        'Zip_code': '02115'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '3510',
        'Zip_code': '02142'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '3511',
        'Zip

    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '2',
        'UserID': '3695',
        'Zip_code': '98502'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '3696',
        'Zip_code': '19149'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '3697',
        'Zip_code': '68516'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '3698',
        'Zip_code': '53202'},
    {   'Age': '56',
        'Gender': 'F',
        'Occupation': '3',
        'UserID': '3699',
        'Zip_code': '30127'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '3700',
        'Zip_code': '10021'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '3701',
        'Zip_code': '92614'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '3702',
        'Zi

    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '3890',
        'Zip_code': '33143'},
    {   'Age': '56',
        'Gender': 'F',
        'Occupation': '16',
        'UserID': '3891',
        'Zip_code': '90039'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '3892',
        'Zip_code': '91505'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '3893',
        'Zip_code': '79401'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '16',
        'UserID': '3894',
        'Zip_code': '02139'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '0',
        'UserID': '3895',
        'Zip_code': '20723'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '3896',
        'Zip_code': '60914'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '3897',
        'Z

        'Gender': 'M',
        'Occupation': '1',
        'UserID': '4080',
        'Zip_code': '48912'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '4081',
        'Zip_code': '19403'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '20',
        'UserID': '4082',
        'Zip_code': '79912'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '4083',
        'Zip_code': '92630'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '4084',
        'Zip_code': '14215'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '4085',
        'Zip_code': '79416'},
    {   'Age': '1',
        'Gender': 'F',
        'Occupation': '10',
        'UserID': '4086',
        'Zip_code': '55391'},
    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '4087',
        'Zip_code': '63376'},
  

        'Occupation': '20',
        'UserID': '4268',
        'Zip_code': '04046'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '18',
        'UserID': '4269',
        'Zip_code': '27603'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '4270',
        'Zip_code': '13211'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '4271',
        'Zip_code': '83405'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '4272',
        'Zip_code': '37923'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '4273',
        'Zip_code': '30030'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '1',
        'UserID': '4274',
        'Zip_code': '04258'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '16',
        'UserID': '4275',
        'Zip_code': '14468'},
    {   'Age': '45',
  

        'Zip_code': '07405'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '1',
        'UserID': '4464',
        'Zip_code': '62052'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '4465',
        'Zip_code': '02148'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '11',
        'UserID': '4466',
        'Zip_code': '10022'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '4467',
        'Zip_code': '15333'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '4468',
        'Zip_code': '75601'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '12',
        'UserID': '4469',
        'Zip_code': '92037'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '4470',
        'Zip_code': '15089'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '6',
       

        'Zip_code': '94041'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '4657',
        'Zip_code': '55416'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '4658',
        'Zip_code': '99163'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '14',
        'UserID': '4659',
        'Zip_code': '37076'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '4660',
        'Zip_code': '06074'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '4661',
        'Zip_code': '85255'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '4662',
        'Zip_code': '84109'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '4663',
        'Zip_code': '92037'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '7',
       

        'Zip_code': '44515'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '3',
        'UserID': '4850',
        'Zip_code': '44555'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '4851',
        'Zip_code': '44406'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '4852',
        'Zip_code': '42025'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '4853',
        'Zip_code': '55346'},
    {   'Age': '50',
        'Gender': 'F',
        'Occupation': '13',
        'UserID': '4854',
        'Zip_code': '03851'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '4855',
        'Zip_code': '90034'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '7',
        'UserID': '4856',
        'Zip_code': '94110'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '14',
      

        'Occupation': '2',
        'UserID': '5042',
        'Zip_code': '55408'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '5043',
        'Zip_code': '92145'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '6',
        'UserID': '5044',
        'Zip_code': '23507'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '5045',
        'Zip_code': '43081'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '16',
        'UserID': '5046',
        'Zip_code': '60614'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '5047',
        'Zip_code': '23452'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '5048',
        'Zip_code': '30350'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '5049',
        'Zip_code': '60613'},
    {   'Age': '18',
     

        'UserID': '5240',
        'Zip_code': '94104'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '5241',
        'Zip_code': '02138'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '5242',
        'Zip_code': '14608'},
    {   'Age': '1',
        'Gender': 'M',
        'Occupation': '10',
        'UserID': '5243',
        'Zip_code': '54220'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '5244',
        'Zip_code': '01095'},
    {   'Age': '45',
        'Gender': 'F',
        'Occupation': '1',
        'UserID': '5245',
        'Zip_code': '27615'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '0',
        'UserID': '5246',
        'Zip_code': '64030'},
    {   'Age': '1',
        'Gender': 'F',
        'Occupation': '10',
        'UserID': '5247',
        'Zip_code': '01915'},
    {   'Age': '45',
        'Gender': 'F',
        'Oc

        'Occupation': '1',
        'UserID': '5431',
        'Zip_code': '19026'},
    {   'Age': '56',
        'Gender': 'M',
        'Occupation': '13',
        'UserID': '5432',
        'Zip_code': '01742'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '17',
        'UserID': '5433',
        'Zip_code': '45014'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '7',
        'UserID': '5434',
        'Zip_code': '60618'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '18',
        'UserID': '5435',
        'Zip_code': '02557'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '5436',
        'Zip_code': '90024'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '11',
        'UserID': '5437',
        'Zip_code': '55426'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '5438',
        'Zip_code': '60048'},
    {   'Age': '56',
  

        'Gender': 'M',
        'Occupation': '16',
        'UserID': '5626',
        'Zip_code': '32043'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '5627',
        'Zip_code': '07040'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '5628',
        'Zip_code': '90024'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '14',
        'UserID': '5629',
        'Zip_code': '02465'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '17',
        'UserID': '5630',
        'Zip_code': '06854'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '5631',
        'Zip_code': '01944'},
    {   'Age': '18',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '5632',
        'Zip_code': '78628'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '5633',
        'Zip_code': '98262'},
 

        'UserID': '5817',
        'Zip_code': '33028'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '5818',
        'Zip_code': '92821'},
    {   'Age': '50',
        'Gender': 'M',
        'Occupation': '6',
        'UserID': '5819',
        'Zip_code': '70808'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '5820',
        'Zip_code': '43615'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '5821',
        'Zip_code': '02139'},
    {   'Age': '35',
        'Gender': 'F',
        'Occupation': '4',
        'UserID': '5822',
        'Zip_code': '78212'},
    {   'Age': '25',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '5823',
        'Zip_code': '02144'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '12',
        'UserID': '5824',
        'Zip_code': '18052'},
    {   'Age': '25',
        'Gender': 'F',
        

        'UserID': '6007',
        'Zip_code': '80537'},
    {   'Age': '18',
        'Gender': 'M',
        'Occupation': '4',
        'UserID': '6008',
        'Zip_code': '78705'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '12',
        'UserID': '6009',
        'Zip_code': '60540'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '0',
        'UserID': '6010',
        'Zip_code': '79606'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '6011',
        'Zip_code': '80538'},
    {   'Age': '35',
        'Gender': 'M',
        'Occupation': '15',
        'UserID': '6012',
        'Zip_code': '02871'},
    {   'Age': '25',
        'Gender': 'F',
        'Occupation': '20',
        'UserID': '6013',
        'Zip_code': '32301'},
    {   'Age': '45',
        'Gender': 'M',
        'Occupation': '1',
        'UserID': '6014',
        'Zip_code': '80634'},
    {   'Age': '25',
        'Gender': 'F',
       

In [243]:
import csv
with open("C:/Users/eltac/Desktop/Week_4/gbernal_data/users.csv", 'w',newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=users_head)
    writer.writeheader()
    for row in data3:
        writer.writerow(row)
csvfile.close()

In [244]:
%%time
import os 
import shutil

path = "C:/Users/eltac/Desktop/Week_4/gbernal_data"
Archive = "C:/Users/eltac/Desktop/Week_4/gbernal_archive"
def files(path):
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files(path):
    csvfile = file
    name = csvfile.replace(".csv", "")
    fullpath = (path + '/' + csvfile)
    ArchivePath = (Archive + '/' + csvfile)
    print(csvfile)
    print (name)
    print (fullpath)
    run_test(fullpath, name)
    shutil.move(fullpath, ArchivePath)

movie.csv
movie
C:/Users/eltac/Desktop/Week_4/gbernal_data/movie.csv
create table movie(
movieid smallint,
title varchar(82),
genres varchar(47));
Table created successfully in PostgreSQL 
Data uploaded to PostgreSQL 
ratings.csv
ratings
C:/Users/eltac/Desktop/Week_4/gbernal_data/ratings.csv
create table ratings(
userid smallint,
movieid smallint,
rating smallint,
timestamp int);
Table created successfully in PostgreSQL 
Data uploaded to PostgreSQL 
users.csv
users
C:/Users/eltac/Desktop/Week_4/gbernal_data/users.csv
create table users(
userid smallint,
gender varchar(1),
age smallint,
occupation smallint,
zip_code varchar(10));
Table created successfully in PostgreSQL 
Data uploaded to PostgreSQL 
Wall time: 23.9 s


In [276]:
query ="""SELECT 
u.Gender,
m.title, 
TO_CHAR(round(AVG(r.rating),2),'FM999999999.00') AS "Average"
FROM movie m 
INNER JOIN ratings r ON m.MovieID = r.MovieID 
INNER JOIN users u ON r.UserID = u.UserID 
WHERE m.title = 'Die Hard (1988)'
AND u.Gender = 'F'
GROUP BY 
u.Gender,
m.title"""

In [277]:
connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
cursor = connection.cursor()
cursor.execute(query)
connection.commit()
query_results = cursor.fetchall()
cursor.close()
connection.close()

In [278]:
for row in query_results:
    print (row)

('F', 'Die Hard (1988)', '3.92')


In [279]:
query ="""SELECT 
u.Gender,
m.title, 
TO_CHAR(round(AVG(r.rating),2),'FM999999999.00') AS "Average"
FROM movie m 
INNER JOIN ratings r ON m.MovieID = r.MovieID 
INNER JOIN users u ON r.UserID = u.UserID 
WHERE m.title = 'Die Hard (1988)'

GROUP BY 
u.Gender,
m.title"""

In [284]:
connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
cursor = connection.cursor()
cursor.execute(query)
connection.commit()
query_results = cursor.fetchall()
cursor.close()
connection.close()

In [285]:
for row in query_results:
    print (row)

('F', 'Die Hard (1988)', '3.92')
('M', 'Die Hard (1988)', '4.17')


In [290]:
query ="""SELECT 
u.Gender,
m.title, 
TO_CHAR(round(AVG(u.age),2),'FM999999999.00') AS "Average"
FROM movie m 
INNER JOIN ratings r ON m.MovieID = r.MovieID 
INNER JOIN users u ON r.UserID = u.UserID 
where m.title = 'Gone with the Wind (1939)' AND u.Gender = 'F'

GROUP BY 
u.Gender,
m.title"""

In [291]:
connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
cursor = connection.cursor()
cursor.execute(query)
connection.commit()
query_results = cursor.fetchall()
cursor.close()
connection.close()

In [292]:
for row in query_results:
    print (row)

('F', 'Gone with the Wind (1939)', '32.90')


In [298]:
query ="""SELECT 
m.title, 
m.genres, 
u.Gender, 
u.Age, 
u.Occupation, 
u.Zip_Code, 
r.Rating, 
r.Timestamp 
FROM movie m 
INNER JOIN ratings r ON m.MovieID = r.MovieID 
INNER JOIN users u ON r.UserID = u.UserID 
ORDER BY m.Title ASC

"""

In [299]:
connection = psycopg2.connect(user = User, password = Password, host = Host, port = Port, database = Database)
cursor = connection.cursor()
cursor.execute(query)
connection.commit()
query_results = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
cursor.close()
connection.close()

In [300]:
results = []
for row in query_results:
    results.append(dict(zip(colnames, row)))

In [301]:
results[:]

[{'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'F',
  'age': 35,
  'occupation': 1,
  'zip_code': '95370',
  'rating': 4,
  'timestamp': 978230923},
 {'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'M',
  'age': 18,
  'occupation': 15,
  'zip_code': '53706',
  'rating': 4,
  'timestamp': 978153321},
 {'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'F',
  'age': 18,
  'occupation': 0,
  'zip_code': '02135',
  'rating': 2,
  'timestamp': 978104190},
 {'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'M',
  'age': 50,
  'occupation': 17,
  'zip_code': '57747',
  'rating': 5,
  'timestamp': 979576562},
 {'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'F',
  'age': 18,
  'occupation': 4,
  'zip_code': '44243',
  'rating': 4,
  'timestamp': 986189513},
 {'title': "'burbs, The (1989)",
  'genres': 'Comedy',
  'gender': 'M',
  'age': 18,
  'occupation': 4,
  'zip_code': '92802',
  'rating': 3,
  'timestam

In [303]:
from tinydb import TinyDB, Query, where

tiny_db = TinyDB("gbernal_movie.json")

In [304]:
tiny_db.insert_multiple(results)

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

In [315]:
female_dh_set = tiny_db.search( (where('title').matches('Die Hard')) & (where('gender').matches('F')) )

In [317]:
dh_avg_f = sum(int(r['rating']) for r in female_dh_set) / len(female_dh_set)

In [319]:
print(f'Average: {dh_avg_f}')

Average: 3.7107438016528924
