# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [6]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [162]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames, encoding='ISO-8859-1')
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames, encoding='ISO-8859-1')
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames, encoding='ISO-8859-1')

    return users, ratings, movies

In [163]:
users, ratings, movies = get_movie_data()

  users = pd.read_table(os.path.join('../data','users.dat'),
  ratings = pd.read_table(os.path.join('../data', 'ratings.dat'),
  movies = pd.read_table(os.path.join('../data', 'movies.dat'),


In [165]:
print (users.head())

   user_id gender  age  occupation    zip
0        1      F    1          10  48067
1        2      M   56          16  70072
2        3      M   25          15  55117
3        4      M   45           7  02460
4        5      M   25          20  55455


In [147]:
print (ratings.head())

   user_id  movie_id  rating  timestamp
0        1      1193       5  978300760
1        1       661       3  978302109
2        1       914       3  978301968
3        1      3408       4  978300275
4        1      2355       5  978824291


In [148]:
print (movies.head())

   movie_id                               title                        genres
0         1                    Toy Story (1995)   Animation|Children's|Comedy
1         2                      Jumanji (1995)  Adventure|Children's|Fantasy
2         3             Grumpier Old Men (1995)                Comedy|Romance
3         4            Waiting to Exhale (1995)                  Comedy|Drama
4         5  Father of the Bride Part II (1995)                        Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [149]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [150]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [151]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (20 pts)

### Joining

users + ratings = users_ratings

In [153]:
users_ratings = pd.merge(users, ratings, on=['user_id'], how='inner',indicator='true')
users_ratings

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,true
0,1,F,1,10,48067,1193,5,978300760,both
1,1,F,1,10,48067,661,3,978302109,both
2,1,F,1,10,48067,914,3,978301968,both
3,1,F,1,10,48067,3408,4,978300275,both
4,1,F,1,10,48067,2355,5,978824291,both
...,...,...,...,...,...,...,...,...,...
1000204,6040,M,25,6,11106,1091,1,956716541,both
1000205,6040,M,25,6,11106,1094,5,956704887,both
1000206,6040,M,25,6,11106,562,5,956704746,both
1000207,6040,M,25,6,11106,1096,4,956715648,both


users_ratings + movies = users_ratings_movies

In [154]:
users_ratings_movies = pd.merge(users_ratings, movies, on=['movie_id'], how='inner')
users_ratings_movies 

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,true,title,genres,year,short_title
0,1,F,1,10,48067,1193,5,978300760,both,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
1,2,M,56,16,70072,1193,5,978298413,both,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
2,12,M,25,12,32793,1193,4,978220179,both,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
3,15,M,25,7,22903,1193,4,978199279,both,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
4,17,M,50,1,95350,1193,5,978158471,both,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1000204,5949,M,18,17,47901,2198,5,958846401,both,Modulations (1998),Documentary,1998,Modulations
1000205,5675,M,35,14,30030,2703,3,976029116,both,Broken Vessels (1998),Drama,1998,Broken Vessels
1000206,5780,M,18,17,92886,2845,1,958153068,both,White Boys (1999),Drama,1999,White Boys
1000207,5851,F,18,20,55410,3607,5,957756608,both,One Little Indian (1973),Comedy|Drama|Western,1973,One Little Indian


### Getting number of users and movies from the dataset.

In [155]:
user_ids = ratings.user_id.unique().tolist()
movie_ids = ratings.movie_id.unique().tolist()
print('Number of Users: {}'.format(len(user_ids)))
print('Number of Movies: {}'.format(len(movie_ids)))

Number of Users: 6040
Number of Movies: 3706


In [87]:
users_ratings_movies[users_ratings_movies.genres=='Action']

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,title,genres,year,short_title,true
163411,2,M,56,16,70072,459,3,978300002,"Getaway, The (1994)",Action,1994,"Getaway, The",both
163412,13,M,45,1,93304,459,3,978202039,"Getaway, The (1994)",Action,1994,"Getaway, The",both
163413,147,M,18,4,91360,459,3,977337008,"Getaway, The (1994)",Action,1994,"Getaway, The",both
163414,148,M,50,17,57747,459,3,977334058,"Getaway, The (1994)",Action,1994,"Getaway, The",both
163415,163,M,18,4,85013,459,2,977220007,"Getaway, The (1994)",Action,1994,"Getaway, The",both
...,...,...,...,...,...,...,...,...,...,...,...,...,...
999790,5990,F,25,20,90046,3283,4,956870190,Minnie and Moskowitz (1971),Action,1971,Minnie and Moskowitz,both
1000116,3765,F,50,3,34744,2258,2,966086510,Master Ninja I (1984),Action,1984,Master Ninja I,both
1000117,5717,M,25,0,03766,2258,4,958509389,Master Ninja I (1984),Action,1984,Master Ninja I,both
1000183,5059,M,45,16,22652,1434,4,962484364,"Stranger, The (1994)",Action,1994,"Stranger, The",both


##### What's the highest rated movie? (20 pts))

###### What is a good rated movie for date night? (60 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined