# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [1]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [2]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../../Assignment2/data/users.dat'),
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../../Assignment2/data/ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../../Assignment2/data/movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings, movies

In [3]:
users, ratings, movies = get_movie_data()

  users = pd.read_table(os.path.join('../../Assignment2/data/users.dat'),
  ratings = pd.read_table(os.path.join('../../Assignment2/data/ratings.dat'),
  movies = pd.read_table(os.path.join('../../Assignment2/data/movies.dat'),


In [4]:
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [5]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [6]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),"Animation|Children's|Comedy,,"
1,2,Jumanji (1995),"Adventure|Children's|Fantasy,,"
2,3,Grumpier Old Men (1995),"Comedy|Romance,,"
3,4,Waiting to Exhale (1995),"Comedy|Drama,,"
4,5,Father of the Bride Part II (1995),"Comedy,,"


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [7]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [8]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [9]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),"Animation|Children's|Comedy,,",1995,Toy Story
1,2,Jumanji (1995),"Adventure|Children's|Fantasy,,",1995,Jumanji
2,3,Grumpier Old Men (1995),"Comedy|Romance,,",1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),"Comedy|Drama,,",1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),"Comedy,,",1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (20 pts)

In [10]:
#merging all the data together
rating_and_users = pd.merge(ratings,users,on='user_id')
merged_data = pd.merge(rating_and_users,movies,on='movie_id')
merged_data.tail()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres,year,short_title
1000204,5949,2198,5,958846401,M,18,17,47901,Modulations (1998),"Documentary,,",1998,Modulations
1000205,5675,2703,3,976029116,M,35,14,30030,Broken Vessels (1998),"Drama,,",1998,Broken Vessels
1000206,5780,2845,1,958153068,M,18,17,92886,White Boys (1999),"Drama,,",1999,White Boys
1000207,5851,3607,5,957756608,F,18,20,55410,One Little Indian (1973),"Comedy|Drama|Western,,",1973,One Little Indian
1000208,5938,2909,4,957273353,M,25,1,35401,"Five Wives, Three Secretaries and Me (1998)","Documentary,",1998,"Five Wives, Three Secretaries and Me"


##### What's the highest rated movie? (20 pts))

In [11]:
#temp value for comparing the movie rating
temp_rating = 0.0
#saving the highest rating movie id
highest_rating_movie = 0

#travsering through the all the movies
for x in range(0, merged_data.movie_id.max()+1):
    movie_list = merged_data.loc[merged_data['movie_id'] == x]
    
    #checking to see if the new movie in the list is higher if so replace it with this one
    if movie_list.rating.mean() > temp_rating:
        temp_rating = movie_list.rating.mean()
        highest_rating_movie = x
        
print(highest_rating_movie)
print(merged_data.loc[merged_data['movie_id'] == highest_rating_movie].head())

787
        user_id  movie_id  rating  timestamp gender  age  occupation    zip  \
965717      149       787       5  977325719      M   25           1  29205   
965718     2825       787       5  972610193      F   25          20  94014   
965719     2872       787       5  972423586      M   25          20  94014   

                                     title        genres  year  \
965717  Gate of Heavenly Peace, The (1995)  Documentary,  1995   
965718  Gate of Heavenly Peace, The (1995)  Documentary,  1995   
965719  Gate of Heavenly Peace, The (1995)  Documentary,  1995   

                        short_title  
965717  Gate of Heavenly Peace, The  
965718  Gate of Heavenly Peace, The  
965719  Gate of Heavenly Peace, The  


###### What is a good rated movie for date night? (60 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined

In [12]:
#looking at genre to pick out what there are
merged_data.genres.unique()

array(['Drama,,', "Animation|Children's|Musical,,", 'Musical|Romance,,',
       "Animation|Children's|Comedy,", 'Action|Adventure|Comedy|Romance,',
       'Action|Adventure|Drama,,', 'Comedy|Drama,',
       "Adventure|Children's|Drama|Musical,", 'Musical,,', 'Comedy,,',
       'Musical,', "Animation|Children's,,", 'Comedy|Fantasy,,',
       'Animation,,', 'Comedy|Sci-Fi,,', 'Drama|War,,', 'Romance,,',
       "Animation|Children's|Musical|Romance,,",
       "Children's|Drama|Fantasy|Sci-Fi,,", 'Drama|Romance,,',
       'Animation|Comedy|Thriller,', 'Drama,',
       "Adventure|Animation|Children's|Comedy|Musical,,",
       "Animation|Children's|Comedy|Musical,,",
       "Animation|Children's|Musical,", 'Thriller,',
       "Animation|Children's|Comedy,,", 'Action|Crime|Romance,,',
       'Action|Adventure|Fantasy|Sci-Fi,,', "Children's|Comedy|Musical,,",
       'Action|Drama|War,,', "Children's|Drama,",
       'Crime|Drama|Thriller,,', 'Action|Crime|Drama,',
       'Action|Adventure|Myste

In [13]:
#gets all the genre that is being searched for 
genre = merged_data.loc[merged_data['genres'].str.contains('Drama' and 'Sci-Fi')]

#divides the dataframe into Male and Female
male = genre.loc[genre['gender'] == 'M']
female = genre.loc[genre['gender'] == 'F']

In [14]:
genre.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres,year,short_title
23270,1,1270,5,978300055,F,1,10,48067,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23271,3,1270,3,978298231,M,25,15,55117,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23272,7,1270,4,978234581,M,35,1,6810,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23273,10,1270,4,978225735,F,35,1,95370,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23274,17,1270,5,978158536,M,50,1,95350,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future


In [15]:
male.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres,year,short_title
23271,3,1270,3,978298231,M,25,15,55117,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23272,7,1270,4,978234581,M,35,1,6810,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23274,17,1270,5,978158536,M,50,1,95350,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23275,19,1270,5,978557301,M,1,10,48073,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23276,22,1270,4,978152904,M,18,15,53706,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future


In [16]:
female.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres,year,short_title
23270,1,1270,5,978300055,F,1,10,48067,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23273,10,1270,4,978225735,F,35,1,95370,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23277,24,1270,4,978131939,F,25,7,10023,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23282,34,1270,5,978102954,F,18,0,2135,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future
23286,45,1270,5,977990190,F,45,16,94110,Back to the Future (1985),"Comedy|Sci-Fi,,",1985,Back to the Future


In [17]:
#temp variable
temp = 0.0
#gets movie_id in the conditional
highest_rating_both = 0

#go through all the movies
for x in range(0, genre.movie_id.max()+1):
    males = male.loc[male['movie_id'] == x]
    females = female.loc[female['movie_id'] == x]
    
    #if there is at least 10 rated movies
    if len(genre[genre['movie_id'] == x].rating) >= 10:
        #find the highest rating separated from male and female then combine them afterwards
        if males.rating.mean() + females.rating.mean() > temp:
            temp = males.rating.mean() + females.rating.mean()
            highest_rating_both = x
            
print(highest_rating_both)
print(merged_data.loc[merged_data['movie_id'] == highest_rating_both].head())

750
        user_id  movie_id  rating  timestamp gender  age  occupation    zip  \
454160       10       750       4  979775386      F   35           1  95370   
454161       17       750       5  978160490      M   50           1  95350   
454162       23       750       4  978463892      M   35           0  90049   
454163       40       750       4  978041037      M   45           0  10543   
454164       42       750       4  978039170      M   25           8  24502   

                                                    title        genres  year  \
454160  Dr. Strangelove or: How I Learned to Stop Worr...  Sci-Fi|War,,  1963   
454161  Dr. Strangelove or: How I Learned to Stop Worr...  Sci-Fi|War,,  1963   
454162  Dr. Strangelove or: How I Learned to Stop Worr...  Sci-Fi|War,,  1963   
454163  Dr. Strangelove or: How I Learned to Stop Worr...  Sci-Fi|War,,  1963   
454164  Dr. Strangelove or: How I Learned to Stop Worr...  Sci-Fi|War,,  1963   

                                  

This is a 3 part program. The first part was to read all the files toegether and display them all together. The second part was to give the highest rating movie in the text file. The third part of the program was to display the best recommended movie based on a male and a female preferences. I decided to display the male and female preferences by picking the genres Drama and Sci-fi. After getting all the movies in those two genres. The program will then display the male and females highest rated movie for that conditional of Drama and Sci-fi. After getting the highest of the male and female rated movies then it will combine the data and display the best recommended movie in that genre.