# Date Night Movie

#### Grading:


- Code: 6.5 pts
- Comments: .5pts
- Markdown Documentation: .5pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [1]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [2]:
def get_movie_data():
    
    '''
    Function to read in movie related data
    
    Parameters
    ----------
    None
    
    Returns
    -------
    users: pd.DataFrame containing user data
    ratings: pd.DataFrame containing rating data
    movies: pd.DataFrame containing movie data
    '''
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_csv(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_csv(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_csv(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings , movies

In [3]:
users, ratings , movies = get_movie_data()

  users = pd.read_csv(os.path.join('../data','users.dat'),
  ratings = pd.read_csv(os.path.join('../data', 'ratings.dat'),
  movies = pd.read_csv(os.path.join('../data', 'movies.dat'),


In [4]:
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [5]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [6]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [7]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [8]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [9]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (1 pts)

In [10]:
new = pd.merge(pd.merge(users,ratings),movies) #new DataFrame that combines the information from the two inputs.
new

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,title,genres,year,short_title
0,1,F,1,10,48067,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
1,2,M,56,16,70072,1193,5,978298413,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
2,12,M,25,12,32793,1193,4,978220179,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
3,15,M,25,7,22903,1193,4,978199279,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
4,17,M,50,1,95350,1193,5,978158471,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
...,...,...,...,...,...,...,...,...,...,...,...,...
1000204,5949,M,18,17,47901,2198,5,958846401,Modulations (1998),Documentary,1998,Modulations
1000205,5675,M,35,14,30030,2703,3,976029116,Broken Vessels (1998),Drama,1998,Broken Vessels
1000206,5780,M,18,17,92886,2845,1,958153068,White Boys (1999),Drama,1999,White Boys
1000207,5851,F,18,20,55410,3607,5,957756608,One Little Indian (1973),Comedy|Drama|Western,1973,One Little Indian


##### What's the highest rated movie? (1 pts))

In [11]:
rating_value=new["rating"] #
max=rating_value.max()
highest_rated_movie = new[new["rating"]==max]
highest_rated_movie[["title","rating"]]

Unnamed: 0,title,rating
0,One Flew Over the Cuckoo's Nest (1975),5
1,One Flew Over the Cuckoo's Nest (1975),5
4,One Flew Over the Cuckoo's Nest (1975),5
6,One Flew Over the Cuckoo's Nest (1975),5
7,One Flew Over the Cuckoo's Nest (1975),5
...,...,...
1000189,Brother Minister: The Assassination of Malcolm...,5
1000195,Lured (1947),5
1000199,Song of Freedom (1936),5
1000204,Modulations (1998),5


##### What are some good rated movies for date night? (4.5 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined
    - Perhaps occupation? 
    - etc.
    
There is no single correct answer. Be sure to explain your reasoning behind your suggestion

Based on the Age and Genre factors, I have classified some good movies.


In [12]:
age_1=int(input("Please enter the age of the youngest person")) # High rated movie for given Age 
genre=input("Please enter the movie Genre that you would like to watch") # High rated movie for given Genre
print(type(genre))
good_rated_movies = new[(new["age"]<=age_1)&(new["rating"]==max)&(new["genres"]==genre)] # Good rated movie based on age and genre factors.
good_rated_movies[["title"]] #printing Movie titles

<class 'str'>


Unnamed: 0,title
0,One Flew Over the Cuckoo's Nest (1975)
6,One Flew Over the Cuckoo's Nest (1975)
208,One Flew Over the Cuckoo's Nest (1975)
302,One Flew Over the Cuckoo's Nest (1975)
303,One Flew Over the Cuckoo's Nest (1975)
...,...
997914,Judy Berlin (1999)
998204,"Legend of 1900, The (Leggenda del pianista sul..."
998521,Faces (1968)
999470,Bandits (1997)
