# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [143]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [144]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings, movies

In [145]:
users, ratings, movies = get_movie_data()

  """
  if __name__ == '__main__':
  del sys.path[0]


In [146]:
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [147]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [148]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [149]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [150]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [151]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (20 pts)

In [152]:
#merge users dataset onto ratings dataset
main = pd.merge(ratings, users)

#display new main dataset
main.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip
0,1,1193,5,978300760,F,1,10,48067
1,1,661,3,978302109,F,1,10,48067
2,1,914,3,978301968,F,1,10,48067
3,1,3408,4,978300275,F,1,10,48067
4,1,2355,5,978824291,F,1,10,48067


In [153]:
#merge the movie dataset to the main dataset
main = pd.merge(main, movies)

#removed unneeded columns
main = main[['user_id', 'movie_id', 'rating', 'timestamp', 'gender', 'age', 'occupation', 'zip', 'title', 'genres', 'year', 'short_title']]

#export main dataset to csv
main.to_csv('../data/main.csv',index=False)

#display new dataset
main.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres,year,short_title
0,1,1193,5,978300760,F,1,10,48067,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
1,2,1193,5,978298413,M,56,16,70072,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
2,12,1193,4,978220179,M,25,12,32793,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
3,15,1193,4,978199279,M,25,7,22903,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
4,17,1193,5,978158471,M,50,1,95350,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest


##### What's the highest rated movie? (20 pts))

In [154]:
#group movies and get the mean of the ratings and median of the age
main_by_movie = main.groupby(['movie_id', 'title', 'genres', 'year', 'short_title'], as_index=False).agg({'rating':'mean', 'age':'median'})

#display new dataset
main_by_movie.head()

Unnamed: 0,movie_id,title,genres,year,short_title,rating,age
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,4.146846,25.0
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji,3.201141,25.0
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men,3.016736,25.0
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale,2.729412,25.0
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II,3.006757,25.0


In [155]:
#sort by ratings in decending order
main_by_movie.sort_values('rating', ascending=False).head(10)

Unnamed: 0,movie_id,title,genres,year,short_title,rating,age
926,989,Schlafes Bruder (Brother of Sleep) (1995),Drama,1995,Schlafes Bruder (Brother of Sleep),5.0,50.0
3635,3881,Bittersweet Motel (2000),Documentary,2000,Bittersweet Motel,5.0,18.0
1652,1830,Follow the Bitch (1998),Comedy,1998,Follow the Bitch,5.0,50.0
3152,3382,Song of Freedom (1936),Drama,1936,Song of Freedom,5.0,56.0
744,787,"Gate of Heavenly Peace, The (1995)",Documentary,1995,"Gate of Heavenly Peace, The",5.0,25.0
3054,3280,"Baby, The (1973)",Horror,1973,"Baby, The",5.0,18.0
3367,3607,One Little Indian (1973),Comedy|Drama|Western,1973,One Little Indian,5.0,18.0
3010,3233,Smashing Time (1967),Comedy,1967,Smashing Time,5.0,47.5
2955,3172,Ulysses (Ulisse) (1954),Adventure,1954,Ulysses (Ulisse),5.0,25.0
3414,3656,Lured (1947),Crime,1947,Lured,5.0,56.0


#### Conclusion

To obtain the highest rated movie the main dataset first needed to undergo some changes. First the dataset needed to be grouped by the movie_id. To do this groupby was utilized and the columns that were grouped were the movie_id, title, gneres, year, and short_title. The rating column was also grouped but this column contained the calculated mean rating for each movie and the age column contained the median age of the people who viwed the movie. Once this was done the dataset was sorted by the rating number in decending order meaning the highest rated movie would be at the top of the dataset. That movie being 'Schlafes Bruder (Brother of Sleep) which came out in the year of 1995 and had a median age viewership of 50 with a rating of 5.0. Since many movies were rated with a 5.0 the top 10 movies were displayed.

###### What is a good rated movie for date night? (60 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined

In [156]:
#age of people
age1 = 21
age2 = 17

#movie genre chosen
genre = 'Drama'

#minimum rating chosen
min_rating = 4.0

#minimum age for movie
min_year = '1998'

In [157]:
#filter movies by the min year requirement
main_by_movie = main_by_movie[main_by_movie['year'] >= min_year]

#filter movies by the genre chosen
main_by_movie = main_by_movie[main_by_movie['genres'].str.contains(genre)]

#filter movies by the min rating chosen
main_by_movie = main_by_movie[main_by_movie['rating'] >= min_rating]

#get movies that the median age is 5 years above the oldest age and 5 years below youngest age
main_by_movie = main_by_movie[main_by_movie['age'] < (age1 + 5)]
main_by_movie = main_by_movie[main_by_movie['age'] > (age2 - 5)]

#sort movies by highest rating first
main_by_movie = main_by_movie.sort_values('rating', ascending=False)

#display new movie dataset
main_by_movie

Unnamed: 0,movie_id,title,genres,year,short_title,rating,age
2309,2503,"Apple, The (Sib) (1998)",Drama,1998,"Apple, The (Sib)",4.666667,25.0
1848,2028,Saving Private Ryan (1998),Action|Drama|War,1998,Saving Private Ryan,4.337354,25.0
2651,2858,American Beauty (1999),Comedy|Drama,1999,American Beauty,4.317386,25.0
2167,2360,"Celebration, The (Festen) (1998)",Drama,1998,"Celebration, The (Festen)",4.307692,25.0
2136,2329,American History X (1998),Drama,1998,American History X,4.226562,25.0
3651,3897,Almost Famous (2000),Comedy|Drama,2000,Almost Famous,4.226358,25.0
2931,3147,"Green Mile, The (1999)",Drama|Thriller,1999,"Green Mile, The",4.154664,25.0
2307,2501,October Sky (1999),Drama,1999,October Sky,4.137755,25.0
3702,3949,Requiem for a Dream (2000),Drama,2000,Requiem for a Dream,4.115132,25.0
3341,3578,Gladiator (2000),Action|Drama,2000,Gladiator,4.106029,25.0


#### Conclusion

I currently do not have a girlfriend nor am I dating at the moment. So for this part of an assignment I did it more as a movie night not so much as a datenight and the person who I would be having this movie night with is my younger brother. So the data chosen to filter out the movies and find the perfect movie to watch together on movie night corresponded to mine and my younger borhters information. I decided to filter out the information by age, year the movies came out, genre, and minimun rating. So for age I used my bothers age (17) and my age (21) and I filtered the movies that the median age of those who saw the movie were 5 years above the oldest age which is mine and 5 years below the oldest age which was my brothers. That was there would be a reduced amount of movies to chose from but not too reduced. Next we chose for the genre to be Drama and movies that were from 1995 and newer with at least a 4.0 rating. After the data was filtered I sorted the data in a decending order by rating. That is the list that is shown above. Now although the top movie is "Apple, The (Sib)", the movies that caught our eye the most were "Saving Private Ryan" and "Fight Club". We have never seen these movies but were really interested and might be seeing them in the near future. 