# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [25]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [26]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings, movies

In [27]:
users, ratings, movies = get_movie_data()

  return read_csv(**locals())


In [28]:
print(users.head())

   user_id gender  age  occupation    zip
0        1      F    1          10  48067
1        2      M   56          16  70072
2        3      M   25          15  55117
3        4      M   45           7  02460
4        5      M   25          20  55455


In [29]:
print(ratings.head())

   user_id  movie_id  rating  timestamp
0        1      1193       5  978300760
1        1       661       3  978302109
2        1       914       3  978301968
3        1      3408       4  978300275
4        1      2355       5  978824291


In [30]:
print(movies.head())

   movie_id                               title                        genres
0         1                    Toy Story (1995)   Animation|Children's|Comedy
1         2                      Jumanji (1995)  Adventure|Children's|Fantasy
2         3             Grumpier Old Men (1995)                Comedy|Romance
3         4            Waiting to Exhale (1995)                  Comedy|Drama
4         5  Father of the Bride Part II (1995)                        Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [31]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [34]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [35]:
print(movies.head())

   movie_id                               title                        genres  \
0         1                    Toy Story (1995)   Animation|Children's|Comedy   
1         2                      Jumanji (1995)  Adventure|Children's|Fantasy   
2         3             Grumpier Old Men (1995)                Comedy|Romance   
3         4            Waiting to Exhale (1995)                  Comedy|Drama   
4         5  Father of the Bride Part II (1995)                        Comedy   

   year                  short_title  
0  1995                    Toy Story  
1  1995                      Jumanji  
2  1995             Grumpier Old Men  
3  1995            Waiting to Exhale  
4  1995  Father of the Bride Part II  


##### Join the tables with `pd.merge` (10 pts)

In [377]:
# Set some Pandas options
pd.set_option('notebook_repr_html', True)
pd.set_option('max_columns', 8)
pd.set_option('max_rows', 7)

In [378]:
import numpy as np
df_user_rating = pd.merge(users, ratings, how = "inner", on = "user_id")
df_user_rating_movie = pd.merge(df_user_rating, movies, how = "inner",\
                                 on = "movie_id")
df_user_rating_movie

Unnamed: 0,user_id,gender,age,occupation,...,title,genres,year,short_title
0,1,F,1,10,...,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
1,2,M,56,16,...,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
...,...,...,...,...,...,...,...,...,...
1000207,5851,F,18,20,...,One Little Indian (1973),Comedy|Drama|Western,1973,One Little Indian
1000208,5938,M,25,1,...,"Five Wives, Three Secretaries and Me (1998)",Documentary,1998,"Five Wives, Three Secretaries and Me"


##### What's the highest rated movie? (10 pts))

In [438]:
def merged_df_to_highest_rated(_df_user_subset, _sensitivity):
    df_mean_rating = pd.DataFrame(columns = ["movie_id", "mean_rating", "rating_count",\
                                             "count_weight", "weighted"])
    df_mean_rating.set_index("movie_id", inplace = True)
    for x in _df_user_subset.groupby(by = "movie_id"):
        df_mean_rating.loc[x[0]] = [x[1].rating.mean(), x[1].rating.count(),\
                                    np.log2(x[1].rating.count()), x[1].rating.mean()\
                                    * np.log2(x[1].rating.count())]
    
    df_top_ratings = df_mean_rating[df_mean_rating.mean_rating >= df_mean_rating.mean_rating.quantile(0.75)]
    df_top_count = df_mean_rating[df_mean_rating.rating_count >= df_mean_rating.rating_count.quantile(0.75)]
    df_top_rated_movies = df_top_ratings.merge(df_top_count, how = "inner")
    
    df_top_rated = df_mean_rating[df_mean_rating.weighted >= df_mean_rating.weighted.max() - 1 -_sensitivity]
    top_rated_movie_index = df_top_rated[df_top_rated.weighted == df_top_rated.weighted.max()].index.values[0]
    return df_top_rated, top_rated_movie_index
    

In [425]:
df_top_rated, top_rated_movie_index = merged_df_to_highest_rated(df_user_rating_movie, 0)
df_mean_rating

Unnamed: 0_level_0,mean_rating,rating_count,count_weight,weighted
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
260,4.453694,2991.0,11.546412,51.424192
318,4.554558,2227.0,11.120886,50.650716
1198,4.477725,2514.0,11.295769,50.579344
2858,4.317386,3428.0,11.743151,50.69972


In [426]:
movies[movies.movie_id.isin(list(df_top_rated.index.values))]

Unnamed: 0,movie_id,title,genres,year,short_title
257,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1977,Star Wars: Episode IV - A New Hope


In [427]:
movies[movies.movie_id == top_rated_movie_index]

Unnamed: 0,movie_id,title,genres,year,short_title
257,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1977,Star Wars: Episode IV - A New Hope


###### What is a good rated movie for date night? (30 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined

### Calculations for Highly Rated  Movies by Females

In [439]:
df_female_user = df_user_rating_movie.groupby("gender").get_group("F")
df_female_top_rated, top_rated_female_movie_index = merged_df_to_highest_rated(df_female_user, 1)
df_female_top_rated

Unnamed: 0_level_0,mean_rating,rating_count,count_weight,weighted
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
318,4.539075,627.0,9.292322,42.178544
527,4.562602,615.0,9.264443,42.269961
...,...,...,...,...
2762,4.477410,664.0,9.375039,41.975892
2858,4.238901,946.0,9.885696,41.904485


### Movies Highly Rated by Females

In [503]:
movies[movies.movie_id.isin(list(df_female_top_rated.index.values))]

Unnamed: 0,movie_id,title,genres,year,short_title
315,318,"Shawshank Redemption, The (1994)",Drama,1994,"Shawshank Redemption, The"
523,527,Schindler's List (1993),Drama|War,1993,Schindler's List
589,593,"Silence of the Lambs, The (1991)",Drama|Thriller,1991,"Silence of the Lambs, The"
1179,1197,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,1987,"Princess Bride, The"
2327,2396,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
2693,2762,"Sixth Sense, The (1999)",Thriller,1999,"Sixth Sense, The"
2789,2858,American Beauty (1999),Comedy|Drama,1999,American Beauty


# Date Night Recommendation for F/F Date Night

In [445]:
movies[movies.movie_id == top_rated_female_movie_index]

Unnamed: 0,movie_id,title,genres,year,short_title
523,527,Schindler's List (1993),Drama|War,1993,Schindler's List


### Calculations for Highly Rated  Movies by Males

In [446]:
df_male_user = df_user_rating_movie.groupby("gender").get_group("M")
df_male_top_rated, top_rated_male_movie_index = merged_df_to_highest_rated(df_male_user, 1)
df_male_top_rated

Unnamed: 0_level_0,mean_rating,rating_count,count_weight,weighted
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
260,4.495307,2344.0,11.194757,50.323871
318,4.560625,1600.0,10.643856,48.542637
858,4.583333,1740.0,10.764872,49.338995
1196,4.344577,2342.0,11.193525,48.631136
1198,4.520597,1942.0,10.923327,49.379965
2028,4.398941,2078.0,11.02098,48.480644
2858,4.347301,2482.0,11.277287,49.025758


### Movies Highly Rated by Males

In [447]:
movies[movies.movie_id.isin(list(df_male_top_rated.index.values))]

Unnamed: 0,movie_id,title,genres,year,short_title
257,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1977,Star Wars: Episode IV - A New Hope
315,318,"Shawshank Redemption, The (1994)",Drama,1994,"Shawshank Redemption, The"
847,858,"Godfather, The (1972)",Action|Crime|Drama,1972,"Godfather, The"
1178,1196,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Drama|Sci-Fi|War,1980,Star Wars: Episode V - The Empire Strikes Back
1180,1198,Raiders of the Lost Ark (1981),Action|Adventure,1981,Raiders of the Lost Ark
1959,2028,Saving Private Ryan (1998),Action|Drama|War,1998,Saving Private Ryan
2789,2858,American Beauty (1999),Comedy|Drama,1999,American Beauty


# Date Night Recommendation for M/M Date Night

In [564]:
movies[movies.movie_id == top_rated_male_movie_index]

Unnamed: 0,movie_id,title,genres,year,short_title
257,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1977,Star Wars: Episode IV - A New Hope


### Merging Highly Rated Data from Female and Male Dataframes

In [565]:
df_movie_date_suggestions = df_female_top_rated.merge(df_male_top_rated, how = "inner", on ="movie_id")
movies[movies.movie_id.isin(list(df_movie_date_suggestions.index.values))]

Unnamed: 0,movie_id,title,genres,year,short_title
315,318,"Shawshank Redemption, The (1994)",Drama,1994,"Shawshank Redemption, The"
2789,2858,American Beauty (1999),Comedy|Drama,1999,American Beauty


# Date Night Recommendations for F/M Date Night

In [586]:
suggested_movie_index = (df_movie_date_suggestions.weighted_x \
         + df_movie_date_suggestions.weighted_y).sort_values(ascending = False).index[0]
suggested_movie_title = "".join(movies[movies.movie_id == suggested_movie_index].title)
suggested_movie_rating = (df_movie_date_suggestions.mean_rating_x\
                         + df_movie_date_suggestions.mean_rating_y)[suggested_movie_index] / 2

print("If age does not matter we recommend going to see \"{}\" for date night.".format(suggested_movie_title))
print("It has an average rating of {:.2f}".format (suggested_movie_rating))

If age does not matter we recommend going to see "American Beauty (1999)" for date night.
It has an average rating of 4.29


### Turning off Warnings

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Date Night Recommendations For Age Group and F/F Date Night

In [608]:
for x in range(20,(users.age.max() + 10 - (users.age.max() % 10)),10):
    df_female_age = df_user_rating_movie.groupby("gender")\
                    .get_group("F")[(df_user_rating_movie.age >= x)\
                        & (df_user_rating_movie.age < x + 10)]
    
    df_f_top_rated, top_rated_f_index = merged_df_to_highest_rated(df_female_age, 1)
    
    suggested_movie_index = (df_f_top_rated.weighted).sort_values(ascending = False).index[0]
    suggested_movie_title = "".join(movies[movies.movie_id == suggested_movie_index].title)
    suggested_movie_rating = (df_f_top_rated.mean_rating)[suggested_movie_index]
    
    print("For age {} to {} we recommend going to see \"{}\" for date night."\
          .format(x, x + 10, suggested_movie_title))
    print("It has an average rating of {:.2f} among this age group.".format (suggested_movie_rating))
    print()

For age 20 to 30 we recommend going to see "Sixth Sense, The (1999)" for date night.
It has an average rating of 4.61 among this age group.

For age 30 to 40 we recommend going to see "Sixth Sense, The (1999)" for date night.
It has an average rating of 4.49 among this age group.

For age 40 to 50 we recommend going to see "Schindler's List (1993)" for date night.
It has an average rating of 4.70 among this age group.

For age 50 to 60 we recommend going to see "Schindler's List (1993)" for date night.
It has an average rating of 4.74 among this age group.



# Date Night Recommendations For Age Group and M/M Date Night

In [611]:
for x in range(20,(users.age.max() + 10 - (users.age.max() % 10)),10):
    df_male_age = df_user_rating_movie.groupby("gender")\
                    .get_group("M")[(df_user_rating_movie.age >= x)\
                        & (df_user_rating_movie.age < x + 10)]
    
    df_m_top_rated, top_rated_m_index = merged_df_to_highest_rated(df_male_age, 1)
    
    suggested_movie_index = (df_m_top_rated.weighted).sort_values(ascending = False).index[0]
    suggested_movie_title = "".join(movies[movies.movie_id == suggested_movie_index].title)
    suggested_movie_rating = (df_m_top_rated.mean_rating)[suggested_movie_index]
    
    print("For age {} to {} we recommend going to see \"{}\" for date night."\
          .format(x, x + 10, suggested_movie_title))
    print("It has an average rating of {:.2f} among this age group.".format (suggested_movie_rating))
    print()

For age 20 to 30 we recommend going to see "Star Wars: Episode IV - A New Hope (1977)" for date night.
It has an average rating of 4.61 among this age group.

For age 30 to 40 we recommend going to see "Star Wars: Episode IV - A New Hope (1977)" for date night.
It has an average rating of 4.40 among this age group.

For age 40 to 50 we recommend going to see "Star Wars: Episode IV - A New Hope (1977)" for date night.
It has an average rating of 4.38 among this age group.

For age 50 to 60 we recommend going to see "Godfather, The (1972)" for date night.
It has an average rating of 4.55 among this age group.



# Date Night Recommendations For Age Group and F/M Date Night

In [612]:
for x in range(20,(users.age.max() + 10 - (users.age.max() % 10)),10):
    df_female_age = df_user_rating_movie.groupby("gender")\
                    .get_group("F")[(df_user_rating_movie.age >= x)\
                        & (df_user_rating_movie.age < x + 10)]
    df_male_age = df_user_rating_movie.groupby("gender")\
                    .get_group("M")[(df_user_rating_movie.age >= x)\
                        & (df_user_rating_movie.age < x + 10)]
    
    df_f_top_rated, top_rated_f_index = merged_df_to_highest_rated(df_female_age, 1)
    df_m_top_rated, top_rated_m_index = merged_df_to_highest_rated(df_male_age, 1)
    
    df_movie_date_suggestions = df_f_top_rated.merge(df_m_top_rated, how = "inner", on = "movie_id")
    
    suggested_movie_index = (df_movie_date_suggestions.weighted_x \
             + df_movie_date_suggestions.weighted_y).sort_values(ascending = False).index[0]
    suggested_movie_title = "".join(movies[movies.movie_id == suggested_movie_index].title)
    suggested_movie_rating = (df_movie_date_suggestions.mean_rating_x\
                         + df_movie_date_suggestions.mean_rating_y)[suggested_movie_index] / 2
    
    print("For age {} to {} we recommend going to see \"{}\" for date night."\
          .format(x, x + 10, suggested_movie_title))
    print("It has an average rating of {:.2f} among this age group.".format (suggested_movie_rating))
    print()

For age 20 to 30 we recommend going to see "Star Wars: Episode IV - A New Hope (1977)" for date night.
It has an average rating of 4.51 among this age group.

For age 30 to 40 we recommend going to see "Raiders of the Lost Ark (1981)" for date night.
It has an average rating of 4.45 among this age group.

For age 40 to 50 we recommend going to see "Schindler's List (1993)" for date night.
It has an average rating of 4.62 among this age group.

For age 50 to 60 we recommend going to see "Schindler's List (1993)" for date night.
It has an average rating of 4.63 among this age group.



# Subjective Date Night
### We are going to analyze movies based on age group of 20-30, both genders, and a genre of Horror for a subjective F/M date night.

In [653]:
df_horror = df_user_rating_movie[df_user_rating_movie.genres.str.contains("Horror")]
df_horror_age = df_horror[(df_horror.age >= 20) & (df_horror.age < 30)]
df_horror_age_f = df_horror_age.groupby("gender").get_group("F")
df_horror_age_m = df_horror_age.groupby("gender").get_group("M")

df_f_top_rated, top_rated_f_index = merged_df_to_highest_rated(df_horror_age_f, 2)
df_m_top_rated, top_rated_m_index = merged_df_to_highest_rated(df_horror_age_m, 2)

df_movie_date_suggestions = df_f_top_rated.merge(df_m_top_rated, how = "inner", on = "movie_id")

suggested_movie_index = (df_movie_date_suggestions.weighted_x \
         + df_movie_date_suggestions.weighted_y).sort_values(ascending = False).index[0]
suggested_movie_title = "".join(movies[movies.movie_id == suggested_movie_index].title)
suggested_movie_rating = (df_movie_date_suggestions.mean_rating_x\
                     + df_movie_date_suggestions.mean_rating_y)[suggested_movie_index] / 2

# Subjective Date Night Recommendation

In [654]:
print("For age {} to {} we recommend going to see \"{}\" for date night."\
      .format(20, 30, suggested_movie_title))
print("It has an average rating of {:.2f}.".format (suggested_movie_rating))
print()

For age 20 to 30 we recommend going to see "Alien (1979)" for date night.
It has an average rating of 4.17.



# Runnerup Movies for Date Night

In [655]:
movies[movies.movie_id.isin(list(df_movie_date_suggestions.index.values))]

Unnamed: 0_level_0,mean_rating_x,rating_count_x,count_weight_x,weighted_x,mean_rating_y,rating_count_y,count_weight_y,weighted_y
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1214,4.054264,129.0,7.011227,28.425363,4.290855,667.0,9.381543,40.254836
1387,3.882883,111.0,6.794416,26.381921,4.200382,524.0,9.033423,37.943824
