Let's see if there is a difference between men and women in the way they rate movies. To identify this difference, we'll use movie ratings from the MovieLens.org website (https://grouplens.org/datasets/movielens/1m/). 

In [1]:
import pandas as pd

In [2]:
pd.options.display.max_rows = 10

Let's upload 3 datasets - the first one includes information about the website users (including their gender), the second one contains movies' ratings (key fields are user_id and movie_id), and the third dataset has information about the movies themselves. 

In [3]:
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table('users.dat', sep='::', header=None, names=unames, engine='python')
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [4]:
rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ratings.dat', sep='::', header=None, names=rnames, engine='python')
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [5]:
mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('movies.dat', sep='::', header=None, names=mnames, engine='python')
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


Next, we merge all 3 datasets into one to make the analysis easier. 

In [6]:
data = pd.merge(pd.merge(ratings, users), movies)
data.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,age,occupation,zip,title,genres
0,1,1193,5,978300760,F,1,10,48067,One Flew Over the Cuckoo's Nest (1975),Drama
1,2,1193,5,978298413,M,56,16,70072,One Flew Over the Cuckoo's Nest (1975),Drama
2,12,1193,4,978220179,M,25,12,32793,One Flew Over the Cuckoo's Nest (1975),Drama
3,15,1193,4,978199279,M,25,7,22903,One Flew Over the Cuckoo's Nest (1975),Drama
4,17,1193,5,978158471,M,50,1,95350,One Flew Over the Cuckoo's Nest (1975),Drama


## Mean ratings for each movie grouped by gender

Using the pivot_table method we can calculate mean rating for each movie for both genders.  

In [31]:
mean_ratings = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
mean_ratings.head()

gender,F,M
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"$1,000,000 Duck (1971)",3.375,2.761905
'Night Mother (1986),3.388889,3.352941
'Til There Was You (1997),2.675676,2.733333
"'burbs, The (1989)",2.793478,2.962085
...And Justice for All (1979),3.828571,3.689024


In [32]:
ratings_by_title = data.groupby('title').size()

In [33]:
active_titles = ratings_by_title.index[ratings_by_title >= 100]  #only movies with more than 100 ratings

In [34]:
mean_ratings = mean_ratings.loc[active_titles] # leaving only the data we chose on the previous step
mean_ratings

gender,F,M
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"'burbs, The (1989)",2.793478,2.962085
...And Justice for All (1979),3.828571,3.689024
10 Things I Hate About You (1999),3.646552,3.311966
101 Dalmatians (1961),3.791444,3.500000
101 Dalmatians (1996),3.240000,2.911215
...,...,...
Young Guns II (1990),2.934783,2.904025
Young Sherlock Holmes (1985),3.514706,3.363344
Your Friends and Neighbors (1998),2.888889,3.536585
Zero Effect (1998),3.864407,3.723140


What are the best-reviewed movies for women?

In [36]:
top_female_ratings = mean_ratings.sort_values(by='F', ascending=False) 
top_female_ratings

gender,F,M
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"Close Shave, A (1995)",4.644444,4.473795
"Wrong Trousers, The (1993)",4.588235,4.478261
"General, The (1927)",4.575758,4.329480
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950),4.572650,4.464589
Wallace & Gromit: The Best of Aardman Animation (1996),4.563107,4.385075
...,...,...
Battlefield Earth (2000),1.574468,1.616949
Friday the 13th Part VI: Jason Lives (1986),1.500000,2.291667
Kazaam (1996),1.444444,1.470588
Friday the 13th Part V: A New Beginning (1985),1.272727,2.165049


We can see that movies' ratings for men and women closely follow each other. But can we find any serious disagreements between the two groups of people?

## Disagreement between men and women regarding movie ratings

In [39]:
mean_ratings['diff'] = mean_ratings['M'] - mean_ratings['F']
sorted_by_diff = mean_ratings.sort_values(by='diff')
sorted_by_diff.head(10)

gender,F,M,diff
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Pet Sematary II (1992),2.833333,1.858696,-0.974638
Cutthroat Island (1995),3.2,2.34127,-0.85873
Dirty Dancing (1987),3.790378,2.959596,-0.830782
Air Bud (1997),3.057143,2.233766,-0.823377
Home Alone 3 (1997),2.486486,1.683761,-0.802726
"To Wong Foo, Thanks for Everything! Julie Newmar (1995)",3.486842,2.795276,-0.691567
Jumpin' Jack Flash (1986),3.254717,2.578358,-0.676359
Orlando (1993),3.862745,3.190476,-0.672269
Spy Hard (1996),3.125,2.472527,-0.652473
Dracula: Dead and Loving It (1995),2.892857,2.25,-0.642857


These 10 movies caused the biggest disparity in ratings - they are higher rated by women than by men. 

In [40]:
sorted_by_diff[::-1][:10]  # reversing the list we get the top 10 movies, more favorably rated by men as opposed to women.

gender,F,M,diff
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Friday the 13th Part V: A New Beginning (1985),1.272727,2.165049,0.892321
Friday the 13th Part VI: Jason Lives (1986),1.5,2.291667,0.791667
Lifeforce (1985),2.25,2.994152,0.744152
Marked for Death (1990),2.1,2.837607,0.737607
Quest for Fire (1981),2.578947,3.309677,0.73073
"Good, The Bad and The Ugly, The (1966)",3.494949,4.2213,0.726351
No Escape (1994),2.3,2.994048,0.694048
"Kentucky Fried Movie, The (1977)",2.878788,3.555147,0.676359
Your Friends and Neighbors (1998),2.888889,3.536585,0.647696
Tora! Tora! Tora! (1970),3.090909,3.737705,0.646796


The top ones include some horror and classic films.

## Movies with the highest deviation between the ratings across all users.

Ignoring the gender division, we'll see what movies had demosntrated the highest standard deviation.

In [52]:
rating_std_by_title = data.groupby('title')['rating'].std() # standard deviation for ratings

In [53]:
rating_std_by_title = rating_std_by_title.loc[active_titles]

In [54]:
rating_std_by_title.sort_values(ascending=False)[:10] 

title
Plan 9 from Outer Space (1958)                    1.455998
Beloved (1998)                                    1.372813
Godzilla 2000 (Gojira ni-sen mireniamu) (1999)    1.364700
Texas Chainsaw Massacre, The (1974)               1.332448
Dumb & Dumber (1994)                              1.321333
Crash (1996)                                      1.319636
Blair Witch Project, The (1999)                   1.316368
Natural Born Killers (1994)                       1.307198
Down to You (2000)                                1.305310
Cemetery Man (Dellamorte Dellamore) (1994)        1.300647
Name: rating, dtype: float64

These are top 10 films that were rated with the highest variability. 