# Movie Rankings: Computing a Mean Rating

1) First, we are going to calculate the average rating for each movie in the dataset. We are also going to compute the average movie rating by gender;

2) We will display top three movies by average rating: for all users, for male users only, and for female users only.

3) We will find top three movies with the greatest difference in average rating comparing the results of male and female users.

In [146]:
# Settings 
import os
import numpy as np
import pandas as pd
import sqlite3
from sqlite3 import Error as SQLiteError

# Pandas
pd.set_option('precision', 4)

# SQLite
dbfile = "sqlitedb/movielens.db"
if not os.path.isfile(dbfile):
    print("Failed to detect the database file.")
    
# Establish DB Connection
conn = sqlite3.connect(dbfile)
if not conn:
    print("Failed to establish DB connection.")

In [147]:
# Data Query
query = """
    SELECT r.movie_id AS MovieID, 
        m.movie_name AS MovieName,
        r.user_id AS UserID,
        CASE u.user_gender
            WHEN 0 THEN 'M'
            ELSE 'F'
        END Gender,
        r.rating as Rating       
        
    FROM ratings AS r
        LEFT JOIN movies as m ON m.movie_id = r.movie_id
        LEFT JOIN users as u ON u.user_id = r.user_id
    ORDER BY r.movie_id, r.user_id
"""

summary = pd.read_sql_query(query, conn)

print("\nSummary DataFrame:\n")
print(summary.head(n=5))


Summary DataFrame:

   MovieID  MovieName  UserID Gender  Rating
0        1  Toy Story     139      M       2
1        1  Toy Story     755      M       2
2        1  Toy Story    1577      F       4
3        1  Toy Story    1940      M       4
4        1  Toy Story    2765      M       4


## 1. Rating by Average Value

### 1.1 Movie Rating by Average

In [148]:
# Movie Mean Ratings
movie_ratings = summary.groupby(['MovieName'])
movie_mean_ratings = movie_ratings[['Rating']].mean().sort_values('Rating', ascending=False)
# Rating
print(movie_mean_ratings)

                                            Rating
MovieName                                         
Shawshank Redemption, The                   3.6000
Star Wars: Episode IV - A New Hope          3.2667
Blade Runner                                3.2222
Groundhog Day                               3.1667
Silence of the Lambs, The                   3.0625
Babe                                        3.0000
Saving Private Ryan                         3.0000
Star Wars: Episode VI - Return of the Jedi  3.0000
Schindler's List                            3.0000
Pulp Fiction                                3.0000
Gladiator                                   2.9167
Raiders of the Lost Ark                     2.9091
Shakespeare in Love                         2.9091
Matrix, The                                 2.8333
Sixth Sense, The                            2.8333
Toy Story                                   2.8235
Independence Day (ID4)                      2.7692
Forrest Gump                   

### 1.2 Average Rating by User Gender

In this section we are going to compute **average movie rating by user gender**, and we will use pandas **groupby** option to accomplish the task. 

In [149]:
movie_mean_rating_by_gender = summary.groupby(['MovieName', 'Gender'])[['Rating']].mean()
print(movie_mean_rating_by_gender)

                                                   Rating
MovieName                                  Gender        
Babe                                       F       3.4286
                                           M       2.0000
Blade Runner                               F       3.5000
                                           M       3.0000
Forrest Gump                               F       3.0000
                                           M       2.2500
Gladiator                                  F       3.0000
                                           M       2.8333
Groundhog Day                              F       2.8333
                                           M       3.5000
Independence Day (ID4)                     F       2.6667
                                           M       2.8571
Matrix, The                                F       2.4000
                                           M       3.1429
Pulp Fiction                               F       4.0000
              

## 2. Top 3 Movies by Average Rating

### 2.1. Top 3 Movies Selected by Users

In [150]:
# Top 3 movies
print( movie_mean_ratings.head(n=3) )

                                    Rating
MovieName                                 
Shawshank Redemption, The           3.6000
Star Wars: Episode IV - A New Hope  3.2667
Blade Runner                        3.2222


### 2.2. Top 3 Movies by Female Users

In [151]:
female_summary = summary[summary['Gender']=='F']
female_mean_rating = female_summary.groupby(['MovieName'])[['Rating']].mean()
print(female_mean_rating.sort_values('Rating', ascending=False).head(n=3))

                           Rating
MovieName                        
Shakespeare in Love          4.25
Pulp Fiction                 4.00
Shawshank Redemption, The    3.80


### 2.3. Top 3 Movies by Male Users

In [152]:
male_summary = summary[summary['Gender']=='M']
male_mean_rating = male_summary.groupby(['MovieName'])[['Rating']].mean()
print(male_mean_rating.sort_values('Rating', ascending=False).head(n=3))

                         Rating
MovieName                      
Raiders of the Lost Ark  3.6667
Schindler's List         3.5000
Groundhog Day            3.5000


... and here we got the results, that show us that average tastes of male and female audiences are quite different. Top female movie **"Shakepeare in Love"** got only **2.1429** within the male audience. Top male movie **"Raiders of the Lost Ark"** got only **2.0000** within the female audience. Next we are going to find three movies that have the biggest difference by comparing male and female ratings.