# Movie Ratings: Positivity Rating

We will compute the **Positivity Rating** as a percentage of evaluations of 4 stars or more.

1) First we will calculate the positivity rating for each movie. We will print the ordered list of movies by **Positivity Rating**, and display the Top 3 most positively rated movies.

2) Then we will compute the **Positivity Rating** for males and females separately. We will print the ratings out, and disply the Top 3 most positively rated movies by female and male audiences.

3) Finally we will show how positive our users were in general, by computing the overall positivity score, the overall positivity score in male audience, and the overall positivity score in female audience.	
	

In [1]:
# Settings 
import os
import numpy as np
import pandas as pd
import sqlite3
from sqlite3 import Error as SQLiteError

# Pandas
pd.set_option('precision', 4)

# SQLite
dbfile = "sqlitedb/movielens.db"
if not os.path.isfile(dbfile):
    print("Failed to detect the database file.")
    
# Establish DB Connection
conn = sqlite3.connect(dbfile)
if not conn:
    print("Failed to establish DB connection.")

In [2]:
# Data Query
query = """
    SELECT r.movie_id AS MovieID, 
        m.movie_name AS MovieName,
        r.user_id AS UserID,
        CASE u.user_gender
            WHEN 0 THEN 'M'
            ELSE 'F'
        END Gender,
        r.rating as Rating       
        
    FROM ratings AS r
        LEFT JOIN movies as m ON m.movie_id = r.movie_id
        LEFT JOIN users as u ON u.user_id = r.user_id
    ORDER BY r.movie_id, r.user_id
"""

summary = pd.read_sql_query(query, conn)

print("\nSummary DataFrame:\n")
summary.head(n=5)


Summary DataFrame:



Unnamed: 0,MovieID,MovieName,UserID,Gender,Rating
0,1,Toy Story,139,M,2
1,1,Toy Story,755,M,2
2,1,Toy Story,1577,F,4
3,1,Toy Story,1940,M,4
4,1,Toy Story,2765,M,4


## 1. Movie Positivity Rating

Before proceeding with the computations of the **Positivity Rating** we are going to add an extra column to our summary dataset. We will name the column **'IsPositive'**, and it will be equal to 1 if the 'Rating' > 3, else it will be equal to 0.


In [3]:
# Positivity Summary DataFrame
summary['IsPositive'] = np.where(summary['Rating']>3, 1, 0)
positivity_summary = summary[['MovieName', 
                'Gender',
                'Rating',
                'IsPositive'
               ]]
positivity_summary.head(n=5)

Unnamed: 0,MovieName,Gender,Rating,IsPositive
0,Toy Story,M,2,0
1,Toy Story,M,2,0
2,Toy Story,F,4,1
3,Toy Story,M,4,1
4,Toy Story,M,4,1


### 1.1. Positivity Rating Computation

In [4]:
# Positivity Rating Computation

positivity_rating = pd.pivot_table(positivity_summary, 
                                        index = ['MovieName'], 
                                        aggfunc ={
                                            'IsPositive': lambda x: x.sum() / x.count(),
                                            'Rating': lambda x: x.count()
                                        } 
                                   )

positivity_rating.columns = ['Positivity', 'TotalVotes']
positivity_rating = positivity_rating.sort_values(['Positivity','TotalVotes'], ascending=False)

positivity_rating

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.7,10
Star Wars: Episode IV - A New Hope,0.5333,15
Gladiator,0.5,12
Blade Runner,0.4444,9
"Silence of the Lambs, The",0.4375,16
Groundhog Day,0.4167,12
"Matrix, The",0.4167,12
Babe,0.4,10
Pulp Fiction,0.3636,11
Saving Private Ryan,0.3636,11


### 1.2. Top 3 Movies by the Positivity Rating

In [5]:
# Top 3 Movies by Positivity Rating
positivity_rating.head(n=3)

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.7,10
Star Wars: Episode IV - A New Hope,0.5333,15
Gladiator,0.5,12


## 2. Positivity Rating by Female Audience

We will start with filtering females data out of **positivity_summary** data frame.

In [6]:
female_positivity_summary = positivity_summary[positivity_summary['Gender']=='F']
female_positivity_summary.head(n=5)

Unnamed: 0,MovieName,Gender,Rating,IsPositive
2,Toy Story,F,4,1
5,Toy Story,F,4,1
6,Toy Story,F,3,0
7,Toy Story,F,3,0
8,Toy Story,F,4,1


### 2.1. Female Audience Positivity Rating Computation

In [7]:
# Female Positivity Rating Computation

female_positivity_rating = pd.pivot_table(female_positivity_summary, 
                                        index = ['MovieName'], 
                                        aggfunc ={
                                            'IsPositive': lambda x: x.sum() / x.count(),
                                            'Rating': lambda x: x.count()
                                        } 
                                   )
female_positivity_rating.columns = ['Positivity', 'TotalVotes']
female_positivity_rating = female_positivity_rating.sort_values(['Positivity','TotalVotes'], ascending=False)

female_positivity_rating

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.8,5
Blade Runner,0.75,4
Shakespeare in Love,0.75,4
Pulp Fiction,0.6667,3
Babe,0.5714,7
Star Wars: Episode IV - A New Hope,0.5714,7
Toy Story,0.5714,7
Forrest Gump,0.5,6
Gladiator,0.5,6
Star Wars: Episode VI - Return of the Jedi,0.5,6


### 2.2. Top 3 Female Selected Movies as per Positivity Rating

In [8]:
female_positivity_rating.head(n=3)

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.8,5
Blade Runner,0.75,4
Shakespeare in Love,0.75,4


## 3. Positivity Rating by Male Audience

Same as in case with the female audience we will create a dataset containing only male ratings.

In [9]:
male_positivity_summary = positivity_summary[positivity_summary['Gender']=='M']
male_positivity_summary.head(n=5)

Unnamed: 0,MovieName,Gender,Rating,IsPositive
0,Toy Story,M,2,0
1,Toy Story,M,2,0
3,Toy Story,M,4,1
4,Toy Story,M,4,1
9,Toy Story,M,2,0


### 3.1. Male Audience Positivity Rating Computation

In [10]:
# Male Positivity Rating Computation

male_positivity_rating = pd.pivot_table(male_positivity_summary, 
                                        index = ['MovieName'], 
                                        aggfunc ={
                                            'IsPositive': lambda x: x.sum() / x.count(),
                                            'Rating': lambda x: x.count()
                                        } 
                                   )
male_positivity_rating.columns = ['Positivity', 'TotalVotes']
male_positivity_rating = male_positivity_rating.sort_values(['Positivity','TotalVotes'], ascending=False)

male_positivity_rating

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.6,5
"Matrix, The",0.5714,7
"Silence of the Lambs, The",0.5556,9
Star Wars: Episode IV - A New Hope,0.5,8
Gladiator,0.5,6
Groundhog Day,0.5,6
Raiders of the Lost Ark,0.5,6
Schindler's List,0.5,6
"Sixth Sense, The",0.3333,6
Independence Day (ID4),0.2857,7


### 3.2. Top 3 Male Selected Movies as per Positivity Rating

In [11]:
male_positivity_rating.head(n=3)

Unnamed: 0_level_0,Positivity,TotalVotes
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1
"Shawshank Redemption, The",0.6,5
"Matrix, The",0.5714,7
"Silence of the Lambs, The",0.5556,9


## 4. User Positivity Summary

In [12]:
audience_positivity_rating = pd.pivot_table(positivity_summary, index=['Gender'], values='IsPositive', 
               aggfunc = [np.mean],
               margins=True)
audience_positivity_rating.columns = ['Positivity']
audience_positivity_rating

Unnamed: 0_level_0,Positivity
Gender,Unnamed: 1_level_1
F,0.4211
M,0.3386
All,0.3776


## 5. Summary

Let's summarize the results!

In general women tend to give more positive evaluations to movies than men do, and we saw this already when we were analyzing movie **Mean Ratings**. Females were positive in approximately **42%** of their ratings, and male users were positive in approximately **34%** of their ratings.

The top movie on our list is the movie **"The Shawshank Redemption"**.
It is the only one movie in the **Top 3** list that got the top **Positivity Rating** from both male and female users. The male users gave a positivity score of **0.6000**, and the female users - **0.7000**.

The second best movie based on votes of all the users is **"Star Wars: Episode IV - A New Hope"**. It got the **Positivity Rating** of **0.5333**. Specifically in male rating it was on the 4th place and it got the positivity score of **0.5000**, and in female rating the movie was on the 6th place and scored of **0.5714**.

The second best positively rated movie in females rating share two movies **"Shakespeare in Love"** and **"Blade Runner"**. Both movies share the positivity score of **0.7500**. Interestingly in male audience **"Shakespeare in Love"** got the score of 0.0000 and the **"Blade Runner"** scored 0.2000.

The second best positively rated movie in males rating is **"The Matrix"**. It got a positivity rating of **0.5714**. In female audience the movie got only 0.2000 positivity score.

When we look at the positivity rating results, we can conclude that the tastes of men and women differ quite significantly, however there are things that both agree and share.