# IS 362 – Week 7 Assignment

### Tasks:
Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movies that they have seen on a scale of 1 to 5. There should be at least one movie that not everyone has seen!
Take the results (observations) and store them somewhere (like a SQL database, or a .CSV file). Load the information into a pandas dataframe. Your solution should include Python and pandas code that accomplishes the following:
   
1. Load the ratings by user information that you collected into a pandas dataframe.
2. Show the average ratings for each user and each movie.
3. Create a new pandas dataframe, with normalized ratings for each user. Again, show the average ratings for each user and each movie.
4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using normalized ratings instead of the actual ratings.
5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user. Once again, show the average ratings for each user and each movie.

You may find this short article on normalization and standardization to be useful: 
http://bi-analytics.org/topic/9-standardization-vs-normalization/

Your deliverables should include your source data and a Jupyter Notebook, posted to GitHub.
This is by design a very open ended assignment. A variety of reasonable approaches are acceptable.
You may work in a small group on this assignment. If you work in a group, each group member should indicate who they worked with, and all group members should individually submit their assignment.
Please start early, and do work that you would want to include in a “presentations portfolio” that you might share in a job interview with a potential employer! You are encouraged to share thoughts, ask, and answer clarifying questions in the “Week 7: Data Aggregation” forum.

##### 1. Import pandas and read review_popular_movies data from the CSV file.

In [31]:
import pandas as pd
import numpy as np

In [35]:
review_movies = pd.read_csv('review_popular_movies.csv')
review_movies

Unnamed: 0,Reviewer_Name,Moana,Star Wars,Logan,Wonder Woman,Spider Man,John Wick
0,Max,,5.0,5.0,3.0,4.0,5
1,Heather,5.0,,4.0,5.0,3.0,3
2,John,3.0,4.0,,5.0,4.0,2
3,Lawrence,4.0,3.0,4.0,,3.0,5
4,Alfred,1.0,2.0,3.0,4.0,,5


##### 2. Show the average ratings for each user and each movie.

In [36]:
review_movies = review_movies.set_index('Reviewer_Name')
review_avg = review_movies.mean(axis=1).round(decimals=0)
movie_avg = review_movies.mean().round(decimals=0)
review_avg

Reviewer_Name
Max         4.0
Heather     4.0
John        4.0
Lawrence    4.0
Alfred      3.0
dtype: float64

In [37]:
movie_avg

Moana           3.0
Star Wars       4.0
Logan           4.0
Wonder Woman    4.0
Spider Man      4.0
John Wick       4.0
dtype: float64

##### 3. Create a new pandas dataframe, with normalized ratings for each user.

In [45]:
x = review_movies
normalization = (x - x.min()) / (x.max() - x.min())
normalization

Unnamed: 0_level_0,Moana,Star Wars,Logan,Wonder Woman,Spider Man,John Wick
Reviewer_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Max,,1.0,1.0,0.0,1.0,1.0
Heather,1.0,,0.5,1.0,0.0,0.333333
John,0.5,0.666667,,1.0,1.0,0.0
Lawrence,0.75,0.333333,0.5,,0.0,1.0
Alfred,0.0,0.0,0.0,0.5,,1.0


##### 3.1 Show the average ratings for each user and each movie.

In [46]:
avgmovies = normalization.mean(axis=1).round(decimals=5)
avgusers = normalization.mean().round(decimals=5)
avgusers

Moana           0.56250
Star Wars       0.50000
Logan           0.50000
Wonder Woman    0.62500
Spider Man      0.50000
John Wick       0.66667
dtype: float64

In [47]:
avgmovies

Reviewer_Name
Max         0.80000
Heather     0.56667
John        0.63333
Lawrence    0.51667
Alfred      0.30000
dtype: float64

##### 4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using normalized ratings instead of the actual ratings.

A normalization helped us to reach a linear and more complex relationship between movie ratings. Normalization gave us a better insight on finding a rating of users and movies.

##### 5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user.

In [52]:
standart_review_movies = (x - x.mean()) / (x.std())
standart_review_movies

Unnamed: 0_level_0,Moana,Star Wars,Logan,Wonder Woman,Spider Man,John Wick
Reviewer_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Max,,1.161895,1.224745,-1.305582,0.866025,0.707107
Heather,1.024695,,0.0,0.783349,-0.866025,-0.707107
John,-0.146385,0.387298,,0.783349,0.866025,-1.414214
Lawrence,0.439155,-0.387298,0.0,,-0.866025,0.707107
Alfred,-1.317465,-1.161895,-1.224745,-0.261116,,0.707107


##### 5.1 Once again, show the average ratings for each user and each movie.

In [53]:
standardavgmovies = standart_review_movies.mean(axis=1).round(decimals=5)
standardavgusers = standart_review_movies.mean().round(decimals=5)
standardavgusers

Moana           0.0
Star Wars       0.0
Logan           0.0
Wonder Woman    0.0
Spider Man      0.0
John Wick       0.0
dtype: float64

In [54]:
standardavgmovies

Reviewer_Name
Max         0.53084
Heather     0.04698
John        0.09521
Lawrence   -0.02141
Alfred     -0.65162
dtype: float64