# Week 7 Assignment - Aggregating Data
Choose six recent popular movies. 

Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movies that they have seen on a scale of 1 to 5. 

There should be at least one movie that not everyone has seen!
Take the results (observations) and store them somewhere (like a SQL database, or a .CSV file). 

Load the information into a pandas dataframe. Your solution should include Python and pandas code that accomplishes the following:
1. Load the ratings by user information that you collected into a pandas dataframe.

2. Show the average ratings for each user and each movie.

3. Create a new pandas dataframe, with normalized ratings for each user. Again, show the average ratings for each user and each movie.

4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using normalized ratings instead of the actual ratings.

5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user. Once again, show the average ratings for each user and each movie.



   

# 1. Load the ratings by user information that you collected into a pandas dataframe.

First we must import numpy and pandas so we can use them to do out calculations.

Then we will read in our csv file with our data into the variable `movie_ratings`

In [1]:
import pandas as pd
import numpy as np

movie_ratings = pd.read_csv('MovieRatings.csv', index_col = 0, header=0)
movie_ratings

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2
Dennis,,,3,5.0,5.0,
Nicole,3.0,1.0,4,,,
Alex,,4.0,2,3.0,1.0,
Sandra,1.0,,4,,,5.0
Mathew,,5.0,2,5.0,,1.0


# 2. Show the average ratings for each user and each movie.

Now we will take our `movie_ratings` dataset and add the column `'Average Rating by User'` that will be the avrage rating given by a particular user.

In [2]:
movie_ratings['Average Rating by User'] = movie_ratings.mean(axis=1, skipna=True)
movie_ratings

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,,,3,5.0,5.0,,4.333333
Nicole,3.0,1.0,4,,,,2.666667
Alex,,4.0,2,3.0,1.0,,2.5
Sandra,1.0,,4,,,5.0,3.333333
Mathew,,5.0,2,5.0,,1.0,3.25


***
Now we will take our `movie_ratings` dataset and add the row `'Average Rating by Movie'` that will be the avrage rating given to a movie, by all the users who rated it .

In [3]:
movie_ratings.loc['Average Rating by Movie'] = movie_ratings.mean(axis=0, skipna=True)
movie_ratings

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,,,3.0,5.0,5.0,,4.333333
Nicole,3.0,1.0,4.0,,,,2.666667
Alex,,4.0,2.0,3.0,1.0,,2.5
Sandra,1.0,,4.0,,,5.0,3.333333
Mathew,,5.0,2.0,5.0,,1.0,3.25
Average Rating by Movie,2.0,3.333333,3.0,4.333333,3.0,3.0,3.216667


# 3. Create a new pandas dataframe, with normalized ratings for each user. Again, show the average ratings for each user and each movie.

Now we will create a copy of the `movie_ratings` dataframe and normalize it. we will call this the `normalized_movies_rating`

In [4]:
normalized_movies_rating = movie_ratings.copy()
normalized_movies_rating

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,,,3.0,5.0,5.0,,4.333333
Nicole,3.0,1.0,4.0,,,,2.666667
Alex,,4.0,2.0,3.0,1.0,,2.5
Sandra,1.0,,4.0,,,5.0,3.333333
Mathew,,5.0,2.0,5.0,,1.0,3.25
Average Rating by Movie,2.0,3.333333,3.0,4.333333,3.0,3.0,3.216667


***
We will now remove all the `NaN` values and replace them with a value of `0` so we can do our math.

In [5]:
normalized_movies_rating = normalized_movies_rating.replace(np.NaN, 0)
normalized_movies_rating

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,0.0,0.0,3.0,5.0,5.0,0.0,4.333333
Nicole,3.0,1.0,4.0,0.0,0.0,0.0,2.666667
Alex,0.0,4.0,2.0,3.0,1.0,0.0,2.5
Sandra,1.0,0.0,4.0,0.0,0.0,5.0,3.333333
Mathew,0.0,5.0,2.0,5.0,0.0,1.0,3.25
Average Rating by Movie,2.0,3.333333,3.0,4.333333,3.0,3.0,3.216667


***
Now we will preform the normalization of the data with the code below.

In [6]:
normalized_movies_rating = (normalized_movies_rating - normalized_movies_rating.min()) / (normalized_movies_rating.max() - normalized_movies_rating.min())
normalized_movies_rating

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,0.0,0.0,0.5,1.0,1.0,0.0,1.0
Nicole,1.0,0.2,1.0,0.0,0.0,0.0,0.090909
Alex,0.0,0.8,0.0,0.6,0.2,0.0,0.0
Sandra,0.333333,0.0,1.0,0.0,0.0,1.0,0.454545
Mathew,0.0,1.0,0.0,1.0,0.0,0.2,0.409091
Average Rating by Movie,0.666667,0.666667,0.5,0.866667,0.6,0.6,0.390909


# 4. Provide a text-based conclusion: explain what might be advantages and disadvantages of using normalized ratings instead of the actual ratings.

When it comes to normilized ratings, I see they can have some use cases, such as in a column with in the dataframe, you can tell who gave the highest/lowest (Relative to one another) rating. This can tell you who in the index liked or hated the movie the most. When it comes to the rows, you can see how many high/low ratings an individual in the index gave to the movies in the dataframe (relitive to the other people in the index). This can be usfull in big data sets and for the film and thearter industry. However, for you the movie viewer, the regular, non-normalized/non-standerdized data ratings would help you better decied what movies to see based on what might be the more poplular movie amongest the ratings in the dataset. This regular data might influence you to see one movie over the other.


# 5. [Extra credit] Create another new pandas dataframe, with standardized ratings for each user. Once again, show the average ratings for each user and each movie.

We can eisily do this by taking our original `movie_ratings` dataframe and copying it into the `standardized_ratings` dataframe and modifying our equasion for the standardazitation format on the data frame.

In [7]:
standardized_ratings = movie_ratings.copy()
standardized_ratings = (standardized_ratings - standardized_ratings.mean())/standardized_ratings.std()
standardized_ratings

Unnamed: 0,Parasite,Avengers Endgame,Yesterday,Ford Vs. Ferrari,John Wick 3,Frozen 2,Average Rating by User
Dennis,,,0.0,0.707107,1.0,,1.732244
Nicole,1.0,-1.372813,1.118034,,,,-0.853195
Alex,,0.392232,-1.118034,-1.414214,-1.0,,-1.111739
Sandra,-1.0,,1.118034,,,1.0,0.180981
Mathew,,0.980581,-1.118034,0.707107,,-1.0,0.051709
Average Rating by Movie,0.0,0.0,0.0,0.0,0.0,0.0,0.0
