In [1]:
import numpy as np
import pandas as pd

### Use pandas to import the required file and set the viewer name as index

In [22]:
ratings = pd.read_csv('MovieRatings.csv')
ratings = ratings.set_index('viewer')
ratings

Unnamed: 0_level_0,Black Panther,Infinity War,Incredibles 2,Antman and the Wasp,Crazy Rich Asians,Deadpool 2
viewer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Duke,4.0,3.0,5.0,4.0,,3.0
Katrina,5.0,,5.0,3.0,4.0,2.0
Ava,,4.0,5.0,,4.0,
Michael,5.0,5.0,,4.0,,5.0
David,4.0,,3.0,4.0,,4.0
Jessica,,3.0,4.0,,5.0,3.0
Jane,3.0,4.0,,4.0,5.0,


### To obtain the average, the .mean() method is used.  Without the use of the axis parameter, the method is applied vertically giving the average rating for the movie.

In [20]:
average_movie_rating = ratings.mean()
average_movie_rating

Black Panther          4.2
Infinity War           3.8
Incredibles 2          4.4
Antman and the Wasp    3.8
Crazy Rich Asians      4.5
Deadpool 2             3.4
dtype: float64

### To obtain the average ratings given by the user, the same .mean() method will be use and the axis parameter will be set to 1 to give aggregate across each row.

In [21]:
average_viewer_rating = ratings.mean(axis=1)
average_viewer_rating

viewer
Duke       3.800000
Katrina    3.800000
Ava        4.333333
Michael    4.750000
David      3.750000
Jessica    3.750000
Jane       4.000000
dtype: float64

### Creating a new DataFrame with normalized rating and standardized rating.

In [43]:
normalized_rating = (ratings - ratings.min())/(ratings.max() - ratings.min())
normalized_rating

Unnamed: 0_level_0,Black Panther,Infinity War,Incredibles 2,Antman and the Wasp,Crazy Rich Asians,Deadpool 2
viewer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Duke,0.5,0.0,1.0,1.0,,0.333333
Katrina,1.0,,1.0,0.0,0.0,0.0
Ava,,0.5,1.0,,0.0,
Michael,1.0,1.0,,1.0,,1.0
David,0.5,,0.0,1.0,,0.666667
Jessica,,0.0,0.5,,1.0,0.333333
Jane,0.0,0.5,,1.0,1.0,


In [44]:
standardized_rating = (ratings - ratings.mean())/ratings.std()
standardized_rating

Unnamed: 0_level_0,Black Panther,Infinity War,Incredibles 2,Antman and the Wasp,Crazy Rich Asians,Deadpool 2
viewer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Duke,-0.239046,-0.956183,0.67082,0.447214,,-0.350823
Katrina,0.956183,,0.67082,-1.788854,-0.866025,-1.227881
Ava,,0.239046,0.67082,,-0.866025,
Michael,0.956183,1.434274,,0.447214,,1.403293
David,-0.239046,,-1.565248,0.447214,,0.526235
Jessica,,-0.956183,-0.447214,,0.866025,-0.350823
Jane,-1.434274,0.239046,,0.447214,0.866025,


### The code below yields the normalized average for each movie, followed by each viewer.

In [45]:
normalized_rating.mean()

Black Panther          0.600000
Infinity War           0.400000
Incredibles 2          0.700000
Antman and the Wasp    0.800000
Crazy Rich Asians      0.500000
Deadpool 2             0.466667
dtype: float64

In [46]:
standardized_rating.mean()

Black Panther         -1.776357e-16
Infinity War           1.887379e-16
Incredibles 2         -4.551914e-16
Antman and the Wasp    4.218847e-16
Crazy Rich Asians      0.000000e+00
Deadpool 2             9.992007e-17
dtype: float64

In [47]:
normalized_rating.mean(axis=1)

viewer
Duke       0.566667
Katrina    0.400000
Ava        0.500000
Michael    1.000000
David      0.541667
Jessica    0.458333
Jane       0.625000
dtype: float64

In [48]:
standardized_rating.mean(axis=1)

viewer
Duke      -0.085604
Katrina   -0.451152
Ava        0.014614
Michael    1.060241
David     -0.207711
Jessica   -0.222049
Jane       0.029503
dtype: float64

## Conclusion
##### Normalization shrinks the range of data to ensure the range of data falls within -1 and 1.  Data within a DataFrame that are normalized containing outliers will scale the result to very small intervals.  Normalization generally works best without outliers present.  