In [1]:
import pandas as pd
import numpy as np

# Load ratings from a CSV file
df = pd.read_csv("movie_ratings.csv", index_col=0)

# Define users
users = df.columns.tolist()

# Replace 0s (indicating not seen) with NaN for proper calculations
df.replace(0, np.nan, inplace=True)

# Calculate average ratings per user and per movie
average_per_user = df.mean()
average_per_movie = df.mean(axis=1)

# Normalize ratings (min-max scaling per user)
normalized_df = df.copy()
for user in users:
    min_rating = df[user].min(skipna=True)
    max_rating = df[user].max(skipna=True)
    if min_rating != max_rating:
        normalized_df[user] = (df[user] - min_rating) / (max_rating - min_rating)
    else:
        normalized_df[user] = df[user]

# Standardize ratings (z-score per user)
standardized_df = df.copy()
for user in users:
    mean_rating = df[user].mean(skipna=True)
    std_rating = df[user].std(skipna=True)
    if std_rating != 0:
        standardized_df[user] = (df[user] - mean_rating) / std_rating
    else:
        standardized_df[user] = df[user]

# Display results
print("Original Movie Ratings:")
print(df)
print("\nAverage Ratings Per User:")
print(average_per_user)
print("\nAverage Ratings Per Movie:")
print(average_per_movie)
print("\nNormalized Ratings:")
print(normalized_df)
print("\nStandardized Ratings:")
print(standardized_df)


Original Movie Ratings:
                                 John  Michael  Scott  Carina  Michelle
Captain America Brave New World   4.0      5.0    4.0     2.0       4.0
Nosferatu                         NaN      4.0    NaN     NaN       NaN
Love Hurts                        3.0      NaN    1.0     4.0       3.0
You're Cordially Invited          4.0      3.0    4.0     3.0       3.0
Flight Risk                       4.0      4.0    NaN     3.0       NaN
Back in Action                    NaN      2.0    2.0     NaN       NaN

Average Ratings Per User:
John        3.750000
Michael     3.600000
Scott       2.750000
Carina      3.000000
Michelle    3.333333
dtype: float64

Average Ratings Per Movie:
Captain America Brave New World    3.800000
Nosferatu                          4.000000
Love Hurts                         2.750000
You're Cordially Invited           3.400000
Flight Risk                        3.666667
Back in Action                     2.000000
dtype: float64

Normalized Rating

In [3]:
Using **normalized ratings** instead of actual ratings has both advantages and disadvantages.

#### **Advantages of Normalization:**
1. **Eliminates Bias from Individual Rating Scales:** Some users may consistently give high or low ratings. Normalization ensures that each users ratings are scaled comparably, preventing bias from affecting recommendations.
2. **Enables Fair Comparisons:** Since different users may use different portions of the rating scale, normalization brings all ratings to a common scale, making it easier to compare preferences across users.

#### **Disadvantages of Normalization:**
1. **Loses Absolute Preference Information:** If a user only rates movies they love (e.g., giving only 4s and 5s), normalization forces these ratings to a full scale (0 to 1), misrepresenting their preferences.
2. **Can Introduce Artificial Differences:** Users with little variance in their ratings may see artificial exaggeration in their scaled ratings, making small differences appear more significant than they are.

While normalization is useful for certain analytical and predictive tasks, it should be applied carefully, considering the data context and the goals of the recommendation system.

SyntaxError: invalid decimal literal (2437268687.py, line 8)