In [12]:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler


file_path = "movie_ratings_updated.csv"  
df = pd.read_csv(file_path)


df_float = df.copy()
for col in df_float.columns[1:]: 
    df_float[col] = pd.to_numeric(df_float[col], errors='coerce').astype("float64")


print("Original Dataset:")
print(df_float.head())


df_float["User Average"] = df_float.iloc[:, 1:].mean(axis=1) 
movie_averages = df_float.iloc[:, 1:].mean()  


print("\nAverage rating per user (Original):")
print(df_float[["User", "User Average"]])

print("\nAverage rating per movie (Original):")
print(movie_averages)


scaler = MinMaxScaler()
normalized_df = df_float.copy()


normalized_values = scaler.fit_transform(df_float.iloc[:, 1:-1]) 
normalized_df.iloc[:, 1:-1] = normalized_values  


normalized_df["User Average"] = normalized_df.iloc[:, 1:-1].mean(axis=1)
normalized_movie_averages = normalized_df.iloc[:, 1:-1].mean()


print("\nNormalized Ratings (first 5 rows):")
print(normalized_df.head()) 

print("\nAverage rating per user (Normalized):")
print(normalized_df[["User", "User Average"]])

print("\nAverage rating per movie (Normalized):")
print(normalized_movie_averages)


scaler = StandardScaler()
standardized_df = df_float.copy()


standardized_values = scaler.fit_transform(df_float.iloc[:, 1:-1])
standardized_df.iloc[:, 1:-1] = standardized_values 


standardized_df["User Average"] = standardized_df.iloc[:, 1:-1].mean(axis=1)
standardized_movie_averages = standardized_df.iloc[:, 1:-1].mean()


print("\nStandardized Ratings (first 5 rows):")
print(standardized_df.head())  

print("\nAverage rating per user (Standardized):")
print(standardized_df[["User", "User Average"]])

print("\nAverage rating per movie (Standardized):")
print(standardized_movie_averages)


df_float.to_csv("movie_ratings_with_averages.csv", index=False)
normalized_df.to_csv("movie_ratings_normalized.csv", index=False)
standardized_df.to_csv("movie_ratings_standardized.csv", index=False)

print("\nData processing complete! Results saved to CSV files.")



Original Dataset:
     User  John Wick  Wild Robot  American Gangster  Zero Dark Thirty  Wicked
0   Mills        5.0         4.0                3.0               4.0     3.0
1   Chris        4.0         NaN                4.0               NaN     4.0
2   Angel        5.0         5.0                4.0               4.0     3.0
3  Isaiah        4.0         3.0                2.0               2.5     3.5
4   Macho        5.0         4.0                4.0               3.0     3.7

Average rating per user (Original):
     User  User Average
0   Mills          3.80
1   Chris          4.00
2   Angel          4.20
3  Isaiah          3.00
4   Macho          3.94

Average rating per movie (Original):
John Wick            4.600
Wild Robot           4.000
American Gangster    3.400
Zero Dark Thirty     3.375
Wicked               3.440
User Average         3.788
dtype: float64

Normalized Ratings (first 5 rows):
     User  John Wick  Wild Robot  American Gangster  Zero Dark Thirty  Wicked  \
0

Conclusion: Advantages and Disadvantages of Normalized Ratings


Advantages of Normalized Ratings:

Consistent Scale: Normalizing ratings brings all values to a consistent scale, usually between 0 and 1. This makes it easier to compare ratings across users and movies, especially when users might have different rating tendencies (e.g., some users might give higher ratings overall, while others are more conservative).

Eliminates Biases: By normalizing ratings, you reduce the impact of individual user biases. For instance, a user who consistently rates movies higher than others will have their ratings adjusted to a comparable scale, enabling fairer comparisons.

Improved for Algorithmic Use: If you're applying machine learning or recommendation algorithms (e.g., collaborative filtering), normalized ratings are often preferable. Algorithms that rely on similarity measures (e.g., cosine similarity or Pearson correlation) benefit from normalization, as it ensures that ratings are comparable on the same scale, preventing skewing due to diferent rating ranges across users.

Prevents Overemphasis of Extreme Ratings: By compressing the range of ratings to a defined scale (e.g., 0-1), extreme values (very high or very low ratings) won't dominate the analysis or affect algorithms disproportionately. This is especially useful when handling outliers or inconsistent data.


Disadvantages of Normalized Ratings:

Loss of Original Context: Normalizing ratings removes the original magnitude of ratings. A user’s genuine rating preferences might get lost. For example, if a user generally rates movies very highly, the normalization process will reduce those ratings, potentially stripping away useful information about their rating behavior.

Data Transformation Complexity: When normalizing ratings, you need to be cautious about how the data is transformed. If the normalization is not done properly or if different users have vastly different rating scales, it could lead to inaccurate representations of the ratings or misinterpretation of the data.

Potential for Misleading Interpretations: If users or stakeholders are unaware that ratings have been normalized, they might misinterpret the data. For example, a normalized rating close to 0.9 might seem almost perfect, but in the context of a movie rating system, it may not be as meaningful when compared to the original scale.

Impact on Interpretability: When discussing ratings with stakeholders or making decisions based on the data, it might be harder for non-technical people to understand or trust the normalized scores, as they no longer correspond to real-world scores (like a 1 to 5-star rating system).


Summary:
Normalized ratings are especially beneficial for comparative analysis, algorithmic fairness, and when working with large datasets where scale inconsistencies might otherwise cause issues.
However, they come at the cost of losing contextual meaning of the original ratings, which might be critical in some cases. Understanding the trade-offs is important when deciding whether to use normalized ratings or stick with the original values.



