<a href="https://colab.research.google.com/github/SANGRAMLEMBE/MTech/blob/main/Applied_Data_Science/Practical/ADS_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity

# Step 1: Load Datasets

In [2]:
user_info = pd.read_csv("user_info.txt", header = None, names = ["Movie_ID","Customer_ID","Rating","Date"])
print(user_info)

      Movie_ID  Customer_ID  Rating        Date
0            1      1488844       3  2005-09-06
1            2       822109       5  2005-05-13
2            3       885013       4  2005-10-19
3            4        30878       4  2005-12-26
4            5       823519       3  2004-05-03
...        ...          ...     ...         ...
2574      2575      1123472       3  2004-10-24
2575      2576      1651475       5  2004-10-25
2576      2577      2630797       5  2004-10-26
2577      2578      2096773       3  2004-10-27
2578      2579      2428512       5  2004-10-28

[2579 rows x 4 columns]


In [3]:
movie_info= pd.read_csv("movie_info.csv",encoding= "latin1",on_bad_lines = "skip")
print(movie_info)

           1    2003                                    Dinosaur Planet
0          2  2004.0                         Isle of Man TT 2004 Review
1          3  1997.0                                          Character
2          4  1994.0                       Paula Abdul's Get Up & Dance
3          5  2004.0                           The Rise and Fall of ECW
4          6  1997.0                                               Sick
...      ...     ...                                                ...
17428  17766  2002.0  Where the Wild Things Are and Other Maurice Se...
17429  17767  2004.0                  Fidel Castro: American Experience
17430  17768  2000.0                                              Epoch
17431  17769  2003.0                                        The Company
17432  17770  2003.0                                       Alien Hunter

[17433 rows x 3 columns]


In [4]:
# Clean movie_info columns

movie_info = movie_info.rename(columns={
    movie_info.columns[0]: "Movie_ID",
    movie_info.columns[1]: "Year",
    movie_info.columns[2]: "Name"})
[["Movie_ID","Year","Name"]]

[['Movie_ID', 'Year', 'Name']]

In [5]:
# Convert MovieID to Integer for Merging


movie_info["Movie_ID"] = movie_info["Movie_ID"].astype(int)

# Step 2: Merge User ratings with Movie Info

In [6]:
merged_data = pd.merge(user_info, movie_info, on="Movie_ID")
print(merged_data)

      Movie_ID  Customer_ID  Rating        Date    Year  \
0            2       822109       5  2005-05-13  2004.0   
1            3       885013       4  2005-10-19  1997.0   
2            4        30878       4  2005-12-26  1994.0   
3            5       823519       3  2004-05-03  2004.0   
4            6       893988       3  2005-11-17  1997.0   
...        ...          ...     ...         ...     ...   
2531      2575      1123472       3  2004-10-24  2004.0   
2532      2576      1651475       5  2004-10-25  2000.0   
2533      2577      2630797       5  2004-10-26  1995.0   
2534      2578      2096773       3  2004-10-27  2001.0   
2535      2579      2428512       5  2004-10-28  1969.0   

                              Name  
0       Isle of Man TT 2004 Review  
1                        Character  
2     Paula Abdul's Get Up & Dance  
3         The Rise and Fall of ECW  
4                             Sick  
...                            ...  
2531          R.O.D. the TV Seri

# Step 3: Create User-Movie Matrix

In [7]:
user_movie_matrix = merged_data.pivot_table(
    index= "Customer_ID",
    columns = "Movie_ID",
    values = "Rating")

 Fill NaN with 0 (not rated yet)

In [8]:
user_movie_matrix_filled = user_movie_matrix.fillna(0)

# Step 4: Compute Similarities (Item-based CF)

In [9]:
movie_similarity = cosine_similarity(user_movie_matrix_filled.T)
movie_similarity_df = pd.DataFrame(
    movie_similarity,
    index=user_movie_matrix.columns,
    columns=user_movie_matrix.columns
)

# Step 5: Find Top 15 Recommendations

In [10]:
# Compute average rating per movie
movie_ratings_mean = merged_data.groupby("Movie_ID")["Rating"].mean()

In [11]:
# Sort by highest rating
top_15_movies = movie_ratings_mean.sort_values(ascending=False).head(15)
top_15_movies

Unnamed: 0_level_0,Rating
Movie_ID,Unnamed: 1_level_1
2572,5.0
2573,5.0
2576,5.0
1865,5.0
1866,5.0
1868,5.0
1871,5.0
1878,5.0
1879,5.0
49,5.0


In [12]:
# Map to movie names
top_15_recommendations = movie_info[movie_info["Movie_ID"].isin(top_15_movies.index)][
    ["Movie_ID", "Name", "Year"]
]

In [13]:
top_15_recommendations = top_15_recommendations.merge(
    top_15_movies, on="Movie_ID"
).rename(columns={"Rating": "AvgRating"})

# Step 6: Show Results

In [14]:
print("Top 15 Recommended Movies/TV Shows:\n")
print(top_15_recommendations.to_string(index=False))

Top 15 Recommended Movies/TV Shows:

 Movie_ID                                                                Name   Year  AvgRating
       28                                                     Lilo and Stitch 2002.0        5.0
       30                                              Something's Gotta Give 2003.0        5.0
       31                          Classic Albums: Meat Loaf: Bat Out of Hell 1999.0        5.0
       35                                     Ferngully 2: The Magical Rescue 2000.0        5.0
       49                         Devo: The Complete Truth About De-evolution 2003.0        5.0
     1865                               Eternal Sunshine of the Spotless Mind 2004.0        5.0
     1866                                                              Sirens 1994.0        5.0
     1868                                                   Mad Monster Party 1967.0        5.0
     1871                                           Fela: Music Is the Weapon 1982.0        5.0
   

# File content in JSON

In [15]:
import json

# Convert to dictionary format
output_data = {
    "top_15_recommendations": top_15_recommendations.to_dict('records'),
    "total_movies": len(movie_info),
    "total_users": len(user_info['Customer_ID'].unique()),
    "recommendation_method": "Collaborative Filtering (Item-based)"
}

# Save to JSON file
with open('RA2512051010001.json', 'w') as f:
    json.dump(output_data, f, indent=2)


In [16]:
# View the JSON file content
with open('RA2512051010001.json', 'r') as f:
    json_content = json.load(f)

print(json.dumps(json_content, indent=2))


{
  "top_15_recommendations": [
    {
      "Movie_ID": 28,
      "Name": "Lilo and Stitch",
      "Year": 2002.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 30,
      "Name": "Something's Gotta Give",
      "Year": 2003.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 31,
      "Name": "Classic Albums: Meat Loaf: Bat Out of Hell",
      "Year": 1999.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 35,
      "Name": "Ferngully 2: The Magical Rescue",
      "Year": 2000.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 49,
      "Name": "Devo: The Complete Truth About De-evolution",
      "Year": 2003.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 1865,
      "Name": "Eternal Sunshine of the Spotless Mind",
      "Year": 2004.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 1866,
      "Name": "Sirens",
      "Year": 1994.0,
      "AvgRating": 5.0
    },
    {
      "Movie_ID": 1868,
      "Name": "Mad Monster Party",
      "Year": 196

## Identify target user's ratings

Get the rating data for the target user ID (372233)
- Filter the merged_data DataFrame to get the ratings for the target user ID and store it in a new variable.


In [17]:
target_user_id = 372233
target_user_ratings = merged_data[merged_data['Customer_ID'] == target_user_id]

## Calculate user similarity
- Calculate the cosine similarity between the target user and all other users based on their movie ratings.



In [18]:
target_user_row = user_movie_matrix_filled.loc[target_user_id].values.reshape(1, -1)
user_similarity = cosine_similarity(target_user_row, user_movie_matrix_filled)
user_similarity_df = pd.DataFrame(user_similarity.T, index=user_movie_matrix_filled.index, columns=['similarity'])

## Find top similar users
- I need to sort the user similarity scores, exclude the target user, and then select the top 15 to find the most similar users.



In [19]:
user_similarity_sorted = user_similarity_df.sort_values(by='similarity', ascending=False)
top_similar_users = user_similarity_sorted[user_similarity_sorted.index != target_user_id].head(15)

## Get movies watched by similar users


- Filter the merged_data DataFrame to include only the ratings from the users in the top_similar_users DataFrame.



In [20]:
movies_by_similar_users = merged_data[merged_data['Customer_ID'].isin(top_similar_users.index)]

## Filter already watched movies

- Create a list of Movie_IDs watched by the target user and filter the movies watched by similar users to exclude those already watched by the target user.




In [21]:
watched_movie_ids = target_user_ratings['Movie_ID'].tolist()
recommended_movies_raw = movies_by_similar_users[~movies_by_similar_users['Movie_ID'].isin(watched_movie_ids)]

## Rank and select recommendations

Rank the movies watched by similar users based on their ratings and select the top recommendations.


In [22]:
movie_recommendations = recommended_movies_raw.groupby('Movie_ID')['Rating'].mean().sort_values(ascending=False)
top_recommendation_ids = movie_recommendations.head(15).index.tolist()


Merge the recommended movie IDs with the movie information to get the movie titles.


In [23]:
top_recommended_movies_details = movie_info[movie_info['Movie_ID'].isin(top_recommendation_ids)]


Display the recommended movies.

- Display the recommended movies by printing the DataFrame containing their details.



In [24]:
print("Recommended Movies for User 372233:\n")
print(top_recommended_movies_details.to_string(index=False))

Recommended Movies for User 372233:

 Movie_ID   Year                                Name
       92 2002.0                  ECW: Cyberslam '99
      258 1999.0                                Mann
      306 2005.0 Sesame Street: Sing Yourself Silly!
      340 1998.0           Midnight: 2000 Seen By...
      343 2000.0           French and Saunders: Live
      404 2001.0                           The Shaft
      598 1973.0           Bobby Darin: Mack is Back
      629 1984.0                         Firestarter
      659 1972.0          The Last House on the Left
      660 2000.0                        Saving Grace
     1234 1994.0                            Crooklyn
     1270 2000.0                    The Great Gatsby
     1271 1939.0              Drums Along the Mohawk
     1961 1939.0                     Port of Shadows
     1963 2000.0                 Beautiful Creatures
