### Collaborative Filtering

Collaborative filtering (CF) [1] is a popular recommendation technique that makes automatic predictions about a user’s interests by collecting preferences or ratings from many users. The core idea is that if two users have similar tastes, the preferences of one can help predict the preferences of the other.

#### Step 1: Import libraries
We’ll use:
- `pandas` for data manipulation
- `numpy` for calculations
- `scikit-learn`’s `cosine_similarity` to measure student similarity

In [15]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

#### Step 2: Load dataset
Simulated student-course rating dataset

In [13]:
df = pd.read_csv("simulated_data.csv")
df.head(10)

Unnamed: 0,Student,Course,Rating
0,Student1,Math,3
1,Student1,Physics,0
2,Student1,Chemistry,2
3,Student1,History,0
4,Student1,Art,0
5,Student1,Biology,4
6,Student1,ComputerScience,1
7,Student1,Economics,0
8,Student2,Math,5
9,Student2,Physics,1


#### Step 3: Student-course matrix
We’ll turn the raw data into a matrix where:
- Rows = students
- Columns = courses
- Values = ratings
Missing values are filled with 0 (meaning no rating).

In [9]:
student_course_matrix = df.pivot_table(index='Student', columns='Course', values='Rating').fillna(0)
student_course_matrix

Course,Art,Biology,Chemistry,ComputerScience,Economics,History,Math,Physics
Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Student1,0.0,4.0,2.0,1.0,0.0,0.0,3.0,0.0
Student10,0.0,2.0,3.0,0.0,1.0,5.0,0.0,3.0
Student100,4.0,0.0,3.0,1.0,0.0,3.0,5.0,2.0
Student11,1.0,3.0,1.0,4.0,3.0,3.0,5.0,2.0
Student12,1.0,2.0,2.0,5.0,3.0,3.0,2.0,4.0
...,...,...,...,...,...,...,...,...
Student95,3.0,0.0,0.0,1.0,3.0,0.0,1.0,3.0
Student96,4.0,2.0,2.0,0.0,4.0,4.0,0.0,3.0
Student97,5.0,1.0,1.0,2.0,0.0,3.0,0.0,0.0
Student98,4.0,4.0,0.0,2.0,4.0,4.0,1.0,3.0


#### Step 4: Compute student similarity
we compute the cosine similarity between students.<br>
Students who rate courses similarly will have higher similarity values (closer to 1).

In [16]:
student_similarity = cosine_similarity(student_course_matrix)
similarity_df = pd.DataFrame(student_similarity, index=student_course_matrix.index, columns=student_course_matrix.index)

print("Student Similarity Matrix:")
similarity_df.head()

Student Similarity Matrix:


Student,Student1,Student10,Student100,Student11,Student12,Student13,Student14,Student15,Student16,Student17,...,Student90,Student91,Student92,Student93,Student94,Student95,Student96,Student97,Student98,Student99
Student,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Student1,1.0,0.368932,0.502079,0.700386,0.494881,0.497613,0.532624,0.151673,0.540062,0.299028,...,0.499152,0.690268,0.215758,0.487088,0.68313,0.135613,0.271746,0.23094,0.434122,0.621059
Student10,0.368932,1.0,0.541266,0.553704,0.680414,0.635489,0.778991,0.531021,0.707528,0.472805,...,0.413405,0.687184,0.587526,0.840168,0.604355,0.321634,0.769823,0.456435,0.670063,0.52736
Student100,0.502079,0.541266,1.0,0.712017,0.618718,0.877939,0.364662,0.712069,0.908541,0.488204,...,0.94387,0.420084,0.722185,0.439596,0.579066,0.557086,0.620174,0.671984,0.580292,0.708683
Student11,0.700386,0.553704,0.712017,1.0,0.890495,0.80428,0.661302,0.620823,0.854751,0.615125,...,0.771842,0.78134,0.625827,0.549787,0.952767,0.582839,0.605587,0.47789,0.776585,0.834812
Student12,0.494881,0.680414,0.618718,0.890495,1.0,0.815374,0.653233,0.699318,0.876501,0.668153,...,0.613716,0.792118,0.634459,0.64312,0.923913,0.678417,0.701646,0.521749,0.800641,0.579066


#### Step 5: Recommend courses for a student

Let’s recommend courses to Student15 based on what similar students have rated highly.
Steps:
1. Find courses Student15 hasn't rated.
2. Look at how similar students rated those courses.
3. Use a weighted average of ratings based on similarity scores.

In [17]:
target_student = 'Student15'
unrated_courses = student_course_matrix.columns[student_course_matrix.loc[target_student] == 0]
similar_students = similarity_df[target_student].drop(target_student).sort_values(ascending=False)

recommendations = {}

for course in unrated_courses:
    weighted_sum = 0
    sim_sum = 0
    for other_student, sim in similar_students.items():
        rating = student_course_matrix.loc[other_student, course]
        if rating > 0:
            weighted_sum += sim * rating
            sim_sum += sim
    if sim_sum > 0:
        recommendations[course] = weighted_sum / sim_sum
    else:
        recommendations[course] = 0

sorted_recommendations = sorted(recommendations.items(), key=lambda x: x[1], reverse=True)
print(f"\nRecommended courses for {target_student}:")
for course, score in sorted_recommendations:
    print(f"{course}: predicted rating {score:.2f}")


Recommended courses for Student15:
Economics: predicted rating 2.89
Chemistry: predicted rating 2.76
Biology: predicted rating 2.61


#### Reference
[1] Sarwar, Badrul, et al. "Item-based collaborative filtering recommendation algorithms." Proceedings of the 10th international conference on World Wide Web. 2001.