# For this recommender system assignment I have decided to implement a content based one. The dataset I have used is a mobile phone rating dataset from kaggle. The dataset contains 1000 mobile phones and their ratings for camera, selfie, audio, display and battery. The recommender system will recommend mobile phones based on the user's preferences for the 5 features.

In [21]:
import pandas as pd
import numpy as np

dataset = pd.read_csv('mobile phone rating by dxo.csv', low_memory=False)

dataset.isnull().sum()

model        0
price        0
launch       0
camera      39
selfie     151
audio      130
display    159
battery    169
dtype: int64

# First in order to ensure optimal performance of the system we need to model the dataset so it doesn't contain any null values and all the features are numeric. For that purpose I have used the StandardScaler from sklearn to scale the numeric features, and I've filled the missing values with the mean for the numeric features and the mode for the categorical features.

In [22]:
from sklearn.preprocessing import StandardScaler


def remove_missing_values(set):
    for column in set.columns:
        if set[column].dtype == object:
            set[column] = set[column].fillna(set[column].mode().iloc[0])
        else:
            set[column] = set[column].fillna(set[column].mean())
    return set


cleaned_dataset = remove_missing_values(dataset)

print(cleaned_dataset.isnull().sum())

scaler = StandardScaler()

feature_columns = ['camera', 'selfie', 'audio', 'display', 'battery']

cleaned_dataset[feature_columns] = scaler.fit_transform(cleaned_dataset[feature_columns])


model      0
price      0
launch     0
camera     0
selfie     0
audio      0
display    0
battery    0
dtype: int64


In [23]:
from sklearn.metrics.pairwise import cosine_similarity

# Select the numeric rating columns for item profiles
smartphone_profiles = cleaned_dataset[feature_columns]


# Now that the dataset is ready we can start building the recommender system. First I've defined the user's preferences as an array. Then I've calculated the similarity scores between the user's preferences and the features of each mobile phone using cosine_similarity. Finally, I've created a dataframe with the mobile phones and their similarity scores and sorted it in descending order. The top 5 mobile phones are the recommendations.

In [24]:
# Define user_profile with the user's preferences
user_profile = np.array([0.1, 0.3, 0.7, 0.6, 0.7]).reshape(1, -1)

# Calculate the similarity scores
sim_scores = cosine_similarity(smartphone_profiles, user_profile).flatten()

#I define a user profile as an array of preference scores. Then, I calculate the cosine similarity between the user profile and each smartphone's feature vector. Cosine similarity measures the cosine of the angle between two vectors, and it's used to find the similarity between the user's preferences and each smartphone.

# Create a DataFrame with Smartphone and Similarity
sim_df = pd.DataFrame({'Smartphone': cleaned_dataset['model'], 'Similarity': sim_scores})

# Sort
recommendations = sim_df.sort_values(by='Similarity', ascending=False)

print(recommendations.head(5))


                 Smartphone  Similarity
57           Xiaomi 11T Pro    0.963123
3   Apple iPhone 13 Pro Max    0.920825
94               Xiaomi 11T    0.878413
17  Apple iPhone 12 Pro Max    0.811485
4       Apple iPhone 13 Pro    0.761816
