# Collaborative Filtering

**Collaborative filtering** is one of the market analysis techniques where we recommend the product to users. The main objective is to give a recommendation to the user who will choose or buy a specific product based on the rating given by another user. The simple concept is the assumption that someone who likes a certain product, the product will also be liked by others. 

In this case, I use the dataset that I and my friends have collected by giving a rating (1-5) on the movie that we've watched. 

Dataset : "recomendations_data.ipynb"

### Import Dataset

Importing datasets on data recomendations_data files

In [16]:
#Import Libraries
from math import sqrt
import pandas as pd

In [17]:
#From 'Name of File' import 'Name of Data'
from recomendations_data import dataset

In [18]:
pd.DataFrame(dataset)

Unnamed: 0,ANI,AhokTemanFirli,Damar Teman Firli,Dpv,Febi ganteng gak ada obat,Genjeh,Hania,Indra 1991 SM,Indra Junior,Jawaharal,...,OM INDRA,Putrisqiana,Rima,Romantika,Star,Topik Zulkarnain,bunga,faizah,franadek,luck
Ada Apa dengan Cinta 2,4,0,5,5,4,5,3,0,4,2,...,3,4,5,5,4,0,0,3,4,3
Aladdin,4,0,0,0,5,5,0,0,5,5,...,1,0,5,0,5,0,5,0,5,0
Avengers: End Game,0,3,5,5,5,5,0,0,5,5,...,5,5,5,0,5,5,5,5,5,4
Bumi Manusia,5,0,0,0,0,0,4,0,0,0,...,0,4,4,0,0,0,0,5,5,0
Captain Marvel,4,4,0,5,4,4,0,0,5,4,...,5,3,5,0,5,2,5,0,4,2
Dilan 1991,4,0,0,4,4,3,4,0,0,3,...,2,2,5,5,0,0,4,5,4,0
Dua Garis Biru,0,0,0,0,0,0,4,0,4,5,...,5,3,3,0,0,0,0,4,3,0
Gundala,0,0,0,4,3,4,5,5,0,4,...,0,3,5,0,4,0,4,0,4,0
Spiderman: Far From Home,3,0,5,5,5,4,0,0,5,5,...,5,4,5,0,0,4,5,0,4,0
The Lion King,0,0,0,0,0,0,0,0,5,4,...,0,0,4,0,5,5,5,0,4,0


### Analysis

The Code below describes the colaborative filtering steps: 
- Step 1: Get the **Similarity Score** between users by using the Euclidean distance. So, someone who has a Euclidean distance close to someone else will have a great similarity and between people who have a distance Euclidean will have a small similarity score.
- Step 2: Get the **Person Corelation** score value of a person between users by using proximity value.Thus, the output of this code is the inter-user score which is high which means it has a high correlation/similarity in providing a watched rating, and low which means it has a small correlation/similarity in providing a film frating that been watched.
- Step 3: Get **Most Similar Person** score. The Output of this code is to display other users who have a high similarity with the user that is compared based on the result of person corelation and score similarity.
- Step 4: Get **Film Recomendations**. The output in this code is a recommendation film to the user that they haven't previously watched, in this output is also ranked from the recommendation of a movie from the user.

**Similarity Score**

In [19]:
#import code 
#Get Similarity Score between person with euclidean distance
def similarity_score(person1,person2):

    # this Returns the ration euclidean distancen score of person 1 and 2

    # To get both rated items by person 1 and 2
    both_viewed = {}

    for item in dataset[person1]:
        if item in dataset[person2]:
            both_viewed[item] = 1
        
        # The Conditions to check if they both have common rating items
        if len(both_viewed) == 0:
            return 0

        # Finding Euclidean distance
        sum_of_euclidean_distance = []

        for item in dataset[person1]:
            if item in dataset[person2]:
                sum_of_euclidean_distance.append(pow(dataset[person1][item] - dataset[person2][item], 2))
        sum_of_euclidean_distance = sum(sum_of_euclidean_distance)
        
        return 1/(1+sqrt(sum_of_euclidean_distance))

**Person Correlation**

In [20]:
#Get Score of Person Correlation between people
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

**Most Similar Users**

In [21]:
#Get Score similarity and person between others person
def most_similar_users(person, number_of_users):

    # returns the number_of_users (similar persons) for a given specific person
    scores = [(person_correlation(person, other_person), other_person) for other_person in dataset if other_person != person]

    # Sort the similar persons so the highest scores person will appear at the first
    scores.sort()
    scores.reverse()
    return scores[0:number_of_users]

**Film Recomendations**

In [22]:
#Get Score Rate and Film Recomendations from person        
def film_recommendations(person):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim

        # ignore scores of zero or lower
        if sim <=0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item]* sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+= sim

    # Create the normalized list
    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    rankings.sort()
    rankings.reverse()
    return rankings

### Output

In [23]:
#Print in order of the score and the name of the person who has similiarity with us
pd.DataFrame(most_similar_users('faizah', 23))

Unnamed: 0,0,1
0,0.478745,Putrisqiana
1,0.451097,Hania
2,0.396203,Romantika
3,0.280149,luck
4,0.138971,OM INDRA
5,0.134491,Damar Teman Firli
6,0.117393,franadek
7,0.091989,ANI
8,-0.01242,AhokTemanFirli
9,-0.030424,Dpv


**Interpretation:**
- The result above is the result of similarity and user score that has a similarity with User: 'Faizah' based on the movie rating that has been given.
- In this output there is a score that has a sign (+) and a sign (-), where (+) means that the user has similarities that are comparable to User: ' Faizah ' and (-) meaning the user has no similarity with Faizah or in the sense of having a difference in Like movies.
- User: 'Faizah' has the highest similarity with User: 'Putrisqiana' with a score of 0.478745 and has the least similarity or inversely proportional to User: 'Jul' which has a similar score with User: 'Faizah' of -0.614816.
- After judging by the rating results it turns out the user: 'Faizah' has the similarity of movies watched with user: "Putrisqiana" at most. Whereas with User: 'Jul' has a difference in the FIL that has been watched so that it generates a score that is inversely proportional. 

In [24]:
#Print score similarity with other person         
print(similarity_score('faizah','Hania'))

0.12178632452799958


**Interpretation:**
In the code above, we want to display the value of similarity score User: 'Faizah' and User: 'Hania'. Visible results similarity score both are 0.12178. That indicates Faizah and Hania have a similar score that is low due to the difference in giving a rating to the film they are watching.

In [25]:
#Print score of person correlation with other person
print(person_correlation('faizah', 'Hania'))

0.4510968544481586


**Interpretation:**
The result above resulted in a person corelation between Faizah and Hania where generating the score: 0.45109685.

In [26]:
#Print Score rate each film recomendations and Film Recomendations to us
pd.DataFrame(film_recommendations('faizah'))

Unnamed: 0,0,1
0,1.991966,Gundala
1,1.928079,Spiderman: Far From Home
2,1.689255,Captain Marvel
3,0.523634,Aladdin
4,0.22478,The Lion King


**Interpretation:**
The result above is a recommendation film for Faizah. It is seen that the recommended films displayed are films that are not yet watched Faizah. And the recommendation of the film has been recorded based on the rating that has been given by another user who has been the film's preview.