
# Tinder Recommandation Project

## Company's Description 📇

<a href="https://tinder.com/?lang=en" target="_blank">Tinder</a> is of one the most famous online dating application in the world right now. The whole idea is to being able to anonymously swipe right or left on a person to show if you liked her or not. If both person swiped right: *It's a match*! And then you can start chatting! 

This whole concept revolutionized the way young people date. Founder <a href="https://www.crunchbase.com/person/sean-rad" target="_blank">Sean Rade</a> believed that *"no matter who you are, you feel more comfortable approaching somebody if you know they want you to approach them."*

With over 50 million users (80% + from 16 to 34), Tinder's valuation is around $1.4 billion which makes this start-up one of the most famous unicorn in california as of today. 😮

## Project 🚧

The main way for Tinder to bring value to its users is to recommand the right profile to the right person! However, as user base is growing Tinder cannot simply show a random profile or the "most liked" person to a given user. 

We will **create a recommandation engine that will show-up the people a user will likely match.** 

  Indeed Tinder's internal data team created a scoring based on several variables that the company is keeping secret. This score is from 0 to 10 and describes how much a user "liked" a person. We will use them for our recommendation system.

## Goals 🎯

Our goal is to

* Recommend 10 best profiles for a given user 

## Scope of this project 🖼️

The dataset is availabale here:


👉👉 <a href="https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Unsupervised_Learning/Tinder_data.zip" target="_blank">Tinder Data</a> 👈👈

The files contain 17,359,346 anonymous ratings of 168,791 profiles made by 135,359 LibimSeTi users. You will find a zip file contained data on gender for each person as well as a rating. 

## Helpers 🦮

Here are a few tips to help us.

### Introduction to Recommendation engines 

It is now time to discuss Recommendation Engines. 
There are two types of recommendation engines: 

1. Collaborative Filtering 
2. Content Based 

![](https://miro.medium.com/max/690/1*G4h4fOX6bCJhdmbsXDL0PA.png)


### Collaborative filtering principles 

We start this project with Collaborative Filtering. The idea is to recommend a product based on other users' review. Let u see the idea behind  [this explanatory gif](https://www.kdnuggets.com/2019/09/machine-learning-recommender-systems.html#:~:text=Recommender%20systems%20are%20an%20important,to%20follow%20from%20example%20code.) from KDNugget: 

![](https://miro.medium.com/max/623/1*hQAQ8s0-mHefYH83uDanGA.gif)

Instead of having "products" to recommend, this time, we will recommend people!

### Build a utility matrix 

Our goal is to be able to create a recommandation engine built on a utility matrix like this one <a href="https://towardsdatascience.com/math-for-data-science-collaborative-filtering-on-utility-matrices-e62fa9badaab" target="_blank">utility matrix</a>. This should look something like this: 

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/utility_matrix.png"/>

### Machine Learning

TruncatedSVD is the perfect algorithm here gue to the sparsity of the utility matrix! 👏 We will apply this algorithm to reduce dimension and then create a <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html" target="_blank">correlation matrix</a> to see which profile are correlated and thefore would be a match!  

## Deliverable 📬

Goals: 

* Have built a utility matrix 
* Have created a correlation matrix 
* Recommend a list of 10 profiles for a random user 



# 1. Reading the data

In [1]:
import pandas as pd 

genders = pd.read_csv("http://www.occamslab.com/petricek/data/gender.dat", header=None)
genders.head()

Unnamed: 0,0,1
0,1,F
1,2,F
2,3,U
3,4,F
4,5,F


In [2]:
ratings = pd.read_csv("http://www.occamslab.com/petricek/data/ratings.dat", header=None)
ratings = ratings.sample(30000)
ratings.head()

Unnamed: 0,0,1,2
6201361,49264,191514,9
634052,4917,198249,5
16949074,131999,96789,10
13636946,105868,164427,2
17309833,134989,194553,6


In [3]:
genders.to_csv('genders.csv')
ratings.to_csv('ratings.csv')

## Exploration 


* We rename the columns the following way: `["UserID", "ProfileID", "Rating"]`

* Then  we find the most rated profiles 

In [4]:
ratings = ratings.rename(columns={0: "UserID", 1: "ProfileID", 2: "Rating"})
ratings.head()

Unnamed: 0,UserID,ProfileID,Rating
6201361,49264,191514,9
634052,4917,198249,5
16949074,131999,96789,10
13636946,105868,164427,2
17309833,134989,194553,6


In [5]:
ratings.groupby("ProfileID")["Rating"].count().sort_values(ascending=False).head()

ProfileID
156148    49
121859    47
22319     44
193687    44
31116     42
Name: Rating, dtype: int64

### Build a Utility Matrix 


In [6]:
ratings = ratings.pivot_table(values="Rating", index="ProfileID", columns="UserID", fill_value=0)
ratings

UserID,2,9,19,38,43,60,62,69,73,74,...,135316,135323,135331,135337,135339,135340,135343,135350,135357,135359
ProfileID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
50,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
55,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
90,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
103,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220892,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
220928,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
220947,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
220948,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Using SVD to create a Recommendation Engine

* Using TruncatedSVD from Sklearn
* Calculating correlations for each users based on the result matrix 
* Getting recommendation for one user. 

In [7]:
import numpy as np 
import sklearn
from sklearn.decomposition import TruncatedSVD

# Initiate SVD 
SVD = TruncatedSVD(n_components=12)

# Use fit method to perform SVD on user Ratings
resultant_matrix = SVD.fit_transform(ratings)

# Calculate Pearson correlation for all people.
corr_mat = np.corrcoef(resultant_matrix)

# Print the correlation coefficient for 5 persons
corr_mat[:5, :]

array([[ 1.        ,  0.62235021,  0.39730877, ...,  0.18602012,
         0.01300132, -0.52934619],
       [ 0.62235021,  1.        , -0.14316782, ...,  0.13819985,
        -0.05438472, -0.1240246 ],
       [ 0.39730877, -0.14316782,  1.        , ...,  0.05336358,
        -0.40588686, -0.68922699],
       [-0.01596799, -0.38752368,  0.33228579, ...,  0.44382775,
         0.62942183, -0.53683792],
       [-0.0662929 , -0.49130695,  0.27301689, ...,  0.30294781,
         0.58472229, -0.50517138]])

In [8]:
corr_mat.shape

(18208, 18208)

In [14]:
#Generate a list of all movie names
all_users = ratings.columns
all_users_list = list(all_users)



In [15]:
all_users_list

[2,
 9,
 19,
 38,
 43,
 60,
 62,
 69,
 73,
 74,
 75,
 87,
 88,
 90,
 93,
 104,
 112,
 119,
 134,
 135,
 141,
 147,
 153,
 155,
 156,
 160,
 182,
 184,
 196,
 198,
 202,
 203,
 217,
 220,
 223,
 251,
 256,
 280,
 286,
 308,
 309,
 312,
 315,
 316,
 325,
 328,
 332,
 335,
 341,
 347,
 352,
 361,
 364,
 368,
 371,
 374,
 375,
 380,
 384,
 398,
 405,
 406,
 408,
 416,
 421,
 444,
 445,
 452,
 455,
 459,
 461,
 467,
 468,
 477,
 484,
 485,
 495,
 496,
 498,
 504,
 505,
 506,
 510,
 513,
 520,
 535,
 536,
 538,
 544,
 545,
 552,
 559,
 573,
 587,
 594,
 596,
 607,
 612,
 613,
 617,
 621,
 632,
 648,
 649,
 659,
 662,
 667,
 670,
 675,
 677,
 679,
 684,
 696,
 702,
 703,
 706,
 708,
 720,
 729,
 730,
 737,
 739,
 740,
 742,
 744,
 745,
 748,
 749,
 757,
 768,
 776,
 782,
 794,
 804,
 822,
 825,
 838,
 852,
 853,
 854,
 859,
 872,
 876,
 877,
 879,
 882,
 890,
 894,
 898,
 900,
 908,
 910,
 919,
 929,
 931,
 937,
 940,
 942,
 946,
 952,
 956,
 961,
 979,
 991,
 992,
 1006,
 1009,
 1017,
 1023,

In [19]:
user = 2
### recommendation system applied to a particular user
# Extract the correlation from corr. matrix.
corr_pers = corr_mat[all_users_list.index(user)] 

print("User {} is going to get along with:".format(user))
print(list(ratings[(corr_pers < 1.0) & (corr_pers > 0.9)].index[:10]))

User 2 is going to get along with:
[2, 67115, 91839, 96970, 102309, 109339, 115095, 122442, 138341, 138367]
