## 4. Recommender-System
* This file deals with obtaining recommendations for each region based on the songs already present in it's rankings. 
* In real-world applications of recommender systems 9Spotify, Amazon, etc) there are two kinds of recommendations made:
    * Item-based collaborative filtering - Items similar to the ones bought by the user are recommended
    * User-based collaborative filtering - Items bought by users who have a similar purchase paatern to the target user recommended.
* The system below attempts to make recommendations based on both the item as well as the user.
* In this case, each region is considered to be a user and the songs are the items. 

### 4.1 Importing libraries


In [1]:
import pandas as pd
import numpy as np
import random

### 4.2 Importing data
* The 'borda_counts.csv' file contains a list of songs with their borda counts (Popularity scores) along with the region and the dates. 

In [2]:
data= pd.read_csv ('borda_counts.csv') #Load dataset

print(data.shape)
(data.columns)

(1644778, 5)


Index(['Track_Name', 'Position', 'Region', 'Date', 'borda_count'], dtype='object')

* Get  list of unique songs grouped by region. .

In [3]:
regions=data.Region.unique()
region_songs=(data.groupby(['Region'])['Track_Name'].unique())

### 4.3 Aggregating the values
* The popularity scores for each track are summed  up over all the dates to  get the total score for each track in a region.
* The higher the score, the more popular the song in the region.
* The aggregated values are sorted to get the most popular songs per region.

In [4]:
borda_counts_sum=pd.DataFrame(data.groupby(['Region','Track_Name'])['borda_count'].sum())
borda_counts_sum.sort_values(by=['Region','borda_count'],ascending=False,inplace=True)
borda_counts_sum.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,borda_count
Region,Track_Name,Unnamed: 2_level_1
us,XO TOUR Llif3,14596
us,Congratulations,13560
us,HUMBLE.,12093
us,SAD!,11613
us,Lucid Dreams,9942
us,rockstar,9822
us,God's Plan,9762
us,1-800-273-8255,9733
us,I Fall Apart,9554
us,Bank Account,9054


### 4.4 Correlation between regions
* The aggregated scores are used to find the Correlation between regions based on the rankings they give to the same/similar tracks. 
* A correlation matrix is obtained which is later used to identify similar regions.

In [5]:
borda_counts_agg=borda_counts_sum.unstack(0).fillna(0)
(borda_counts_agg.head(5))
# print(borda_counts_agg.shape())

Unnamed: 0_level_0,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count
Region,ar,at,au,be,bg,bo,br,ca,ch,cl,...,nl,no,nz,pa,pe,ph,pl,pt,py,us
Track_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
#DansLeTierquar (Lyon),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
#DansLeTierquar (Marseille),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
#DansLeTierquar (Nantes),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
#FleKKsinonem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
#JM,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1710.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [6]:
correlation_df = borda_counts_agg.corr()
(correlation_df.head(10))

Unnamed: 0_level_0,Unnamed: 1_level_0,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count,borda_count
Unnamed: 0_level_1,Region,ar,at,au,be,bg,bo,br,ca,ch,cl,...,nl,no,nz,pa,pe,ph,pl,pt,py,us
Unnamed: 0_level_2,Region,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
borda_count,ar,1.0,0.095831,0.062599,0.121061,0.166404,0.892117,0.078602,0.072878,0.179035,0.893863,...,0.101912,0.078232,0.07001,0.832563,0.902296,0.054657,0.106816,0.216821,0.873692,0.047197
borda_count,at,0.095831,1.0,0.675116,0.762622,0.503718,0.124531,0.138145,0.592766,0.876781,0.0749,...,0.600935,0.689751,0.654965,0.164182,0.086164,0.454402,0.68605,0.618263,0.092228,0.394716
borda_count,au,0.062599,0.675116,1.0,0.794188,0.496251,0.094975,0.129008,0.831562,0.756086,0.048237,...,0.598602,0.696008,0.94677,0.140736,0.059822,0.562324,0.620396,0.729835,0.065705,0.688095
borda_count,be,0.121061,0.762622,0.794188,1.0,0.541927,0.152705,0.170617,0.724063,0.854019,0.102267,...,0.753263,0.724012,0.783277,0.200044,0.113924,0.52503,0.714664,0.736669,0.116149,0.527188
borda_count,bg,0.166404,0.503718,0.496251,0.541927,1.0,0.195295,0.179886,0.530299,0.603136,0.135811,...,0.437422,0.424242,0.503097,0.210954,0.148993,0.370037,0.566093,0.570828,0.141358,0.399004
borda_count,bo,0.892117,0.124531,0.094975,0.152705,0.195295,1.0,0.08072,0.106024,0.20989,0.908773,...,0.125629,0.103387,0.104409,0.867072,0.952428,0.084596,0.139274,0.247895,0.897894,0.071385
borda_count,br,0.078602,0.138145,0.129008,0.170617,0.179886,0.08072,1.0,0.131013,0.169036,0.065909,...,0.126763,0.120255,0.13636,0.093546,0.071595,0.110323,0.163049,0.231878,0.100009,0.089276
borda_count,ca,0.072878,0.592766,0.831562,0.724063,0.530299,0.106024,0.131013,1.0,0.722091,0.058293,...,0.534528,0.613792,0.85607,0.161847,0.071663,0.461632,0.573498,0.759162,0.076813,0.9003
borda_count,ch,0.179035,0.876781,0.756086,0.854019,0.603136,0.20989,0.169036,0.722091,1.0,0.156883,...,0.669149,0.734759,0.747439,0.255093,0.169324,0.489731,0.739038,0.761265,0.165049,0.523359
borda_count,cl,0.893863,0.0749,0.048237,0.102267,0.135811,0.908773,0.065909,0.058293,0.156883,1.0,...,0.08465,0.060284,0.055528,0.885371,0.924411,0.038788,0.088632,0.194301,0.918058,0.037653


### 4.5 Similar regions
The correlation matrix is used to identify the top 5 similar regions for each region in the list. 

In [7]:
similar_regions = pd.DataFrame(columns=['Region','Top 5'])
for region in regions:
    top=[]
    sim_region=[]
    top=list(correlation_df['borda_count'][region].sort_values(ascending=False).head(6))
    for item in top:
        temp=list((correlation_df == item).idxmax(axis=1)['borda_count'][region])
        # print(temp[1])
        sim_region.append(temp[1])
    sim_region.pop(0) # Removing the target  region as it is similar to itself
    row=[region,sim_region]
#     # print(row)
    similar_regions = similar_regions.append(pd.Series(row,index=similar_regions.columns),ignore_index=True)

similar_regions.set_index('Region',inplace=True)
similar_regions.head(5)


Unnamed: 0_level_0,Top 5
Region,Unnamed: 1_level_1
ee,"[lt, lv, global, hu, cz]"
cr,"[hn, gt, pa, bo, py]"
pt,"[global, ch, ca, lv, nz]"
gr,"[mt, bg, lu, ee, lt]"
br,"[pt, hk, global, bg, lu]"


### 4.6 Recommendations
The Recommendations for a particular region are made as follows:
#### Item-Based Collaborative Filtering (Track Based)
* For every target region:
    * Get a list of songs already ranked by that region
    * Find the top 5 most popular songs.
    * For every song in top 5:
        * Find other songs which belong to the same cluster and add to the recommended list.
    * From the recommended list, remove songs that already exist in the target region's rankings (analogy: Have already been bought by user)
    * Choose 5 songs in random from the cleaned recommended list for the final Recommendations.
#### User-Based Collaborative Filtering (Region Based)
* For every target region:
    * Find top 5 similar regions to target region, based on correlation values. 
    * For every region in top 5:
        * Find the most popular songs of that  region and add to recommended list.
    * From the recommended list, remove songs that already exist in the target region's rankings (analogy: Have already been bought by user)
    * Choose 5 songs in random from the cleaned recommended list for the final Recommendations.

* The Recommendations from the Item-based and User-Based approaches are combined and the final list is recommended to the user. 

* The are are written to a csv file. 



In [8]:
recommendations = pd.DataFrame(columns=['Region','Songs'])
track_list=pd.read_csv('track_list.csv')

for region in regions:
    recommend_songs=[]
    temp=[]
    songs=region_songs[region]

    top_5=borda_counts_sum.loc[region].head(5)
    for track in top_5:
        cluster=track_list[track_list['Track_Name'] =='Shape of You'].Cluster
        sim_track_list=track_list.loc[track_list['Cluster'] == cluster.values[0]]['Track_Name']
        temp=set(temp)|set(sim_track_list)

    
    for sim_region in similar_regions['Top 5'][region]:
        sim_region_songs=borda_counts_sum.loc[sim_region].head(5)
        temp=set(temp)|set(sim_region_songs)
   
    temp=list(set(temp)-set(songs))
    recommend_songs=random.choices(temp, k=10)
    row=[region,recommend_songs]
    recommendations = recommendations.append(pd.Series(row,index=recommendations.columns),ignore_index=True)
recommendations.to_csv('Recommendations.csv')

recommendations.head(10)

Unnamed: 0,Region,Songs
0,ee,"[Devuelveme, Nei, nei, ekki um jólin, Sieben I..."
1,cr,"[Rica, Bien Duro, Silhouettes - Original Radio..."
2,pt,"[Ritmo Mexicano, Limonada Coco - Remix, Slecht..."
3,gr,"[Fear for Nobody, Yhen elämän juttu, Ai Ai Ai,..."
4,br,"[Geen Seconde Rust, Qui dit mieux (feat. Orels..."
5,es,"[Kontrollieren, What You Want, Migraine, Llama..."
6,lu,"[Juna, LA CRIMINAL, Che ne sai, Livet Er For K..."
7,ca,"[Kan Niet Kiezen, Kruunu tikittää (feat.TIPPA)..."
8,id,"[Never Let Me Go, Range, Destiny, Oh La La - T..."
9,lt,"[Mi No Lob, Brr Brr, One Life, Le Encanta, Vuo..."
