# PROJECT

**Armut is a service platform** in Turkey which meet service providers and those who want to receive service. It provides easy access to services such as cleaning, renovation and transportation. While using the data that has users and the services that these users have taken the **Recommendation System** wanted to be created with **Association Rule Learning** method.

In [53]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

pd.set_option("display.max_columns",500)

## DATA

* UserId => Customer number
* ServiceId => The services for each category.
* CategoryId => The categories that represents the services in general such as cleaning, transportation.
* CreateDate => The date that service has taken.

* **NOTE!** <br>
One ServiceId can be under different categories and for different catefories it represents different services. For example: CategoryId=7 ServiceId=4 cleaning of heather also CategoryId=2 and ServiceId=4 represents setup of the furnitures.

In [2]:
armut = pd.read_csv("armut_data.csv")
armut.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


In [3]:
armut.shape

(162523, 4)

In [4]:
armut["UserId"].nunique()

24826

In [5]:
armut.isnull().sum()

UserId        0
ServiceId     0
CategoryId    0
CreateDate    0
dtype: int64

In [6]:
df = armut.copy()

## TASK 1: Prepare Data

Create Hizmet(Service) column to add up ServiceId and CategoryId such as 4_5 or 2_7

In [7]:
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


In [8]:
df["Hizmet"] = df["ServiceId"].apply(str)+"_"+df["CategoryId"].apply(str)
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Hizmet
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


* Create New_Date which has just year and month. 
* Create SepetID (bagID) that has UserId and the New_Date

In [9]:
df["New_Date"] = pd.to_datetime(df['CreateDate']).dt.to_period('M')
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Hizmet,New_Date
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08


In [10]:
df["SepetID"] = df["UserId"].apply(str)+"_"+df["New_Date"].apply(str)
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Hizmet,New_Date,SepetID
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


## TASK 2: Create Association Rules and Make A Suggestion

Create pivot table with SepetID and Hizmet

In [11]:
df_pivot = df.groupby(["SepetID","Hizmet"])["Hizmet"].count().unstack().fillna(0).applymap(lambda x: 1 if x > 0 else 0)
df_pivot.head()


Hizmet,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,18_4,19_6,1_4,20_5,21_5,22_0,23_10,24_10,25_0,26_7,27_7,28_4,29_0,2_0,30_2,31_6,32_4,33_4,34_6,35_11,36_1,37_0,38_4,39_10,3_5,40_8,41_3,42_1,43_2,44_0,45_6,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
SepetID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0
0_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
0_2018-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
0_2018-04,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


Create Association Rules

We will use **apriori** algorithm to analyze basket with products. Apriori algortihm will produce Support value for items.

In [37]:
frequent_itemsets = apriori(df_pivot,min_support=0.01,use_colnames=True)
frequent_itemsets.tail()



Unnamed: 0,support,itemsets
51,0.01112,"(25_0, 22_0)"
52,0.016568,"(2_0, 22_0)"
53,0.013437,"(25_0, 2_0)"
54,0.011191,"(2_0, 38_4)"
55,0.010067,"(38_4, 9_4)"


In [13]:
frequent_itemsets.sort_values("support",ascending=False)

Unnamed: 0,support,itemsets
8,0.238121,(18_4)
19,0.130286,(2_0)
5,0.120963,(15_1)
39,0.067762,(49_1)
28,0.066568,(38_4)
3,0.056627,(13_11)
12,0.047515,(22_0)
9,0.045563,(19_6)
15,0.042895,(25_0)
7,0.041533,(17_5)


In [16]:
rules = association_rules(frequent_itemsets,metric="support",min_threshold=0.01)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(2_0),(13_11),0.130286,0.056627,0.012819,0.098394,1.737574,0.005442,1.046325
1,(13_11),(2_0),0.056627,0.130286,0.012819,0.226382,1.737574,0.005442,1.124216
2,(15_1),(2_0),0.120963,0.130286,0.033951,0.280673,2.154278,0.018191,1.209066
3,(2_0),(15_1),0.130286,0.120963,0.033951,0.260588,2.154278,0.018191,1.188833
4,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,0.007929,1.072262


* ANTECEDENTS: FIRST PRODUCT (SERVICE)
* CONSEQUENTS: SECOND PRODUCT (SERVICE)
* ANTECEDENT SUPPORT: THE PROB. THAT FIRST SERVICE IS TAKEN
* CONSEQUENTS SUPPORT: THE PROB. THAT SECOND SERVICE IS TAKEN
* SUPPORT: THE PROB. THAT FIRST AND SECOND SERVICE IS TAKEN TOGETHER.

In [54]:
def recommender_system(association_rules, selected_product, recommendation_count=2):
    sorted_rules = association_rules.sort_values("lift",ascending=False)
    recommendation_list = []
    for i, product in sorted_rules["antecedents"].items():
        for j in list(product):
            if j == selected_product:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"]))
    # to prevent non unique values we will use dictionary
    recommendation_list = list({item for item_list in recommendation_list for item in item_list if item != selected_product})
    return recommendation_list[:recommendation_count]


In [55]:
recommender_system(rules,"2_0")

['15_1', '38_4']

## Result
We can say if a user is using number 2 Service in 0 category we can recommend thar user to take number 15 category under number 1 category and number 38 Service under number 4 category. 