<a href="https://www.kaggle.com/code/dilekdd/association-rule-based-recommender-system?scriptVersionId=197990372" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="text-align: center; font-size: 24px; font-weight: bold; color: HotPink;">
    Association Rule Based Recommender System
</div>

Armut, Turkey's largest online service platform, brings together service providers and those who want to receive service. 

It provides easy access to services such as cleaning, renovation, and transportation with a few touches on your computer or smartphone. 

It is intended to create a product recommendation system with Association Rule Learning using the data set containing service users and the services and categories these users receive.

The dataset consists of the services received by the customers and the categories of these services. It includes the date and time information of each service received.

| **Field**      | **Description**                                                                                                                                             |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **UserId**     | Customer number                                                                                                                                             |
| **ServiceId**  | Anonymized services for each category. (Example: Upholstery cleaning service under the cleaning category) A ServiceId can appear under different categories and represent different services in those categories. (Example: The service with CategoryId 7 and ServiceId 4 represents radiator cleaning, whereas the service with CategoryId 2 and ServiceId 4 represents furniture assembly) |
| **CategoryId** | Anonymized categories. (Example: Cleaning, transportation, renovation categories)                                                                           |
| **CreateDate** | The date the service was purchased                                                                                                                          |

In [1]:
!pip install mlxtend
#import datetime as dt
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
from mlxtend.frequent_patterns import apriori, association_rules




In [2]:
df = pd.read_csv('/kaggle/input/armut-data-csv/armut_data.csv')

In [3]:
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


In [4]:
df.shape

(162523, 4)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162523 entries, 0 to 162522
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   UserId      162523 non-null  int64 
 1   ServiceId   162523 non-null  int64 
 2   CategoryId  162523 non-null  int64 
 3   CreateDate  162523 non-null  object
dtypes: int64(3), object(1)
memory usage: 5.0+ MB


In [6]:
df.duplicated().sum()

0

In [7]:
df.isnull().sum()

UserId        0
ServiceId     0
CategoryId    0
CreateDate    0
dtype: int64

ServiceID represents a different service for each CategoryID. Create a new variable to represent these services by combining ServiceID and CategoryID with "_". The output to be obtained is:

In [8]:
df["Services"] = df["ServiceId"].astype(str) + "_" + df["CategoryId"].astype(str)

df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Services
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


The dataset consists of the date and time the services were received, there is no basket definition (invoice etc.). In order to apply Association Rule Learning, a basket (invoice etc.) definition must be created. Here, the basket definition is the services received by each customer monthly. 

For example; the customer with id 7256 represents one basket for the 9_4, 46_4 services received in the 8th month of 2017; another basket for the 9_4, 38_4 services received in the 10th month of 2017. Baskets must be defined with a unique ID. 

To do this, first create a new date variable that contains only the year and month. Combine the UserID and the newly created date variable with "_" and assign it to a new variable called ID. The output to be obtained:

In [9]:
df["New_Date"] = pd.to_datetime(df["CreateDate"]).dt.to_period("M")

df["Basket_Id"] = df["UserId"].astype(str) + "_" + df["New_Date"].astype(str)

df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Services,New_Date,Basket_Id
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


Create the service pivot table

In [10]:
def create_basket_services_df(dataframe):
    basket_services = dataframe.groupby(['Basket_Id', "Services"])['CategoryId'].count().unstack().fillna(0)
    basket_services = basket_services.apply(lambda x: x.map(lambda y: 1 if y > 0 else 0))

    return basket_services

basket_services_df = create_basket_services_df(df)

In [11]:
basket_services_df.head(15)

Services,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,18_4,19_6,1_4,20_5,21_5,22_0,23_10,24_10,25_0,26_7,27_7,28_4,29_0,2_0,30_2,31_6,32_4,33_4,34_6,35_11,36_1,37_0,38_4,39_10,3_5,40_8,41_3,42_1,43_2,44_0,45_6,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
Basket_Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0
0_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
0_2018-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
0_2018-04,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
10000_2017-12,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10000_2018-03,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10001_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10001_2018-05,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
10001_2018-06,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Create association rules

In [12]:
# The Apriori algorithm finds frequent itemsets that occur with a minimum of 1% support.

frequent_itemsets = apriori(basket_services_df,
                            min_support=0.01,
                            use_colnames=True)

frequent_itemsets.sort_values("support", ascending=False)[:10]




Unnamed: 0,support,itemsets
8,0.238121,(18_4)
19,0.130286,(2_0)
5,0.120963,(15_1)
39,0.067762,(49_1)
28,0.066568,(38_4)
3,0.056627,(13_11)
12,0.047515,(22_0)
9,0.045563,(19_6)
15,0.042895,(25_0)
7,0.041533,(17_5)


In [13]:
#Relationship rules are derived from the frequent itemsets obtained as a result of the apriori algorithm.

rules = association_rules(frequent_itemsets,
                          metric="support",
                          min_threshold=0.01)

rules[rules["lift"] > 2].sort_values("lift", ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
10,(22_0),(25_0),0.047515,0.042895,0.01112,0.234043,5.456141,0.009082,1.249553,0.857462
11,(25_0),(22_0),0.042895,0.047515,0.01112,0.259247,5.456141,0.009082,1.285834,0.853324
18,(38_4),(9_4),0.066568,0.041393,0.010067,0.151234,3.653623,0.007312,1.129413,0.778096
19,(9_4),(38_4),0.041393,0.066568,0.010067,0.243216,3.653623,0.007312,1.233418,0.757661
4,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,0.007929,1.072262,0.803047
5,(33_4),(15_1),0.02731,0.120963,0.011233,0.411311,3.400299,0.007929,1.493211,0.725728
12,(22_0),(2_0),0.047515,0.130286,0.016568,0.3487,2.676409,0.010378,1.33535,0.657611
13,(2_0),(22_0),0.130286,0.047515,0.016568,0.127169,2.676409,0.010378,1.09126,0.720197
14,(2_0),(25_0),0.130286,0.042895,0.013437,0.103136,2.404371,0.007849,1.067168,0.67159
15,(25_0),(2_0),0.042895,0.130286,0.013437,0.313257,2.404371,0.007849,1.266432,0.610268


Make a service suggestion to a user who last received 2_0 service

In [14]:
latest_user_with_2_0 = df[df['Services'] == '2_0'].sort_values(by='New_Date', ascending=False).iloc[0]['UserId']


def arl_recommender(rules_df, Services, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules["antecedents"]):
        for j in list(product):
            if j == Services:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return recommendation_list[0:rec_count]



recommendations = arl_recommender(rules, '2_0', rec_count=1)

print(f"Recommendations for the user {latest_user_with_2_0 }: {recommendations}")

Recommendations for the user 10591: ['22_0']
