# Business Problem

### Armut, Turkey's largest online service platform, brings together service providers and those who want to receive service. It provides easy access to services such as cleaning, modification and transportation with a few touches on the computer or smart phone. It uses the data set containing the service users and the services and categories these users have received. It is desired to create a product recommendation system with Association Rule Learning.

# Data set

### The data set consists of the services customers receive and the categories of these services. It contains the date and time information of each service received.

# Variables

### UserId: Customer number
### ServiceId: Anonymized services belonging to each category. (Example: Upholstery washing service under the cleaning category)
### A ServiceId can be found under different categories and refers to different services under different categories. (Example: The service with CategoryId 7 and ServiceId 4 is honeycomb cleaning, while the service with CategoryId 2 and ServiceId 4 is furniture assembly)
### CategoryId: Anonymized categories. (Example: Cleaning, transportation, renovation category)
### CreateDate: The date the service was purchased

# Import the libraries

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
from mlxtend.frequent_patterns import apriori, association_rules

# Read the dataset

In [2]:
df_ = pd.read_csv("/kaggle/input/armut-datacsv/armut_data.csv")
df = df_.copy()
df.columns = [col.lower() for col in df.columns]
df.head()

Unnamed: 0,userid,serviceid,categoryid,createdate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


# Data Preparation

### ServiceID represents a different service for each CategoryID. Combine ServiceID and CategoryID with "_" to create a new variable to represent the services.

In [3]:
df['service'] = ['_'.join(col) for col in df.drop(['userid', 'createdate'], axis=1).values.astype(str)]
df.head()

Unnamed: 0,userid,serviceid,categoryid,createdate,service
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


### The data set consists of the date and time the services are received, there is no basket definition (invoice, etc.). In order to apply Association Rule Learning, a basket (invoice, etc.) definition must be created. Here, the definition of basket is the services that each customer receives monthly. For example; A basket of 9_4, 46_4 services that the customer with id 7256 received in the 8th month of 2017; 9_4, 38_4 services received in the 10th month of 2017 represent another basket.Baskets must be identified with a unique ID. To do this, first create a new date variable containing only the year and month. Combine UserID and the newly created date variable with "_" and assign it to a new variable called ID.

In [4]:
df['date'] = pd.to_datetime(df['createdate']).dt.to_period('M').astype('str')
df['box_id'] = [str(col[0]) + '_' + col[-1] for col in df.values]
df.head()

Unnamed: 0,userid,serviceid,categoryid,createdate,service,date,box_id
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


# Association Rules

### Create Pivot Table

In [5]:
df_service = df.groupby(['box_id', 'service'])['service'].count().unstack().fillna(0).applymap(lambda x: 1 if x != 0 else 0)
df_service.head()

service,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,18_4,19_6,1_4,20_5,21_5,22_0,23_10,24_10,25_0,26_7,27_7,28_4,29_0,2_0,30_2,31_6,32_4,33_4,34_6,35_11,36_1,37_0,38_4,39_10,3_5,40_8,41_3,42_1,43_2,44_0,45_6,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
box_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0
0_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
0_2018-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
0_2018-04,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


### Create association rules

In [6]:
frequent_itemsets = apriori(df_service, min_support=0.01, use_colnames=True)
frequent_itemsets.sort_values('support', ascending=False)



Unnamed: 0,support,itemsets
8,0.238121,(18_4)
19,0.130286,(2_0)
5,0.120963,(15_1)
39,0.067762,(49_1)
28,0.066568,(38_4)
3,0.056627,(13_11)
12,0.047515,(22_0)
9,0.045563,(19_6)
15,0.042895,(25_0)
7,0.041533,(17_5)


In [7]:
rules = association_rules(frequent_itemsets, metric='support', min_threshold=0.01)
rules.sort_values('lift', ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
10,(22_0),(25_0),0.047515,0.042895,0.01112,0.234043,5.456141,0.009082,1.249553
11,(25_0),(22_0),0.042895,0.047515,0.01112,0.259247,5.456141,0.009082,1.285834
19,(38_4),(9_4),0.066568,0.041393,0.010067,0.151234,3.653623,0.007312,1.129413
18,(9_4),(38_4),0.041393,0.066568,0.010067,0.243216,3.653623,0.007312,1.233418
5,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,0.007929,1.072262
4,(33_4),(15_1),0.02731,0.120963,0.011233,0.411311,3.400299,0.007929,1.493211
13,(2_0),(22_0),0.130286,0.047515,0.016568,0.127169,2.676409,0.010378,1.09126
12,(22_0),(2_0),0.047515,0.130286,0.016568,0.3487,2.676409,0.010378,1.33535
15,(25_0),(2_0),0.042895,0.130286,0.013437,0.313257,2.404371,0.007849,1.266432
14,(2_0),(25_0),0.130286,0.042895,0.013437,0.103136,2.404371,0.007849,1.067168


### Using the arl_recommender function, recommend a service to a user who last received the 2_0 service.

In [8]:
def arl_recommnder(rules_df, product_id, rec_count=1):
    sorted_rules = rules_df.sort_values('lift', ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules['antecedents']):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]['consequents'])[0])

    return recommendation_list[0: rec_count]

In [9]:
arl_recommnder(rules, '2_0', 1)

['22_0']

In [10]:
arl_recommnder(rules, '2_0', 2)

['22_0', '25_0']

In [11]:
arl_recommnder(rules, '2_0', 3)

['22_0', '25_0', '15_1']

# Thank you very much for checking my notebook!