<a href="https://www.kaggle.com/code/handandegerli/armut-arl-recommender-system?scriptVersionId=182765011" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

### Building a Product Recommendation System with Association Rule Learning

---

**Business Problem:** Armut, Turkey's largest online service platform, connects service providers with those seeking services. With just a few taps on a computer or smartphone, users can easily access services like cleaning, renovation, and transportation. 

Using the dataset containing users who have received services and the categories of services they received, an association rule learning-based pattern recommendation system is to be developed.

**Variables:**

**UserId:** Customer number\
**ServiceID:** Anonymized services belonging to each category. (Example: Sofa washing service under the cleaning category) A ServiceId can be found under different categories and refers to different services under different categories. (Example: Service with CategoryId 7 ServiceId 4 is honeycomb cleaning while service with CategoryId 2 ServiceId 4 is furniture assembly)\
**CategoryID:** Anonymized categories (Example: Cleaning, transportation, renovation category)\
**CreateDate:** Date the service was purchased\

**Install Packages**

In [1]:
!pip install openpyxl



**Import Libraries**

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
from mlxtend.frequent_patterns import apriori, association_rules

### Task 1: Data Preparation
---
**Step 1:** Read the armut_data.csv file.\
**Step 2:** ServiceID represents a different service for each CategoryID.  Create a new variable to represent these services by combining ServiceID and CategoryID with "_".\
**Step 3:** The data set consists of the date and time the services were received, there is no basket definition (invoice etc.). In order to apply Association Rule Learning, a basket (invoice, etc.) definition must be created. The basket definition here is the services that each customer receives monthly. For example; A basket of 9_4, 46_4 services received by customers with ID 7256 in the 8th month of 2017; The 9_4, 38_4 services received in the 10th month of 2017 refer to another basket. Carts must be identified with a unique ID. To do this, first create a new date variable that contains only the year and month. When you create a new UserID, combine the variable te with " _ " and assign it to a new variable named ID.

In [3]:
df_ = pd.read_csv("/kaggle/input/armut-dataset-for-arl-recommender-system/armut_data.csv")
df = df_.copy()
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


In [4]:
def check_df(dataframe, head=5, tail=5, quan=False):
    print('########## Shape ##########')
    print(dataframe.shape)
    print('########## Types ##########')
    print(dataframe.dtypes)
    print('########## Head ##########')
    print(dataframe.head(head))
    print('########## Tail ##########')
    print(dataframe.tail(tail))
    print('########## NA ##########')
    print(dataframe.isnull().sum())
    if quan:
        print('########## Quantiles ##########')
        print(dataframe.describe([0, 0.01, 0.25, 0.50, 0.75, 0.99, 1]).T)

check_df(df, quan=True)

########## Shape ##########
(162523, 4)
########## Types ##########
UserId         int64
ServiceId      int64
CategoryId     int64
CreateDate    object
dtype: object
########## Head ##########
   UserId  ServiceId  CategoryId           CreateDate
0   25446          4           5  2017-08-06 16:11:00
1   22948         48           5  2017-08-06 16:12:00
2   10618          0           8  2017-08-06 16:13:00
3    7256          9           4  2017-08-06 16:14:00
4   25446         48           5  2017-08-06 16:16:00
########## Tail ##########
        UserId  ServiceId  CategoryId           CreateDate
162518   10591         25           0  2018-08-06 14:40:00
162519   10591          2           0  2018-08-06 14:43:00
162520   10591         31           6  2018-08-06 14:47:00
162521   12666         38           4  2018-08-06 16:01:00
162522   17497         47           7  2018-08-06 16:04:00
########## NA ##########
UserId        0
ServiceId     0
CategoryId    0
CreateDate    0
dtype: int64


In [5]:
df['Service'] = df['ServiceId'].astype(str) + '_' + df['CategoryId'].astype(str)
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


In [6]:
df['CreateDate'] = pd.to_datetime(df['CreateDate'])

In [7]:
df['New_Date'] = pd.to_datetime(df['CreateDate'],format='%Y-%m').dt.to_period('M')
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service,New_Date
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08


In [8]:
df['BasketId'] = df['UserId'].astype(str) + '_' + df['New_Date'].astype(str)
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Service,New_Date,BasketId
0,25446,4,5,2017-08-06 16:11:00,4_5,2017-08,25446_2017-08
1,22948,48,5,2017-08-06 16:12:00,48_5,2017-08,22948_2017-08
2,10618,0,8,2017-08-06 16:13:00,0_8,2017-08,10618_2017-08
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
4,25446,48,5,2017-08-06 16:16:00,48_5,2017-08,25446_2017-08


### Task 2:Create Association Rules and Make Suggestions
---

**Step 1:** Create the service pivot table.\
**Step 2:** Create association rules.\
**Step 3:** Use the arl_recommender function to recommend a service to a user who last purchased 2_0 services.

In [9]:
def create_service_product_df(dataframe):
    return dataframe.groupby(['BasketId', 'Service'])['CategoryId'].count().unstack().fillna(0). \
            map(lambda x: 1 if x > 0 else 0)

service_product_df = create_service_product_df(df)

In [10]:
def create_rules(dataframe):
    frequent_itemsets = apriori(dataframe.astype('bool'), min_support=0.01, use_colnames=True)
    rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)
    return rules
rules = create_rules(service_product_df)

In [11]:
def arl_recommender(rules_df, service, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, serv in enumerate(sorted_rules["antecedents"]):
        for j in list(serv):
            if j == service:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return print("You might also need this service -->",recommendation_list[0:rec_count])

arl_recommender(rules, '2_0',3)

You might also need this service --> ['22_0', '25_0', '15_1']
