# Association Rules

This method tries to find rules that show how items are usually paired together. In our case, we try to find rules about how points of interest are usually visited together.

In [1]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import numpy as np

### Data

In [2]:
walkins = pd.read_csv('Aueb Dmst Assignment Data\walkins.csv', dtype = str)
users = pd.read_csv('Aueb Dmst Assignment Data\\users.csv', dtype = str)
pois = pd.read_csv('Aueb Dmst Assignment Data\pois.csv', dtype = str)
categories = pd.read_csv('Aueb Dmst Assignment Data\categories.csv', dtype = str)

In [3]:
categories.rename(columns= {'id': 'category_id'}, inplace = True)
pois_categories = pd.merge(pois[['id', 'category_id']],
                           categories,
                           how = 'left',
                           on = 'category_id')

In [4]:
pois_categories.rename(columns= {'id': 'poi_id'}, inplace = True)
visits = pd.merge(walkins,
                  pois_categories[['poi_id', 'title']],
                  how = 'left',
                  on = 'poi_id')

In [5]:
user_cons = pd.read_excel('user_cons.xlsx', dtype = str)

### Analysis

* **Confidence**: the probability that a transaction X also contains the object Y.
* **Lift**: an indication of whether a rule can be considered representative of the data to be used in the decision-making process.

#### Analysis 1: Per Day & Per User between Categories of Points of Interest

We try to find if there is an association between the categories of points of interest a user visits during the same day.

In [6]:
visits['code'] = visits['user_id'] + visits['created']
data = visits[['code', 'title', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [7]:
data = data.groupby(by=['code', 'title']).count().reset_index()
data['sum'] = 1

In [8]:
perday = data.groupby(['code', 'title'])['sum'].sum().unstack().reset_index().fillna(0).set_index('code')

In [9]:
frequent_itemsets = apriori(perday, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.8)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [10]:
rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
31,(Pet Store),(Supermarket),0.03603,0.397681,0.014168,0.393214,0.988767,-0.000161,0.992638
3,(Bakery),(Supermarket),0.052697,0.397681,0.019597,0.371887,0.935139,-0.001359,0.958934
17,(Electronics Store),(Supermarket),0.04662,0.397681,0.017296,0.370999,0.932906,-0.001244,0.957581
43,(Women's Store),(Supermarket),0.092467,0.397681,0.032291,0.349213,0.878123,-0.004482,0.925524
33,(Shoe Store),(Supermarket),0.043042,0.397681,0.014851,0.345029,0.867604,-0.002266,0.919613
19,(Gym),(Supermarket),0.044121,0.397681,0.014941,0.338631,0.851514,-0.002605,0.910716
21,(Home Store),(Supermarket),0.066613,0.397681,0.02242,0.336572,0.846338,-0.004071,0.90789
11,(Clothing Store),(Supermarket),0.072186,0.397681,0.024218,0.335492,0.843621,-0.004489,0.906414
29,(Museum),(Supermarket),0.032327,0.397681,0.010662,0.329811,0.829336,-0.002194,0.89873
27,(Metro Station),(Supermarket),0.104585,0.397681,0.034268,0.32766,0.823928,-0.007323,0.895856


**Not good result. Associations are mostly for categories 'Supermarket', 'Metro Station' and 'Square'.**

#### Analysis 2: Per User between Categories of Points of Interest

We try to find if there is an association between the categories of points of interest a user visits.

In [11]:
data = visits[['user_id', 'title', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)
data = data.groupby(by=['user_id', 'title']).count().reset_index()
data['sum'] = 1
perday = data.groupby(['user_id', 'title'])['sum'].sum().unstack().reset_index().fillna(0).set_index('user_id')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [12]:
frequent_itemsets = apriori(perday, min_support=0.7, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [13]:
rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1769,"(Metro Station, Square, Cafe, Women's Store)",(Supermarket),0.704478,0.858706,0.704478,1.0,1.164542,0.099538,inf
362,"(Cultural Center, Clothing Store)",(Supermarket),0.701493,0.858706,0.701493,1.0,1.164542,0.099116,inf
1353,"(Metro Station, Square, Women's Store)",(Supermarket),0.705473,0.858706,0.705473,1.0,1.164542,0.099679,inf
1101,"(Park, Women's Store, Cafe)",(Supermarket),0.700498,0.858706,0.700498,1.0,1.164542,0.098976,inf
1889,"(Metro Station, Square, Beach, Shopping Mall, ...",(Park),0.700498,0.812935,0.700498,1.0,1.23011,0.131038,inf
611,"(Metro Station, Cafe, Bakery)",(Supermarket),0.703483,0.858706,0.703483,1.0,1.164542,0.099398,inf
158,"(Cultural Center, Beach)",(Park),0.769154,0.812935,0.769154,1.0,1.23011,0.143882,inf
1893,"(Metro Station, Beach, Shopping Mall, Park, Cu...",(Square),0.700498,0.800995,0.700498,1.0,1.248447,0.139402,inf
1800,"(Park, Square, Cafe, Women's Store)",(Supermarket),0.700498,0.858706,0.700498,1.0,1.164542,0.098976,inf
1905,"(Metro Station, Cultural Center, Beach, Shoppi...","(Park, Square)",0.700498,0.768159,0.700498,1.0,1.301813,0.162404,inf


**Too many results that can not be interpreted in a useful way.**

#### Analysis 3: Per Day & Per User  between Points of Interest

We try to find if there is an association between the points of interest a user visits during the same day.

In [14]:
visits['code'] = visits['user_id'] + visits['created']
data = visits[['code', 'poi_id', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)
data = data.groupby(by=['code', 'poi_id']).count().reset_index()
data['sum'] = 1
perday = data.groupby(['code', 'poi_id'])['sum'].sum().unstack().reset_index().fillna(0).set_index('code')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [15]:
frequent_itemsets = apriori(perday, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.01)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [16]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction


**No results**

####  Analysis 4: Per Day & Per User between Categories of Points of Interest while taking into account visits in multiple points of interest that belong in the same category
We try to find if there is an association between the categories of points of interest a user visits during the same day. In this specific analysis we want to see if users often visit points of interest that belong in the same category together.

In [17]:
visits['code'] = visits['user_id'] + visits['created']
data = visits[['code', 'title', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)
data = data.groupby(by=['code', 'title']).count().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [18]:
visits['code'] = visits['user_id'] + visits['created']
data = visits[['code', 'title', 'id']]

In [19]:
data = data.groupby(by=['code', 'title']).count().reset_index()
data['sum'] = 1

In [20]:
perday = data.groupby(['code', 'title'])['sum'].sum().unstack().reset_index().fillna(0).set_index('code')
frequent_itemsets = apriori(perday, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.6)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [21]:
rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
43,(Pet Store),(Supermarket),0.03603,0.397681,0.014168,0.393214,0.988767,-0.000161,0.992638
3,(Bakery),(Supermarket),0.052697,0.397681,0.019597,0.371887,0.935139,-0.001359,0.958934
23,(Electronics Store),(Supermarket),0.04662,0.397681,0.017296,0.370999,0.932906,-0.001244,0.957581
61,(Women's Store),(Supermarket),0.092467,0.397681,0.032291,0.349213,0.878123,-0.004482,0.925524
45,(Shoe Store),(Supermarket),0.043042,0.397681,0.014851,0.345029,0.867604,-0.002266,0.919613
25,(Gym),(Supermarket),0.044121,0.397681,0.014941,0.338631,0.851514,-0.002605,0.910716
29,(Home Store),(Supermarket),0.066613,0.397681,0.02242,0.336572,0.846338,-0.004071,0.90789
17,(Clothing Store),(Supermarket),0.072186,0.397681,0.024218,0.335492,0.843621,-0.004489,0.906414
37,(Museum),(Supermarket),0.032327,0.397681,0.010662,0.329811,0.829336,-0.002194,0.89873
35,(Metro Station),(Supermarket),0.104585,0.397681,0.034268,0.32766,0.823928,-0.007323,0.895856


**Similar results with those of the 1st analysis.**

#### Analysis 5: Per Day & Per User between Categories of Points of Interest, only for consistent clients

We try to find if there is an association between the categories of points of interest a consistent user visits during the same day.

In [22]:
visits['code'] = visits['user_id'] + visits['created']
visits = pd.merge(visits,
                  user_cons,
                  how = 'left',
                  on = 'user_id')

In [23]:
visits_cons = visits[visits['consistency'] == 'consistent']
data = visits_cons[['code', 'title', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [24]:
data = data.groupby(by=['code', 'title']).count().reset_index()
data['sum'] = 1

In [25]:
perday = data.groupby(['code', 'title'])['sum'].sum().unstack().reset_index().fillna(0).set_index('code')

In [26]:
frequent_itemsets = apriori(perday, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.8)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [27]:
rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
31,(Pet Store),(Supermarket),0.036448,0.398054,0.014283,0.391872,0.984468,-0.000225,0.989833
17,(Electronics Store),(Supermarket),0.046616,0.398054,0.017245,0.369949,0.929394,-0.00131,0.955392
3,(Bakery),(Supermarket),0.052907,0.398054,0.019513,0.368821,0.926561,-0.001547,0.953685
43,(Women's Store),(Supermarket),0.093671,0.398054,0.032717,0.349278,0.877463,-0.004569,0.925042
33,(Shoe Store),(Supermarket),0.043196,0.398054,0.014959,0.346317,0.870024,-0.002235,0.920852
19,(Gym),(Supermarket),0.044659,0.398054,0.015124,0.338657,0.850781,-0.002653,0.910187
21,(Home Store),(Supermarket),0.067501,0.398054,0.02275,0.337036,0.846709,-0.004119,0.907962
11,(Clothing Store),(Supermarket),0.072859,0.398054,0.024524,0.336596,0.845604,-0.004478,0.90736
15,(Cultural Center),(Supermarket),0.069183,0.398054,0.023134,0.334391,0.840063,-0.004404,0.904353
27,(Metro Station),(Supermarket),0.103491,0.398054,0.034546,0.333805,0.838591,-0.006649,0.903557


**Close results with those of the 1st analysis.**

#### Analysis 5: Per Day & Per User between Points of Interest, only for consistent clients

We try to find if there is an association between the points of interest a consistent user visits during the same day.

In [31]:
visits['code'] = visits['user_id'] + visits['created']
visits = pd.merge(visits,
                  user_cons,
                  how = 'left',
                  on = 'user_id')

In [32]:
visits_cons = visits[visits['consistency'] == 'inconsistent']
data = visits[['user_id', 'poi_id', 'id']]
data.rename(columns= {'id': 'sum'}, inplace = True)
data = data.groupby(by=['user_id', 'poi_id']).count().reset_index()
data['sum'] = 1
perday = data.groupby(['user_id', 'poi_id'])['sum'].sum().unstack().reset_index().fillna(0).set_index('user_id')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [33]:
frequent_itemsets = apriori(perday, min_support=0.7, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values(by = 'confidence', ascending = False, inplace = True)

In [34]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
3,(581ce08d-b424-45ef-a2c2-f328989a52b1),(0f65c381-51dc-40a6-b306-1fc109e1f62c),0.757214,0.768159,0.747264,0.986859,1.284707,0.165603,17.643085
0,(373636b6-6361-48e8-a02d-2de41d80edb6),(0f65c381-51dc-40a6-b306-1fc109e1f62c),0.722388,0.768159,0.712438,0.986226,1.283882,0.157529,16.831642
4,(373636b6-6361-48e8-a02d-2de41d80edb6),(581ce08d-b424-45ef-a2c2-f328989a52b1),0.722388,0.757214,0.706468,0.977961,1.291526,0.159465,11.016418
2,(0f65c381-51dc-40a6-b306-1fc109e1f62c),(581ce08d-b424-45ef-a2c2-f328989a52b1),0.768159,0.757214,0.747264,0.972798,1.284707,0.165603,8.925278
5,(581ce08d-b424-45ef-a2c2-f328989a52b1),(373636b6-6361-48e8-a02d-2de41d80edb6),0.757214,0.722388,0.706468,0.932983,1.291526,0.159465,4.142406
1,(0f65c381-51dc-40a6-b306-1fc109e1f62c),(373636b6-6361-48e8-a02d-2de41d80edb6),0.768159,0.722388,0.712438,0.927461,1.283882,0.157529,3.827079


**The associations are only between points of interest "Flisvos Park", "Palaio Faliro Beach" and "Stavros Niarchos Foundation".**