# Market Basket Analysis

<img src = "img/basket.jpg" height="300" width="300">


Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans

**Example** 

1. Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other.

2.  one super market chain discovered in its analysis that male customers that bought diapers often bought beer as well, have put the diapers close to beer coolers, and their sales increased dramatically

*src*: https://en.wikipedia.org/wiki/Affinity_analysis

In [1]:
import pandas as pd

Let's load the user-comment data. 

In [2]:
user_stories = pd.read_csv("data/story_user_comment.csv")

In [3]:
user_stories.head()

Unnamed: 0,user,story,comment
0,21,14356377,1
1,21,15131370,1
2,21,15196309,1
3,47,15601729,1
4,47,14023198,1


In [4]:
user_stories.shape

(52460, 3)

In [5]:
user_stories_sorted = user_stories.sort_values("user")

In [6]:
user_stories_sorted.head()

Unnamed: 0,user,story,comment
1298,3,14384187,1
21446,1000001,13816627,1
43320,10001001010,14238005,1
39685,1096232042,15357584,1
39684,1096232042,15357584,1


In [7]:
unique_users = pd.unique(user_stories_sorted.user)

For market basket analysis, we need the baskets. Let's create the transaction matrix of various baskets.

In [8]:
basket = []

In [15]:
for user in unique_users[:100]:
    user_data = user_stories_sorted[user_stories_sorted.user == user].copy()
    user_stories = pd.unique(user_data.story)
    user_stories = [data.tolist() for data in user_stories]
    basket.append(user_stories)
    

In [16]:
basket

[[14384187],
 [13816627],
 [14238005],
 [15357584],
 [13696004, 13507993, 13338592],
 [15744359],
 [14998429],
 [14446261],
 [14087381],
 [15345483, 14702676],
 [14384187],
 [13816627],
 [14238005],
 [15357584],
 [13696004, 13507993, 13338592],
 [15744359],
 [14998429],
 [14446261],
 [14087381],
 [15345483, 14702676],
 [14044517],
 [15752022, 15254952, 14039135],
 [13559662],
 [15742287],
 [15601729],
 [13361019],
 [13601451],
 [13423629],
 [14552615],
 [14308754,
  15298833,
  13713480,
  15253781,
  15316175,
  14592745,
  14529376,
  14643467],
 [15200221, 15066729],
 [15675582, 14923362],
 [14997799],
 [13642662, 13559581],
 [14068280],
 [15878197, 14136081, 15234207, 13397145, 13338592],
 [15694926],
 [14417758],
 [15820161],
 [14170041, 13823979, 13597949],
 [15131370],
 [15037960],
 [14621347],
 [15756684],
 [14952787],
 [15423202, 15309989],
 [13354329],
 [15234207, 13500346, 15127633, 13755673],
 [13489100,
  14361425,
  14101233,
  14744068,
  13881535,
  14043631,
  13867316

In [17]:
len(basket)

110

# FP Growth

In [18]:
import pyfpgrowth

Let's create FP Tree

In [38]:
?pyfpgrowth.pyfpgrowth.FPTree

In [39]:
hn_fptree = pyfpgrowth.pyfpgrowth.FPTree(basket, 0.1, 0.1, 0.1)

In [41]:
# List frequency for various items

In [42]:
hn_fptree.frequent

Find the frequent patterns

In [19]:
patterns = pyfpgrowth.find_frequent_patterns(basket, 2)

In [44]:
patterns

Generate the association rules

In [20]:
rules = pyfpgrowth.generate_association_rules(patterns, 0.7)

In [45]:
rules