# Market Basket Analysis



1. The process of discovering frequent item sets in large transactional database is called market basket analysis. 
2. Frequent item set mining leads to the discovery of associations and correlations among items.

## Import Library

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from apyori import apriori

## Load Dataset

Dataset that will be used is an Item-Surveyed data from the Bootcamp class. This dataset contain 3 items that will be bought by every people in class.

List of items:
1. HP
2. Watch
3. Racket
4. Camera
5. Music Pad
6. Mouse
7. Router
8. Soup
9. Bag

In [165]:
data=pd.read_csv('data/transaksi.csv')
data.head()

Unnamed: 0,Item 1,Item 2,Item 3
0,HP,Racket,Watch
1,HP,Camera,Watch
2,Watch,Camera,Music Pad
3,Camera,Watch,Mouse
4,HP,Watch,Music Pad


## Data Proprocessing

In [137]:
records = []
for i in range(0, 23):
    records.append([str(data.values[i,j]) for j in range(0, 3)])

This algorithm can't be used in Dataframe, so we have to convert the dataset into list form 

In [169]:
records

[['HP', 'Racket', 'Watch'],
 ['HP', 'Camera', 'Watch'],
 ['Watch', 'Camera', 'Music Pad'],
 ['Camera', 'Watch', 'Mouse'],
 ['HP', 'Watch', 'Music Pad'],
 ['Watch', 'Racket', 'Camera'],
 ['HP', 'Camera', 'Watch'],
 ['Watch', 'Camera', 'Music Pad'],
 ['Racket', 'Soap', 'Guitar'],
 ['Racket', 'Camera', 'Guitar'],
 ['Camera', 'Bag', 'nan'],
 ['Music Pad', 'Guitar', 'Camera'],
 ['Camera', 'Watch', 'nan'],
 ['Guitar', 'Camera', 'Music Pad'],
 ['Camera', 'Watch', 'Music Pad'],
 ['Camera', 'Racket', 'Guitar'],
 ['Guitar', 'Camera', 'Watch'],
 ['Guitar', 'Watch', 'nan'],
 ['Camera', 'Watch', 'nan'],
 ['Soap', 'Bag', 'Guitar'],
 ['Router', 'Bag', 'nan'],
 ['Watch', 'Racket', 'Soap'],
 ['Music Pad', 'Soap', 'Watch']]

## Fit the Apriori-Algorithm

There are some parameters that should be set in Apriori, such as min_support, min_lift, min_confidence, and min_length

1. Support is an indication of how frequently the itemset appears in the dataset
2. Confidence is an indication of how often the rule has been found to be true
3. Lift is the ratio of the observed support to that expected if X and Y were independent

In [202]:
rules = apriori(records, min_support=0.085, min_confidence=0.5, min_lift=1.5, min_length=2)
results = list(rules)

## Result of Apriori

In [203]:
for item in results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: Bag -> nan
Support: 0.08695652173913043
Confidence: 0.6666666666666666
Lift: 3.0666666666666664
Rule: HP -> Watch
Support: 0.17391304347826086
Confidence: 1.0
Lift: 1.5333333333333332
Rule: Soap -> Racket
Support: 0.08695652173913043
Confidence: 0.5
Lift: 1.9166666666666667
Rule: Camera -> Guitar
Support: 0.08695652173913043
Confidence: 1.0
Lift: 1.5333333333333332
Rule: Camera -> Guitar
Support: 0.08695652173913043
Confidence: 0.6666666666666666
Lift: 1.9166666666666665
Rule: Camera -> HP
Support: 0.08695652173913043
Confidence: 1.0
Lift: 1.5333333333333332


From the apriori, we can see that rule 'HP -> Watch' has the highest confidence value (conf=1), among other rules. We can conclude that if the customer buy HP, they will absolutely buy Watch. 

This Rule also has the highest Support Value, that means this rule mostly happen in the store among other rules.


# Thanks

Created by Fransdana Nadeak