<a href="https://colab.research.google.com/github/ashishpatel26/Ganpat-University-Data-Science/blob/main/Lecture_7_Music_Recommendation_System_Association_Rule_Mining_LASTFM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Association Rule Mining of Last FM

* Datatset : https://www.biz.uiowa.edu/faculty/jledolter/DataMining/lastfm.csv
* We have total Transaction : **289955**
* We have library(apyori) to calculate the association rule using Apriori.

> Requirement : `# pip install apyori`

---
Notebook outline
---
1. <b><a href="#1">Import Required Packages</a></b>
1. <b><a href="#2">Load Dataset</a></b>
1. <b><a href="#3">Generate Shallow Copy</a></b>
1. <b><a href="#4">Differentiate Data and Extract useful columns</a></b>
1. <b><a href="#5">Drop Duplicate Data</a></b>
1. <b><a href="#6">Transform Dataset into form of Transaction into list</a></b>
1. <b><a href="#7">Generate Rule using Apriori Algorithm</a></b>
1. <b><a href="#8">Display All Rules</a></b>
1. <b><a href="#9">Final Result with Support, Confidense and Lift</a></b>

---
## Algorithm Explaination:
---

* We have provide `min_support`, `min_confidence`, `min_lift`, and `min length` of sample-set for find rule.

#### Measure 1: Support.
This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

![](https://annalyzin.files.wordpress.com/2016/04/association-rule-support-table.png?w=503&h=447)

If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. You may then identify itemsets with support values above this threshold as significant itemsets.

#### Measure 2: Confidence. 
This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple -> beer} is 3 out of 4, or 75%.

![](https://annalyzin.files.wordpress.com/2016/03/association-rule-confidence-eqn.png?w=527&h=77)

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items, we use a third measure called lift.

#### Measure 3: Lift. 
This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1,which implies no association between items. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.
![](https://annalyzin.files.wordpress.com/2016/03/association-rule-lift-eqn.png?w=566&h=80)

---

### 1.Import Required Packages <h2 id="1"> </h2>

In [2]:
!pip install apyori
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from apyori import apriori
%matplotlib inline
import os



### 2.Load Dataset <span id="2"> </span>

In [3]:
lastfm1 = pd.read_csv("https://www.biz.uiowa.edu/faculty/jledolter/DataMining/lastfm.csv")

### 3.Generate Shallow Copy<span id="3"></span>

In [4]:
lastfm = lastfm1.copy()
lastfm.shape

(289955, 4)

### 4.Differentiate Data and Extract useful columns <span id="4"></span>

In [6]:
lastfm = lastfm[['user','artist']]
lastfm.head()

Unnamed: 0,user,artist
0,1,red hot chili peppers
1,1,the black dahlia murder
2,1,goldfrapp
3,1,dropkick murphys
4,1,le tigre


### 5.Drop Duplicate Data <span id="5"></span>

In [7]:
lastfm = lastfm.drop_duplicates()
lastfm.shape

(289953, 2)

### 6.Transform Dataset into form of Transaction into list <span id="6"></span>

In [8]:
records = []
for i in lastfm['user'].unique():
    records.append(list(lastfm[lastfm['user'] == i]['artist'].values))

In [9]:
print(type(records))

<class 'list'>


In [15]:
for i in range(5):
  print(records[i][:5])

['red hot chili peppers', 'the black dahlia murder', 'goldfrapp', 'dropkick murphys', 'le tigre']
['devendra banhart', 'boards of canada', 'cocorosie', 'aphex twin', 'animal collective']
['tv on the radio', 'tool', 'kyuss', 'dj shadow', 'air']
['dream theater', 'ac/dc', 'metallica', 'iron maiden', 'bob marley & the wailers']
['lily allen', 'kanye west', 'sigur rós', 'pink floyd', 'stevie wonder']


### 7.Generate Rule using Apriori Algorithm <span id="7"></span>

In [16]:
association_rules = apriori(records, min_support=0.01, min_confidence=0.4, min_lift=3, min_length=2)
association_results = list(association_rules)

In [17]:
print("There are {} Relation derived.".format(len(association_results)))

There are 91 Relation derived.


### 8.Display All Rules<span id="8"></span>

In [18]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'tool', 'a perfect circle'})
frozenset({'arctic monkeys', 'kaiser chiefs'})
frozenset({'beyoncé', 'rihanna'})
frozenset({'black sabbath', 'metallica'})
frozenset({'sum 41', 'blink-182'})
frozenset({'linkin park', 'breaking benjamin'})
frozenset({'bright eyes', 'death cab for cutie'})
frozenset({'broken social scene', 'death cab for cutie'})
frozenset({'broken social scene', 'radiohead'})
frozenset({'in flames', 'children of bodom'})
frozenset({'coldplay', 'keane'})
frozenset({'snow patrol', 'coldplay'})
frozenset({'coldplay', 'the fray'})
frozenset({'coldplay', 'travis'})
frozenset({'daft punk', 'justice'})
frozenset({'the decemberists', 'death cab for cutie'})
frozenset({'the postal service', 'death cab for cutie'})
frozenset({'the shins', 'death cab for cutie'})
frozenset({'led zeppelin', 'deep purple'})
frozenset({'dream theater', 'metallica'})
frozenset({'panic at the disco', 'fall out boy'})
frozenset({'franz ferdinand', 'kaiser chiefs'})
frozenset({'good charlotte', 'l

### 9.Final Result with Support, Confidense and Lift <span id="9"></span>

In [21]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: With " + items[0] + " you can also listen " + items[1])

    # second index of the inner list
    print("Support: " + str(item[1]))

    # third index of the list located at 0th
    # of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: With tool you can also listen a perfect circle
Support: 0.016266666666666665
Confidence: 0.44283121597096187
Lift: 8.717149920688225
Rule: With arctic monkeys you can also listen kaiser chiefs
Support: 0.012533333333333334
Confidence: 0.4008528784648188
Lift: 5.3116547499755145
Rule: With beyoncé you can also listen rihanna
Support: 0.013933333333333334
Confidence: 0.46860986547085204
Lift: 10.88103402796096
Rule: With black sabbath you can also listen metallica
Support: 0.0172
Confidence: 0.45263157894736844
Lift: 4.06555310431768
Rule: With sum 41 you can also listen blink-182
Support: 0.014133333333333333
Confidence: 0.42741935483870963
Lift: 7.420474910394264
Rule: With linkin park you can also listen breaking benjamin
Support: 0.0108
Confidence: 0.4426229508196721
Lift: 4.507362024640246
Rule: With bright eyes you can also listen death cab for cutie
Support: 0.0152
Confidence: 0.4021164021164021
Lift: 4.944054124381993
Rule: With broken social scene you can also listen death