<img src="https://images.agoramedia.com/wte3.0/gcms/coterie-diapers.jpg" width=500>
<img src="https://images.newindianexpress.com/uploads/user/imagelibrary/2020/7/19/w1200X800/Post_tests-.jpg" width=500>


One of the most famous story about association rule mining is that people who buy diapers also tend to buy beers

Association Rules are helpful in identifying the underlying relations between different items. 

There are many methods to perform Association Rule Mining and one such method is Apriori algorithm. In this notebook, we will take an example of a movie datasets and find out which two movie sets go hand in hand for example people who watch Ironman 3 might also watch Avengers.

Figuring out Associations between items can be beneficial in a number of ways. For example, if Item A and Item B are bought together then:


1.   A and B could be placed side by side so that buyers of one item would be prompted to buy another
2.   No need to apply promotional offers to both A and B
3.   Item A's buyers can be the target audience for item B's advertisements
4.   A and B could be sold as a combo





Before we get into Apriori, let's learn some of the concepts/terminologies:



<img src="https://annalyzin.files.wordpress.com/2016/04/association-rule-support-table.png?w=503&h=447" width=300>

**1.   Support:** It determines how famous an ItemSet is. So, Support(Apple) = 4/8 = 0.5 Or Support([Apple, Beer]) = 3/8, Or Support([Apple, Beer, Rice]) = 2/8, and so on and so forth. Hence, support of an itemset is the ratio between the itemset's frequency and the number of transactions.

**2.   Confidence:**  Confidence of Item A given Item B is how likely item A is purchased given Item B is purchased. So the formula is Confidence(B -> A) = Support(A,B)/Support(B).

**3.   Lift:**  Lift(B->A) = Support(A,B)/Support(B) / Support(A)



In [None]:
!pip install apyori

Collecting apyori
  Downloading https://files.pythonhosted.org/packages/5e/62/5ffde5c473ea4b033490617ec5caa80d59804875ad3c3c57c0976533a21a/apyori-1.1.2.tar.gz
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-cp37-none-any.whl size=5975 sha256=68d03e2e97014b6faa400aff1df6be2d58ec692d7840495eca1672b8b51aab1b
  Stored in directory: /root/.cache/pip/wheels/5d/92/bb/474bbadbc8c0062b9eb168f69982a0443263f8ab1711a8cad0
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd  
from apyori import apriori

In [None]:
movie_data = pd.read_csv('https://raw.githubusercontent.com/tripathiaakash/ML_Course/main/movie_dataset.csv', header = None)

In [None]:
movie_data.shape

(7501, 20)

In [None]:
movie_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,The Revenant,13 Hours,Allied,Zootopia,Jigsaw,Achorman,Grinch,Fast and Furious,Ghostbusters,Wolverine,Mad Max,John Wick,La La Land,The Good Dunosaur,Ninja Turtles,The Good Dunosaur Bad Moms,2 Guns,Inside Out,Valerian,Spiderman 3
1,Beirut,Martian,Get Out,,,,,,,,,,,,,,,,,
2,Deadpool,,,,,,,,,,,,,,,,,,,
3,X-Men,Allied,,,,,,,,,,,,,,,,,,
4,Ninja Turtles,Moana,Ghost in the Shell,Ralph Breaks the Internet,John Wick,,,,,,,,,,,,,,,


In [None]:
records = []  
for i in range(0, movie_data.shape[0]):  
    records.append([movie_data.values[i,j] for j in range(0, 20) if str(movie_data.values[i,j])!='nan'])

In [None]:
records

[['The Revenant',
  '13 Hours',
  'Allied',
  'Zootopia',
  'Jigsaw',
  'Achorman',
  'Grinch',
  'Fast and Furious',
  'Ghostbusters',
  'Wolverine',
  'Mad Max',
  'John Wick',
  'La La Land',
  'The Good Dunosaur',
  'Ninja Turtles',
  'The Good Dunosaur Bad Moms',
  '2 Guns',
  'Inside Out',
  'Valerian',
  'Spiderman 3'],
 ['Beirut', 'Martian', 'Get Out'],
 ['Deadpool'],
 ['X-Men', 'Allied'],
 ['Ninja Turtles',
  'Moana',
  'Ghost in the Shell',
  'Ralph Breaks the Internet',
  'John Wick'],
 ['Mad Max'],
 ['The Spy Who Dumped Me', 'Hotel Transylvania'],
 ['Thor', 'London Has Fallen', 'The Lego Movie'],
 ['Intern', 'Tomb Rider', 'John Wick'],
 ['Hotel Transylvania'],
 ['Get Out', 'Suicide Squad'],
 ['Doctor Strange'],
 ['X-Men', 'Beirut', 'Ninja Turtles', 'Get Out', 'Fantastic Beast'],
 ['Tomb Rider', 'Cafe Society', 'Doctor Strange'],
 ['Ninja Turtles', 'The Good Dunosaur Bad Moms'],
 ['Ninja Turtles'],
 ['The Revenant',
  'Coco',
  'Captain America',
  'La La Land',
  'Spiderman

Let's suppose we want itemsets in our analysis which have occured atleast 40 times in all of the transaction data. So, minimum support for any itemset is 40/7501 = 0.0053. Similarly, any itemset's minimum confidence is set to be 0.2 or 20% which means that Item A must have been bought along with Item B atleast 20% of the times Item B was bought. Also, we have set minimum lift as 3 

***Apriori Says All subsets of a frequent itemset must also be frequent.***

<img src="https://miro.medium.com/max/1000/1*3C8TKEtyZHpYbesLwLeCxQ.gif">

In [None]:
association_rules = apriori(records, min_support=0.0053, min_confidence=0.20, min_lift=3, min_length=2)
association_results = list(association_rules)

In [None]:
len(association_results)

16

In [None]:
association_results[1]

RelationRecord(items=frozenset({'Star Wars', 'Green Lantern'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Star Wars'}), items_add=frozenset({'Green Lantern'}), confidence=0.3728813559322034, lift=4.700811850163794)])

In [None]:
association_results

[RelationRecord(items=frozenset({'Green Lantern', 'Red Sparrow'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Red Sparrow'}), items_add=frozenset({'Green Lantern'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'Star Wars', 'Green Lantern'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Star Wars'}), items_add=frozenset({'Green Lantern'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'Kung Fu Panda', 'Jumanji'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Kung Fu Panda'}), items_add=frozenset({'Jumanji'}), confidence=0.3234501347708895, lift=3.2919938411349285)]),
 RelationRecord(items=frozenset({'Wonder Woman', 'Jumanji'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Wonder Woman'}), items_add=frozenset({'Jumanji

In [None]:
association_results[0][2][0]
#association_results[0][2][0]

OrderedStatistic(items_base=frozenset({'Red Sparrow'}), items_add=frozenset({'Green Lantern'}), confidence=0.3006993006993007, lift=3.790832696715049)

In [None]:
results = []
for item in association_results:
    
    pair = item[0] 
    items = [x for x in pair]
    
    value0 = str(items[0])
    value1 = str(items[1])

    value2 = str(item[1])[:7]

    value3 = str(item[2][0][2])[:7]
    value4 = str(item[2][0][3])[:7]
    
    rows = (value0, value1,value2,value3,value4)
    results.append(rows)
    
labels = ['Title 1','Title 2','Support','Confidence','Lift']
movie_suggestion = pd.DataFrame(results, columns = labels)

In [None]:
movie_suggestion

Unnamed: 0,Title 1,Title 2,Support,Confidence,Lift
0,Green Lantern,Red Sparrow,0.00573,0.30069,3.79083
1,Star Wars,Green Lantern,0.00586,0.37288,4.70081
2,Kung Fu Panda,Jumanji,0.01599,0.32345,3.29199
3,Wonder Woman,Jumanji,0.00533,0.37735,3.84065
4,The Spy Who Dumped Me,Spiderman 3,0.00799,0.27149,4.12241
5,The Revenant,Coco,0.00533,0.23255,3.25451
6,Jumanji,Tomb Rider,0.00866,0.311,3.16532
7,Ninja Turtles,The Revenant,0.00719,0.30508,3.20061
8,Tomb Rider,Spiderman 3,0.00573,0.20574,3.12402
9,Tomb Rider,The Revenant,0.00599,0.21531,3.01314


In [None]:
10000

10000