<a href="https://colab.research.google.com/github/a-nagar/vistra-intermediate/blob/main/Frequent_Pattern_Association_Rules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install mlxtend --upgrade

Collecting mlxtend
  Downloading mlxtend-0.23.0-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: mlxtend
  Attempting uninstall: mlxtend
    Found existing installation: mlxtend 0.22.0
    Uninstalling mlxtend-0.22.0:
      Successfully uninstalled mlxtend-0.22.0
Successfully installed mlxtend-0.23.0


# Transactions Dataset
Let's look at a set of transactions stored in the form of a list with elements containing individual transactions.

In [None]:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

In [None]:
type(dataset)

list

## Converting to Transactions Dataframe
Before we can proceed, we need to convert the transaction list using TransactionEncoder object. Notice the format of the output dataframe.

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


# Apriori Algorithm
Let's run apriori algorithm and provide minimum support values.

In [None]:
from mlxtend.frequent_patterns import apriori

apriori(df, min_support=0.6, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Kidney Beans, Eggs)"
6,0.6,"(Onion, Eggs)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Onion, Kidney Beans)"
9,0.6,"(Yogurt, Kidney Beans)"


# Frequent Itemsets
Let's create frequent items sets with minimum support.

In [None]:
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.8,(Eggs),1
1,1.0,(Kidney Beans),1
2,0.6,(Milk),1
3,0.6,(Onion),1
4,0.6,(Yogurt),1
5,0.8,"(Eggs, Kidney Beans)",2
6,0.6,"(Eggs, Onion)",2
7,0.6,"(Milk, Kidney Beans)",2
8,0.6,"(Onion, Kidney Beans)",2
9,0.6,"(Yogurt, Kidney Beans)",2


In [None]:
frequent_itemsets[ (frequent_itemsets['length'] >= 2) &
                   (frequent_itemsets['support'] >= 0.6) ]

Unnamed: 0,support,itemsets,length
5,0.8,"(Eggs, Kidney Beans)",2
6,0.6,"(Eggs, Onion)",2
7,0.6,"(Milk, Kidney Beans)",2
8,0.6,"(Onion, Kidney Beans)",2
9,0.6,"(Yogurt, Kidney Beans)",2
10,0.6,"(Eggs, Onion, Kidney Beans)",3


# Association Rules
Next, let's try to find association rules with significan confidence values from the transaction dataset.

In [None]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf
1,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
3,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
4,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
6,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
7,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
8,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
9,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf


Let's try to create a column with the length of items in each antecedent.

In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
1,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,1
2,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2
3,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2
4,(Eggs),"(Onion, Kidney Beans)",0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
5,(Onion),"(Eggs, Kidney Beans)",0.6,0.8,0.6,1.0,1.25,0.12,inf,1


The above information can be used for filtering rules with sufficient number of items in antecedent or consequent.

In [None]:
rules[ (rules['antecedent_len'] >= 2) &
       (rules['confidence'] > 0.75) &
       (rules['lift'] > 1.2) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
3,"(Onion, Kidney Beans)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2


# FP Growth Algorithm

FP Growth is a faster alternative to Apriori algorithm that doesn't involve explicit candidate generation.

As per documentation, "*In particular, and what makes it different from the Apriori frequent pattern mining algorithm, FP-Growth is an frequent pattern mining algorithm that does not require candidate generation. Internally, it uses a so-called FP-tree (frequent pattern tree) datastrucure without generating the candidate sets explicitely, which makes is particularly attractive for large datasets.*"

In [None]:
from mlxtend.frequent_patterns import fpgrowth

fpgrowth(df, min_support=0.6, use_colnames=True)

Unnamed: 0,support,itemsets
0,1.0,(Kidney Beans)
1,0.8,(Eggs)
2,0.6,(Yogurt)
3,0.6,(Onion)
4,0.6,(Milk)
5,0.8,"(Eggs, Kidney Beans)"
6,0.6,"(Yogurt, Kidney Beans)"
7,0.6,"(Eggs, Onion)"
8,0.6,"(Onion, Kidney Beans)"
9,0.6,"(Eggs, Onion, Kidney Beans)"


If you just want the maximal patterns, you can use *fpmax* algorithm.

As per documentation, "*FP-Max is a variant of FP-Growth, which focuses on obtaining maximal itemsets. An itemset X is said to maximal if X is frequent and there exists no frequent super-pattern containing X. In other words, a frequent pattern X cannot be sub-pattern of larger frequent pattern to qualify for the definition maximal itemset.*"

In [None]:
from mlxtend.frequent_patterns import fpmax
fpmax(df, min_support=0.6, use_colnames=True)


Unnamed: 0,support,itemsets
0,0.6,"(Milk, Kidney Beans)"
1,0.6,"(Eggs, Onion, Kidney Beans)"
2,0.6,"(Yogurt, Kidney Beans)"


# Working With A Real Dataset
Let's work with a dataset from UCI repository: https://archive.ics.uci.edu/ml/datasets/online+retail

We will download the file and read it into a Pandas dataframe.

In [None]:
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

In [None]:
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.071429,(4 TRADITIONAL SPINNING TOPS)
1,0.096939,(ALARM CLOCK BAKELIKE GREEN)
2,0.102041,(ALARM CLOCK BAKELIKE PINK)
3,0.094388,(ALARM CLOCK BAKELIKE RED )
4,0.081633,(BAKING SET 9 PIECE RETROSPOT )
...,...,...
85,0.084184,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO..."
86,0.084184,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET..."
87,0.102041,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY..."
88,0.099490,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY..."


In [None]:
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets.head()

Unnamed: 0,support,itemsets,length
0,0.071429,(4 TRADITIONAL SPINNING TOPS),1
1,0.096939,(ALARM CLOCK BAKELIKE GREEN),1
2,0.102041,(ALARM CLOCK BAKELIKE PINK),1
3,0.094388,(ALARM CLOCK BAKELIKE RED ),1
4,0.081633,(BAKING SET 9 PIECE RETROSPOT ),1


In [None]:
pd.set_option('max_colwidth', 600)
frequent_itemsets[ (frequent_itemsets['length'] >= 2) &
                   (frequent_itemsets['support'] >= 0.07) ].sort_values(by="length", ascending=False)

Unnamed: 0,support,itemsets,length
89,0.081633,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RETROSPOT PAPER NAPKINS , POSTAGE)",4
88,0.09949,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RETROSPOT PAPER NAPKINS )",3
87,0.102041,"(SET/6 RED SPOTTY PAPER CUPS, SET/6 RED SPOTTY PAPER PLATES, POSTAGE)",3
86,0.084184,"(SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RETROSPOT PAPER NAPKINS , POSTAGE)",3
85,0.084184,"(SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETROSPOT PAPER NAPKINS , POSTAGE)",3
84,0.084184,"(PLASTERS IN TIN SPACEBOY, PLASTERS IN TIN WOODLAND ANIMALS, POSTAGE)",3
83,0.084184,"(PLASTERS IN TIN CIRCUS PARADE , PLASTERS IN TIN WOODLAND ANIMALS, POSTAGE)",3
82,0.07398,"(PLASTERS IN TIN CIRCUS PARADE , PLASTERS IN TIN SPACEBOY, POSTAGE)",3
81,0.071429,"(ALARM CLOCK BAKELIKE RED , ALARM CLOCK BAKELIKE GREEN, POSTAGE)",3
74,0.107143,"(SET/6 RED SPOTTY PAPER PLATES, POSTAGE)",2


In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ALARM CLOCK BAKELIKE GREEN),(ALARM CLOCK BAKELIKE PINK),0.096939,0.102041,0.07398,0.763158,7.478947,0.064088,3.791383
1,(ALARM CLOCK BAKELIKE PINK),(ALARM CLOCK BAKELIKE GREEN),0.102041,0.096939,0.07398,0.725,7.478947,0.064088,3.283859
2,(ALARM CLOCK BAKELIKE RED ),(ALARM CLOCK BAKELIKE GREEN),0.094388,0.096939,0.079082,0.837838,8.642959,0.069932,5.568878
3,(ALARM CLOCK BAKELIKE GREEN),(ALARM CLOCK BAKELIKE RED ),0.096939,0.094388,0.079082,0.815789,8.642959,0.069932,4.916181
4,(ALARM CLOCK BAKELIKE GREEN),(POSTAGE),0.096939,0.765306,0.084184,0.868421,1.134737,0.009996,1.783673


# Lab Assignment
You will use the MovieLens 100K dataset available from
https://grouplens.org/datasets/movielens/

We will use the version for education and research. I have already uploaded the relevant files on the server and below is the command to read the files.


In [None]:
import pandas as pd
movies = pd.read_csv("https://an-ml.s3.us-west-1.amazonaws.com/ml-latest-small/movies.csv")

  and should_run_async(code)


In [None]:
movies.head()

  and should_run_async(code)


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [None]:
ratings = pd.read_csv("https://an-ml.s3.us-west-1.amazonaws.com/ml-latest-small/ratings.csv")

  and should_run_async(code)


In [None]:
ratings.head()

  and should_run_async(code)


Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


Let's join the above two tables on the common key movieId.

In [None]:
df = pd.merge(movies, ratings, on="movieId")

  and should_run_async(code)


In [None]:
df.head()

  and should_run_async(code)


Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483


The columns of interest to us are title and userId. Now, repeat the steps that we did earlier and find significant frequent patterns and association rules. You are free to set the selection paramters.

*Optional* - Do the results make sense? Use your knowledge of movies 😀

In [None]:
basket = (df.loc[:10000,:]
          .groupby(['userId', 'title'])['rating']
          .sum().unstack().reset_index().fillna(0)
          .set_index('userId'))

  and should_run_async(code)


In [None]:
basket.head()

  and should_run_async(code)


title,Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),"Addiction, The (1995)","Adventures of Priscilla, Queen of the Desert, The (1994)",Amateur (1994),"Amazing Panda Adventure, The (1995)","American President, The (1995)",Angels and Insects (1995),Anne Frank Remembered (1995),Antonia's Line (Antonia) (1995),...,"War, The (1994)",Waterworld (1995),What's Eating Gilbert Grape (1993),When Night Is Falling (1995),While You Were Sleeping (1995),"White Balloon, The (Badkonake sefid) (1995)",White Man's Burden (1995),White Squall (1996),Wild Bill (1995),"Young Poisoner's Handbook, The (1995)"
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)

  and should_run_async(code)


In [None]:
basket_sets.fillna(0, inplace=True)

  and should_run_async(code)


In [None]:
basket_sets

  and should_run_async(code)


title,Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),"Addiction, The (1995)","Adventures of Priscilla, Queen of the Desert, The (1994)",Amateur (1994),"Amazing Panda Adventure, The (1995)","American President, The (1995)",Angels and Insects (1995),Anne Frank Remembered (1995),Antonia's Line (Antonia) (1995),...,"War, The (1994)",Waterworld (1995),What's Eating Gilbert Grape (1993),When Night Is Falling (1995),While You Were Sleeping (1995),"White Balloon, The (Badkonake sefid) (1995)",White Man's Burden (1995),White Squall (1996),Wild Bill (1995),"Young Poisoner's Handbook, The (1995)"
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0
2,0.0,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0
3,0.0,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0
4,0.0,0,0,1,0,0,0,0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0
5,1.0,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0.0,1,0,1,0,0,1,0,0,1,...,0,0.0,1.0,0,0,1,0,0,0,0
607,0.0,0,0,0,0,0,1,0,0,0,...,0,1.0,1.0,0,0,0,0,1,0,0
608,1.0,1,0,0,0,0,0,0,0,0,...,0,1.0,0.0,0,1,0,0,0,0,0
609,0.0,0,0,0,0,0,0,0,1,0,...,0,1.0,0.0,0,1,0,0,0,0,0


In [None]:
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.271331,(Ace Ventura: Pet Detective (1994))
1,0.150171,(Ace Ventura: When Nature Calls (1995))
2,0.119454,"(American President, The (1995))"
3,0.343003,(Apollo 13 (1995))
4,0.218430,(Babe (1995))
...,...,...
4621,0.076792,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), Se..."
4622,0.071672,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), St..."
4623,0.075085,"(Star Wars: Episode IV - A New Hope (1977), Se..."
4624,0.083618,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), St..."


In [None]:
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets.head()

  and should_run_async(code)


Unnamed: 0,support,itemsets,length
0,0.271331,(Ace Ventura: Pet Detective (1994)),1
1,0.150171,(Ace Ventura: When Nature Calls (1995)),1
2,0.119454,"(American President, The (1995))",1
3,0.343003,(Apollo 13 (1995)),1
4,0.21843,(Babe (1995)),1


In [None]:
pd.set_option('max_colwidth', 600)
frequent_itemsets[ (frequent_itemsets['length'] >= 2) &
                   (frequent_itemsets['support'] >= 0.07) ].sort_values(by="length", ascending=False)

  and should_run_async(code)


Unnamed: 0,support,itemsets,length
4625,0.073379,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), Star Wars: Episode IV - A New Hope (1977), Shawshank Redemption, The (1994), Pulp Fiction (1994), Usual Suspects, The (1995), Toy Story (1995))",6
4599,0.075085,"(Braveheart (1995), Batman Forever (1995), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994), Pulp Fiction (1994), Dumb & Dumber (Dumb and Dumber) (1994))",6
4597,0.078498,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), Braveheart (1995), Apollo 13 (1995), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994), Pulp Fiction (1994))",6
4596,0.071672,"(Braveheart (1995), Apollo 13 (1995), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994), Pulp Fiction (1994), Toy Story (1995))",6
4595,0.071672,"(Braveheart (1995), Batman Forever (1995), Waterworld (1995), Apollo 13 (1995), Shawshank Redemption, The (1994), Pulp Fiction (1994))",6
...,...,...,...
464,0.100683,"(Star Trek: Generations (1994), GoldenEye (1995))",2
463,0.151877,"(GoldenEye (1995), Shawshank Redemption, The (1994))",2
462,0.134812,"(GoldenEye (1995), Seven (a.k.a. Se7en) (1995))",2
461,0.172355,"(GoldenEye (1995), Pulp Fiction (1994))",2


In [None]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values("confidence", ascending=False)[:100]

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
53444,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), Braveheart (1995), Seven (a.k.a. Se7en) (1995), Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Usual Suspects, The (1995))",(Pulp Fiction (1994)),0.071672,0.520478,0.071672,1.000000,1.921311,0.034368,inf,0.516544
50829,"(Usual Suspects, The (1995), Taxi Driver (1976), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994))",(Pulp Fiction (1994)),0.078498,0.520478,0.078498,1.000000,1.921311,0.037642,inf,0.520370
11032,"(Batman Forever (1995), Die Hard: With a Vengeance (1995), Ace Ventura: When Nature Calls (1995))",(Ace Ventura: Pet Detective (1994)),0.075085,0.271331,0.075085,1.000000,3.685535,0.054712,inf,0.787823
53382,"(Braveheart (1995), Star Wars: Episode IV - A New Hope (1977), Seven (a.k.a. Se7en) (1995), Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Usual Suspects, The (1995))",(Pulp Fiction (1994)),0.075085,0.520478,0.075085,1.000000,1.921311,0.036005,inf,0.518450
54374,"(Twelve Monkeys (a.k.a. 12 Monkeys) (1995), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994), Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Usual Suspects, The (1995))",(Pulp Fiction (1994)),0.076792,0.520478,0.076792,1.000000,1.921311,0.036823,inf,0.519409
...,...,...,...,...,...,...,...,...,...,...
40270,"(Ace Ventura: Pet Detective (1994), Dumb & Dumber (Dumb and Dumber) (1994), Shawshank Redemption, The (1994), Braveheart (1995))",(Pulp Fiction (1994)),0.100683,0.520478,0.097270,0.966102,1.856182,0.044867,14.145904,0.512900
28727,"(Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Usual Suspects, The (1995), Braveheart (1995))",(Pulp Fiction (1994)),0.098976,0.520478,0.095563,0.965517,1.855059,0.044048,13.906143,0.511567
41350,"(Ace Ventura: Pet Detective (1994), Dumb & Dumber (Dumb and Dumber) (1994), Seven (a.k.a. Se7en) (1995), Shawshank Redemption, The (1994))",(Pulp Fiction (1994)),0.098976,0.520478,0.095563,0.965517,1.855059,0.044048,13.906143,0.511567
47529,"(Shawshank Redemption, The (1994), Léon: The Professional (a.k.a. The Professional) (Léon) (1994), Seven (a.k.a. Se7en) (1995), Braveheart (1995))",(Pulp Fiction (1994)),0.098976,0.520478,0.095563,0.965517,1.855059,0.044048,13.906143,0.511567
