# Hands-On: Descoberta de Regras de Associação com Algoritmo Apriori 🛍

Neste hands-on, você aprenderá como descobrir regras de associação a partir de um conjunto de dados de compras usando o algoritmo Apriori. Usaremos a biblioteca mlxtend para implementar o algoritmo e encontrar padrões interessantes nas compras.

## Preparação do Ambiente 🛠️

In [None]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

## Importação e Exploração dos Dados 📥🔍

O dataset contém compras feitas, onde cada linha é uma lista compra

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/devjaynemorais/modelos_descritivos_curso/main/Aula%2001/dados/store_data.csv', header=None)

  and should_run_async(code)


O Algoritmo de apriori recebe  "transações' como entrada, para transformarmos nosso dataframe em transações precisamos convertelo em uma lista de listas.

In [None]:
lists = data.values.tolist()

  and should_run_async(code)


In [None]:
for column in data.columns:
  arr = data[column].tolist()

  if any([text.startswith(' ') for text in arr if text is not np.nan]):
    display(data[column].unique())

  and should_run_async(code)


array(['cottage cheese', nan, 'eggs', 'toothpaste', 'frozen smoothie',
       'escalope', 'green tea', 'chocolate', 'olive oil', 'magazines',
       'zucchini', 'hot dogs', 'green grapes', 'low fat yogurt',
       'fresh bread', 'chili', 'tomato juice', 'gums', 'chicken',
       'carrots', 'light cream', 'asparagus', 'white wine', 'cooking oil',
       'pancakes', 'salt', 'strawberries', 'whole wheat rice', 'cereals',
       'brownies', 'blueberries', 'rice', 'champagne', 'spinach',
       'mushroom cream sauce', 'almonds', 'cake', 'melons', 'protein bar',
       'salmon', 'tea', 'pasta', 'french fries', 'milk', 'vegetables mix',
       'barbecue sauce', 'french wine', ' asparagus', 'chutney', 'oil',
       'mint', 'butter', 'gluten free bar', 'honey', 'shampoo', 'cookies',
       'mashed potato', 'chocolate bread', 'pet food', 'shallot',
       'green beans', 'extra dark chocolate', 'black tea',
       'whole weat flour', 'energy drink', 'yogurt cake', 'salad',
       'light mayo', 'h

Convertemos, porém as listas contem muitos nan, precisamos removelos.


In [None]:
cleaned_lists = []

for buy_list in lists:
  result = list(filter(lambda x: x is not np.nan, buy_list))

  ## Removendo espaços extras
  result = [text.strip() for text in result]

  if result:
    cleaned_lists.append(result)

  and should_run_async(code)


### Criando as "transactions" 🔍🛒

A função Apriori espera que os dados estejam em um DataFrame do pandas codificado com "one-hot". Para criar, podemos utilizar a classe TransactionEncoder.



In [None]:
te = TransactionEncoder()

  and should_run_async(code)


In [None]:
transactions = te.fit(cleaned_lists).transform(cleaned_lists)

  and should_run_async(code)


In [None]:
transactions

  and should_run_async(code)


array([[ True,  True, False, ...,  True, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False,  True, False]])

In [None]:
df = pd.DataFrame(transactions, columns=te.columns_)
df.head(3)

  and should_run_async(code)


Unnamed: 0,almonds,antioxydant juice,asparagus,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,body spray,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,True,True,False,True,False,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


### Aplicando Apriori

Um conjunto de itens é considerado "frequente" se ele atender a um limite de suporte especificado pelo usuário. Por exemplo, se o limite de suporte for definido como 0,5 (50%), um conjunto de itens frequente é definido como um conjunto de itens que ocorrem juntos em pelo menos 50% de todas as transações no banco de dados.

Nesse caso, um conjunto de itens deve aparecer em pelo menos 0.45% das transações para ser considerado frequente.

Uma forma de descobrir esse valor é encontrando o valor de suporte para cada item:

support(item) = (number of transactions containing item) / (total number of transactions)

In [None]:
## Exemplo

result = df.sum() / len(df.index)

result

  and should_run_async(code)


almonds              0.020397
antioxydant juice    0.008932
asparagus            0.004799
avocado              0.033329
babies food          0.004533
                       ...   
whole wheat pasta    0.029463
whole wheat rice     0.058526
yams                 0.011465
yogurt cake          0.027330
zucchini             0.009465
Length: 119, dtype: float64

In [None]:
result.mean()

  and should_run_async(code)


0.03288973234941224

In [None]:
df.shape

  and should_run_async(code)


(7501, 119)

In [None]:
frequent_itemsets = apriori(df, min_support=0.0045, use_colnames=True)

  and should_run_async(code)


In [None]:
frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.020397,(almonds)
1,0.008932,(antioxydant juice)
2,0.004799,(asparagus)
3,0.033329,(avocado)
4,0.004533,(babies food)
...,...,...
837,0.006266,"(spaghetti, mineral water, whole wheat rice)"
838,0.005066,"(spaghetti, pancakes, olive oil)"
839,0.004533,"(spaghetti, mineral water, chocolate, eggs)"
840,0.004933,"(spaghetti, chocolate, mineral water, milk)"


In [None]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=3.0)

  and should_run_async(code)


[Referência - Association Rules](https://www.saedsayad.com/association_rules.htm)

In [None]:
rules

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(chicken),(light cream),0.059992,0.015598,0.004533,0.075556,4.843951,0.003597,1.064858,0.844202
1,(light cream),(chicken),0.015598,0.059992,0.004533,0.290598,4.843951,0.003597,1.325072,0.806131
2,(mushroom cream sauce),(escalope),0.019064,0.079323,0.005733,0.300699,3.790833,0.004220,1.316568,0.750514
3,(escalope),(mushroom cream sauce),0.079323,0.019064,0.005733,0.072269,3.790833,0.004220,1.057349,0.799635
4,(pasta),(escalope),0.015731,0.079323,0.005866,0.372881,4.700812,0.004618,1.468107,0.799853
...,...,...,...,...,...,...,...,...,...,...
65,"(spaghetti, milk)","(frozen vegetables, mineral water)",0.035462,0.035729,0.004533,0.127820,3.577517,0.003266,1.105587,0.746965
66,"(frozen vegetables, mineral water)","(spaghetti, milk)",0.035729,0.035462,0.004533,0.126866,3.577517,0.003266,1.104685,0.747172
67,"(frozen vegetables, milk)","(spaghetti, mineral water)",0.023597,0.059725,0.004533,0.192090,3.216228,0.003123,1.163836,0.705730
68,"(mineral water, milk)","(spaghetti, frozen vegetables)",0.047994,0.027863,0.004533,0.094444,3.389607,0.003195,1.073526,0.740521
