<h1>Association Rule Mining com uso de algoritmo Apriori</h1>

Técnica super prática e muito usada em análise de comportamento de clientes, <br>
carrinhos de compra, recomendação de produtos e descoberta de padrões.<br><br>

Método de descobrir padrões e relações frequentes entre itens em grandes bases de dados.<br>
Muito aplicado em:<br>
<ul>
    <li>Análise de carrinho de compras → market basket analysis</li>
    <li>Recomendação de produtos → "quem compra X tende a comprar Y"</li>
    <li>Descoberta de comportamentos correlacionados em logs, sistemas, etc.</li>
</ul>
Como medir as regras - 3 métricas principais:

<table border="1">
  <thead>
    <tr>
      <th>Métrica</th>
      <th>Definição</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Support (suporte)</td>
      <td>Frequência da regra no dataset</td>
    </tr>
    <tr>
      <td>Confidence (confiança)</td>
      <td>Probabilidade de comprar Y dado que comprou X</td>
    </tr>
    <tr>
      <td>Lift </td>
      <td>Quantas vezes X e Y ocorrem juntos em relação ao esperado se fossem independentes<br>
          O Lift > 1 indica que X e Y ocorrem juntos mais frequentemente do que o esperado.
      </td>
    </tr>
  </tbody>
</table>
método de redução de dimensionalidade.<br>
Transforma um conjunto de variáveis correlacionadas em <br>
um novo conjunto de variáveis não correlacionadas, <br>
chamadas de componentes principais.<br>
Esses componentes explicam a maior parte da variância dos dados, <br>
com o menor número possível de dimensões.<br>

O PCA ajuda você a simplificar dados complexos mantendo o máximo de informação possível.<br><br>

<h3>Algoritmo Apriori</h3>
✅ O Apriori é o algoritmo mais clássico e popular para gerar itemsets frequentes<br> 
→ grupos de itens que aparecem juntos com suporte acima de um limiar mínimo.<br>
✅ Depois que você tem os itemsets frequentes, você gera as association rules com eles.<br>
✅ O Apriori funciona de forma iterativa:<br><br>
1️⃣ Primeiro encontra os itens mais frequentes.<br>
2️⃣ Depois expande para pares frequentes.<br>
3️⃣ Depois para trios frequentes.<br>
4️⃣ E assim por diante — sempre eliminando combinações que não atingem o suporte mínimo <br>
(por isso o nome Apriori → princípio de "se um subconjunto não <br>
é frequente, o conjunto maior também não será").<br><br>

<h3>🚀 Fluxo típico de uso:</h3>

1️⃣ Pré-processar dados → tabela transacional (1 = item presente, 0 = item ausente)<br>
2️⃣ Rodar o Apriori → obter os itemsets frequentes<br>
3️⃣ Gerar as regras de associação → com suporte, confiança e lift<br>
4️⃣ Analisar as regras → filtrar as mais interessantes<br>
5️⃣ Aplicar no negócio → recomendação, cross-selling, etc.

In [12]:
# ! pip install mlxtend

<h2>Iniciando o ambiente</h2>

In [22]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules


<h2> Carregando o dataset de compras</h2>

In [24]:
data = pd.read_csv('groceries.csv')
#df.columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width', 'Species']
data.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,
1,tropical fruit,yogurt,coffee,,,,,,
2,whole milk,,,,,,,,
3,pip fruit,yogurt,cream cheese,meat spreads,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,


<h2> Conversão dos dados - Indicação de todos os itens, sinalizando se foi comprado ou não</h2>

In [26]:

basket_sets = pd.get_dummies(data)
basket_sets.head()

Unnamed: 0,1_Instant food products,1_UHT-milk,1_artif. sweetener,1_baby cosmetics,1_bags,1_baking powder,1_bathroom cleaner,1_beef,1_berries,1_beverages,...,9_sweet spreads,9_tea,9_vinegar,9_waffles,9_whipped/sour cream,9_white bread,9_white wine,9_whole milk,9_yogurt,9_zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


<h2>Cálculo de Support </h2>

In [28]:
apriori(basket_sets, min_support=0.02)

Unnamed: 0,support,itemsets
0,0.030421,(7)
1,0.034951,(17)
2,0.029126,(23)
3,0.049191,(26)
4,0.064401,(47)
5,0.04466,(83)
6,0.024272,(90)
7,0.040453,(92)
8,0.038835,(99)
9,0.033981,(100)


In [30]:
apriori(basket_sets, min_support=0.02, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.030421,(1_beef)
1,0.034951,(1_canned beer)
2,0.029126,(1_chicken)
3,0.049191,(1_citrus fruit)
4,0.064401,(1_frankfurter)
5,0.04466,(1_other vegetables)
6,0.024272,(1_pip fruit)
7,0.040453,(1_pork)
8,0.038835,(1_rolls/buns)
9,0.033981,(1_root vegetables)


In [52]:
frequent_itemsets = apriori(basket_sets, min_support=0.002, use_colnames=True)
frequent_itemsets['Length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets[frequent_itemsets['Length']>=3]


Unnamed: 0,support,itemsets,Length
820,0.002589,"(3_other vegetables, 2_root vegetables, 1_beef)",3
821,0.002589,"(1_chicken, 2_other vegetables, 3_whole milk)",3
822,0.002589,"(2_other vegetables, 1_citrus fruit, 3_whole m...",3
823,0.003236,"(1_citrus fruit, 2_tropical fruit, 3_pip fruit)",3
824,0.002589,"(1_citrus fruit, 3_other vegetables, 4_whole m...",3
825,0.002265,"(1_frankfurter, 6_whole milk, 5_other vegetables)",3
826,0.002265,"(3_other vegetables, 4_whole milk, 1_pork)",3
827,0.00356,"(1_root vegetables, 2_other vegetables, 3_whol...",3
828,0.002589,"(1_sausage, 2_rolls/buns, 3_soda)",3
829,0.002265,"(1_sausage, 4_whole milk, 3_other vegetables)",3


<h2>Regras de Associação</h2>

<h3>Confidence</h3>

In [58]:
rules = association_rules(frequent_itemsets, metric='confidence',   min_threshold=0.5)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(2_sausage),(1_frankfurter),0.011327,0.064401,0.011327,1.0,15.527638,1.0,0.010597,inf,0.946318,0.175879,1.0,0.58794
1,(7_pastry),(1_frankfurter),0.005178,0.064401,0.002589,0.5,7.763819,1.0,0.002256,1.871197,0.875732,0.038647,0.465583,0.270101
2,(2_ham),(1_sausage),0.00712,0.076052,0.004531,0.636364,8.367505,1.0,0.003989,2.540858,0.886804,0.057613,0.606432,0.347969
3,(2_meat),(1_sausage),0.006796,0.076052,0.004854,0.714286,9.392097,1.0,0.004338,3.233819,0.899642,0.062241,0.690768,0.389058
4,(3_beef),(1_sausage),0.004854,0.076052,0.002589,0.533333,7.012766,1.0,0.00222,1.979889,0.861585,0.033058,0.494921,0.283688


<h3>Lift</h3>

In [62]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(1_beef),(2_citrus fruit),0.030421,0.028803,0.005502,0.180851,6.278986,1.0,0.004625,1.185618,0.867117,0.10241,0.156558,0.185931
1,(2_citrus fruit),(1_beef),0.028803,0.030421,0.005502,0.191011,6.278986,1.0,0.004625,1.198508,0.865672,0.10241,0.165629,0.185931
2,(1_beef),(2_other vegetables),0.030421,0.0589,0.003236,0.106383,1.806173,1.0,0.001444,1.053136,0.460347,0.037594,0.050455,0.080664
3,(2_other vegetables),(1_beef),0.0589,0.030421,0.003236,0.054945,1.806173,1.0,0.001444,1.02595,0.474278,0.037594,0.025294,0.080664
4,(1_beef),(2_root vegetables),0.030421,0.036893,0.005502,0.180851,4.902016,1.0,0.004379,1.175741,0.820977,0.089005,0.149472,0.164987


<h3>Lift & Confidence</h3>

In [66]:
rules[(rules['lift']>=5) & (rules['confidence']>=0.5)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
93,(2_sausage),(1_frankfurter),0.011327,0.064401,0.011327,1.000000,15.527638,1.0,0.010597,inf,0.946318,0.175879,1.000000,0.587940
136,(7_pastry),(1_frankfurter),0.005178,0.064401,0.002589,0.500000,7.763819,1.0,0.002256,1.871197,0.875732,0.038647,0.465583,0.270101
239,(2_ham),(1_sausage),0.007120,0.076052,0.004531,0.636364,8.367505,1.0,0.003989,2.540858,0.886804,0.057613,0.606432,0.347969
243,(2_meat),(1_sausage),0.006796,0.076052,0.004854,0.714286,9.392097,1.0,0.004338,3.233819,0.899642,0.062241,0.690768,0.389058
259,(3_beef),(1_sausage),0.004854,0.076052,0.002589,0.533333,7.012766,1.0,0.002220,1.979889,0.861585,0.033058,0.494921,0.283688
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
958,"(4_root vegetables, 6_whole milk)",(5_other vegetables),0.003883,0.012621,0.003236,0.833333,66.025641,1.0,0.003187,5.924272,0.988694,0.243902,0.831203,0.544872
959,"(4_root vegetables, 5_other vegetables)",(6_whole milk),0.005178,0.009385,0.003236,0.625000,66.594828,1.0,0.003188,2.641640,0.990111,0.285714,0.621447,0.484914
964,"(7_butter, 6_whole milk)",(5_other vegetables),0.002913,0.012621,0.002265,0.777778,61.623932,1.0,0.002229,4.443204,0.986646,0.170732,0.774937,0.478632
965,"(7_butter, 5_other vegetables)",(6_whole milk),0.002589,0.009385,0.002265,0.875000,93.232759,1.0,0.002241,7.924919,0.991842,0.233333,0.873816,0.558190


In [70]:
rules.iloc[778]

antecedents                 (7_whole milk)
consequents           (6_other vegetables)
antecedent support                0.004207
consequent support                 0.00712
support                           0.003883
confidence                        0.923077
lift                             129.65035
representativity                       1.0
leverage                          0.003854
conviction                       12.907443
zhangs_metric                     0.996479
jaccard                           0.521739
certainty                         0.922525
kulczynski                        0.734266
Name: 778, dtype: object