## Regras de associação

As regras de associação é uma forma de relacionar dentro de um conjunto de items padrões de relacionamento em que X implica em Y.(X --> Y). As regras de associação podem ter diversos usos como verificar comportamento de clientes, campanha de marketing, gestão de estoque, definição de catálogo etc.</br>

Algumas medidas podem ser tiradas qunado estamos trabalhando com a regra de associação como o suporte, confiança e lift que nos ajuda a ir criando alguns critérios para a formação das regras como por exemplo a criação de regras que satisfaça a um suporte minimo. Isso é importante já que o numero de regras cresce exponencialmente com o numero de itens da base.</br></br>

**Suporte** mede a probabilidade de um determinada regra acontecer dentro do conjunto de dados.</br>

**suporte** = Sup(X -> Y) = P(X u Y) = σ(X u Y)/n  onde n é o numero de transações.</br>
Serve para eliminar as regras menos interessantes.</br>

**Confiança** verifica a ocorrência da parte consequente da regra.</br>
Conf(X -> Y) = P(X|Y) = σ(X u Y)/σ(X)</br>

**Lift** O Lift é uma medida que indica a força de associação entre dois itens, levando em conta a frequência de ambos no conjunto de dados. É calculado dividindo a confiança da associação pelo suporte do segundo item.

O Lift é usado para comparar a força da associação entre dois itens com a força esperada. Um valor de Lift maior que 1 indica uma associação positiva, ou seja, a ocorrência do antecedente aumenta a probabilidade do consequente.

Lift(X -> Y) = Conf(X -> Y)/σ(Y) = σ(X u Y)/σ(X) * σ(Y)</br>





###Etapas


1. Carregar as bibliotecas   
2. Importar e verificar os dados
3. Pre processamento
   * Limpeza dos dados
   * Tranformação dos dados
5. Preparando o modelo
6. Criando as regras de associação.

### Carregar as bibliotecas.

In [None]:
!pip install mlxtend



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder


### Importar os dados

In [None]:
url = 'https://raw.githubusercontent.com/higoramario/univesp-com360-mineracao-dados/main/market-basket-optimisation.csv'

  and should_run_async(code)


In [None]:
df = pd.read_csv(url, header=None)

  and should_run_async(code)


### Pré processamento dos dados.

In [None]:
df.head()

  and should_run_async(code)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [None]:
len(df)

  and should_run_async(code)


7501

Limpeza dos dados. Primeiro remover os espaços brancos no inicio e fim das palavras

In [None]:
for index in df.columns:
  df[index] = df[index].str.strip()

  and should_run_async(code)


Juntar os dados existentes na planilha para identificar a quantidade de itens distintos existe na tabela.

In [None]:
itens = df.melt()['value'].dropna().unique()
print(f"O total de itens distintos no dataframe = {len(itens)}")

O total de itens distintos no dataframe = 119


  and should_run_async(code)


#### Identificar os registros que possuem mais que um item. Para as regras de associação é necessario que os registros possuam mais que um item.

Somando os itens diferentes de nulos

In [None]:
cesta_item = df.notna().apply(sum, axis=1)

  and should_run_async(code)


In [None]:
cesta_item.head()

  and should_run_async(code)


Unnamed: 0,0
0,20
1,3
2,1
3,2
4,5


Filtro dos registros com mais de um item para serem usados na regra de associação.

In [None]:
filtro=(cesta_item[:]>1).values.tolist()

  and should_run_async(code)


In [None]:
df1=df.iloc[filtro]

  and should_run_async(code)


Criando listas com os valores com mais de uma compra com as linhas do dataframe excluindo os valores nulos

In [None]:
lista_itens = [list(item.dropna()) for _, item in df1.iterrows()]
print(lista_itens)

  and should_run_async(code)


[['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil'], ['burgers', 'meatballs', 'eggs'], ['turkey', 'avocado'], ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea'], ['whole wheat pasta', 'french fries'], ['soup', 'light cream', 'shallot'], ['frozen vegetables', 'spaghetti', 'green tea'], ['eggs', 'pet food'], ['turkey', 'burgers', 'mineral water', 'eggs', 'cooking oil'], ['spaghetti', 'champagne', 'cookies'], ['mineral water', 'salmon'], ['shrimp', 'chocolate', 'chicken', 'honey', 'oil', 'cooking oil', 'low fat yogurt'], ['turkey', 'eggs'], ['turkey', 'fresh tuna', 'tomatoes', 'spaghetti', 'mineral water', 'black tea', 'salmon', 'eggs', 'chicken', 'extra dark chocolate'], ['meatballs', 'milk', 'honey', 'french fries', 'protein bar'], ['r

### Preparando o modelo

Criando um dataframe como OnehotEncoder.

In [None]:
te = TransactionEncoder()
te_ary = te.fit(lista_itens).transform(lista_itens)
df2 = pd.DataFrame(te_ary, columns=te.columns_)

  and should_run_async(code)


In [None]:
df2.head()

  and should_run_async(code)


Unnamed: 0,almonds,antioxydant juice,asparagus,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,body spray,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,True,True,False,True,False,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,True,False,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,False,False


Criando uma ista de itens frequentes que possuem um suporte minimo.

In [None]:
min_support = 0.02
frequent_itemsets = apriori(df2, min_support= min_support, use_colnames=True)
frequent_itemsets.sort_values(by=['support'], ascending=False)

  and should_run_async(code)


Unnamed: 0,support,itemsets
35,0.294936,(mineral water)
13,0.218897,(eggs)
50,0.218201,(spaghetti)
9,0.201148,(chocolate)
17,0.200104,(french fries)
...,...,...
145,0.020706,"(spaghetti, mineral water, chocolate)"
147,0.020532,"(spaghetti, mineral water, milk)"
56,0.020358,(white wine)
106,0.020358,"(spaghetti, frozen smoothie)"


### criando as regras de associação

Criando as regras de associação onde usamos os itens frequentes que atendem ao suporte minimo anterior. <br>
**metric:** é a metrica é definica entre: <br>
*   'support',
*   'confidence'
*   'lift'
*   'leverage'
*   'conviction'
*   'zhangs_metric'

**min_thereshold:** é o valor minimos do parametro escolhido como metrica, funcionando como um filtro.

In [None]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)


  and should_run_async(code)


In [None]:
rules

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(burgers),(chocolate),0.110318,0.201148,0.022272,0.201893,1.003700,0.000082,1.000933,0.004144
1,(burgers),(eggs),0.110318,0.218897,0.037585,0.340694,1.556414,0.013436,1.184735,0.401826
2,(burgers),(french fries),0.110318,0.200104,0.028711,0.260252,1.300583,0.006635,1.081309,0.259772
3,(burgers),(green tea),0.110318,0.159910,0.022795,0.206625,1.292135,0.005154,1.058881,0.254121
4,(burgers),(milk),0.110318,0.163390,0.023317,0.211356,1.293574,0.005292,1.060822,0.255089
...,...,...,...,...,...,...,...,...,...,...
93,"(spaghetti, mineral water)",(ground beef),0.077954,0.124587,0.022272,0.285714,2.293296,0.012560,1.225579,0.611625
94,"(ground beef, mineral water)",(spaghetti),0.053419,0.218201,0.022272,0.416938,1.910800,0.010616,1.340851,0.503559
95,"(spaghetti, mineral water)",(milk),0.077954,0.163390,0.020532,0.263393,1.612054,0.007796,1.135762,0.411773
96,"(spaghetti, milk)",(mineral water),0.046285,0.294936,0.020532,0.443609,1.504083,0.006881,1.267209,0.351408


In [None]:
for rule in rules.iterrows():
  itens = list(rule[1])
  antecedente = list(itens[0])
  consequente = list(itens[1])
  print(f'{antecedente} --> {consequente}, suporte: {itens[4]:.3f}, confiança: {itens[5]:3f}')

['chocolate'] --> ['burgers'], suporte: 0.022, confiança: 0.110727
['burgers'] --> ['chocolate'], suporte: 0.022, confiança: 0.201893
['eggs'] --> ['burgers'], suporte: 0.038, confiança: 0.171701
['burgers'] --> ['eggs'], suporte: 0.038, confiança: 0.340694
['burgers'] --> ['french fries'], suporte: 0.029, confiança: 0.260252
['french fries'] --> ['burgers'], suporte: 0.029, confiança: 0.143478
['green tea'] --> ['burgers'], suporte: 0.023, confiança: 0.142546
['burgers'] --> ['green tea'], suporte: 0.023, confiança: 0.206625
['burgers'] --> ['milk'], suporte: 0.023, confiança: 0.211356
['milk'] --> ['burgers'], suporte: 0.023, confiança: 0.142705
['mineral water'] --> ['burgers'], suporte: 0.032, confiança: 0.107965
['burgers'] --> ['mineral water'], suporte: 0.032, confiança: 0.288644
['spaghetti'] --> ['burgers'], suporte: 0.028, confiança: 0.128389
['burgers'] --> ['spaghetti'], suporte: 0.028, confiança: 0.253943
['eggs'] --> ['cake'], suporte: 0.025, confiança: 0.113672
['cake'] 

  and should_run_async(code)


In [None]:
rul = rules.iloc[(rules['antecedents']==frozenset({'spaghetti'})).values.tolist()]
rul

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
12,(spaghetti),(burgers),0.218201,0.110318,0.028015,0.128389,1.163805,0.003943,1.020733,0.180033
20,(spaghetti),(cake),0.218201,0.100748,0.023665,0.108453,1.076475,0.001681,1.008642,0.09087
24,(spaghetti),(chicken),0.218201,0.076562,0.022446,0.102871,1.343633,0.005741,1.029326,0.327129
48,(spaghetti),(chocolate),0.218201,0.201148,0.051157,0.23445,1.165556,0.007266,1.0435,0.181684
52,(spaghetti),(cooking oil),0.218201,0.065425,0.020706,0.094896,1.45045,0.006431,1.032561,0.397236
70,(spaghetti),(eggs),0.218201,0.218897,0.047677,0.218501,0.998191,-8.6e-05,0.999493,-0.002313
88,(spaghetti),(french fries),0.218201,0.200104,0.036019,0.165072,0.824928,-0.007644,0.958041,-0.213502
92,(spaghetti),(frozen smoothie),0.218201,0.079172,0.020358,0.093301,1.178469,0.003083,1.015584,0.193709
102,(spaghetti),(frozen vegetables),0.218201,0.121107,0.036367,0.166667,1.376197,0.009941,1.054672,0.349655
108,(spaghetti),(grated cheese),0.218201,0.064729,0.021576,0.098884,1.527645,0.007452,1.037902,0.441798
