## Trabalho Prático - Bruno Silva - Grupo 7
### Unsupervised Learning: Apriori

O dataset escolhido é conjunto de dados com informações de compras de mercearia.
O objetivo é encontrar sugestões de produtos com base num produto escolhido, como identificar conjuntos comuns de produtos.

Link para o Dataset: [Groceries Dataset](https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset/)

### Importar livrarias necessárias

In [3]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

### Carregar o dataset

In [4]:
groceries = pd.read_csv('Groceries_dataset.csv')
groceries.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


### Tabela Pivot
Tabela pivot para utilizar o algoritmo apriori.  
Se o produto estiver na fatura, a célula de interseção será "Verdadeira". Se não estiver, será "Falsa".

In [5]:

transactions = groceries.groupby('Member_number')['itemDescription'].apply(list)
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
df_encoded.head()

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False
1,False,False,False,False,False,False,False,False,True,False,...,False,False,False,True,False,True,False,True,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False


### Aplicar Apriori
Abaixo aplicamos o algoritmo Apriori, procurando por itemsets frequentes e, em seguida, extrair regras de associação com base nesses itemsets.  
As regras resultantes são exibidas na primeira linha do DataFrame resultante.

In [6]:
min_support=0.01
freq_itemsets = apriori(df_encoded, min_support=min_support, use_colnames=True)    
rules = association_rules(freq_itemsets, metric="support", min_threshold=min_support)    
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(beef),(UHT-milk),0.119548,0.078502,0.010518,0.087983,1.120775,0.001133,1.010396,0.122392
1,(UHT-milk),(beef),0.078502,0.119548,0.010518,0.133987,1.120775,0.001133,1.016672,0.11694
2,(bottled beer),(UHT-milk),0.158799,0.078502,0.014879,0.0937,1.193597,0.002413,1.016769,0.192815
3,(UHT-milk),(bottled beer),0.078502,0.158799,0.014879,0.189542,1.193597,0.002413,1.037933,0.176014
4,(bottled water),(UHT-milk),0.213699,0.078502,0.021293,0.09964,1.269268,0.004517,1.023477,0.269801


### Obter sugestões com base num produto indicado:
A função "recommend_items", retirada do exemplo fornecido, irá procurar que items deverão ser sugeridos em função de um item.

In [7]:
# Function for simulating the recommendation process
def recommend_items(rules_df, itemDesc, rec_count):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    # we are sorting the rules dataframe by using "lift" metric
    recommended_exams = []

    for i, exam in sorted_rules["antecedents"].items():
        for j in list(exam):
            if j == itemDesc:
                recommended_exams.append(list(sorted_rules.iloc[i]["consequents"]))

    recommended_exams = list(
        {item for item_list in recommended_exams for item in item_list}
    )
    recommended_exams.remove(itemDesc)
    return recommended_exams[:rec_count]

In [8]:
print("Sugestões para frankfurter:", recommend_items(rules, "frankfurter", 5))
print("Sugestões para pastry:",recommend_items(rules, "pastry", 5))
print("Sugestões para whole milk:",recommend_items(rules, "whole milk", 5))



Sugestões para frankfurter: ['domestic eggs', 'canned beer', 'beef', 'whole milk', 'meat']
Sugestões para pastry: ['domestic eggs', 'canned beer', 'soft cheese', 'cream cheese ', 'beef']
Sugestões para whole milk: ['yogurt', 'dishes', 'hamburger meat', 'specialty bar', 'hard cheese']


### Verificar quais os items mais vendidos

In [9]:

# Ordenar pelo suporte
freq_itemsets_ord = freq_itemsets.sort_values(by='support', ascending=False)

# Exibir items
print(freq_itemsets_ord)




       support                                   itemsets
113   0.458184                               (whole milk)
69    0.376603                         (other vegetables)
84    0.349666                               (rolls/buns)
94    0.313494                                     (soda)
114   0.282966                                   (yogurt)
...        ...                                        ...
2269  0.010005           (sausage, ice cream, whole milk)
2266  0.010005  (rolls/buns, other vegetables, ice cream)
2263  0.010005      (other vegetables, herbs, whole milk)
956   0.010005                    (misc. beverages, pork)
2293  0.010005    (margarine, rolls/buns, tropical fruit)

[3016 rows x 2 columns]


### Verificar quais os conjuntos de 2 items mais comuns

In [10]:
# Criar uma copia do dataset
frequent_itemsets_min2 = freq_itemsets.copy()

# Filtrar conjuntos de 2 itens no mínimo
frequent_itemsets_min2['length'] = frequent_itemsets_min2['itemsets'].apply(lambda x: len(x))
frequent_itemsets_min2 = frequent_itemsets_min2[frequent_itemsets_min2['length'] >= 2]

# Ordenar pelo suporte
frequent_itemsets_min2 = frequent_itemsets_min2.sort_values(by='support', ascending=False)

# Exibir resultados
print(frequent_itemsets_min2)


       support                                   itemsets  length
1050  0.191380             (other vegetables, whole milk)       2
1140  0.178553                   (rolls/buns, whole milk)       2
1209  0.151103                         (soda, whole milk)       2
1241  0.150590                       (yogurt, whole milk)       2
1031  0.146742             (rolls/buns, other vegetables)       2
...        ...                                        ...     ...
728   0.010005             (specialty bar, domestic eggs)       2
1456  0.010005       (rolls/buns, bottled water, chicken)       3
1884  0.010005    (frozen vegetables, soda, citrus fruit)       3
2312  0.010005  (meat, other vegetables, root vegetables)       3
1780  0.010005            (rolls/buns, canned beer, pork)       3

[2900 rows x 3 columns]
