<a href="https://colab.research.google.com/github/MaTheusSlv/PosGraduacaoMackenzie_CienciaDeDados/blob/main/TAC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***ESTUDO FEITO PARA O TAC DO MACKENZIE***

**Pré-requisitos**

In [None]:
#Bibliotecas usadas
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
#Importando arquivo
csv = pd.read_csv('https://drive.google.com/uc?id=11ILv9wMFj0XPNv35EbfHctH21B2Oq7BC', dtype={'codigobarras': str})

# **Análise exploratória**

In [None]:
#Averiguando o dataset
print('Tipos de dados do dataset\n')
print(csv.dtypes)
print('\nDimensões do dataset\n')
print(csv.shape)
print('\nColunas do dataset\n')
print(csv.columns)
print('\nQuantidade de NaNs:\n')
print(csv.isna().sum().sort_values(ascending=False))
print('\nTop 5 quantidade de itens\n')
print(csv.produto.value_counts()[0:5])
print()
print(csv.classificacao.value_counts()[0:5])
print()
print(csv.data.value_counts()[0:5])
print()
print(csv.cliente.value_counts()[0:5])
print()
print(csv.formapagamento.value_counts()[0:5])

Tipos de dados do dataset

vendaid             int64
produto            object
codigobarras       object
preco             float64
classificacao      object
qtd                 int64
data               object
cliente            object
formapagamento     object
dtype: object

Dimensões do dataset

(172587, 9)

Colunas do dataset

Index(['vendaid', 'produto', 'codigobarras', 'preco', 'classificacao', 'qtd',
       'data', 'cliente', 'formapagamento'],
      dtype='object')

Quantidade de NaNs:

cliente           116793
codigobarras        9544
vendaid                0
preco                  0
produto                0
classificacao          0
qtd                    0
data                   0
formapagamento         0
dtype: int64

Top 5 quantidade de itens

produto
TAXA DE ENTREGA                      4706
APLICACAO DE INJETAVEIS FARMELHOR    1664
LOSARTANA 50MG 30'S NEO QUIM         1368
DORFLEX 10'S                         1338
DIPIRONA SOD.500MG 10´S PRATI        1169
Name: count, dtype

# **Preparação para o algoritmo**

In [None]:
#Filtra o período
csv['data'] = pd.to_datetime(csv['data'])
transacoes = csv[csv.data >= '2024-04-01']

In [None]:
#Retirar os "produtos" indesejados
transacoes = transacoes.query("produto != 'TAXA DE ENTREGA' and produto != 'APLICACAO DE INJETAVEIS FARMELHOR'")

In [None]:
#Transforma a tabela para a estrutura necessária usada no algoritmo
transacoes = pd.pivot_table(data=transacoes, index='vendaid', columns='produto', values='qtd',
                            aggfunc='sum', fill_value=0)
transacoes.columns.name=''

In [None]:
  #Substitui as quantidades > 0 para 1, e mantém o que é 0 como 0
transacoes = transacoes.applymap(lambda x: 0 if x == 0 else 1)

##ESSE PASSO CONSOME BASTANTE DA RAM

# **Aplicando o mlxtend**

In [None]:
#modelo
itens_freq = apriori(transacoes, min_support=0.001, use_colnames= True)

#Coletando as regras
regras = association_rules(itens_freq, metric= 'lift', min_threshold= 1, num_itemsets=2)
regras = regras.sort_values(['confidence','lift'], ascending=[False, False])

In [None]:
regras

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
13,"(SAB SIENE 85g, PROTEINAS DO LEITE)","(SAB SIENE 85g, ROSAS VERMELHAS)",0.002569,0.002788,0.001421,0.553191,198.422194,1.0,0.001414,2.231856,0.997523,0.361111,0.551942,0.531498
8,"(SAB SIENE 85g, PROTEINAS DO LEITE)","(SAB SIENE 85g, LAVANDA)",0.002569,0.002843,0.001421,0.553191,194.606383,1.0,0.001414,2.231733,0.997424,0.356164,0.551918,0.526596
2,"(SAB SIENE 85g, ERVA DOCE)","(SAB SIENE 85g, LAVANDA)",0.002788,0.002843,0.001531,0.54902,193.138763,1.0,0.001523,2.211088,0.997604,0.373333,0.547734,0.543741
3,"(SAB SIENE 85g, LAVANDA)","(SAB SIENE 85g, ERVA DOCE)",0.002843,0.002788,0.001531,0.538462,193.138763,1.0,0.001523,2.160626,0.997658,0.373333,0.537171,0.543741
5,"(SAB SIENE 85g, PROTEINAS DO LEITE)","(SAB SIENE 85g, ERVA DOCE)",0.002569,0.002788,0.001312,0.510638,183.158949,1.0,0.001305,2.037781,0.997102,0.324324,0.50927,0.490613
12,"(SAB SIENE 85g, ROSAS VERMELHAS)","(SAB SIENE 85g, PROTEINAS DO LEITE)",0.002788,0.002569,0.001421,0.509804,198.422194,1.0,0.001414,2.034759,0.997742,0.361111,0.508541,0.531498
9,"(SAB SIENE 85g, LAVANDA)","(SAB SIENE 85g, PROTEINAS DO LEITE)",0.002843,0.002569,0.001421,0.5,194.606383,1.0,0.001414,1.994861,0.997697,0.356164,0.498712,0.526596
4,"(SAB SIENE 85g, ERVA DOCE)","(SAB SIENE 85g, PROTEINAS DO LEITE)",0.002788,0.002569,0.001312,0.470588,183.158949,1.0,0.001305,1.884036,0.997321,0.324324,0.469225,0.490613
6,"(SAB SIENE 85g, ROSAS VERMELHAS)","(SAB SIENE 85g, ERVA DOCE)",0.002788,0.002788,0.001257,0.45098,161.760477,1.0,0.00125,1.816351,0.996596,0.291139,0.449445,0.45098
7,"(SAB SIENE 85g, ERVA DOCE)","(SAB SIENE 85g, ROSAS VERMELHAS)",0.002788,0.002788,0.001257,0.45098,161.760477,1.0,0.00125,1.816351,0.996596,0.291139,0.449445,0.45098
