<h1><b>Oque é MBA (Market Basket Analysis)?</b></h1>
<p>Market Basket Analysis é uma técnica de análise de dados utilizada no campo do marketing e da análise de negócios para identificar padrões de compra dos consumidores. Ela permite às empresas descobrir quais produtos são frequentemente comprados juntos, permitindo assim que elas desenvolvam estratégias de vendas mais eficazes e personalizadas para seus clientes. Essa análise é realizada por meio de algoritmos que exploram grandes conjuntos de dados transacionais, identificando relações entre itens comprados e ajudando as empresas a entender melhor o comportamento de seus clientes.</p>
<p> Os dados que iremos analisar estão nesse <a href="https://www.kaggle.com/competitions/instacart-market-basket-analysis/data">Link.</a></p>

In [25]:
# Verificando a versão Python
from platform import python_version
print(f"Versão Python Para Este Projeto {python_version()}")

Versão Python Para Este Projeto 3.9.16


In [26]:
# Instalando o pacote apriori
!pip install -q efficient_apriori watermark

In [4]:
# imports 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from itertools import combinations
import efficient_apriori
from efficient_apriori import apriori
import warnings
warnings.filterwarnings("ignore")

In [28]:
%reload_ext watermark
%watermark -a "Airton Fabre" --iversion

Author: Airton Fabre

matplotlib       : 3.7.1
pandas           : 1.5.3
numpy            : 1.22.4
efficient_apriori: 2.0.3



In [5]:
# Carregando os datasets
aisles = pd.read_csv("aisles.csv")
departments = pd.read_csv("departments.csv")
orders = pd.read_csv("orders.csv")
products = pd.read_csv("products.csv")
order_prior = pd.read_csv("order_products__prior.csv")
order_train = pd.read_csv("order_products__train.csv")

<h1><b>Análise Exploratória</b></h1>
<p> Faz sentido fazer análises em colunas ID's?</p>

In [30]:
aisles.shape

(134, 2)

In [31]:
aisles.head(2)

Unnamed: 0,aisle_id,aisle
0,1,prepared soups salads
1,2,specialty cheeses


In [32]:
departments.shape

(21, 2)

In [33]:
departments.head(2)

Unnamed: 0,department_id,department
0,1,frozen
1,2,other


In [34]:
orders.shape

(3421083, 7)

In [35]:
orders.head(2)

Unnamed: 0,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2539329,1,prior,1,2,8,
1,2398795,1,prior,2,3,7,15.0


In [36]:
products.shape

(49688, 4)

In [37]:
products.head(2)

Unnamed: 0,product_id,product_name,aisle_id,department_id
0,1,Chocolate Sandwich Cookies,61,19
1,2,All-Seasons Salt,104,13


In [38]:
order_prior.shape

(23886959, 4)

In [39]:
order_prior.head(2)

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,2,33120.0,1.0,1.0
1,2,28985.0,2.0,1.0


In [40]:
order_train.shape

(1384617, 4)

In [41]:
order_train.head(2)

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,1,49302,1,1
1,1,11109,2,1


In [42]:
# Criando um array com todos os registros
count = np.array([["Aisles", len(aisles)],
                  ["Departments", len(departments)],
                  ["Products", len(products)],
                  ["Order_Train", len(order_train)],
                  ["Order_Prior", len(order_prior)],
                  ["Orders", len(orders)]])

# Criando um dataframe a partir do array
df = pd.DataFrame(count, columns = ["Arquivo", "Total de Registros"])

In [43]:
df

Unnamed: 0,Arquivo,Total de Registros
0,Aisles,134
1,Departments,21
2,Products,49688
3,Order_Train,1384617
4,Order_Prior,23886959
5,Orders,3421083


In [44]:
# Total de itens nas categorias eval_set
orders["eval_set"].value_counts()

prior    3214874
train     131209
test       75000
Name: eval_set, dtype: int64

<h1><b>Limpeza dos Dados</b></h1>
<h3>Verificando Valores Ausentes</h3>

In [45]:
aisles.isnull().sum()

aisle_id    0
aisle       0
dtype: int64

In [46]:
departments.isna().sum()

department_id    0
department       0
dtype: int64

In [47]:
products.isnull().sum()

product_id       0
product_name     0
aisle_id         0
department_id    0
dtype: int64

In [48]:
orders.isnull().sum()

order_id                       0
user_id                        0
eval_set                       0
order_number                   0
order_dow                      0
order_hour_of_day              0
days_since_prior_order    206209
dtype: int64

In [49]:
order_prior.isna().sum()

order_id             0
product_id           1
add_to_cart_order    1
reordered            1
dtype: int64

In [50]:
order_train.isna().sum()

order_id             0
product_id           0
add_to_cart_order    0
reordered            0
dtype: int64

<h2><b>Feature Extraction</b></h2>
<h3>Merge Entre Tabelas</h3>

In [6]:
df1 = products.merge(aisles, on = "aisle_id", how = "left")
df2 = products.merge(departments, on = "department_id", how = "left")
df3 = order_prior.merge(orders, on = "order_id", how = "left")
df4 = order_prior.merge(products, on = "product_id", how = "left")

In [None]:
df = pd.concat([df1, df2, df3, df4], axis = 1)

In [3]:
df.head()

NameError: ignored

In [2]:
df.shape()

NameError: ignored