# 1. Modelo final para a previsão de pedidos em atraso

---

No [notebook](https://nbviewer.jupyter.org/github/barbosarafael/Projetos/blob/master/iNeuron_Back_order_prediction_Notebook/hackaton_ineuron_back_order_prediction.ipynb) passado abordamos os seguintes tópicos:

- Explicação da competição da iNeuron: objetivo, métricas, premiação (não se esqueçam, esses projetos são focados na **aprendizagem** e não em ganhar a competição);
- Quais métricas utilizar para nossos dados;
- Criar uma baseline com o modelo de regressão logística sem a nenhum tipo de pré-processamento nos dados;
- Verificar a situação que temos nos dados;
- Aplicar uma Análise Exploratória de Dados completa;

Recomendo fortemente olhar o notebook anterior pois bastante coisa que tem neste será tratada de forma direta. É isto, vamos melhorar nosso modelo anterior (baseline).

## 2. Importando as bibliotecas

---

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 3. Configurações adicionais

---



In [33]:
plt.style.use("seaborn-muted")
%matplotlib inline
pd.set_option('display.max_columns', None)


from google.colab import drive
drive.mount("/content/drive", force_remount = True)

Mounted at /content/drive


# 4. Dados utilizados

---

Para não ficar muito sem nexo, algumas informações do banco de dados estão abaixo. 

## 4.1. Carregando o banco de dados

In [0]:
banco = pd.read_csv("/content/drive/My Drive/Training_Dataset_v2.csv", low_memory = False)

## 4.2. Dicionário das variáveis

1. sku – Random ID for the product
2. national_inv – Current inventory level for the part
3. lead_time – Transit time for product (if available)
4. in_transit_qty – Amount of product in transit from source
5. forecast_3_month – Forecast sales for the next 3 months
6. forecast_6_month – Forecast sales for the next 6 months
7. forecast_9_month – Forecast sales for the next 9 months
8. sales_1_month – Sales quantity for the prior 1 month time period
9. sales_3_month – Sales quantity for the prior 3 month time period
10. sales_6_month – Sales quantity for the prior 6 month time period
11. sales_9_month – Sales quantity for the prior 9 month time period
12. min_bank – Minimum recommend amount to stock
13. potential_issue – Source issue for part identified
14. pieces_past_due – Parts overdue from source
15. perf_6_month_avg – Source performance for prior 6 month period
16. perf_12_month_avg – Source performance for prior 12 month period
17. local_bo_qty – Amount of stock orders overdue
18. deck_risk – Part risk flag
19. oe_constraint – Part risk flag
20. ppap_risk – Part risk flag
21. stop_auto_buy – Part risk flag
22. rev_stop – Part risk flag
23. went_on_backorder – Product actually went on backorder. This is the target value.

## 4.3. Estrutura

---



In [35]:
banco.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1687861 entries, 0 to 1687860
Data columns (total 23 columns):
 #   Column             Non-Null Count    Dtype  
---  ------             --------------    -----  
 0   sku                1687861 non-null  object 
 1   national_inv       1687860 non-null  float64
 2   lead_time          1586967 non-null  float64
 3   in_transit_qty     1687860 non-null  float64
 4   forecast_3_month   1687860 non-null  float64
 5   forecast_6_month   1687860 non-null  float64
 6   forecast_9_month   1687860 non-null  float64
 7   sales_1_month      1687860 non-null  float64
 8   sales_3_month      1687860 non-null  float64
 9   sales_6_month      1687860 non-null  float64
 10  sales_9_month      1687860 non-null  float64
 11  min_bank           1687860 non-null  float64
 12  potential_issue    1687860 non-null  object 
 13  pieces_past_due    1687860 non-null  float64
 14  perf_6_month_avg   1687860 non-null  float64
 15  perf_12_month_avg  1687860 non-n

## 4.4. Sumário estatístico das variáveis

In [36]:
banco.describe().T.round(2)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
national_inv,1687860.0,496.11,29615.23,-27256.0,4.0,15.0,80.0,12334404.0
lead_time,1586967.0,7.87,7.06,0.0,4.0,8.0,9.0,52.0
in_transit_qty,1687860.0,44.05,1342.74,0.0,0.0,0.0,0.0,489408.0
forecast_3_month,1687860.0,178.12,5026.55,0.0,0.0,0.0,4.0,1427612.0
forecast_6_month,1687860.0,344.99,9795.15,0.0,0.0,0.0,12.0,2461360.0
forecast_9_month,1687860.0,506.36,14378.92,0.0,0.0,0.0,20.0,3777304.0
sales_1_month,1687860.0,55.93,1928.2,0.0,0.0,0.0,4.0,741774.0
sales_3_month,1687860.0,175.03,5192.38,0.0,0.0,1.0,15.0,1105478.0
sales_6_month,1687860.0,341.73,9613.17,0.0,0.0,2.0,31.0,2146625.0
sales_9_month,1687860.0,525.27,14838.61,0.0,0.0,4.0,47.0,3205172.0


## 5. Pré-processamento dos dados

---

Agora é a hora que ~o filho chora e a mãe não vê~, a partir da análise exploratória e de alguns insights retirados do notebook anterior, iremos fazer uma série de transformações nas variáveis.

## 5.1. Exclusão de variáveis

Somente uma, que nesse caso é a `sku`, somente um ID randômico dos produtos. Não faria sentido deixar ela no modelo.

In [37]:
banco = banco.drop("sku", axis = 1)

banco.head()

Unnamed: 0,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,sales_9_month,min_bank,potential_issue,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,went_on_backorder
0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,No,0.0,-99.0,-99.0,0.0,No,No,No,Yes,No,No
1,2.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,No,0.0,0.99,0.99,0.0,No,No,No,Yes,No,No
2,2.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,No,0.0,-99.0,-99.0,0.0,Yes,No,No,Yes,No,No
3,7.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,No,0.0,0.1,0.13,0.0,No,No,No,Yes,No,No
4,8.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2.0,No,0.0,-99.0,-99.0,0.0,Yes,No,No,Yes,No,No
