# Análise exploratória de dados da Olist
Neste projeto, realizarei uma análise exploratória de dados no conjunto de dados Olist do Kaggle. Algumas questões/hipóteses de negócio serão formuladas e tentarei respondê-las.

Sobre o conjunto de dados:

Este é um conjunto de dados público de comércio eletrônico brasileiro de pedidos feitos na Olist Store. O conjunto de dados contém informações de 100 mil pedidos de 2016 a 2018 feitos em vários marketplaces no Brasil. Suas características permitem visualizar um pedido a partir de várias dimensões: desde o status do pedido, preço, desempenho de pagamento e frete até a localização do cliente, atributos do produto e, finalmente, análises escritas pelos clientes. Também foi lançado um conjunto de dados de geolocalização que relaciona os códigos postais brasileiros a coordenadas de latitude/longitude.

Contexto:

Este conjunto de dados foi generosamente fornecido pela Olist, a maior loja de departamentos nos marketplaces brasileiros. A Olist conecta pequenas empresas de todo o Brasil a canais sem complicações e com um único contrato. Esses comerciantes podem vender seus produtos através da Olist Store e enviá-los diretamente para os clientes usando os parceiros logísticos da Olist. Saiba mais em seu site: www.olist.com

Após um cliente comprar o produto da Olist Store, um vendedor é notificado para atender ao pedido. Assim que o cliente recebe o produto, ou a data de entrega estimada expira, o cliente recebe uma pesquisa de satisfação por e-mail, onde pode atribuir uma nota para a experiência de compra e escrever alguns comentários.

### 0.0 Importando as bibliotecas

In [1]:
# Data manipulation and visualization.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### 1.0 Coletando os dados

In [2]:
customers_path = '../Olist/input/olist_customers_dataset.csv'
geolocation_path = '../Olist/input/olist_geolocation_dataset.csv'
items_path = '../Olist/input/olist_order_items_dataset.csv'
payments_path = '../Olist/input/olist_order_payments_dataset.csv'
reviews_path = '../Olist/input/olist_order_reviews_dataset.csv'
orders_path = '../Olist/input/olist_orders_dataset.csv'
products_path = '../Olist/input/olist_products_dataset.csv'
sellers_path = '../Olist/input/olist_sellers_dataset.csv'
category_path = '../Olist/input/product_category_name_translation.csv'

customers = pd.read_csv(customers_path)
geolocation = pd.read_csv(geolocation_path)
items = pd.read_csv(items_path)
payments = pd.read_csv(payments_path)
reviews = pd.read_csv(reviews_path)
orders = pd.read_csv(orders_path)
products = pd.read_csv(products_path)
sellers = pd.read_csv(sellers_path)
category = pd.read_csv(category_path)

### 1.1 Schema do banco de dados
<img src="reports/schemaOlist.png" height=500>

### 1.2 Unindo os dados em diversas tabela
- Abaixo, unirei os dados em tabelas contendo informação completa acerca de determinado elemento, como por exemplo, pedidos.

### 1.3 Customers data
- customer_id é um id gerado cada vez que o cliente faz um pedido.
- customer_unique_id é um id que identifica individualmente cada cliente, atribuído no cadastro.
- Customers e geolocation são unidos através da chave zip_code_prefix.
- Nota-se que Customers tem 100.000 linhas, enquanto geolocation tem 1.000.000 de linhas e zipcodes duplicados. Isso indica que um zipcode possui várias localizações de latitude e longitude. Estratégia: Agrupar os zipcodes, obtendo a média/centroide de latitude e longitude.
- Selecionarei apenas as colunas de latitude e longitude pois cidade e estado do cliente já estão presentes no dataset customers.

In [19]:
customers.head()

Unnamed: 0,customer_id,customer_unique_id,customer_zip_code_prefix,customer_city,customer_state
0,06b8999e2fba1a1fbc88172c00ba8bc7,861eff4711a542e4b93843c6dd7febb0,14409,franca,SP
1,18955e83d337fd6b2def6b18a428ac77,290c77bc529b7ac935b93aa66c333dc3,9790,sao bernardo do campo,SP
2,4e7b3e00288586ebd08712fdd0374a03,060e732b5b29e8181a18229c7b0b2b5e,1151,sao paulo,SP
3,b2b6027bc5c5109e529d4dc6358b12c3,259dac757896d24d7702b9acbbff3f3c,8775,mogi das cruzes,SP
4,4f2d8ab171c80ec8364f7c12e35b23ad,345ecd01c38d18a9036ed96c73b8d066,13056,campinas,SP


In [11]:
geolocation.head()

Unnamed: 0,geolocation_zip_code_prefix,geolocation_lat,geolocation_lng,geolocation_city,geolocation_state
0,1037,-23.545621,-46.639292,sao paulo,SP
1,1046,-23.546081,-46.64482,sao paulo,SP
2,1046,-23.546129,-46.642951,sao paulo,SP
3,1041,-23.544392,-46.639499,sao paulo,SP
4,1035,-23.541578,-46.641607,sao paulo,SP


In [89]:
geolocation_grouped = geolocation.groupby(['geolocation_zip_code_prefix'])[['geolocation_lat', 'geolocation_lng']].mean().reset_index()

customer_data = pd.merge(left=customers, right=geolocation_grouped, left_on='customer_zip_code_prefix', right_on='geolocation_zip_code_prefix', how='left').drop(columns=['geolocation_zip_code_prefix'])

customer_data.head()

Unnamed: 0,customer_id,customer_unique_id,customer_zip_code_prefix,customer_city,customer_state,geolocation_lat,geolocation_lng
0,06b8999e2fba1a1fbc88172c00ba8bc7,861eff4711a542e4b93843c6dd7febb0,14409,franca,SP,-20.498489,-47.396929
1,18955e83d337fd6b2def6b18a428ac77,290c77bc529b7ac935b93aa66c333dc3,9790,sao bernardo do campo,SP,-23.727992,-46.542848
2,4e7b3e00288586ebd08712fdd0374a03,060e732b5b29e8181a18229c7b0b2b5e,1151,sao paulo,SP,-23.531642,-46.656289
3,b2b6027bc5c5109e529d4dc6358b12c3,259dac757896d24d7702b9acbbff3f3c,8775,mogi das cruzes,SP,-23.499702,-46.185233
4,4f2d8ab171c80ec8364f7c12e35b23ad,345ecd01c38d18a9036ed96c73b8d066,13056,campinas,SP,-22.9751,-47.142925


#### 1.4 Order data
- Irei agora unir os datasets orders e items. Então, irei uni-lo com customers, a fim de obter um dataset completo com informações de pedidos.
- Um pedido pode conter múltiplos itens. Portanto, no dataset items, o order_id contém duplicados, indicando múltiplos itens de um mesmo pedido.
- Das colunas contendo informação temporal, selecionarei apenas a que indica a data e horário do pedido, pois é a que será interessante para a análise. Poderemos investigar, por exemplo, em quais datas ocorreram maiores vendas e etc.
- Aqui, eu desejo obter uma visão do id do pedido, qual cliente pediu (customer_unique_id), qual foi a data do pedido, qual foi o preço e qual foi o frete.

In [25]:
orders.head()

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00


In [37]:
items.head()

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,dd7ddc04e1b6c2c614352b383efe2d36,2017-05-03 11:05:13,239.9,19.93
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,5b51032eddd242adc84c38acab88f23d,2018-01-18 14:48:30,199.0,17.87
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,9d7a1d34a5052409006425275ba1c2b4,2018-08-15 10:10:18,12.99,12.79
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,df560393f3a51e74553ab94004ba5c87,2017-02-13 13:57:51,199.9,18.14


In [8]:
items['order_id'].duplicated().sum()

13984

In [40]:
orders.shape, items.shape

((99441, 8), (112650, 7))

In [90]:
order_data = pd.merge(left=orders, right=items, on='order_id', how='inner').drop(columns=['order_approved_at', 'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date', 'shipping_limit_date', 'seller_id', 'order_status'])
order_data

Unnamed: 0,order_id,customer_id,order_purchase_timestamp,order_item_id,product_id,price,freight_value
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,2017-10-02 10:56:33,1,87285b34884572647811a353c7ac498a,29.99,8.72
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,2018-07-24 20:41:37,1,595fac2a385ac33a80bd5114aec74eb8,118.70,22.76
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,2018-08-08 08:38:49,1,aa4383b373c6aca5d8797843e5594415,159.90,19.22
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,2017-11-18 19:28:06,1,d0b61bfb1de832b15ba9d266ca96e5b0,45.00,27.20
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,2018-02-13 21:18:39,1,65266b2da20d04dbe00c5c2d3bb7859e,19.90,8.72
...,...,...,...,...,...,...,...
112645,63943bddc261676b46f01ca7ac2f7bd8,1fca14ff2861355f6e5f14306ff977a7,2018-02-06 12:58:58,1,f1d4ce8c6dd66c47bbaa8c6781c2a923,174.90,20.10
112646,83c1379a015df1e13d02aae0204711ab,1aa71eb042121263aafbe80c1b562c9c,2017-08-27 14:46:43,1,b80910977a37536adeddd63663f916ad,205.99,65.02
112647,11c177c8e97725db2631073c19f07b62,b331b74b18dc79bcdf6532d51e1637c1,2018-01-08 21:28:27,1,d1c427060a0f73f6b889a5c7c61f2ac4,179.99,40.59
112648,11c177c8e97725db2631073c19f07b62,b331b74b18dc79bcdf6532d51e1637c1,2018-01-08 21:28:27,2,d1c427060a0f73f6b889a5c7c61f2ac4,179.99,40.59


In [91]:
order_data = pd.merge(left=order_data, right=customers, how='inner', on='customer_id').drop(columns=['customer_id', 'customer_zip_code_prefix'])
order_data

Unnamed: 0,order_id,order_purchase_timestamp,order_item_id,product_id,price,freight_value,customer_unique_id,customer_city,customer_state
0,e481f51cbdc54678b7cc49136f2d6af7,2017-10-02 10:56:33,1,87285b34884572647811a353c7ac498a,29.99,8.72,7c396fd4830fd04220f754e42b4e5bff,sao paulo,SP
1,53cdb2fc8bc7dce0b6741e2150273451,2018-07-24 20:41:37,1,595fac2a385ac33a80bd5114aec74eb8,118.70,22.76,af07308b275d755c9edb36a90c618231,barreiras,BA
2,47770eb9100c2d0c44946d9cf07ec65d,2018-08-08 08:38:49,1,aa4383b373c6aca5d8797843e5594415,159.90,19.22,3a653a41f6f9fc3d2a113cf8398680e8,vianopolis,GO
3,949d5b44dbf5de918fe9c16f97b45f8a,2017-11-18 19:28:06,1,d0b61bfb1de832b15ba9d266ca96e5b0,45.00,27.20,7c142cf63193a1473d2e66489a9ae977,sao goncalo do amarante,RN
4,ad21c59c0840e6cb83a9ceb5573f8159,2018-02-13 21:18:39,1,65266b2da20d04dbe00c5c2d3bb7859e,19.90,8.72,72632f0f9dd73dfee390c9b22eb56dd6,santo andre,SP
...,...,...,...,...,...,...,...,...,...
112645,63943bddc261676b46f01ca7ac2f7bd8,2018-02-06 12:58:58,1,f1d4ce8c6dd66c47bbaa8c6781c2a923,174.90,20.10,da62f9e57a76d978d02ab5362c509660,praia grande,SP
112646,83c1379a015df1e13d02aae0204711ab,2017-08-27 14:46:43,1,b80910977a37536adeddd63663f916ad,205.99,65.02,737520a9aad80b3fbbdad19b66b37b30,nova vicosa,BA
112647,11c177c8e97725db2631073c19f07b62,2018-01-08 21:28:27,1,d1c427060a0f73f6b889a5c7c61f2ac4,179.99,40.59,5097a5312c8b157bb7be58ae360ef43c,japuiba,RJ
112648,11c177c8e97725db2631073c19f07b62,2018-01-08 21:28:27,2,d1c427060a0f73f6b889a5c7c61f2ac4,179.99,40.59,5097a5312c8b157bb7be58ae360ef43c,japuiba,RJ


### 1.5 Product data
- Considerando que eu já possuo um dataset com dados sobre o pedido realizado (como a data e os custos), o cliente que fez o pedido e os itens pedidos, agora irei obter a categoria do produto com base na chave "product_id".

In [11]:
products.head()

Unnamed: 0,product_id,product_category_name,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm
0,1e9e8ef04dbcff4541ed26657ea517e5,perfumaria,40.0,287.0,1.0,225.0,16.0,10.0,14.0
1,3aa071139cb16b67ca9e5dea641aaa2f,artes,44.0,276.0,1.0,1000.0,30.0,18.0,20.0
2,96bd76ec8810374ed1b65e291975717f,esporte_lazer,46.0,250.0,1.0,154.0,18.0,9.0,15.0
3,cef67bcfe19066a932b7673e239eb23d,bebes,27.0,261.0,1.0,371.0,26.0,4.0,26.0
4,9dc1a7de274444849c219cff195d0b71,utilidades_domesticas,37.0,402.0,4.0,625.0,20.0,17.0,13.0


In [92]:
ord_cust_prod = pd.merge(left=order_data, right=products, on='product_id', how='inner').drop(columns=[x for x in products.columns if x not in set(['product_id', 'product_category_name'])])

In [93]:
ord_cust_prod.head()

Unnamed: 0,order_id,order_purchase_timestamp,order_item_id,product_id,price,freight_value,customer_unique_id,customer_city,customer_state,product_category_name
0,e481f51cbdc54678b7cc49136f2d6af7,2017-10-02 10:56:33,1,87285b34884572647811a353c7ac498a,29.99,8.72,7c396fd4830fd04220f754e42b4e5bff,sao paulo,SP,utilidades_domesticas
1,128e10d95713541c87cd1a2e48201934,2017-08-15 18:29:31,1,87285b34884572647811a353c7ac498a,29.99,7.78,3a51803cc0d012c3b5dc8b7528cb05f7,sao paulo,SP,utilidades_domesticas
2,0e7e841ddf8f8f2de2bad69267ecfbcf,2017-08-02 18:24:47,1,87285b34884572647811a353c7ac498a,29.99,7.78,ef0996a1a279c26e7ecbd737be23d235,sao paulo,SP,utilidades_domesticas
3,bfc39df4f36c3693ff3b63fcbea9e90a,2017-10-23 23:26:46,1,87285b34884572647811a353c7ac498a,29.99,14.1,e781fdcc107d13d865fc7698711cc572,florianopolis,SC,utilidades_domesticas
4,53cdb2fc8bc7dce0b6741e2150273451,2018-07-24 20:41:37,1,595fac2a385ac33a80bd5114aec74eb8,118.7,22.76,af07308b275d755c9edb36a90c618231,barreiras,BA,perfumaria


### 1.6 Payment data
- Em um mesmo pedido, podemos ter vários itens. Payments fornece dados de pagamento para cada item. Os itens podem ser parcelados em diferentes números de parcelas, formando uma sequência, e podem ser pagos de diferentes formas em um mesmo pedido.
- Considerando que o pagamento com cartão de crédito representa 75% dos pedidos, enquanto os outros 25% estão distribuídos nas outras categorias, irei ignorar as colunas payment_type e payment_sequential.
- Agruparei por order_id e somarei as parcelas e o valor, obtendo, para cada pedido, o número total de parcelas e o valor total pago.

In [70]:
payments.head()

Unnamed: 0,order_id,payment_sequential,payment_type,payment_installments,payment_value
0,b81ef226f3fe1789b1e8b2acac839d17,1,credit_card,8,99.33
1,a9810da82917af2d9aefd1278f1dcfa0,1,credit_card,1,24.39
2,25e8ea4e93396b6fa0d3dd708e76c1bd,1,credit_card,1,65.71
3,ba78997921bbcdc1373bb41e913ab953,1,credit_card,8,107.78
4,42fdf880ba16b47b59251dd489d4441a,1,credit_card,2,128.45


In [7]:
total_installments_value = payments.groupby(['order_id'])[['payment_installments', 'payment_value']].sum().reset_index()

In [12]:
ord_cust_prod_pay = pd.merge(left=ord_cust_prod, right=total_installments_value, on='order_id', how='left')

In [13]:
ord_cust_prod_pay

Unnamed: 0,order_id,order_purchase_timestamp,order_item_id,product_id,price,freight_value,customer_unique_id,product_category_name,payment_installments,payment_value
0,e481f51cbdc54678b7cc49136f2d6af7,2017-10-02 10:56:33,1,87285b34884572647811a353c7ac498a,29.99,8.72,7c396fd4830fd04220f754e42b4e5bff,utilidades_domesticas,3.0,38.71
1,128e10d95713541c87cd1a2e48201934,2017-08-15 18:29:31,1,87285b34884572647811a353c7ac498a,29.99,7.78,3a51803cc0d012c3b5dc8b7528cb05f7,utilidades_domesticas,3.0,37.77
2,0e7e841ddf8f8f2de2bad69267ecfbcf,2017-08-02 18:24:47,1,87285b34884572647811a353c7ac498a,29.99,7.78,ef0996a1a279c26e7ecbd737be23d235,utilidades_domesticas,1.0,37.77
3,bfc39df4f36c3693ff3b63fcbea9e90a,2017-10-23 23:26:46,1,87285b34884572647811a353c7ac498a,29.99,14.10,e781fdcc107d13d865fc7698711cc572,utilidades_domesticas,1.0,44.09
4,53cdb2fc8bc7dce0b6741e2150273451,2018-07-24 20:41:37,1,595fac2a385ac33a80bd5114aec74eb8,118.70,22.76,af07308b275d755c9edb36a90c618231,perfumaria,1.0,141.46
...,...,...,...,...,...,...,...,...,...,...
112645,e8fd20068b9f7e6ec07068bb7537f781,2017-08-10 21:21:07,1,0df37da38a30a713453b03053d60d3f7,356.00,18.12,fb9310710003399b031add3e55f34719,esporte_lazer,10.0,748.24
112646,e8fd20068b9f7e6ec07068bb7537f781,2017-08-10 21:21:07,2,0df37da38a30a713453b03053d60d3f7,356.00,18.12,fb9310710003399b031add3e55f34719,esporte_lazer,10.0,748.24
112647,cfa78b997e329a5295b4ee6972c02979,2017-12-20 09:52:41,1,3d2c44374ee42b3003a470f3e937a2ea,55.90,15.14,a49e8e11e850592fe685ae3c64b40eca,instrumentos_musicais,1.0,71.04
112648,9c5dedf39a927c1b2549525ed64a053c,2017-03-09 09:54:05,1,ac35486adb7b02598c182c2ff2e05254,72.00,13.08,6359f309b166b0196dbf7ad2ac62bb5a,beleza_saude,3.0,85.08


### 1.7 Seller data
- Pode ser interessante para a análise obter, para cada pedido, o vendedor, sua localização, data de envio e entrega, e o cliente, também localizado.

In [18]:
sellers.head()

Unnamed: 0,seller_id,seller_zip_code_prefix,seller_city,seller_state
0,3442f8959a84dea7ee197c632cb2df15,13023,campinas,SP
1,d1b65fc7debc3361ea86b5f14c68d2e2,13844,mogi guacu,SP
2,ce3ad9de960102d0677a81f5d0bb7b2d,20031,rio de janeiro,RJ
3,c0f3eea2e14555b6faeea3dd58c1b1c3,4195,sao paulo,SP
4,51a04a8a6bdcb23deccc82b0b80742cf,12914,braganca paulista,SP


In [19]:
sellers.shape

(3095, 4)

In [63]:
sellers_geoloc = pd.merge(left=sellers, right=geolocation_grouped, left_on='seller_zip_code_prefix', right_on='geolocation_zip_code_prefix', how='left').drop(columns=['seller_zip_code_prefix', 'geolocation_zip_code_prefix'])

In [64]:
sellers_orders_geoloc = pd.merge(left=items, right=sellers_geoloc, on='seller_id', how='left').drop(columns=['order_item_id', 'product_id', 'price', 'freight_value'])

In [65]:
sellers_orders_geoloc = pd.merge(left=sellers_orders_geoloc, right=orders, on='order_id', how='inner').drop(columns=['order_status', 'order_approved_at', 'order_delivered_carrier_date', 'shipping_limit_date'])

In [66]:
seller_data = pd.merge(left=sellers_orders_geoloc, right=customer_data, on='customer_id', how='inner').drop(columns=['customer_id', 'customer_zip_code_prefix']).rename(columns={'geolocation_lat_x': 'seller_lat', 'geolocation_lng_x': 'seller_lng', 'geolocation_lat_y': 'customer_lat', 'geolocation_lng_y': 'customer_lng'})

In [67]:
seller_data.head()

Unnamed: 0,order_id,seller_id,seller_city,seller_state,seller_lat,seller_lng,order_purchase_timestamp,order_delivered_customer_date,order_estimated_delivery_date,customer_unique_id,customer_city,customer_state,customer_lat,customer_lng
0,00010242fe8c5a6d1ba2dd792cb16214,48436dade18ac8b2bce089ec2a041202,volta redonda,SP,-22.496953,-44.127492,2017-09-13 08:59:02,2017-09-20 23:43:48,2017-09-29 00:00:00,871766c5855e863f6eccc05f988b23cb,campos dos goytacazes,RJ,-21.762775,-41.309633
1,00018f77f2f0320c557190d7a144bdd3,dd7ddc04e1b6c2c614352b383efe2d36,sao paulo,SP,-23.565096,-46.518565,2017-04-26 10:53:06,2017-05-12 16:04:24,2017-05-15 00:00:00,eb28e67c4c0b83846050ddfb8a35d051,santa fe do sul,SP,-20.220527,-50.903424
2,000229ec398224ef6ca0657da4fc703e,5b51032eddd242adc84c38acab88f23d,borda da mata,MG,-22.262584,-46.171124,2018-01-14 14:33:31,2018-01-22 13:19:16,2018-02-05 00:00:00,3818d81c6709e39d06b2738a8d3a2474,para de minas,MG,-19.870305,-44.593326
3,00024acbcdf0a6daa1e931b038114c75,9d7a1d34a5052409006425275ba1c2b4,franca,SP,-20.553624,-47.387359,2018-08-08 10:00:35,2018-08-14 13:32:39,2018-08-20 00:00:00,af861d436cfc08b2c2ddefd0ba074622,atibaia,SP,-23.089925,-46.611654
4,00042b26cf59d7ce69dfabb4e55b4fd9,df560393f3a51e74553ab94004ba5c87,loanda,PR,-22.929384,-53.135873,2017-02-04 13:57:51,2017-03-01 16:42:31,2017-03-17 00:00:00,64b576fb70d441e8f1b2d7d446e483c5,varzea paulista,SP,-23.243402,-46.827614


### 1.9 Entendimento inicial dos dados
- De posse dos conjuntos de dados, agora iremos realizar uma exploração inicial, acessando estatísticas descritivas, valores nulos e duplicados, realizando limpezas e etc.
- A análise ocorrerá em dois datasets que obtive acima. No primeiro, há informações sobre os pedidos, clientes e pagamentos. No segundo, há informações sobre vendedores e clientes.

In [88]:
df = ord_cust_prod_pay.copy()