# Proyek Analisis Data: [E-Commerce]
- **Nama:** Danny Suggi Saputra
- **Email:** dannysaputra3003@gmail.com
- **ID Dicoding:** dannyysaputra

## Menentukan Pertanyaan Bisnis

- Bagaimana tren jumlah pesanan dari waktu ke waktu?
- Produk kategori apa yang paling banyak terjual dan menghasilkan pendapatan tertinggi?
- Metode pembayaran apa yang paling sering digunakan oleh pelanggan?
- Seberapa puas pelanggan dengan layanan yang diberikan?
- Kota atau wilayah mana yang memiliki jumlah pembelian terbanyak?

## Import Semua Packages/Library yang Digunakan

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

In [2]:
df_customers = pd.read_csv('data/customers_dataset.csv')
df_customers.head()

Unnamed: 0,customer_id,customer_unique_id,customer_zip_code_prefix,customer_city,customer_state
0,06b8999e2fba1a1fbc88172c00ba8bc7,861eff4711a542e4b93843c6dd7febb0,14409,franca,SP
1,18955e83d337fd6b2def6b18a428ac77,290c77bc529b7ac935b93aa66c333dc3,9790,sao bernardo do campo,SP
2,4e7b3e00288586ebd08712fdd0374a03,060e732b5b29e8181a18229c7b0b2b5e,1151,sao paulo,SP
3,b2b6027bc5c5109e529d4dc6358b12c3,259dac757896d24d7702b9acbbff3f3c,8775,mogi das cruzes,SP
4,4f2d8ab171c80ec8364f7c12e35b23ad,345ecd01c38d18a9036ed96c73b8d066,13056,campinas,SP


In [3]:
df_geolocation = pd.read_csv('data/geolocation_dataset.csv')
df_geolocation.head()

Unnamed: 0,geolocation_zip_code_prefix,geolocation_lat,geolocation_lng,geolocation_city,geolocation_state
0,1037,-23.545621,-46.639292,sao paulo,SP
1,1046,-23.546081,-46.64482,sao paulo,SP
2,1046,-23.546129,-46.642951,sao paulo,SP
3,1041,-23.544392,-46.639499,sao paulo,SP
4,1035,-23.541578,-46.641607,sao paulo,SP


In [4]:
df_order_items = pd.read_csv('data/order_items_dataset.csv')
df_order_items.head()

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,dd7ddc04e1b6c2c614352b383efe2d36,2017-05-03 11:05:13,239.9,19.93
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,5b51032eddd242adc84c38acab88f23d,2018-01-18 14:48:30,199.0,17.87
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,9d7a1d34a5052409006425275ba1c2b4,2018-08-15 10:10:18,12.99,12.79
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,df560393f3a51e74553ab94004ba5c87,2017-02-13 13:57:51,199.9,18.14


In [5]:
df_order_payments = pd.read_csv('data/order_payments_dataset.csv')
df_order_payments.head()

Unnamed: 0,order_id,payment_sequential,payment_type,payment_installments,payment_value
0,b81ef226f3fe1789b1e8b2acac839d17,1,credit_card,8,99.33
1,a9810da82917af2d9aefd1278f1dcfa0,1,credit_card,1,24.39
2,25e8ea4e93396b6fa0d3dd708e76c1bd,1,credit_card,1,65.71
3,ba78997921bbcdc1373bb41e913ab953,1,credit_card,8,107.78
4,42fdf880ba16b47b59251dd489d4441a,1,credit_card,2,128.45


In [6]:
df_order_reviews = pd.read_csv('data/order_reviews_dataset.csv')
df_order_reviews.head()

Unnamed: 0,review_id,order_id,review_score,review_comment_title,review_comment_message,review_creation_date,review_answer_timestamp
0,7bc2406110b926393aa56f80a40eba40,73fc7af87114b39712e6da79b0a377eb,4,,,2018-01-18 00:00:00,2018-01-18 21:46:59
1,80e641a11e56f04c1ad469d5645fdfde,a548910a1c6147796b98fdf73dbeba33,5,,,2018-03-10 00:00:00,2018-03-11 03:05:13
2,228ce5500dc1d8e020d8d1322874b6f0,f9e4b658b201a9f2ecdecbb34bed034b,5,,,2018-02-17 00:00:00,2018-02-18 14:36:24
3,e64fb393e7b32834bb789ff8bb30750e,658677c97b385a9be170737859d3511b,5,,Recebi bem antes do prazo estipulado.,2017-04-21 00:00:00,2017-04-21 22:02:06
4,f7c4243c7fe1938f181bec41a392bdeb,8e6bfb81e283fa7e4f11123a3fb894f1,5,,Parabéns lojas lannister adorei comprar pela I...,2018-03-01 00:00:00,2018-03-02 10:26:53


In [7]:
df_orders = pd.read_csv('data/orders_dataset.csv')
df_orders.head()

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00


In [8]:
df_product_category = pd.read_csv('data/product_category_name_translation.csv')
df_product_category.head()

Unnamed: 0,product_category_name,product_category_name_english
0,beleza_saude,health_beauty
1,informatica_acessorios,computers_accessories
2,automotivo,auto
3,cama_mesa_banho,bed_bath_table
4,moveis_decoracao,furniture_decor


In [9]:
df_products = pd.read_csv('data/products_dataset.csv')
df_products.head()

Unnamed: 0,product_id,product_category_name,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm
0,1e9e8ef04dbcff4541ed26657ea517e5,perfumaria,40.0,287.0,1.0,225.0,16.0,10.0,14.0
1,3aa071139cb16b67ca9e5dea641aaa2f,artes,44.0,276.0,1.0,1000.0,30.0,18.0,20.0
2,96bd76ec8810374ed1b65e291975717f,esporte_lazer,46.0,250.0,1.0,154.0,18.0,9.0,15.0
3,cef67bcfe19066a932b7673e239eb23d,bebes,27.0,261.0,1.0,371.0,26.0,4.0,26.0
4,9dc1a7de274444849c219cff195d0b71,utilidades_domesticas,37.0,402.0,4.0,625.0,20.0,17.0,13.0


In [10]:
df_sellers = pd.read_csv('data/sellers_dataset.csv')
df_sellers.head()

Unnamed: 0,seller_id,seller_zip_code_prefix,seller_city,seller_state
0,3442f8959a84dea7ee197c632cb2df15,13023,campinas,SP
1,d1b65fc7debc3361ea86b5f14c68d2e2,13844,mogi guacu,SP
2,ce3ad9de960102d0677a81f5d0bb7b2d,20031,rio de janeiro,RJ
3,c0f3eea2e14555b6faeea3dd58c1b1c3,4195,sao paulo,SP
4,51a04a8a6bdcb23deccc82b0b80742cf,12914,braganca paulista,SP


In [11]:
# Merge data orders + customers

df_orders_customers = df_orders.merge(df_customers, on="customer_id", how="left")
df_orders_customers.head()

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,customer_unique_id,customer_zip_code_prefix,customer_city,customer_state
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,7c396fd4830fd04220f754e42b4e5bff,3149,sao paulo,SP
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,af07308b275d755c9edb36a90c618231,47813,barreiras,BA
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,3a653a41f6f9fc3d2a113cf8398680e8,75265,vianopolis,GO
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,7c142cf63193a1473d2e66489a9ae977,59296,sao goncalo do amarante,RN
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,72632f0f9dd73dfee390c9b22eb56dd6,9195,santo andre,SP


In [12]:
# Merge data orders + order items + products + product category

df_orders_items = df_order_items.merge(df_orders, on="order_id", how="left")
df_orders_items = df_orders_items.merge(df_products, on="product_id", how="left")
df_orders_items = df_orders_items.merge(df_product_category, on="product_category_name", how="left")
df_orders_items.head()

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value,customer_id,order_status,order_purchase_timestamp,...,order_estimated_delivery_date,product_category_name,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm,product_category_name_english
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,2017-09-29 00:00:00,cool_stuff,58.0,598.0,4.0,650.0,28.0,9.0,14.0,cool_stuff
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,dd7ddc04e1b6c2c614352b383efe2d36,2017-05-03 11:05:13,239.9,19.93,f6dd3ec061db4e3987629fe6b26e5cce,delivered,2017-04-26 10:53:06,...,2017-05-15 00:00:00,pet_shop,56.0,239.0,2.0,30000.0,50.0,30.0,40.0,pet_shop
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,5b51032eddd242adc84c38acab88f23d,2018-01-18 14:48:30,199.0,17.87,6489ae5e4333f3693df5ad4372dab6d3,delivered,2018-01-14 14:33:31,...,2018-02-05 00:00:00,moveis_decoracao,59.0,695.0,2.0,3050.0,33.0,13.0,33.0,furniture_decor
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,9d7a1d34a5052409006425275ba1c2b4,2018-08-15 10:10:18,12.99,12.79,d4eb9395c8c0431ee92fce09860c5a06,delivered,2018-08-08 10:00:35,...,2018-08-20 00:00:00,perfumaria,42.0,480.0,1.0,200.0,16.0,10.0,15.0,perfumery
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,df560393f3a51e74553ab94004ba5c87,2017-02-13 13:57:51,199.9,18.14,58dbd0b2d70206bf40e62cd34e84d795,delivered,2017-02-04 13:57:51,...,2017-03-17 00:00:00,ferramentas_jardim,59.0,409.0,1.0,3750.0,35.0,40.0,30.0,garden_tools


In [13]:
# Merge data orders + order payments

df_orders_payments = df_orders.merge(df_order_payments, on="order_id", how="left")
df_orders_payments.head()

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,payment_sequential,payment_type,payment_installments,payment_value
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,1.0,credit_card,1.0,18.12
1,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,3.0,voucher,1.0,2.0
2,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,2.0,voucher,1.0,18.59
3,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,1.0,boleto,1.0,141.46
4,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,1.0,credit_card,3.0,179.12


In [14]:
# Merge data orders + order reviews

df_orders_reviews = df_orders.merge(df_order_reviews, on="order_id", how="left")
df_orders_reviews.head()

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,review_id,review_score,review_comment_title,review_comment_message,review_creation_date,review_answer_timestamp
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,a54f0611adc9ed256b57ede6b6eb5114,4.0,,"Não testei o produto ainda, mas ele veio corre...",2017-10-11 00:00:00,2017-10-12 03:43:48
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,8d5266042046a06655c8db133d120ba5,4.0,Muito boa a loja,Muito bom o produto.,2018-08-08 00:00:00,2018-08-08 18:37:50
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,e73b67b67587f7644d5bd1a52deb1b01,5.0,,,2018-08-18 00:00:00,2018-08-22 19:07:58
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,359d03e676b3c069f62cadba8dd3f6e8,5.0,,O produto foi exatamente o que eu esperava e e...,2017-12-03 00:00:00,2017-12-05 19:21:58
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,e50934924e227544ba8246aeb3770dd4,5.0,,,2018-02-17 00:00:00,2018-02-18 13:02:51


In [15]:
# Merge data orders + sellers + geolocations

df_orders_sellers = df_orders_items.merge(df_sellers, on="seller_id", how="left")
df_orders_sellers = df_orders_sellers.merge(df_geolocation, left_on="seller_zip_code_prefix", right_on="geolocation_zip_code_prefix", how="left")
df_orders_sellers.head()

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value,customer_id,order_status,order_purchase_timestamp,...,product_width_cm,product_category_name_english,seller_zip_code_prefix,seller_city,seller_state,geolocation_zip_code_prefix,geolocation_lat,geolocation_lng,geolocation_city,geolocation_state
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.498183,-44.123614,volta redonda,RJ
1,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.487885,-44.131566,volta redonda,RJ
2,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.501227,-44.132443,volta redonda,RJ
3,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.500389,-44.124773,volta redonda,RJ
4,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.499963,-44.127571,volta redonda,RJ


**Insight:**
- Beberapa dataset telah di-merge dengan tujuan untuk mempermudah analisis sesuai dengan case pertanyaan bisnis yang ada

### Assessing Data

In [19]:
def check_dataset_info(df, name):
    print(f'\nDataset: {name}')
    print('-' * 30)
    print(df.info())
    print('\nMissing values:\n', df.isnull().sum())
    print('\nDuplicates:', df.duplicated().sum())
    print('\nDescriptive statistics:\n', df.describe(include="all"))
    print('-' * 30)
    display(df.head())

In [20]:
# Menilai kualitas data Orders Customers

check_dataset_info(df_orders_customers, "Orders_Customers")


Dataset: Orders_Customers
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99441 entries, 0 to 99440
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   order_id                       99441 non-null  object
 1   customer_id                    99441 non-null  object
 2   order_status                   99441 non-null  object
 3   order_purchase_timestamp       99441 non-null  object
 4   order_approved_at              99281 non-null  object
 5   order_delivered_carrier_date   97658 non-null  object
 6   order_delivered_customer_date  96476 non-null  object
 7   order_estimated_delivery_date  99441 non-null  object
 8   customer_unique_id             99441 non-null  object
 9   customer_zip_code_prefix       99441 non-null  int64 
 10  customer_city                  99441 non-null  object
 11  customer_state                 99441 non-null  object
dtypes:

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,customer_unique_id,customer_zip_code_prefix,customer_city,customer_state
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,7c396fd4830fd04220f754e42b4e5bff,3149,sao paulo,SP
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,af07308b275d755c9edb36a90c618231,47813,barreiras,BA
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,3a653a41f6f9fc3d2a113cf8398680e8,75265,vianopolis,GO
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,7c142cf63193a1473d2e66489a9ae977,59296,sao goncalo do amarante,RN
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,72632f0f9dd73dfee390c9b22eb56dd6,9195,santo andre,SP


In [21]:
# Menilai kualitas data Orders Items

check_dataset_info(df_orders_items, "Orders_Items")


Dataset: Orders_Items
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112650 entries, 0 to 112649
Data columns (total 23 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   order_id                       112650 non-null  object 
 1   order_item_id                  112650 non-null  int64  
 2   product_id                     112650 non-null  object 
 3   seller_id                      112650 non-null  object 
 4   shipping_limit_date            112650 non-null  object 
 5   price                          112650 non-null  float64
 6   freight_value                  112650 non-null  float64
 7   customer_id                    112650 non-null  object 
 8   order_status                   112650 non-null  object 
 9   order_purchase_timestamp       112650 non-null  object 
 10  order_approved_at              112635 non-null  object 
 11  order_delivered_carrier_date   111456

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value,customer_id,order_status,order_purchase_timestamp,...,order_estimated_delivery_date,product_category_name,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm,product_category_name_english
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,2017-09-29 00:00:00,cool_stuff,58.0,598.0,4.0,650.0,28.0,9.0,14.0,cool_stuff
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,dd7ddc04e1b6c2c614352b383efe2d36,2017-05-03 11:05:13,239.9,19.93,f6dd3ec061db4e3987629fe6b26e5cce,delivered,2017-04-26 10:53:06,...,2017-05-15 00:00:00,pet_shop,56.0,239.0,2.0,30000.0,50.0,30.0,40.0,pet_shop
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,5b51032eddd242adc84c38acab88f23d,2018-01-18 14:48:30,199.0,17.87,6489ae5e4333f3693df5ad4372dab6d3,delivered,2018-01-14 14:33:31,...,2018-02-05 00:00:00,moveis_decoracao,59.0,695.0,2.0,3050.0,33.0,13.0,33.0,furniture_decor
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,9d7a1d34a5052409006425275ba1c2b4,2018-08-15 10:10:18,12.99,12.79,d4eb9395c8c0431ee92fce09860c5a06,delivered,2018-08-08 10:00:35,...,2018-08-20 00:00:00,perfumaria,42.0,480.0,1.0,200.0,16.0,10.0,15.0,perfumery
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,df560393f3a51e74553ab94004ba5c87,2017-02-13 13:57:51,199.9,18.14,58dbd0b2d70206bf40e62cd34e84d795,delivered,2017-02-04 13:57:51,...,2017-03-17 00:00:00,ferramentas_jardim,59.0,409.0,1.0,3750.0,35.0,40.0,30.0,garden_tools


In [22]:
# Menilai kualitas data Orders Payments

check_dataset_info(df_orders_payments, "Orders_Payments")


Dataset: Orders_Payments
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103887 entries, 0 to 103886
Data columns (total 12 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   order_id                       103887 non-null  object 
 1   customer_id                    103887 non-null  object 
 2   order_status                   103887 non-null  object 
 3   order_purchase_timestamp       103887 non-null  object 
 4   order_approved_at              103712 non-null  object 
 5   order_delivered_carrier_date   101999 non-null  object 
 6   order_delivered_customer_date  100755 non-null  object 
 7   order_estimated_delivery_date  103887 non-null  object 
 8   payment_sequential             103886 non-null  float64
 9   payment_type                   103886 non-null  object 
 10  payment_installments           103886 non-null  float64
 11  payment_value                  103

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,payment_sequential,payment_type,payment_installments,payment_value
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,1.0,credit_card,1.0,18.12
1,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,3.0,voucher,1.0,2.0
2,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,2.0,voucher,1.0,18.59
3,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,1.0,boleto,1.0,141.46
4,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,1.0,credit_card,3.0,179.12


In [23]:
# Menilai kualitas data Orders Reviews

check_dataset_info(df_orders_reviews, "Orders_Reviews")


Dataset: Orders_Reviews
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99992 entries, 0 to 99991
Data columns (total 14 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   order_id                       99992 non-null  object 
 1   customer_id                    99992 non-null  object 
 2   order_status                   99992 non-null  object 
 3   order_purchase_timestamp       99992 non-null  object 
 4   order_approved_at              99831 non-null  object 
 5   order_delivered_carrier_date   98199 non-null  object 
 6   order_delivered_customer_date  97005 non-null  object 
 7   order_estimated_delivery_date  99992 non-null  object 
 8   review_id                      99224 non-null  object 
 9   review_score                   99224 non-null  float64
 10  review_comment_title           11568 non-null  object 
 11  review_comment_message         40977 non-null  obj

Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,review_id,review_score,review_comment_title,review_comment_message,review_creation_date,review_answer_timestamp
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,a54f0611adc9ed256b57ede6b6eb5114,4.0,,"Não testei o produto ainda, mas ele veio corre...",2017-10-11 00:00:00,2017-10-12 03:43:48
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,8d5266042046a06655c8db133d120ba5,4.0,Muito boa a loja,Muito bom o produto.,2018-08-08 00:00:00,2018-08-08 18:37:50
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,e73b67b67587f7644d5bd1a52deb1b01,5.0,,,2018-08-18 00:00:00,2018-08-22 19:07:58
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,359d03e676b3c069f62cadba8dd3f6e8,5.0,,O produto foi exatamente o que eu esperava e e...,2017-12-03 00:00:00,2017-12-05 19:21:58
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,e50934924e227544ba8246aeb3770dd4,5.0,,,2018-02-17 00:00:00,2018-02-18 13:02:51


In [24]:
# Menilai kualitas data Orders Sellers

check_dataset_info(df_orders_sellers, "Orders_Sellers")


Dataset: Orders_Sellers
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16252672 entries, 0 to 16252671
Data columns (total 31 columns):
 #   Column                         Dtype  
---  ------                         -----  
 0   order_id                       object 
 1   order_item_id                  int64  
 2   product_id                     object 
 3   seller_id                      object 
 4   shipping_limit_date            object 
 5   price                          float64
 6   freight_value                  float64
 7   customer_id                    object 
 8   order_status                   object 
 9   order_purchase_timestamp       object 
 10  order_approved_at              object 
 11  order_delivered_carrier_date   object 
 12  order_delivered_customer_date  object 
 13  order_estimated_delivery_date  object 
 14  product_category_name          object 
 15  product_name_lenght            float64
 16  product_description_lenght     f

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value,customer_id,order_status,order_purchase_timestamp,...,product_width_cm,product_category_name_english,seller_zip_code_prefix,seller_city,seller_state,geolocation_zip_code_prefix,geolocation_lat,geolocation_lng,geolocation_city,geolocation_state
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.498183,-44.123614,volta redonda,RJ
1,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.487885,-44.131566,volta redonda,RJ
2,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.501227,-44.132443,volta redonda,RJ
3,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.500389,-44.124773,volta redonda,RJ
4,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,2017-09-19 09:45:35,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,...,14.0,cool_stuff,27277,volta redonda,SP,27277.0,-22.499963,-44.127571,volta redonda,RJ


**Insight:**
1. Orders Customers
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Kolom customer_zip_code_prefix bertipe data int, seharusnya string
   - Terdapat 3 kolom yang mengandung missing values
   - Tidak ada kolom yang memiliki nilai yang duplikasi
2. Orders Items
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 12 kolom yang mengandung missing values
   - Tidak ada kolom yang memiliki nilai yang duplikasi
3. Orders Payments
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 7 kolom yang mengandung missing values
   - Tidak ada kolom yang memiliki nilai yang duplikasi
4. Orders Reviews
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 10 kolom yang mengandung missing values
   - Tidak ada kolom yang memiliki nilai yang duplikasi
5. Orders Sellers
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 17 kolom yang mengandung missing values
   - Terdapat kolom yang memiliki nilai yang duplikasi

### Cleaning Data

#### Cleaning Data Orders Customers

In [25]:
# Mengubah tipe data ke datetime
df_orders_customers['order_purchase_timestamp'] = pd.to_datetime(df_orders_customers['order_purchase_timestamp'])

In [None]:
# Hanya menggunakan kolom yang diperlukan
df_orders_customers = df_orders_customers[["order_id", "order_purchase_timestamp"]]

In [29]:
# Menambahkan kolom waktu agregasi
df_orders_customers['order_date'] = df_orders_customers['order_purchase_timestamp'].dt.date
df_orders_customers['order_month'] = df_orders_customers['order_purchase_timestamp'].dt.to_period('M')

In [30]:
# Menampilkan hasil setelah cleaning
df_orders_customers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99441 entries, 0 to 99440
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   order_id                  99441 non-null  object        
 1   order_purchase_timestamp  99441 non-null  datetime64[ns]
 2   order_date                99441 non-null  object        
 3   order_month               99441 non-null  period[M]     
dtypes: datetime64[ns](1), object(2), period[M](1)
memory usage: 3.0+ MB


#### Cleaning Data Orders Items

In [37]:
# Menghapus kolom yang tidak diperlukan

columns_to_drop = [
    'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date',
    'order_approved_at', 'shipping_limit_date', 'order_status',
    'product_name_lenght', 'product_description_lenght', 'product_photos_qty',
    'product_weight_g', 'product_length_cm', 'product_height_cm', 'product_width_cm'
]

df_orders_items.drop(columns=columns_to_drop, inplace=True)

In [38]:
# Menghapus missing value

df_orders_items.dropna(subset=['product_category_name', 'product_category_name_english'], inplace=True)

In [39]:
df_orders_items.info()

<class 'pandas.core.frame.DataFrame'>
Index: 111023 entries, 0 to 112649
Data columns (total 10 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   order_id                       111023 non-null  object 
 1   order_item_id                  111023 non-null  int64  
 2   product_id                     111023 non-null  object 
 3   seller_id                      111023 non-null  object 
 4   price                          111023 non-null  float64
 5   freight_value                  111023 non-null  float64
 6   customer_id                    111023 non-null  object 
 7   order_purchase_timestamp       111023 non-null  object 
 8   product_category_name          111023 non-null  object 
 9   product_category_name_english  111023 non-null  object 
dtypes: float64(2), int64(1), object(7)
memory usage: 9.3+ MB


#### Cleaning Data Orders Payments

In [41]:
# Hapus kolom yang tidak relevan
columns_to_drop = [
    "order_status", "order_purchase_timestamp", "order_approved_at", 
    "order_delivered_carrier_date", "order_delivered_customer_date", "order_estimated_delivery_date"
]

df_orders_payments.drop(columns=columns_to_drop, inplace=True)

In [42]:
# Menghapus missing value

df_orders_payments.dropna(subset=["payment_type", "payment_sequential", "payment_installments", "payment_value"], inplace=True)

In [43]:
# Mengubah tipe data
df_orders_payments["payment_sequential"] = df_orders_payments["payment_sequential"].astype(int)
df_orders_payments["payment_installments"] = df_orders_payments["payment_installments"].astype(int)

In [44]:
df_orders_payments.info()

<class 'pandas.core.frame.DataFrame'>
Index: 103886 entries, 0 to 103886
Data columns (total 6 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   order_id              103886 non-null  object 
 1   customer_id           103886 non-null  object 
 2   payment_sequential    103886 non-null  int64  
 3   payment_type          103886 non-null  object 
 4   payment_installments  103886 non-null  int64  
 5   payment_value         103886 non-null  float64
dtypes: float64(1), int64(2), object(3)
memory usage: 5.5+ MB


#### Cleaning Data Orders Reviews

In [45]:
# Hapus kolom yang tidak relevan
columns_to_drop = [
    "order_status", "order_purchase_timestamp", "order_approved_at", 
    "order_delivered_carrier_date", "order_delivered_customer_date", "order_estimated_delivery_date", "review_id", "review_comment_title",
    "review_comment_message", "review_creation_date", "review_answer_timestamp"
]

df_orders_reviews.drop(columns=columns_to_drop, inplace=True)

In [46]:
df_orders_reviews.dropna(subset=["review_score"], inplace=True)

In [48]:
df_orders_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 99224 entries, 0 to 99991
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   order_id      99224 non-null  object 
 1   customer_id   99224 non-null  object 
 2   review_score  99224 non-null  float64
dtypes: float64(1), object(2)
memory usage: 3.0+ MB


#### Cleaning Data Orders Sellers

In [53]:
columns_to_drop = [
    "order_approved_at", "order_delivered_carrier_date", "order_delivered_customer_date",
    "order_estimated_delivery_date", "product_category_name", "product_name_lenght",
    "product_description_lenght", "product_photos_qty", "product_weight_g",
    "product_length_cm", "product_height_cm", "product_width_cm",
    "product_category_name_english"
]
df_orders_sellers.drop(columns=columns_to_drop, inplace=True)

KeyError: "['order_approved_at', 'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date', 'product_category_name', 'product_name_lenght', 'product_description_lenght', 'product_photos_qty', 'product_weight_g', 'product_length_cm', 'product_height_cm', 'product_width_cm', 'product_category_name_english'] not found in axis"

In [50]:
df_orders_sellers.drop_duplicates(inplace=True)

In [51]:
df_orders_sellers.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12241964 entries, 0 to 16252671
Data columns (total 18 columns):
 #   Column                       Dtype  
---  ------                       -----  
 0   order_id                     object 
 1   order_item_id                int64  
 2   product_id                   object 
 3   seller_id                    object 
 4   shipping_limit_date          object 
 5   price                        float64
 6   freight_value                float64
 7   customer_id                  object 
 8   order_status                 object 
 9   order_purchase_timestamp     object 
 10  seller_zip_code_prefix       int64  
 11  seller_city                  object 
 12  seller_state                 object 
 13  geolocation_zip_code_prefix  float64
 14  geolocation_lat              float64
 15  geolocation_lng              float64
 16  geolocation_city             object 
 17  geolocation_state            object 
dtypes: float64(5), int64(2), object(11)
memory us

#### Merge to One DataFrame

In [54]:
all_df = df_orders_items.merge(df_orders_payments, on=["order_id", "customer_id"], how="outer")
all_df = all_df.merge(df_orders_reviews, on=["order_id", "customer_id"], how="outer")
all_df = all_df.merge(df_orders_sellers, on=["order_id", "customer_id"], how="outer")

In [57]:
# Pilih kolom yang relevan
all_df = all_df[[
    "order_id", "order_purchase_timestamp_x", "product_category_name_english", "price_x", "payment_type", "review_score", "seller_city", "seller_state"
]]

all_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17997000 entries, 0 to 17996999
Data columns (total 8 columns):
 #   Column                         Dtype  
---  ------                         -----  
 0   order_id                       object 
 1   order_purchase_timestamp_x     object 
 2   product_category_name_english  object 
 3   price_x                        float64
 4   payment_type                   object 
 5   review_score                   float64
 6   seller_city                    object 
 7   seller_state                   object 
dtypes: float64(2), object(6)
memory usage: 1.1+ GB


In [59]:
check_dataset_info(all_df, "All dataframe")


Dataset: All dataframe
------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17997000 entries, 0 to 17996999
Data columns (total 8 columns):
 #   Column                         Dtype  
---  ------                         -----  
 0   order_id                       object 
 1   order_purchase_timestamp_x     object 
 2   product_category_name_english  object 
 3   price_x                        float64
 4   payment_type                   object 
 5   review_score                   float64
 6   seller_city                    object 
 7   seller_state                   object 
dtypes: float64(2), object(6)
memory usage: 1.1+ GB
None

Missing values:
 order_id                              0
order_purchase_timestamp_x       179211
product_category_name_english    179211
price_x                          179211
payment_type                       1116
review_score                     183529
seller_city                         833
seller_state                        

Unnamed: 0,order_id,order_purchase_timestamp_x,product_category_name_english,price_x,payment_type,review_score,seller_city,seller_state
0,00010242fe8c5a6d1ba2dd792cb16214,2017-09-13 08:59:02,cool_stuff,58.9,credit_card,5.0,volta redonda,SP
1,00010242fe8c5a6d1ba2dd792cb16214,2017-09-13 08:59:02,cool_stuff,58.9,credit_card,5.0,volta redonda,SP
2,00010242fe8c5a6d1ba2dd792cb16214,2017-09-13 08:59:02,cool_stuff,58.9,credit_card,5.0,volta redonda,SP
3,00010242fe8c5a6d1ba2dd792cb16214,2017-09-13 08:59:02,cool_stuff,58.9,credit_card,5.0,volta redonda,SP
4,00010242fe8c5a6d1ba2dd792cb16214,2017-09-13 08:59:02,cool_stuff,58.9,credit_card,5.0,volta redonda,SP


**Insight:**
1. Orders Customers
   - Mengubah beberapa kolom yang seharusnya memiliki tipe data date
   - Mengubah tipe data pada kolom customer_zip_code_prefix menjadi bertipe data string
   - Mengisi missing value menjadi salah satu kategori
2. Orders Items
   - Menghapus beberapa kolom yang tidak digunakan
   - Menghapus missing value
3. Orders Payments
   - Menghapus beberapa kolom yang tidak digunakan
   - Menghapus missing value
   - Mengubah tipe data
4. Orders Reviews
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 10 kolom yang mengandung missing values
   - Tidak ada kolom yang memiliki nilai yang duplikasi
5. Orders Sellers
   - Terdapat beberapa kolom yang masih memiliki tipedata object, seharusnya tipedata date
   - Terdapat 17 kolom yang mengandung missing values
   - Terdapat kolom yang memiliki nilai yang duplikasi

## Exploratory Data Analysis (EDA)

### Explore ...

**Insight:**
- xxx
- xxx

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2