# Dive Deeper: Supermarket Sales Analysis

> The growth of supermarkets in most populated cities are increasing and market competitions are also high. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. [Source: Kaggle](https://www.kaggle.com/aungpyaeap/supermarket-sales)

## Import Library and Dataset

In [167]:
# import library
import pandas as pd

Silakan import data CSV bernama `supermarket_sales.csv`. Data ini diperoleh dari Kaggle dan hanya diambil beberapa kolom untuk kebutuhan analisis.

In [168]:
# import dataset, simpan ke object supermarket
supermarket = pd.read_csv('./supermarket_sales.csv')

In [169]:
# cek informasi data
supermarket.head()

Unnamed: 0,invoice_id,city,customer,gender,product_line,unit_price,quantity,date,time,payment,rating
0,750-67-8428,Yangon,Member,Female,Health and beauty,74.69,7,1/5/2019,13:08,Ewallet,9.1
1,226-31-3081,Naypyitaw,Normal,Female,Electronic accessories,15.28,5,3/8/2019,10:29,Cash,9.6
2,631-41-3108,Yangon,Normal,Male,Home and lifestyle,46.33,7,3/3/2019,13:23,Credit card,7.4
3,123-19-1176,Yangon,Member,Male,Health and beauty,58.22,8,1/27/2019,20:33,Ewallet,8.4
4,373-73-7910,Yangon,Normal,Male,Sports and travel,86.31,7,2/8/2019,10:37,Ewallet,5.3


Deskripsi data:

- `invoice_id`: Computer generated sales slip invoice identification number
- `city`: Location of supercenters
- `customer`: Type of customers, recorded by Members for customers using member card and Normal for without member card
- `gender`: Gender type of customer
- `product_line`: General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel
- `unit_price`: Price of each product in dollar
- `quantity`: Number of products purchased by customer
- `date`: Date of purchase (Record available from January 2019 to March 2019)
- `time`: Purchase time (10am to 9pm)
- `payment`: Payment used by customer for purchase (3 methods are available – Cash, Credit card and Ewallet)
- `rating`: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)

## Data Pre-processing

❓ Apakah seluruh kolom di atas sudah memiliki tipe data yang sesuai? Jika belum, kolom apa saja yang perlu diubah?


In [170]:
supermarket.nunique()

invoice_id      1000
city               3
customer           2
gender             2
product_line       6
unit_price       943
quantity          10
date              89
time             506
payment            3
rating            61
dtype: int64

In [171]:
# cek tipe data
supermarket.dtypes

invoice_id       object
city             object
customer         object
gender           object
product_line     object
unit_price      float64
quantity          int64
date             object
time             object
payment          object
rating          float64
dtype: object

> Kolom yang perlu diganti:
- `city` --> category
- `customer` --> category
- `gender` --> category
- `product_line` --> category
- `payment` --> category
- `date` + `time` --> datetime

In [172]:
# ubah tipe data jadi category
category_cols = ['city', 'customer', 'gender', 'product_line', 'payment']
supermarket[category_cols] = \
supermarket[category_cols].astype('category')


In [173]:
supermarket['datetime'] = supermarket['date'] + ' ' + supermarket['time']
supermarket['datetime'] = pd.to_datetime(supermarket['datetime'])
supermarket.dtypes

invoice_id              object
city                  category
customer              category
gender                category
product_line          category
unit_price             float64
quantity                 int64
date                    object
time                    object
payment               category
rating                 float64
datetime        datetime64[ns]
dtype: object

In [174]:
supermarket['month'] = supermarket['datetime'].dt.month_name()
supermarket['month'] = supermarket['month'].astype('category')
ordered_month = ['January', "February", "March"]
supermarket['month'] = supermarket['month'].cat.reorder_categories(ordered_month)
supermarket.dtypes

invoice_id              object
city                  category
customer              category
gender                category
product_line          category
unit_price             float64
quantity                 int64
date                    object
time                    object
payment               category
rating                 float64
datetime        datetime64[ns]
month                 category
dtype: object

## Analysis

❓ Bantulah tim marketing untuk mengetahui jenis produk apa yang paling favorit di setiap kotanya? Favorit di sini berdasarkan banyaknya transaksi yang terjadi.

In [175]:
supermarket.pivot_table(
    index='city',
    columns='product_line',
    values='invoice_id',
    aggfunc='count',
).idxmax(axis=1)

city
Mandalay     Fashion accessories
Naypyitaw     Food and beverages
Yangon        Home and lifestyle
dtype: object

❓ Bantulah tim sales untuk mengetahui **total pendapatan kotor** untuk masing-masing kota, jika pada setiap transaksi diberlakukan pajak sebesar 5%.

Hint: Hitung `total` per transaksi terlebih dahulu, kemudian tambahkan dengan besar pajaknya.

In [176]:
# Hitung total transaksi
supermarket['total'] = supermarket['unit_price'] * supermarket['quantity'] * 1.05
supermarket.head(3)

Unnamed: 0,invoice_id,city,customer,gender,product_line,unit_price,quantity,date,time,payment,rating,datetime,month,total
0,750-67-8428,Yangon,Member,Female,Health and beauty,74.69,7,1/5/2019,13:08,Ewallet,9.1,2019-01-05 13:08:00,January,548.9715
1,226-31-3081,Naypyitaw,Normal,Female,Electronic accessories,15.28,5,3/8/2019,10:29,Cash,9.6,2019-03-08 10:29:00,March,80.22
2,631-41-3108,Yangon,Normal,Male,Home and lifestyle,46.33,7,3/3/2019,13:23,Credit card,7.4,2019-03-03 13:23:00,March,340.5255


> Soal 2
- index = `city`
- columns = ``
- values = `sum`

In [177]:
# Soal 2

supermarket.pivot_table(
    index='city',
    values='total',
    aggfunc='sum'
)

Unnamed: 0_level_0,total
city,Unnamed: 1_level_1
Mandalay,106197.672
Naypyitaw,110568.7065
Yangon,106200.3705


❓ Bantulah tim customer relation untuk mengetahui rata-rata tingkat kepuasan customer secara bulanan untuk masing-masing kota. Apakah mengalami penurunan atau kenaikan?

Kolom yang digunakan: `rating`, `datetime`

In [178]:
pd.crosstab(
    index=supermarket['month'],
    columns=supermarket['city'],
    values=supermarket['rating'],
    aggfunc='mean',
)

city,Mandalay,Naypyitaw,Yangon
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
January,6.801802,7.154918,7.078151
February,7.008257,7.2,7.007447
March,6.649107,6.858491,6.993701


> Insight: Untuk setiap kota ada penurunan dari february ke maret.

___

## Self-Exploration Time ~

Pada bagian ini, kami sediakan tempat untuk Anda melakukan eksplorasi mandiri terhadap data `supermarket`. Rumuskanlah minimal **dua pertanyaan bisnis** yang menarik dari data, kemudian cobalah jawab pertanyaan tersebut menggunakan teknik-teknik yang sudah kita pelajari bersama di kelas. Sertakan insight menarik dalam bentuk narasi dan ceritakanlah di depan kelas :)

Sebagai panduan, berikut adalah teknik-teknik yang dapat Anda gunakan:

- Conditional subsetting (filter): `dataframe[kondisi]`
- Extract and transform `datetime64` component: `.dt.COMPONENT` and `.dt.to_period()`
- Frequency and aggregation table:
    - `.value_counts()`
    - `pd.crosstab()`
    - `pd.pivot_table()`
- Sorting table: `.sort_values()`

**Pertanyaan 1:** Kita ingin membuat strategi marketing di masing-masing kota di segment penjualan product dengan kuantitas paling rendah 

- index = product_line 
- column = city 
- values = quantity 
- aggfunc=sum 

In [181]:
# code here
supermarket.pivot_table(
    index = 'product_line',
    columns = 'city',
    values = 'quantity',
    aggfunc='sum'
).idxmin() 

city
Mandalay     Food and beverages
Naypyitaw    Home and lifestyle
Yangon        Health and beauty
dtype: object

> **📈 Insight:** ...

**Pertanyaan 2:** Mencari payment yang paling sering digunakan di setiap kota.

In [185]:
# code here
supermarket.pivot_table( 
    index = 'payment', 
    columns = 'city', 
    values = 'rating', 
    aggfunc = 'count' 
).idxmax() 

city
Mandalay     Ewallet
Naypyitaw       Cash
Yangon       Ewallet
dtype: object

> **📈 Insight:** ...