# **BUSINESS UNDERSTANDING**

This dataset contains comprehensive information about product sales transactions, including invoice numbers, product codes, quantities, prices, discounts, and customer details along with locations. It's highly valuable for analyzing sales performance, customer trends, and the effectiveness of sales strategies across different markets.

By utilizing this dataset, several important things that can be explored are:
1. Total order per month by market and region
2. Total customers by market and region
3. Total revenue per month by market and region
4. Top 10 revenue by country
5. top 5 best-selling product categories

# **DATA UNDERSTANDING**

- E-Commerce Insights from December 1 2010 - September 30 2011. The dataset consists of 35,185 rows and 12 columns.
- Data source : https://www.kaggle.com/datasets
- Data dictionary :
1. **InvoiceNo**: Invoice number. This is a unique identifier for each sales transaction or invoice generated.
2. **StockCode**: Stock code. This is a unique identifier for each product in the inventory.
3. **Product_name**: Product name. This refers to the name of the product being sold.
4. **Category**: Category. This groups products into specific categories based on type or characteristics.
5. **InvoiceDate**: Invoice date. This indicates the date when the transaction or sale occurred.
6. **Quantity**: Quantity. This shows the number of units of the product sold in a particular transaction.
7. **UnitPrice(€)**: Unit price in Euros. This is the price per unit for each product sold.
8. **Disc_type**: Discount type. This refers to the type of discount that may be applied to the transaction, such as a percentage discount, fixed amount discount, or possibly no discount at all.
9. **CustomerID**: Customer ID. This is a unique identifier for each customer making a transaction.
10. **Country**: Country. This indicates the country of the customer or where the transaction took place.
11. **Market_id**: Market ID. This is a unique identifier for the market or market segment where the product is sold.
12. **Region**: Region. This refers to a specific geographical area or region within the country or market.

# **DATA PREPARATION**

Python Version: 3.11.6

Packages:
- Pandas

### Import Packages

In [1]:
import pandas as pd

### Load Dataset

In [2]:
df = pd.read_csv('Ecommerce_Final.csv')

### Preview Dataset

In [3]:
df.head()

Unnamed: 0,InvoiceNo,StockCode,Product_name,Category,InvoiceDate,Quantity,UnitPrice(€),Disc_type,CustomerID,Country,Market_id,Region
0,536370,POST,Reffair AX30 [MAX] Portable Air Purifier for C...,Car & Motorbike,2010-12-01 08:45:00,15,18.0,1,12583,United States,US,East
1,536370,22726,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,2010-12-01 08:45:00,118,3.75,1,12583,Australia,APAC,Oceania
2,536370,21724,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers & Accessories,2010-12-01 08:45:00,106,0.85,3,12583,Australia,APAC,Oceania
3,536370,21913,Sounce Fast Phone Charging Cable & Data Sync U...,Computers & Accessories,2010-12-01 08:45:00,96,3.75,1,12583,Germany,EU,Central
4,536370,21035,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers & Accessories,2010-12-01 08:45:00,61,2.95,1,12583,Senegal,Africa,Africa


### Info Dataset

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35185 entries, 0 to 35184
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   InvoiceNo     35185 non-null  int64  
 1   StockCode     35185 non-null  object 
 2   Product_name  35185 non-null  object 
 3   Category      35185 non-null  object 
 4   InvoiceDate   35185 non-null  object 
 5   Quantity      35185 non-null  int64  
 6   UnitPrice(€)  35185 non-null  float64
 7   Disc_type     35185 non-null  int64  
 8   CustomerID    35185 non-null  int64  
 9   Country       35185 non-null  object 
 10  Market_id     35185 non-null  object 
 11  Region        35185 non-null  object 
dtypes: float64(1), int64(4), object(7)
memory usage: 3.2+ MB


In [5]:
df.shape

(35185, 12)

#### Know the characteristics of the data by checking the amount of NULL data in each column

In [6]:
df.isnull().sum()

InvoiceNo       0
StockCode       0
Product_name    0
Category        0
InvoiceDate     0
Quantity        0
UnitPrice(€)    0
Disc_type       0
CustomerID      0
Country         0
Market_id       0
Region          0
dtype: int64