# 1. Business Understanding

ABC Company operates an e-commerce platform and processes thousands of orders daily. To deliver these orders, ABC has partnered with several courier companies in India, which charge them based on the weight of the products and the distance between the warehouse and the customer’s delivery address.

# 1.1 Main Objective

- Check if the fees charged by the courier companies for each order are correct.

# 1.2 Specific Objective

- Compare the total weight of each order calculated using the SKU master with the weight stated by the courier company in their invoice.
- Compare the warehouse PIN to all mappings, used to determine delivery area, is as the area reported by the courier company.

# 2. Data Understanding

## 2.1 ABC Data

ABC has data split 3 reports:

1. Website Order
2. Master SKU
3. Warehouse PIN

Website order report includes:
- Order IDs
- Product SKUs for each order

Master SKU provides the gross weight of each product, which is needed to calculate the total weight of each order.

Warehouse PIN  contains PINS for all India Pincode mappings.

## 2.2 Courier Data

Courier company invoices contain information such as:
- AWB number
- Order ID
- Shipment weight
- Warehouse pickup PIN
- Customer delivery PIN
- Delivery area
- Charge per shipment and type of shipment.








# 3. Data Wrangling

In [2]:
# importing libraries

import pandas as pd

In [24]:
# reading the data

invoice = pd.read_csv(r'C:\Users\w.selen.KEEMBLT0011\Desktop\Mercy\DataScience\B2B Ecommerce Fraud\data\raw\b2b\Invoice.csv')
sku_master = pd.read_csv(r'C:\Users\w.selen.KEEMBLT0011\Desktop\Mercy\DataScience\B2B Ecommerce Fraud\data\raw\b2b\SKU Master.csv')
pincodes = pd.read_csv(r'C:\Users\w.selen.KEEMBLT0011\Desktop\Mercy\DataScience\B2B Ecommerce Fraud\data\raw\b2b\pincodes.csv')
order_report = pd.read_csv(r'C:\Users\w.selen.KEEMBLT0011\Desktop\Mercy\DataScience\B2B Ecommerce Fraud\data\raw\b2b\Order Report.csv')
courier_rates = pd.read_csv(r'C:\Users\w.selen.KEEMBLT0011\Desktop\Mercy\DataScience\B2B Ecommerce Fraud\data\raw\b2b\Courier Company - Rates.csv')

## 3.1 Invoice Dataset

In [25]:
# previewing the data

invoice.head()

Unnamed: 0,AWB Code,Order ID,Charged Weight,Warehouse Pincode,Customer Pincode,Zone,Type of Shipment,Billing Amount (Rs.)
0,1091117222124,2001806232,1.3,121003,507101,d,Forward charges,135.0
1,1091117222194,2001806273,1.0,121003,486886,d,Forward charges,90.2
2,1091117222931,2001806408,2.5,121003,532484,d,Forward charges,224.6
3,1091117223244,2001806458,1.0,121003,143001,b,Forward charges,61.3
4,1091117229345,2001807012,0.15,121003,515591,d,Forward charges,45.4


In [42]:
# checking the shape of the data

print(f"The data has {invoice.shape[0]} rows and {invoice.shape[1]} columns")

The data has 124 rows and 8 columns


In [43]:
# checking the data types of the data

invoice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124 entries, 0 to 123
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   AWB Code              124 non-null    int64  
 1   Order ID              124 non-null    int64  
 2   Charged Weight        124 non-null    float64
 3   Warehouse Pincode     124 non-null    int64  
 4   Customer Pincode      124 non-null    int64  
 5   Zone                  124 non-null    object 
 6   Type of Shipment      124 non-null    object 
 7   Billing Amount (Rs.)  124 non-null    float64
dtypes: float64(2), int64(4), object(2)
memory usage: 7.9+ KB


- The data has 6 numeric columns, with 2 having floats and 4 having integers. The data has 2 columns with objects

In [44]:
# looking at the statistics of the different columns

invoice.describe()

Unnamed: 0,AWB Code,Order ID,Charged Weight,Warehouse Pincode,Customer Pincode,Billing Amount (Rs.)
count,124.0,124.0,124.0,124.0,124.0,124.0
mean,1091118000000.0,2001811000.0,0.956048,121003.0,365488.072581,110.066129
std,1473661.0,5167.329,0.662815,0.0,152156.32213,64.060832
min,1091117000000.0,2001806000.0,0.15,121003.0,140301.0,33.0
25%,1091117000000.0,2001807000.0,0.6675,121003.0,302017.0,86.7
50%,1091117000000.0,2001809000.0,0.725,121003.0,321304.5,90.2
75%,1091119000000.0,2001812000.0,1.1,121003.0,405102.25,135.0
max,1091122000000.0,2001827000.0,4.13,121003.0,845438.0,403.8


### 3.1.1 Data Cleaning

#### 3.1.1.1 Data Completeness

In [45]:
# checking if the data has any missing values

invoice.isna().sum()

AWB Code                0
Order ID                0
Charged Weight          0
Warehouse Pincode       0
Customer Pincode        0
Zone                    0
Type of Shipment        0
Billing Amount (Rs.)    0
dtype: int64

- The data has no missing values

#### 3.1.1.2 Data Consistency

In [48]:
# Checking if the data has any duplicates

print(f"The data has {invoice.duplicated().sum()} duplicate rows")

The data has 0 duplicate rows


#### 3.1.1.3 Data Uniformity

In [56]:
# checking unique values per column

invoice.nunique()

AWB Code                124
Order ID                124
Charged Weight           54
Warehouse Pincode         1
Customer Pincode        108
Zone                      3
Type of Shipment          2
Billing Amount (Rs.)     20
dtype: int64

- It can be noted that `Warehouse Pincode` only has one entry, hence, can be converted to an object data type

In [59]:
# converting column to object data type

invoice['Warehouse Pincode'] = invoice['Warehouse Pincode'].astype(object)

In [62]:
invoice.head()

Unnamed: 0,AWB Code,Order ID,Charged Weight,Warehouse Pincode,Customer Pincode,Zone,Type of Shipment,Billing Amount (Rs.)
0,1091117222124,2001806232,1.3,121003,507101,d,Forward charges,135.0
1,1091117222194,2001806273,1.0,121003,486886,d,Forward charges,90.2
2,1091117222931,2001806408,2.5,121003,532484,d,Forward charges,224.6
3,1091117223244,2001806458,1.0,121003,143001,b,Forward charges,61.3
4,1091117229345,2001807012,0.15,121003,515591,d,Forward charges,45.4


## 3.2 Master SKU Dataset

In [26]:
sku_master.head()

Unnamed: 0,SKU,Weight (g),Unnamed: 2,Unnamed: 3,Unnamed: 4
0,8904223815682,210,,,
1,8904223815859,165,,,
2,8904223815866,113,,,
3,8904223815873,65,,,
4,8904223816214,120,,,


## 3.3 Pincodes Dataset

In [27]:
pincodes.head()

Unnamed: 0,Warehouse Pincode,Customer Pincode,Zone,Unnamed: 3,Unnamed: 4
0,121003,507101,d,,
1,121003,486886,d,,
2,121003,532484,d,,
3,121003,143001,b,,
4,121003,515591,d,,


## 3.4 Order Report Dataset

In [28]:
order_report.head()

Unnamed: 0,ExternOrderNo,SKU,Order Qty,Unnamed: 3,Unnamed: 4
0,2001827036,8904223818706,1.0,,
1,2001827036,8904223819093,1.0,,
2,2001827036,8904223819109,1.0,,
3,2001827036,8904223818430,1.0,,
4,2001827036,8904223819277,1.0,,


## 3.5 Courier Dataset

In [29]:
courier_rates.head()

Unnamed: 0,fwd_a_fixed,fwd_a_additional,fwd_b_fixed,fwd_b_additional,fwd_c_fixed,fwd_c_additional,fwd_d_fixed,fwd_d_additional,fwd_e_fixed,fwd_e_additional,rto_a_fixed,rto_a_additional,rto_b_fixed,rto_b_additional,rto_c_fixed,rto_c_additional,rto_d_fixed,rto_d_additional,rto_e_fixed,rto_e_additional
0,29.5,23.6,33,28.3,40.1,38.9,45.4,44.8,56.6,55.5,13.6,23.6,20.5,28.3,31.9,38.9,41.3,44.8,50.7,55.5


In [30]:
data = [invoice, sku_master, pincodes, order_report, courier_rates]

In [41]:
# looking at the shape of the data

for i in data:
    #print(i)
    print(f'The data has {i.shape[0]} columns and  {i.shape[1]} rows')
    print('---------------------------------------------------')

          AWB Code    Order ID  Charged Weight  Warehouse Pincode  \
0    1091117222124  2001806232            1.30             121003   
1    1091117222194  2001806273            1.00             121003   
2    1091117222931  2001806408            2.50             121003   
3    1091117223244  2001806458            1.00             121003   
4    1091117229345  2001807012            0.15             121003   
..             ...         ...             ...                ...   
119  1091118551656  2001812941            0.73             121003   
120  1091117614452  2001809383            0.50             121003   
121  1091120922803  2001820978            0.50             121003   
122  1091121844806  2001811475            0.50             121003   
123  1091121846136  2001811305            0.50             121003   

     Customer Pincode Zone         Type of Shipment  Billing Amount (Rs.)  
0              507101    d          Forward charges                 135.0  
1              4868

In [32]:
for i in data:
    print(i.shape[0])

124
66
124
400
1
