B2B Courier Charges Accuracy Analysis using Python

Let's start this task by programming the necessary Python libraries and the dataset

In [1]:
import pandas as pd

In [17]:
order_report = pd.read_csv('Datasets/Order_Report.csv')
sku_master = pd.read_csv('Datasets/SKU_Master.csv')
pincode_mapping = pd.read_csv('Datasets/pincodes.csv')
courier_invoice = pd.read_csv('Datasets/Invoice.csv')
courier_company_rates = pd.read_csv('Datasets/Courier_Company_Rates.csv')

In [27]:
print("Order Report:")
print(order_report.head())

Order Report:
   ExternOrderNo            SKU  Order Qty  Unnamed: 3  Unnamed: 4
0     2001827036  8904223818706        1.0         NaN         NaN
1     2001827036  8904223819093        1.0         NaN         NaN
2     2001827036  8904223819109        1.0         NaN         NaN
3     2001827036  8904223818430        1.0         NaN         NaN
4     2001827036  8904223819277        1.0         NaN         NaN


In [30]:
print("SKU Master:")
print(sku_master.head());

SKU Master:
             SKU  Weight (g)  Unnamed: 2  Unnamed: 3  Unnamed: 4
0  8904223815682         210         NaN         NaN         NaN
1  8904223815859         165         NaN         NaN         NaN
2  8904223815866         113         NaN         NaN         NaN
3  8904223815873          65         NaN         NaN         NaN
4  8904223816214         120         NaN         NaN         NaN


In [31]:
print("Pincode Mapping:")
pincode_mapping.head()

Pincode Mapping:


Unnamed: 0,Warehouse Pincode,Customer Pincode,Zone,Unnamed: 3,Unnamed: 4
0,121003,507101,d,,
1,121003,486886,d,,
2,121003,532484,d,,
3,121003,143001,b,,
4,121003,515591,d,,


In [32]:
print("Courier Invoice:")
print(courier_invoice.head())

Courier Invoice:
        AWB Code    Order ID  Charged Weight  Warehouse Pincode  \
0  1091117222124  2001806232            1.30             121003   
1  1091117222194  2001806273            1.00             121003   
2  1091117222931  2001806408            2.50             121003   
3  1091117223244  2001806458            1.00             121003   
4  1091117229345  2001807012            0.15             121003   

   Customer Pincode Zone Type of Shipment  Billing Amount (Rs.)  
0            507101    d  Forward charges                 135.0  
1            486886    d  Forward charges                  90.2  
2            532484    d  Forward charges                 224.6  
3            143001    b  Forward charges                  61.3  
4            515591    d  Forward charges                  45.4  


In [33]:
print("Courier Company rates:")
courier_company_rates.head()

Courier Company rates:


Unnamed: 0,fwd_a_fixed,fwd_a_additional,fwd_b_fixed,fwd_b_additional,fwd_c_fixed,fwd_c_additional,fwd_d_fixed,fwd_d_additional,fwd_e_fixed,fwd_e_additional,rto_a_fixed,rto_a_additional,rto_b_fixed,rto_b_additional,rto_c_fixed,rto_c_additional,rto_d_fixed,rto_d_additional,rto_e_fixed,rto_e_additional
0,29.5,23.6,33,28.3,40.1,38.9,45.4,44.8,56.6,55.5,13.6,23.6,20.5,28.3,31.9,38.9,41.3,44.8,50.7,55.5


Now let's have a look if any of the data contains missing values:

In [None]:
# Check for missing values
print("Missing values in Website Order Report:")
order_report.isnull().sum()

Missing values in Website Order Report:


ExternOrderNo      0
SKU                0
Order Qty          0
Unnamed: 3       400
Unnamed: 4       400
dtype: int64

In [35]:
print("Missing values in SKU Master:")
sku_master.isnull().sum()

Missing values in SKU Master:


SKU            0
Weight (g)     0
Unnamed: 2    66
Unnamed: 3    66
Unnamed: 4    66
dtype: int64

In [37]:
print("Missing values in Pincode Mapping:")
pincode_mapping.isnull().sum()

Missing values in Pincode Mapping:


Warehouse Pincode      0
Customer Pincode       0
Zone                   0
Unnamed: 3           124
Unnamed: 4           124
dtype: int64

In [39]:
print("Missing values in Courier Invoice:")
courier_invoice.isnull().sum()

Missing values in Courier Invoice:


AWB Code                0
Order ID                0
Charged Weight          0
Warehouse Pincode       0
Customer Pincode        0
Zone                    0
Type of Shipment        0
Billing Amount (Rs.)    0
dtype: int64

In [40]:
print("Missing values in courier rates:")
courier_company_rates.isnull().sum()

Missing values in courier rates:


fwd_a_fixed         0
fwd_a_additional    0
fwd_b_fixed         0
fwd_b_additional    0
fwd_c_fixed         0
fwd_c_additional    0
fwd_d_fixed         0
fwd_d_additional    0
fwd_e_fixed         0
fwd_e_additional    0
rto_a_fixed         0
rto_a_additional    0
rto_b_fixed         0
rto_b_additional    0
rto_c_fixed         0
rto_c_additional    0
rto_d_fixed         0
rto_d_additional    0
rto_e_fixed         0
rto_e_additional    0
dtype: int64

Now let's clean the data:

In [41]:
# Remove unnamed columns from the Website Order Report DataFrame
order_report = order_report.drop(columns=['Unnamed: 3', 'Unnamed: 4'])

In [42]:
# Remove unnamed columns for the SKU Master DataFrame
sku_master = sku_master.drop(columns=['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'])

In [43]:
# Remove unnamed columns for the Pincode Mapping DataFrame
pincode_mapping = pincode_mapping.drop(columns=['Unnamed: 3', 'Unnamed: 4'])

Now let's merge the order report and SKU master datasets according to the commom SKU column:

In [44]:
# Merge the Order Report and SKU Master based on SKU
merged_data = pd.merge(order_report, sku_master, on='SKU')
merged_data.head()

Unnamed: 0,ExternOrderNo,SKU,Order Qty,Weight (g)
0,2001827036,8904223818706,1.0,127
1,2001827036,8904223819093,1.0,150
2,2001827036,8904223819109,1.0,100
3,2001827036,8904223818430,1.0,165
4,2001827036,8904223819277,1.0,350


The 'ExternOderNo' is nothing but 'Order Id' in other datasets. Let's rename this column:

In [47]:
# Rename the "ExternOrderNo" column to "Order ID" in the merged_data DataFrame
merged_data = merged_data.rename(columns={'ExternOrderNo': 'Order ID'})
merged_data.head()

Unnamed: 0,Order ID,SKU,Order Qty,Weight (g)
0,2001827036,8904223818706,1.0,127
1,2001827036,8904223819093,1.0,150
2,2001827036,8904223819109,1.0,100
3,2001827036,8904223818430,1.0,165
4,2001827036,8904223819277,1.0,350
