# B2B Courier Charge Accuracy Analysis using Machine Learning

This notebook analyzes and predicts B2B courier charges using machine learning.
Predicted charges are compared with actual charges to identify billing inconsistencies.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [3]:
df=pd.read_csv('/content/courier_data.csv')
df.head()


Unnamed: 0,shipment_weight,origin_pincode,destination_pincode,delivery_zone,shipment_type,actual_charge
0,1.3,121003,507101,d,Forward charges,135.0
1,1.0,121003,486886,d,Forward charges,90.2
2,2.5,121003,532484,d,Forward charges,224.6
3,1.0,121003,143001,b,Forward charges,61.3
4,0.15,121003,515591,d,Forward charges,45.4


In [4]:
df.shape

(124, 6)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124 entries, 0 to 123
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   shipment_weight      124 non-null    float64
 1   origin_pincode       124 non-null    int64  
 2   destination_pincode  124 non-null    int64  
 3   delivery_zone        124 non-null    object 
 4   shipment_type        124 non-null    object 
 5   actual_charge        124 non-null    float64
dtypes: float64(2), int64(2), object(2)
memory usage: 5.9+ KB


In [6]:
df.describe()

Unnamed: 0,shipment_weight,origin_pincode,destination_pincode,actual_charge
count,124.0,124.0,124.0,124.0
mean,0.956048,121003.0,365488.072581,110.066129
std,0.662815,0.0,152156.32213,64.060832
min,0.15,121003.0,140301.0,33.0
25%,0.6675,121003.0,302017.0,86.7
50%,0.725,121003.0,321304.5,90.2
75%,1.1,121003.0,405102.25,135.0
max,4.13,121003.0,845438.0,403.8


In [7]:
df.isnull().sum()

Unnamed: 0,0
shipment_weight,0
origin_pincode,0
destination_pincode,0
delivery_zone,0
shipment_type,0
actual_charge,0


## Dataset Understanding

The dataset contains shipment-level B2B courier information such as shipment weight,
origin and destination pincodes, delivery zone, shipment type, and actual billing amount.
Initial exploration was performed to understand structure, data types, and missing values.


In [9]:
df=df.dropna()

In [10]:
df_encoded=pd.get_dummies(df,columns=['delivery_zone', 'shipment_type'],drop_first=True)


In [11]:
X=df_encoded.drop('actual_charge',axis=1)
y=df_encoded['actual_charge']
X.shape, y.shape


((124, 6), (124,))

## Data Cleaning and Preprocessing

Missing values were removed, categorical variables were encoded using one-hot encoding,
and the dataset was split into features (X) and target variable (y) to prepare for
machine learning model training.
