<a href="https://colab.research.google.com/github/Devadarsan80/second/blob/main/spare_demand_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem Statement:

To efficiently manage inventory, businesses need accurate forecasting of future product demand.This project aims to build a machine learning model that predicts monthly inventory demand based on historical sales patterns and time-based features.The objective is to reduce stockouts, minimize overstocking, and support data-driven inventory planning.

In [None]:
import pandas as pd
import numpy as np

In [None]:
df=pd.read_csv('/content/ml_ready_spare_demand.csv')

In [None]:
df.head()

Unnamed: 0,year,month,business_partner_name,vehicle_model,spare_category,avg_km,demand_count
0,2001,2,adesXXXXXXXXXX,BAJAJ PULSAR AS 200,General Service,27176.0,2
1,2001,2,adesXXXXXXXXXX,BAJAJ PULSAR AS 200,Engine,27176.0,1
2,2001,2,ADHIXXXXXXXXXX,BAJAJ PULSAR 150,General Service,15745.0,3
3,2001,2,ADHIXXXXXXXXXX,BAJAJ PULSAR 150,Engine,15745.0,2
4,2001,2,ADHIXXXXXXXXXX,BAJAJ PULSAR 150,Transmission,15745.0,1


Observation:

The dataset contains historical inventory demand data with time-related and categorical features. Initial inspection helped identify data types, missing values, and overall data structure required for preprocessing.

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   year                   1000 non-null   int64  
 1   month                  1000 non-null   int64  
 2   business_partner_name  1000 non-null   object 
 3   vehicle_model          1000 non-null   object 
 4   spare_category         1000 non-null   object 
 5   avg_km                 1000 non-null   float64
 6   demand_count           1000 non-null   int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 54.8+ KB


In [None]:
df.shape

(1000, 7)

In [None]:
df.describe()

Unnamed: 0,year,month,avg_km,demand_count
count,1000.0,1000.0,1000.0,1000.0
mean,2002.002,7.256,17104.097514,2.97
std,0.815267,3.459832,19287.648842,2.142823
min,2001.0,1.0,0.0,1.0
25%,2001.0,4.75,3942.5,2.0
50%,2002.0,8.0,10000.0,3.0
75%,2003.0,10.0,25000.0,3.0
max,2003.0,12.0,251447.0,27.0


In [None]:
df.isnull().sum()

Unnamed: 0,0
year,0
month,0
business_partner_name,0
vehicle_model,0
spare_category,0
avg_km,0
demand_count,0


In [None]:
df_encoded=pd.get_dummies(df,columns=['business_partner_name','vehicle_model','spare_category'],drop_first=True)

Observation:

Missing values were handled and categorical variables were encoded to convert raw business data into a machine-readable format suitable for machine learning algorithms.Time-based features such as month and year enabled the model to capture seasonality and trends, while one-hot encoding preserved categorical information without introducing bias.

In [None]:
x=df_encoded.drop('demand_count',axis=1)
y=df_encoded['demand_count']

Observation:

The dataset was split into training and testing sets to evaluate the model’s ability to generalize to unseen data and avoid overfitting.

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression
model=LinearRegression()
model.fit(x_train,y_train)

In [None]:
from sklearn.metrics import mean_absolute_error

y_pred=model.predict(x_test)
mae=mean_absolute_error(y_test,y_pred)
print('Mean Absolute Error:',mae)

Mean Absolute Error: 1.6073294480445373


In [None]:
from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor(
    n_estimators=200,
    max_depth=12,
    random_state=42
)

rf_model.fit(x_train, y_train)


Observation:

Random Forest was selected due to its ability to handle non-linear relationships, mixed feature types, and robustness against overfitting in regression problems.The model learned complex patterns in historical demand data by aggregating predictions from multiple decision trees, improving prediction stability and accuracy

In [None]:
from sklearn.metrics import mean_absolute_error

y_pred=rf_model.predict(x_test)
mae=mean_absolute_error(y_test,y_pred)
print('Mean Absolute Error:',mae)

Mean Absolute Error: 0.9009415002489715


Observation:

The Mean Absolute Error of approximately 0.90 indicates that the model’s predictions deviate from actual demand by less than one unit on average, demonstrating strong predictive performance.

In [None]:
# Create results dataframe
results = x_test.copy()
results['actual_demand'] = y_test.values
results['predicted_demand'] = y_pred

results.head()


Unnamed: 0,year,month,avg_km,business_partner_name_A suXXXXXXXXXX,business_partner_name_ABIRXXXXXXXXXX,business_partner_name_ADHIXXXXXXXXXX,business_partner_name_AMITXXXXXXXXXX,business_partner_name_AMOLXXXXXXXXXX,business_partner_name_ANIMXXXXXXXXXX,business_partner_name_ANISXXXXXXXXXX,...,vehicle_model_BAJAJ PULSAR RS 200,vehicle_model_BAJAJ V,vehicle_model_BAJAJ V150,vehicle_model_BAJAJ XCD 135,spare_category_Electrical,spare_category_Engine,spare_category_General Service,spare_category_Transmission,actual_demand,predicted_demand
521,2002,9,820.0,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,2,2.1306
737,2003,3,14006.0,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,3,3.114249
740,2003,3,25213.0,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,2,3.447273
660,2002,12,7342.0,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,3,2.695602
411,2002,4,47778.0,False,False,False,False,False,False,False,...,False,False,False,False,False,True,False,False,2,2.092773


Observation:

The most recent available data was used as input to forecast demand for the upcoming month, simulating real-world future prediction scenarios.

In [None]:
results.to_csv("inventory_demand_forecast.csv", index=False)


Observation:

Predicted demand results were stored in a CSV file, enabling easy sharing, reporting, and integration with business tools like Excel or Power BI.

In [None]:
# Use last available row as future input
next_month_input = x.iloc[[-1]]

next_month_prediction = rf_model.predict(next_month_input)

print("Next Month Predicted Demand:", round(next_month_prediction[0], 2))


Next Month Predicted Demand: 3.26


Observation:

The model successfully generated a numerical forecast representing expected inventory demand for the next month based on learned historical trends.Based on the forecast, demand is expected to remain stable around 3 units per month. Inventory managers can maintain optimal stock levels and reduce holding costs.

In [None]:
import joblib

joblib.dump(rf_model, "inventory_forecasting_model.pkl")
joblib.dump(x.columns.tolist(), "model_features.pkl")


['model_features.pkl']

Observation:

Saving both the trained model and its feature list ensures consistent predictions during deployment and allows the model to be reused without retraining.

Conclusion

The inventory demand forecasting model successfully predicts future monthly demand with high accuracy using historical data and machine learning techniques.
By leveraging Random Forest regression and proper feature engineering, the model provides reliable forecasts that support inventory optimization and cost reduction.
This project is production-ready, with saved models and exported results, making it suitable for real-world business deployment.