# Predicting Item Consumption with XGBoost

## Project Introduction

This project aims to predict the monthly consumption of various items using historical consumption data from the past three years. The predictions will help in inventory management and demand forecasting. We will use XGBoost, a powerful machine learning algorithm, to build our predictive models.


In [None]:
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load and prepare data
data = pd.read_csv('data.csv')


## Data Preparation

We start by reshaping the data to have a proper format for our analysis. We melt the data frame to have monthly consumption in a single column, then extract and correct the year and month columns.


In [None]:
# Reshape data
monthly_data = pd.melt(data, id_vars=['ITEM DESCRIPTION'],
                       value_vars=[col for col in data.columns if 'JANUARY' in col or 'FEBRUARY' in col or 'MARCH' in col or 'APRIL' in col or 'MAY' in col or 'JUNE' in col or 'JULY' in col or 'AUGUST' in col or 'SEPTEMBER' in col or 'OCTOBER' in col or 'NOVEMBER' in col or 'DECEMBER' in col],
                       var_name='Month_Year', value_name='Consumption')

# Extract year and month
monthly_data['Year'] = monthly_data['Month_Year'].str.split('_').str[0]
monthly_data['Month'] = monthly_data['Month_Year'].str.split('_').str[1]

# Correct the reversed columns
monthly_data['Year'], monthly_data['Month'] = monthly_data['Month'], monthly_data['Year']

# Create a date column
monthly_data['Date'] = pd.to_datetime(monthly_data['Month'] + ' ' + monthly_data['Year'], format='%B %Y')

# Drop unnecessary columns
monthly_data = monthly_data.drop(columns=['Month_Year'])

# Sort by date
monthly_data = monthly_data.sort_values(by='Date')

# Set Date as index
monthly_data.set_index('Date', inplace=True)
monthly_data.head()


## Data Exploration and Visualization

We explore the data to understand the consumption patterns and visualize them to identify any trends or seasonality.


In [None]:
# Plot consumption patterns for a sample item
sample_item = monthly_data['ITEM DESCRIPTION'].unique()[0]
sample_data = monthly_data[monthly_data['ITEM DESCRIPTION'] == sample_item]

plt.figure(figsize=(14, 7))
plt.plot(sample_data.index, sample_data['Consumption'], marker='o', linestyle='-', color='deepskyblue' )
plt.title(f'Consumption Pattern for {sample_item}')
plt.xlabel('Date')
plt.ylabel('Consumption')
plt.grid(True)
plt.show()


In [None]:
# Prepare for each item
items = monthly_data['ITEM DESCRIPTION'].unique()

future_predictions_all_items = []

## Model Training and Evaluation

We use XGBoost to train our model. The data is split into training and test sets to evaluate the model's performance. The mean squared error (MSE) is used as the evaluation metric.


In [None]:
for item in items:
    # Filter data for the current item
    item_data = monthly_data[monthly_data['ITEM DESCRIPTION'] == item].copy()

    # Prepare features and target
    item_data['Month'] = item_data.index.month
    item_data['Year'] = item_data.index.year
    item_data['Day_of_Week'] = item_data.index.dayofweek

    # Example target variable: Predict next month's consumption
    item_data['Next_Consumption'] = item_data['Consumption'].shift(-1)
    item_data = item_data.dropna()

    # Features and target
    X = item_data[['Month', 'Year', 'Day_of_Week']]
    y = item_data['Next_Consumption']

    # Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

    # Initialize XGBoost regressor
    model = xgb.XGBRegressor(objective='reg:squarederror')

    # Train the model
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f'Item: {item} - Mean Squared Error: {mse}')

    # Predict next month's consumption for the next 12 months
    future_dates = pd.date_range(start=item_data.index.max() + pd.DateOffset(months=1), periods=12, freq='ME')
    future_features = pd.DataFrame({
        'Month': future_dates.month,
        'Year': future_dates.year,
        'Day_of_Week': future_dates.dayofweek
    })

    # Predict future consumption
    future_predictions = model.predict(future_features)

    # Create DataFrame for future predictions
    future_df = pd.DataFrame({'Date': future_dates, 'Item': item, 'Predicted_Consumption': future_predictions})
    future_predictions_all_items.append(future_df)

# Combine all future predictions into a single DataFrame
final_predictions = pd.concat(future_predictions_all_items)

## Save predictions to CSV

In [None]:
# Save predictions to CSV
final_predictions.to_csv('future_predictions.csv', index=False)
print("Predictions saved to 'future_predictions.csv'")

# Display the first few rows of the predictions
final_predictions.head()

## Detailed Comparison and Visualization
In this section, we will compare the historical consumption and the predicted consumption for a specific item (e.g., the first item in the dataset). This will help us understand how well the model performs and visualize the consumption trends.

In [None]:
import matplotlib.pyplot as plt

# Choose an item to plot (you can change this to any item in your dataset)
item_to_plot = items[0]  # This selects the first item in the list

# Filter data for the chosen item
historical_data = monthly_data[monthly_data['ITEM DESCRIPTION'] == item_to_plot]
predicted_data = final_predictions[final_predictions['Item'] == item_to_plot]

# Create the plot
plt.figure(figsize=(15, 7))

# Plot historical data
plt.plot(historical_data.index, historical_data['Consumption'], label='Historical', color='springgreen', marker='o')

# Plot predicted data
plt.plot(predicted_data['Date'], predicted_data['Predicted_Consumption'], label='Predicted', color='deepskyblue', marker='o')

# Add labels and title
plt.title(f'Consumption Pattern for {item_to_plot}')
plt.xlabel('Date')
plt.ylabel('Consumption')

# Add legend
plt.legend()

# Show grid
plt.grid(True, linestyle='--', alpha=0.7)


# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

# Adjust layout to prevent cutting off labels
plt.tight_layout()

# Show the plot
plt.show()

# Print some information about the plot
print(f"Plotting consumption for: {item_to_plot}")
print(f"Historical data range: {historical_data.index.min()} to {historical_data.index.max()}")
print(f"Prediction range: {predicted_data['Date'].min()} to {predicted_data['Date'].max()}")