## **Walmart Product Recommendation System Using Ensemble Modeling**

### **Introduction**

In this project, we aim to develop a comprehensive product recommendation system tailored for Walmart's retail environment. The primary goal is to leverage machine learning techniques to optimize inventory management and enhance the customer shopping experience through targeted product recommendations.

The recommendation system is built on the foundation of demand forecasting. By analyzing historical sales data, we first identify high-demand and low-demand products. These insights are crucial not only for inventory optimization but also for driving promotional strategies aimed at boosting the sales of low-demand items.

To generate accurate and meaningful recommendations, we employ an ensemble of collaborative filtering models, namely Singular Value Decomposition (SVD) and Non-negative Matrix Factorization (NMF). This ensemble approach allows us to combine the strengths of both models, enhancing the predictive power of our recommendation engine.

The system is designed to recommend low-demand products to customers based on their purchase history and preferences, thereby improving product visibility and driving sales. Additionally, the inventory model monitors stock levels and suggests optimal refill strategies using Economic Order Quantity (EOQ) and stock level thresholds.

This Google Colab notebook walks you through the entire process, from data loading and exploration to model training, evaluation, and deployment. Each step is carefully explained to ensure clarity and facilitate understanding of the underlying concepts and methodologies.

Let's dive into the code and explore how we can make data-driven recommendations that not only satisfy customer needs but also optimize Walmart's inventory management.



#Import Required Libraries
We begin by importing the necessary libraries for data manipulation,
modeling, and evaluation.

In [7]:
!pip install scikit-surprise



In [5]:
import pandas as pd
import numpy as np
import warnings
import joblib
from surprise import SVD, NMF, Dataset, Reader, accuracy
from surprise.model_selection import train_test_split, GridSearchCV

# Suppress warnings for clean output
warnings.filterwarnings("ignore")

#Data Generation and Preparation

Load the synthetic data into a DataFrame. This data is generated for
building a recommendation system.

In [6]:
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(42)

# Define parameters for the larger synthetic dataset
num_customers = 3500
num_products = 100

# Product Categories
product_categories = ['Electronics', 'Groceries', 'Clothing', 'Home & Kitchen', 'Toys', 'Sports', 'Books', 'Beauty']

# Generate synthetic data
data = {
    'Product ID': np.random.choice(range(1, num_products + 1), num_customers),
    'Product Category': np.random.choice(product_categories, num_customers),
    'Customer ID': np.random.choice(range(1, num_customers + 1), num_customers),
    'Preferred Product Categories': np.random.choice(product_categories, num_customers),
    'Purchase Frequency': np.random.randint(1, 20, num_customers),
    'Recency of Last Purchase': np.random.randint(1, 365, num_customers),
    'Units Sold': np.random.randint(1, 100, num_customers),
    'Annual Demand': np.random.randint(500, 10000, num_customers),
    'Current Stock Level': np.random.randint(1, 500, num_customers)
}

# Create the DataFrame
synthetic_df = pd.DataFrame(data)

# Ensure some customers have multiple preferred categories
for i in range(0, num_customers, 10):
    synthetic_df.at[i, 'Preferred Product Categories'] = ', '.join(np.random.choice(product_categories, 2))

# Save the DataFrame to a CSV file in Google Colab
file_path = 'large_synthetic_walmart_recommendation_data.csv'
synthetic_df.to_csv(file_path, index=False)

# Download the file
from google.colab import files
files.download(file_path)


ModuleNotFoundError: No module named 'google.colab'

#Define the Recommendation Threshold
Define a threshold to identify low-demand products, which will be used
to generate recommendations.

In [None]:
np.random.seed(42)
df = pd.read_csv('large_synthetic_walmart_recommendation_data.csv')


low_demand_threshold = 2500  # This value can be adjusted based on business needs

# Filter the dataset to get low-demand products
low_demand_products = df[df['Annual Demand'] < low_demand_threshold]

#Create the Recommendation List
Generate recommendations by matching low-demand products with customers
who have a preference for the corresponding product category.

In [None]:
# Initialize an empty list to store the recommendations
recommendations = []

# Iterate through each low-demand product
for _, product_row in low_demand_products.iterrows():
    product_category = product_row['Product Category']

    # Find customers whose preferred categories include the product category
    for _, customer_row in df.iterrows():
        preferred_categories = customer_row['Preferred Product Categories']

        # Check if the customer's preferred categories match the product's category
        if product_category in preferred_categories:
            # Append relevant data to the recommendations list
            recommendations.append({
                'Customer ID': customer_row['Customer ID'],
                'Product ID': product_row['Product ID'],
                'Product Category': product_row['Product Category'],
                'Purchase Frequency': customer_row['Purchase Frequency'],
                'Recency of Last Purchase': customer_row['Recency of Last Purchase'],
                'Units Sold': product_row['Units Sold'],
                'Current Stock Level': product_row['Current Stock Level']
            })

# Convert the recommendations list to a DataFrame for further processing
recommendations_df = pd.DataFrame(recommendations)

# Display the first few recommendations
print("Sample recommendations:")
recommendations_df.head(10)


Sample recommendations:


Unnamed: 0,Customer ID,Product ID,Product Category,Purchase Frequency,Recency of Last Purchase,Units Sold,Current Stock Level
0,1095,61,Toys,1,53,84,484
1,195,61,Toys,6,192,84,484
2,1460,61,Toys,1,147,84,484
3,2075,61,Toys,15,130,84,484
4,1188,61,Toys,17,99,84,484
5,3218,61,Toys,5,271,84,484
6,3102,61,Toys,7,272,84,484
7,692,61,Toys,11,73,84,484
8,3122,61,Toys,3,122,84,484
9,916,61,Toys,2,182,84,484


#Evaluate Recommendation Quality
We evaluate the quality of recommendations by calculating precision and recall.
Precision is the proportion of recommended items that are actually purchased.
Recall is the proportion of actual purchases that were recommended.

In [None]:
# Simulate actual purchases (for demonstration purposes)
np.random.seed(42)
recommendations_df['Purchased'] = np.random.choice([0, 1], size=len(recommendations_df), p=[0.8, 0.2])

# Calculate Precision
precision = recommendations_df[recommendations_df['Purchased'] == 1].shape[0] / recommendations_df.shape[0]

# Simulate actual low-demand product purchase rate (for demonstration purposes)
actual_low_demand_purchases = np.random.choice([0, 1], size=len(low_demand_products), p=[0.9, 0.1])

# Calculate Recall
recall = sum(actual_low_demand_purchases) / len(low_demand_products)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Precision: 0.20
Recall: 0.10


#Train Collaborative Filtering Models
Use collaborative filtering techniques such as SVD and NMF to generate recommendations.
These techniques factorize the matrix of customer-product interactions and predict ratings for products not yet purchased by a customer.

In [None]:
# Convert the data to Surprise's required format
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['Customer ID', 'Product ID', 'Units Sold']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.25)

#Model Training with SVD
Train a Singular Value Decomposition (SVD) model to predict product ratings.

In [None]:

algo_svd = SVD()
algo_svd.fit(trainset)

# Evaluate the model on the test set
predictions_svd = algo_svd.test(testset)
accuracy.rmse(predictions_svd)

RMSE: 53.5662


53.5662260330945

#Model Training with NMF
Train a Non-negative Matrix Factorization (NMF) model to predict product ratings.

In [None]:
algo_nmf = NMF()
algo_nmf.fit(trainset)

# Evaluate the model on the test set
predictions_nmf = algo_nmf.test(testset)
accuracy.rmse(predictions_nmf)

RMSE: 53.9340


53.93402901509188

#Perform Grid Search for Hyperparameter Tuning
Use GridSearchCV to find the best hyperparameters for both SVD and NMF models.


In [None]:
# Define parameter grid for SVD
param_grid_svd = {
    'n_factors': [20, 50, 100],
    'reg_all': [0.02, 0.05, 0.1, 0.2]
}

# Define parameter grid for NMF
param_grid_nmf = {
    'n_factors': [20, 50, 100],
    'reg_pu': [0.02, 0.05, 0.1],
    'reg_qi': [0.02, 0.05, 0.1]
}

# Grid Search for SVD
gs_svd = GridSearchCV(SVD, param_grid_svd, measures=['rmse'], cv=5)
gs_svd.fit(data)
best_svd = gs_svd.best_estimator['rmse']

# Grid Search for NMF
gs_nmf = GridSearchCV(NMF, param_grid_nmf, measures=['rmse'], cv=5)
gs_nmf.fit(data)
best_nmf = gs_nmf.best_estimator['rmse']


#Ensemble of SVD and NMF
Combine the predictions from both SVD and NMF using an ensemble method
to leverage the strengths of both models.

In [None]:
class EnsembleRegressor:
    def __init__(self, algorithms):
        self.algorithms = algorithms

    def fit(self, trainset):
        for algo in self.algorithms:
            algo.fit(trainset)

    def predict(self, testset):
        predictions = []
        for algo in self.algorithms:
            algo_predictions = algo.test(testset)
            predictions.append([pred.est for pred in algo_predictions])
        avg_predictions = np.mean(predictions, axis=0)
        final_predictions = [(pred[0], pred[1], avg_predictions[i], pred[2], None) for i, pred in enumerate(testset)]
        return final_predictions

ensemble = EnsembleRegressor([best_svd, best_nmf])
ensemble.fit(trainset)
ensemble_predictions = ensemble.predict(testset)

# Evaluate the ensemble model
ensemble_rmse = accuracy.rmse(ensemble_predictions)
ensemble_mae = accuracy.mae(ensemble_predictions)

print(f'Ensemble RMSE: {ensemble_rmse}')
print(f'Ensemble MAE: {ensemble_mae}')

RMSE: 53.5979
MAE:  45.5931
Ensemble RMSE: 53.5979288010649
Ensemble MAE: 45.593142599708564


#Save the Model
Save the trained ensemble model to a file for later use.

In [None]:
import joblib
joblib.dump(ensemble, 'ensemble_recommendation_model.joblib')

# The model can be loaded using:
# model = joblib.load('ensemble_recommendation_model.joblib')

['ensemble_recommendation_model.joblib']

#Function to get Recommendations

In [None]:
# Function to Get Recommendations Using the Trained Model
def get_recommendations_for_customer(customer_id, df, model, n_recommendations=15):
    """
    This function generates a list of recommended products for a given customer ID
    using the trained ensemble model.

    Parameters:
    - customer_id (int): The ID of the customer for whom to generate recommendations.
    - df (DataFrame): The dataframe containing customer and product information.
    - model (EnsembleRegressor): The trained ensemble model.
    - n_recommendations (int): The number of recommendations to return (default is 15).

    Returns:
    - DataFrame: A dataframe containing the recommended products with predicted ratings.
    """

    # Get all product IDs the customer has not yet purchased
    purchased_products = df[df['Customer ID'] == customer_id]['Product ID'].unique()
    all_products = df['Product ID'].unique()
    products_to_predict = [prod for prod in all_products if prod not in purchased_products]

    # Generate predictions for these products
    predictions = []
    for product_id in products_to_predict:
        prediction = model.predict([(customer_id, product_id, 0)])
        predictions.append((product_id, prediction[0][2]))  # Extracting the predicted rating

    # Convert predictions to a DataFrame
    predictions_df = pd.DataFrame(predictions, columns=['Product ID', 'Predicted Rating'])

    # Sort the predictions by rating
    recommendations_df = predictions_df.sort_values(by='Predicted Rating', ascending=False).head(n_recommendations)

    # Merge with the original dataframe to get additional product details (optional)
    recommendations_df = recommendations_df.merge(df[['Product ID', 'Product Category']].drop_duplicates(), on='Product ID')

    return recommendations_df
