# **Forecasting Sales with Machine Learning**

## Import the necessary libraries

In [2]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

## Load Data

In [4]:
# read in the data
df = pd.read_csv('sales_data.csv')

df.head()

Unnamed: 0,week,month,year,product_id,advertising_spend,sales
0,1,1,2021,1001,500,1000
1,2,1,2021,1001,400,1200
2,3,1,2021,1001,300,1500
3,4,2,2021,1001,500,2000
4,5,2,2021,1001,400,1800


## Select the features and target variables

In [5]:
# select the features and target variable
X = df[['week', 'month', 'year', 'product_id', 'advertising_spend']]
y = df['sales']

## Split the data into training and testing sets

In [6]:
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Create the model

In [8]:
# create the model
# model = RandomForestRegressor()
model = LinearRegression()

## Train the model

In [9]:
# train the model
model.fit(X_train, y_train)

## Make Predictions on the testing set

In [10]:
# make predictions on the testing set
y_pred = model.predict(X_test)

y_pred

array([2232.14285714, 1282.14285714])

In [11]:
X_test

Unnamed: 0,week,month,year,product_id,advertising_spend
7,8,3,2021,1001,400
1,2,1,2021,1001,400


In [12]:
y_test

7    2200
1    1200
Name: sales, dtype: int64

## Calculate the model's performance

In [14]:
# calculate the model's performance
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absolute Error: {mae: .2f}')

Mean Absolute Error:  57.14


A MAE of 57.14 means that, on average, the model's predictions were off by 57.14 units of sales. A lower MAE would be a better fit of the model and a higher accuracy of the predictions. In general, a MAE of 57.14, compared to the random forest's model score of 125.5, is considered to be very good.

The entire code reads in a CSV file containing sales data for a manufacturing company, selects the relevant features and target variable, and splits the data into training and testing sets. It then creates a Linear Regression model and trains it on the training data. Finally, it makes predictions on the testing data and calculates the model's performance using the mean absolute error (MAE) metric.

## Insights

The main insight from this project is that it is possible to use machine learning techniques, such as Random Forest Regression, to predict future product demand based on past sales data. By training a model on historical data and evaluating its performance on a separate testing set, you can get a sense of how well the model is able to make predictions and identify any areas for improvement.

In general, this type of project could be useful for a manufacturing company that wants to better understand and forecast product demand in order to optimize production and inventory management. By using data-driven techniques to predict demand, the company can make more informed decisions about how much of each product to produce and when to produce it, which can help to reduce waste, lower costs, and increase efficiency.