# Forecasting Product Sales

In this notebook we will be using the Northwind Trading database to create a model to forecast sales data. This will allow us to provide accurate predictions for profits and inventory control.

In [1]:
# Modules
import pandas as pd
import psycopg2
import matplotlib.pyplot as plt
import os
from sqlalchemy import create_engine
%matplotlib inline

In [2]:
# Connection info for code readability
conn_info = {
    'dbname': 'northwind',
    'user': 'postgres',
    'password': os.getenv('DB_PASSWORD'),
    'host': 'localhost',
    'port': '5432'
}

password = conn_info['password']
engine = create_engine(f'postgresql://postgres:{password}@localhost/northwind')

## Transforming the data

First we need to pull from the database the needed data. We will need order details such as:
- Product Id
- Product Name (not part of the model but will be used for visuals later)
- Order Date
- Units sold

As the database has inconsistency with the price of units over time changes, we will only apply the monetary values after we have constructed a forecasting model based on product movement, than apply the most up to date pricing. This will also enable the model to show how future profits will change as we change prices of certain items.

Now lets pull the needed data from the database.

In [3]:
query = """
SELECT p.product_id, p.product_name, od.quantity, o.order_date
FROM products p
JOIN order_details od ON p.product_id = od.product_id
JOIN orders o ON od.order_id = o.order_id
ORDER BY o.order_date;
"""
engine = create_engine(f'postgresql://postgres:{conn_info['password']}@localhost/northwind')

orders = pd.read_sql_query(query, engine)



Now we can transform the data slightly to work with our model. The only changes we have to make is to convert the `order_date` column to a datetime type and create a `week_start_date` column to use to aggregate the data.

In [6]:
# Convert order_date to datetime
orders['order_date'] = pd.to_datetime(orders['order_date'])

# Create a week_start_date column
orders['week_start_date'] = orders['order_date'].dt.to_period('W').apply(lambda r: r.start_time)

Now lets aggregate the data by `product_id` and `week_start_date` to get the total quantity of each product sold each week.

In [10]:
orders_grouped = orders.groupby(['week_start_date', 'product_id'])['quantity'].sum().reset_index()

Now we have the total sales for each product each week, however we are missing 0 values. We still want to have an entry for each product each week so we will transform the into a 'wide' format, fill in missing values with 0, then transform the data back into 'long' format.

In [20]:
orders_pivot = orders_grouped.pivot(index='product_id', columns='week_start_date', values='quantity').fillna(0)
orders_long = orders_pivot.stack().reset_index()
# Set column names to the requirements for the training library
orders_long = orders_long.rename(columns={'week_start_date': 'ds', 0: 'y', 'product_id': 'unique_id'})

## Creating the forecasting model