##   <center> Business Overview </center>

<center><img src="Logo.png"/></center>
Fire Maul Tools is a firefighter-owned company serving firefighters, tactical teams, and military personnel worldwide by providing innovative tools that were designed for what they do. The company started by producing handmade firefighting tools such as axes and halligan bars, but since then has expanded their product breadth to include items such as tool lube and grip kits. The grip kits are just what they sound like, it is a fiber tape that comes with a lacing ring system that greatly improves grip on the tools. The item has proved to be very popular and the business owner wants to make sure he is getting the most out of the product while keeping his cost down. That is why Fire Maul Tools has tasked me with forecasting the future demand of the grip kits. This way the company can plan its purchasing and inventory levels more effeciently which will lead to lower costs associated with excessive inventory or stockouts. By following the Data Science methodology and utilizing time-series machine learning techniques, I plan to accomplish this task and help Fire Maul Tools grow as a company so that they can continue to serve First Responders.

**** 

## <center> Data Understanding </center>
 To complete this task, I have been given a dataset with the needed information that will be used with our models. Again, the standard Data Science methodology will be followed:
 -  Obtain the Data
 -  Clean the Data
 -  Exploration
 -  Model
 -  Interpret

We will now begin by importing the data and python libraries we will need in the project. Following that we will take an initial glance at the dataset to see what it contains and what format it is in. We will also need to address things like missing values and unnecessary columns. The idea is that after performing these cleaning steps, we will have a better understanding of the information in the table and that will lead to creating better model outputs.


In [73]:
# Importing libraries 

import pandas as pd
import numpy as np


# Importing dataset
df = pd.read_csv('/Users/natashawyatt/Documents/Flatiron_school/capstone/Wraps.csv')




In [139]:
# Taking a look at the first few rows...
df.head()

Unnamed: 0.1,Unnamed: 0,Date,Transaction Type,Num,Customer,Memo/Description,Qty,Sales Price,Amount,Balance
0,FireWrap Grip Kit - Light Blue,,,,,,,,,
1,FireWrap Grip Kit - Light Blue,03/23/2018,Sales Receipt,#1394,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,24.95
2,FireWrap Grip Kit - Light Blue,04/26/2018,Sales Receipt,#1477,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,49.9
3,FireWrap Grip Kit - Light Blue,04/27/2018,Sales Receipt,#1511,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,74.85
4,FireWrap Grip Kit - Light Blue,05/14/2018,Sales Receipt,#1617,,FireWrap Grip Kit,1.0,34.95,34.95,109.8


In [140]:
# Checking the columns, Dtype, and number of rows..
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5801 entries, 0 to 5800
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        5799 non-null   object 
 1   Date              5761 non-null   object 
 2   Transaction Type  5761 non-null   object 
 3   Num               5761 non-null   object 
 4   Customer          0 non-null      float64
 5   Memo/Description  5558 non-null   object 
 6   Qty               5780 non-null   float64
 7   Sales Price       5747 non-null   float64
 8   Amount            5780 non-null   object 
 9   Balance           5761 non-null   object 
dtypes: float64(3), object(7)
memory usage: 453.3+ KB


In [185]:
# Seeing how many missing values are in each column
df.isna().sum()

Product_ID             2
Date                  40
Transaction Type      40
Num                   40
Customer            5801
Memo/Description     243
Qty                   21
Sales Price           54
Amount                21
Balance               40
dtype: int64

In [141]:
# Make sure that index is DatetimeIndex named "date"
# if isinstance(df.index, pd.DatetimeIndex):
#    df.index.name = 'Date'
# else:
#    df.rename(columns={'index':'Date'}, inplace=True)
#    df.set_index('Date', inplace=True)
    
# df

### Data Preparation
After just a quick look at the dataset we see there are 10 rows of which almost all of them are 'object' which means they are not in numerical form, the Non-Null Count also varies between column. The 'Unnamed : 0' column looks to be the product type and the 'Customer' column was cleared previously so no personal information would be shared. We will continue to look at each column individually in an effort to understand the data, and clean it in order for it to be used for modelling. For example, we need to understand what is the difference between 'Sales Price', 'Amount', and 'Balance'.

First lets look at the 'Unnamed: 0' column.
****
##### Unnamed: 0

In [175]:
df['Unnamed: 0'].unique()

array(['FireWrap Grip Kit - Light Blue',
       'Total for FireWrap Grip Kit - Light Blue',
       'FireWrap Grip Kit - Pink', 'Total for FireWrap Grip Kit - Pink',
       'FireWrap® Grip Kit Black', 'Total for FireWrap® Grip Kit Black',
       'FireWrap® Grip Kit Blue', 'Total for FireWrap® Grip Kit Blue',
       'FireWrap® Grip Kit GLOW - Aqua',
       'Total for FireWrap® Grip Kit GLOW - Aqua',
       'FireWrap® Grip Kit GLOW - Green ( 927 )',
       'Total for FireWrap® Grip Kit GLOW - Green ( 927 )',
       'FireWrap® Grip Kit Orange', 'Total for FireWrap® Grip Kit Orange',
       'FireWrap® Grip Kit Red', 'Total for FireWrap® Grip Kit Red',
       'FireWrap® Grip Kit Yellow', 'Total for FireWrap® Grip Kit Yellow',
       'FireWrap® Grip Kit Green', 'Total for FireWrap® Grip Kit Green',
       'FireWrap® Grip Kit White', 'Total for FireWrap® Grip Kit White',
       'TOTAL', nan,
       'Wednesday, Jan 11, 2023 10:03:05 AM GMT-8 - Accrual Basis'],
      dtype=object)

#### Product names
We see that there are a few issues here. First there is the product name for each available color and also the total sales for that product are both in this column. They should be separated to make it easier to decipher in the pandas dataframe. Also I know that 'FWGK-SC-BLK' is the same product as 'FireWrap® Grip Kit Black', the reason for this is just the way they were logged into the company's system. I will attempt to combine the corresponding colors appropriately, but this does lead us to a question. Does the color of the Grip Kit have an impact on sales? The answer to this question is something that we will try to find in our EDA phase. 
For now we will begin by renaming the column and then see if there is a way to simplify the product names.

In [181]:
#Changing column name to Product Id:
df= df.rename(columns={"Unnamed: 0": "Product_ID"})
df.head(10)

Unnamed: 0,Product_ID,Date,Transaction Type,Num,Customer,Memo/Description,Qty,Sales Price,Amount,Balance
0,FireWrap Grip Kit - Light Blue,,,,,,,,,
1,FireWrap Grip Kit - Light Blue,03/23/2018,Sales Receipt,#1394,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,24.95
2,FireWrap Grip Kit - Light Blue,04/26/2018,Sales Receipt,#1477,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,49.9
3,FireWrap Grip Kit - Light Blue,04/27/2018,Sales Receipt,#1511,,FireWrap Grip Kit - Pre-Order,1.0,24.95,24.95,74.85
4,FireWrap Grip Kit - Light Blue,05/14/2018,Sales Receipt,#1617,,FireWrap Grip Kit,1.0,34.95,34.95,109.8
5,FireWrap Grip Kit - Light Blue,06/22/2018,Invoice,1133,,,4.0,26.0,104.0,213.8
6,FireWrap Grip Kit - Light Blue,10/09/2018,Sales Receipt,#1895,,FireWrap Grip Kit,1.0,34.95,34.95,248.75
7,FireWrap Grip Kit - Light Blue,11/30/2018,Sales Receipt,#2045,,FireWrap Grip Kit,1.0,34.95,34.95,283.7
8,FireWrap Grip Kit - Light Blue,12/30/2018,Sales Receipt,#2149,,FireWrap Grip Kit,1.0,34.95,34.95,318.65
9,FireWrap Grip Kit - Light Blue,01/23/2019,Invoice,1257,,,1.0,30.0,30.0,348.65


*** 
The column has been renamed, but I still want to work on this one a little more. My assumption is that the colors of the kits will not matter in the long run when it comes to the models. But to act on that assumption I will look more into it during the EDA phase. Right now I am going to continue to clean the DataFrame by dropping some columns and addressing the Nan's.
The columns that I am going to drop now are:
-  Num: A number assigned to each order, will not be useful going forward
-  Customer: These rows were already emptied so that no personal information would be shared
-  Memo/Description: These rows do not contain anything of value for us so it can be removed
-  Balance: According to the business owner this was a tally kept for customers that depended on how they paid, for example, in-person or through the website, and will not help our models

I have also clarified with the business owner that 'Amount' is simply the 'Sales Price' times the quantity of products sold. We will include these columns through our EDA phase and then decide the best way to move forward with the models. 



In [153]:
# Dropping the 4 listed columns...
df.drop(columns= ['Num','Customer', 'Memo/Description', 'Balance'], inplace = True )

In [155]:
df.head(25)

Unnamed: 0,Product ID,Date,Transaction Type,Qty,Sales Price,Amount
0,FireWrap Grip Kit - Light Blue,,,,,
1,FireWrap Grip Kit - Light Blue,03/23/2018,Sales Receipt,1.0,24.95,24.95
2,FireWrap Grip Kit - Light Blue,04/26/2018,Sales Receipt,1.0,24.95,24.95
3,FireWrap Grip Kit - Light Blue,04/27/2018,Sales Receipt,1.0,24.95,24.95
4,FireWrap Grip Kit - Light Blue,05/14/2018,Sales Receipt,1.0,34.95,34.95
5,FireWrap Grip Kit - Light Blue,06/22/2018,Invoice,4.0,26.0,104.00
6,FireWrap Grip Kit - Light Blue,10/09/2018,Sales Receipt,1.0,34.95,34.95
7,FireWrap Grip Kit - Light Blue,11/30/2018,Sales Receipt,1.0,34.95,34.95
8,FireWrap Grip Kit - Light Blue,12/30/2018,Sales Receipt,1.0,34.95,34.95
9,FireWrap Grip Kit - Light Blue,01/23/2019,Invoice,1.0,30.0,30.00


***
This is already looking a lot cleaner. Less columns makes it much easier to look at and get a good understanding of the information inside the table. There is still the issue of Nan's being present in the dataset and we will look into those now.

In [68]:
# WIP

#def merge_rename_products(df, product_map, rename_to):
    # replace product Ids with new names
#    df['Product ID'] = df['Product ID'].replace(product_map)
    # group by the new product names
#    df = df.groupby(['Product ID']).sum()
    # rename the column
#    df.rename(columns={'Product ID':rename_to}, inplace=True)
#    return df


In [69]:
# WIP
# product_map = {}
# rename_to = ''

# df = merge_rename_products(df, product_map, rename_to)


In [112]:
df['Transaction Type'].unique()

array([nan, 'Sales Receipt', 'Invoice', 'Refund', 'Credit Memo'],
      dtype=object)

In [113]:
df['Transaction Type'].value_counts()

Sales Receipt    4589
Invoice          1164
Refund              7
Credit Memo         1
Name: Transaction Type, dtype: int64

In [114]:
df['Num'].unique()

array([nan, '#1394', '#1477', ..., 'M-072022-001', 'M-090722-003',
       'M-092522-001'], dtype=object)

#### Work Flow
Clean dataframe down to essential columns:
-  Date
-  Product ID - colors matter?
-  Transaction Type - Sales Receipt
-  QTY
-  Sales price -Are all prices the same?

In [190]:
sales=df['Sales Price'].sort_values(ascending = False)
sales.value_counts()

24.95    1247
27.95    1102
34.95     762
24.99     632
37.95     400
21.75     313
18.75     313
40.95     158
47.95     146
28.95     127
30.00     115
19.99      89
28.50      69
22.95      47
31.50      43
26.00      32
41.95      27
28.00      16
41.00      14
32.99      14
16.50      13
36.00      12
25.00      12
20.00      11
29.00       8
35.99       8
0.00        4
24.00       4
15.00       4
32.00       2
37.90       1
12.95       1
34.00       1
Name: Sales Price, dtype: int64

df.head()

### Recommendations



However, it's important to keep in mind that forecasting is not always accurate and unexpected events can happen, so it's important to have a buffer inventory to handle unexpected demand or supply chain disruptions. Additionally, other factors such as pricing, marketing, and competition can affect the demand for the product, so it's important to consider these factors in your inventory management strategy.