---- OUTLINE ----

The purpose of this notebook is to examine and optimize inventory management, increasing profitability. The following list is an outline of how this analysis will be conducted.

1.) Data Preprocessing and Cleaning

2.) Exploratory Data Analysis (EDA)

3.) Feature Engineering

4.) Data Analysis and Modeling

5.) Model Evaluation and Interpretation

6.) Recommendations and Action Plan

7.) Presentation and Visualization

---- 1.) Data Preprocessing and Cleaning ----

In [1]:
# Import libraries needed for Profitability Analysis
import pandas as pd                         
import matplotlib.pyplot as plt

In [2]:
# Use pandas to read the file and read the head
data = pd.read_csv('Sample - Superstore.csv')
data.head(3)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714


---- Duplicates & Missing Values ----

In [3]:
print(data.isnull().sum())
print("\n")
print("There are",data.isnull().sum().sum(),"Null (missing) values in this data set")
print("There are",data.duplicated().sum(),"Duplicates in this data set")

Row ID           0
Order ID         0
Order Date       0
Ship Date        0
Ship Mode        0
Customer ID      0
Customer Name    0
Segment          0
Country          0
City             0
State            0
Postal Code      0
Region           0
Product ID       0
Category         0
Sub-Category     0
Product Name     0
Sales            0
Quantity         0
Discount         0
Profit           0
dtype: int64


There are 0 Null (missing) values in this data set
There are 0 Duplicates in this data set


---- Irrelevant or Redundant Features ----

Delete columns "Row ID", "Order ID", "Ship Mode", "Customer Name", "Country", "Region" as they hold no relevance towards inventory management analysis.

In [4]:
data = data.drop(["Row ID", "Order ID", "Ship Mode", "Customer Name", "Country", "Region", "Ship Date"], axis=1)
# Check head to ensure columns were deleted
data.head(3)

Unnamed: 0,Order Date,Customer ID,Segment,City,State,Postal Code,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,11/8/2016,CG-12520,Consumer,Henderson,Kentucky,42420,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,11/8/2016,CG-12520,Consumer,Henderson,Kentucky,42420,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,6/12/2016,DV-13045,Corporate,Los Angeles,California,90036,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714


---- Data Formatting and Conversion ----

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Order Date    9994 non-null   object 
 1   Customer ID   9994 non-null   object 
 2   Segment       9994 non-null   object 
 3   City          9994 non-null   object 
 4   State         9994 non-null   object 
 5   Postal Code   9994 non-null   int64  
 6   Product ID    9994 non-null   object 
 7   Category      9994 non-null   object 
 8   Sub-Category  9994 non-null   object 
 9   Product Name  9994 non-null   object 
 10  Sales         9994 non-null   float64
 11  Quantity      9994 non-null   int64  
 12  Discount      9994 non-null   float64
 13  Profit        9994 non-null   float64
dtypes: float64(3), int64(2), object(9)
memory usage: 1.1+ MB


----

---- 2.) Exploratory Data Analysis (EDA) ----

Explore each column within the data set and use data visualization to display information.