# EDA Muesli Company

### Business case
A Muesli distribution company has approached you to help them understand their delivery process. They want to develop KPIs to help them keep track of the health of their business in order to improve the service they offer their customers.

### Workflow
maybe add more whitespace on the edges and a internal/external heading on the side
![workflow](images/Muesli_Flow.drawio.png)
![workflow](./images/workflow.png)

In [2]:
# import the necessary libraries you need for your analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Floats (decimal numbers) should be displayed rounded with 2 decimal places
pd.options.display.float_format = "{:,.2f}".format
# Set style for plots
plt.style.use('fivethirtyeight') 

## Orders-Dataset

In [4]:
# read in csv file and display first 5 rows of #!CAMPAIGN DATA
df_orders = pd.read_csv("data/Group_1_Muesli Project_raw_data-Campaign_Data.csv")
df_orders.head()

Unnamed: 0,Order ID,Arrival Scan Date,Customer Name
0,CA-2019-109666,03/05/2019,Kunst Miller
1,CA-2019-138933,03/05/2019,Jack Lebron
2,CA-2019-130001,03/05/2019,Heather Kirkland
3,CA-2019-113061,06/05/2019,Ed Ludwig
4,CA-2019-162138,06/05/2019,Grace Kelly


In [5]:
# check which columns are included in our dataframe
df_orders.columns

Index(['Order ID', 'Arrival Scan Date', 'Customer Name'], dtype='object')

In [6]:
# Let's have a look at the shape of our dataset, meaning how long and wide it is.
df_orders.shape

(333, 3)

In [7]:
# We now want to check out our data-types as well as get a feeling for possible missing values
df_orders.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Order ID           333 non-null    object
 1   Arrival Scan Date  333 non-null    object
 2   Customer Name      333 non-null    object
dtypes: object(3)
memory usage: 7.9+ KB


## Campaign-Dataset

In [8]:
# read in csv file and display first 5 rows of #!CAMPAIGN DATA
df_camp = pd.read_csv("data/Group_1_Muesli Project_raw_data-Campaign_Data.csv")
df_camp.head()

Unnamed: 0,Order ID,Arrival Scan Date,Customer Name
0,CA-2019-109666,03/05/2019,Kunst Miller
1,CA-2019-138933,03/05/2019,Jack Lebron
2,CA-2019-130001,03/05/2019,Heather Kirkland
3,CA-2019-113061,06/05/2019,Ed Ludwig
4,CA-2019-162138,06/05/2019,Grace Kelly


In [9]:
# check which columns are included in our dataframe
df_camp.columns

Index(['Order ID', 'Arrival Scan Date', 'Customer Name'], dtype='object')

In [10]:
# Let's have a look at the shape of our dataset, meaning how long and wide it is.
df_camp.shape

(333, 3)

In [11]:
# We now want to check out our data-types as well as get a feeling for possible missing values
df_camp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Order ID           333 non-null    object
 1   Arrival Scan Date  333 non-null    object
 2   Customer Name      333 non-null    object
dtypes: object(3)
memory usage: 7.9+ KB


## Process-Dataset

In [12]:
# read in csv file and display first 5 rows of #!PROCESS
df_process = pd.read_csv("data/Group_1_Muesli_Project_raw_data-Order_Process_Data.csv")
df_process.head()

Unnamed: 0,Row ID,Order ID,Order Date,On Truck Scan Date,Ship Mode
0,3074,CA-2019-125206,3/1/2019,07/01/2019,Express
1,4919,CA-2019-160304,2/1/2019,09/01/2019,Standard Processing
2,4920,CA-2019-160304,2/1/2019,09/01/2019,Standard Processing
3,8604,US-2019-116365,3/1/2019,09/01/2019,Standard Processing
4,8605,US-2019-116365,3/1/2019,09/01/2019,Standard Processing


In [13]:
# check which columns are included in our dataframe
df_process.columns

Index(['Row ID', 'Order ID', 'Order Date', 'On Truck Scan Date', 'Ship Mode'], dtype='object')

In [14]:
# Let's have a look at the shape of our dataset, meaning how long and wide it is.
df_process.shape

(5899, 5)

In [15]:
# We now want to check out our data-types as well as get a feeling for possible missing values
df_process.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5899 entries, 0 to 5898
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Row ID              5899 non-null   int64 
 1   Order ID            5899 non-null   object
 2   Order Date          5899 non-null   object
 3   On Truck Scan Date  5899 non-null   object
 4   Ship Mode           5899 non-null   object
dtypes: int64(1), object(4)
memory usage: 230.6+ KB


## Intern Data Study

In [16]:
# read in csv file and display first 5 rows of #!INTERN DATA STUDY
df_intern = pd.read_csv("data/Group_1_Muesli_Project_raw_data-Intern_Data_Study.csv")
df_intern.head()

Unnamed: 0,Order ID,Ready to Ship Date,Pickup Date
0,CA-2019-116540,02/09/2019,03/09/2019
1,CA-2019-116540,02/09/2019,03/09/2019
2,CA-2019-129847,04/09/2019,04/09/2019
3,CA-2019-129630,04/09/2019,04/09/2019
4,CA-2019-106278,05/09/2019,06/09/2019


In [17]:
# check which columns are included in our dataframe
df_intern.columns

Index(['Order ID', 'Ready to Ship Date', 'Pickup Date'], dtype='object')

In [18]:
# Let's have a look at the shape of our dataset, meaning how long and wide it is.
df_intern.shape

(290, 3)

In [19]:
# We now want to check out our data-types as well as get a feeling for possible missing values
df_intern.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 290 entries, 0 to 289
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Order ID            290 non-null    object
 1   Ready to Ship Date  290 non-null    object
 2   Pickup Date         290 non-null    object
dtypes: object(3)
memory usage: 6.9+ KB
