# <a id='0'>Objective:</a>
- Clean up the dataset and explore 
- Start high and drill down: 
    + Qty by Category, Item, Year, Month, Day, time?
    + Gross Sales (Frequency) by Category, Item, Year, Month, Day, time?
- Create visual to break out by these features

In [1]:
# Note: This code was developed using Anaconda 4.5.3 with Python 3.6.5

# <a id='0'>O.S.E.M.N </a> 
- <a href='#1'>1. Obtain Data: Gather Data & Setup up environment</a>
- <a href='#2'>2. Scrub Data: Data Prep</a>
- <a href='#3'> 3. Explore data: EDA / Feature Engineering</a>
- <a href='#4'> 4. Model Data: Model Data for Prediction & Cross Validate</a>
- <a href='#4'> 5. iNterpret Data: Give Insights</a>

## <a id='1'>1. Obtain Data</a>

#### Import libraries & load dataset

In [2]:
import pandas as pd
import numpy as np
import os, sys

In [3]:
# Creating variables for the path of file
path = '/users/kevin8523/desktop/github/coffee_consulting/data/'
filename = 'All.txt'
filepath = f'{path}{filename}'

In [95]:
# Read in the data **NOTE: Need to set data types
df = pd.read_csv(filepath, sep='\t', header=0, 
                 dtype={'Gross Sales': np.float64, 
                        'Discounts': np.float64,
                        'Net Sales': np.float64,
                        'Tax': np.float64,},
                 encoding='latin-1', low_memory = False)

#### Quick Exploration of the data

In [111]:
# Shape of the data
print('\033[1m'+'DATASET','(ROWS, COLUMNS)'+'\033[0m')
print('df',df.shape)

[1mDATASET (ROWS, COLUMNS)[0m
df (447916, 28)


In [102]:
# df.head(), df.tail(),df.info(),df.describe()
df.head(2)

Unnamed: 0,Date,Time,Time Zone,Category,Item,Qty,Price Point Name,SKU,Modifiers Applied,Gross Sales,...,Event Type,Location,Dining Option,Customer ID,Customer Name,Customer Reference ID,Year,Month,Week,Day
0,2014-01-02,07:34:21,Central Time (US & Canada),1 Hot Drinks,Brew Coffee,1,16 oz,,"Dark, To Go",2.39,...,Payment,Mazama Coffee Co,,,,,2014,1,1,Thursday
1,2014-01-02,07:49:27,Central Time (US & Canada),3 Bakery,Muffin Regular,1,Morning Glory,,,2.29,...,Payment,Mazama Coffee Co,,,,,2014,1,1,Thursday


## <a id='2'>2. Scrub Data</a>

#### Changing the Dtypes for the datetime columns


In [98]:
df['Date'] = pd.to_datetime(df['Date'])
df['Time'] = pd.to_datetime(df['Time'], format ='%H:%M:%S').dt.time

#### Feature Engineer
Pandas doc: https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties


In [100]:
# Date Column
df['Year'] = df.Date.dt.year # Extracts year
df['Month'] = df.Date.dt.month # Extracts month
df['Week'] = df.Date.dt.week # Extracts week
df['Day'] = df.Date.dt.weekday_name # Extracts day

In [105]:
df.head(2)

Unnamed: 0,Date,Time,Time Zone,Category,Item,Qty,Price Point Name,SKU,Modifiers Applied,Gross Sales,...,Event Type,Location,Dining Option,Customer ID,Customer Name,Customer Reference ID,Year,Month,Week,Day
0,2014-01-02,07:34:21,Central Time (US & Canada),1 Hot Drinks,Brew Coffee,1,16 oz,,"Dark, To Go",2.39,...,Payment,Mazama Coffee Co,,,,,2014,1,1,Thursday
1,2014-01-02,07:49:27,Central Time (US & Canada),3 Bakery,Muffin Regular,1,Morning Glory,,,2.29,...,Payment,Mazama Coffee Co,,,,,2014,1,1,Thursday


In [118]:
# Reorder columns
list(df.columns.values)
df = df[['Date',
 'Year',
 'Month',
 'Week',
 'Day',
 'Time',
 'Time Zone',
 'Category',
 'Item',
 'Qty',
 'Price Point Name',
 'SKU',
 'Modifiers Applied',
 'Gross Sales',
 'Discounts',
 'Net Sales',
 'Tax',
 'Transaction ID',
 'Payment ID',
 'Device Name',
 'Notes',
 'Details',
 'Event Type',
 'Location',
 'Dining Option',
 'Customer ID',
 'Customer Name',
 'Customer Reference ID']]

In [119]:
df.head(2)

Unnamed: 0,Date,Year,Month,Week,Day,Time,Time Zone,Category,Item,Qty,...,Payment ID,Device Name,Notes,Details,Event Type,Location,Dining Option,Customer ID,Customer Name,Customer Reference ID
0,2014-01-02,2014,1,1,Thursday,07:34:21,Central Time (US & Canada),1 Hot Drinks,Brew Coffee,1,...,DNYN5K3JNQXS3,Vicky Ipad,,https://squareup.com/dashboard/sales/transacti...,Payment,Mazama Coffee Co,,,,
1,2014-01-02,2014,1,1,Thursday,07:49:27,Central Time (US & Canada),3 Bakery,Muffin Regular,1,...,2F35P9MRY4YYJ,Vicky Ipad,,https://squareup.com/dashboard/sales/transacti...,Payment,Mazama Coffee Co,,,,


## <a id='3'>3. Explore Data</a>