### Author: Murat C Koc
### Objective: Calculate Shopify Sneaker Stores AOV
### Challenge: Shopify Data Science Internship - Fall 2021

#### Question 1:
On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis. 

- Think about what could be going wrong with our calculation. Think about a better way to evaluate this data. 
- What metric would you report for this dataset?
- What is its value?

#### Solution:
#### Step 1:
- Check data quality, make necessary changes
#### Step 2:
- Check outliers for better analysis

In [16]:
# Libraries
import pandas as pd
from pandas import datetime
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from fbprophet import Prophet

In [17]:
# Read csv into Dataframe
shoe_df = pd.read_csv('https://raw.githubusercontent.com/MuratCKoc/Fall_2021_DataScience_Intern_Challenge/main/data/shopify_sneakers.csv')
shoe_df.head()

Unnamed: 0,order_id,shop_id,user_id,order_amount,total_items,payment_method,created_at
0,1,53,746,224,2,cash,2017-03-13 12:36:56
1,2,92,925,90,1,cash,2017-03-03 17:38:52
2,3,44,861,144,1,cash,2017-03-14 4:23:56
3,4,18,935,156,1,credit_card,2017-03-26 12:43:37
4,5,18,883,156,1,credit_card,2017-03-01 4:35:11


#### Step 1: Check data quality

In [27]:
# Check Null values
if shoe_df.isnull().values.any() == 'False':
    print('NULL DETECTED')
else:
    print('Null check passed! No Null values Detected')

Null check passed! No Null values Detected


In [29]:
# Check Data types
shoe_df.dtypes

order_id           int64
shop_id            int64
user_id            int64
order_amount       int64
total_items        int64
payment_method    object
created_at        object
dtype: object

In [30]:
# Convert date types -> created_at
shoe_df['created_at'] = pd.to_datetime(shoe_df['created_at'])
shoe_df.dtypes

order_id                   int64
shop_id                    int64
user_id                    int64
order_amount               int64
total_items                int64
payment_method            object
created_at        datetime64[ns]
dtype: object

#### Step 2: Inspect the data

In [31]:
# Lets take a quick look at order_amounts to figure out
shoe_df.order_amount.describe()

count      5000.000000
mean       3145.128000
std       41282.539349
min          90.000000
25%         163.000000
50%         284.000000
75%         390.000000
max      704000.000000
Name: order_amount, dtype: float64