# Statistics Challenge (Optional)

Use the `orders.csv` dataset in the same directory to complete this challenge.

**Background**:

There are exactly 100 sneaker shops on a sneaker retailing website, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis. 

**Questions**:

- What went wrong with this metric and our analysis? 

- Propose some new metrics that better represents the behavior of the stores' customers. Why are these metrics better? You can propose as many new metrics as you wish but quality heavily outweights quantity.

- Find the values of your new metrics.

- Report any other interesting findings.

Show all of your work in this notebook.

In [144]:
# library import

import numpy as np, pandas as pd, matplotlib.pyplot as mp

In [145]:
# getting a wrong AOV of $3145.13

df = pd.read_csv("orders.csv")
df.dropna()

wrong_AOV = df['order_value'].mean()
print(wrong_AOV, '\n')

print("Simply getting the mean of order value to get the average price of one sneaker is wrong since some orders")
print("have had multiple items than one")

3145.128 

Simply getting the mean of order value to get the average price of one sneaker is wrong since some orders
have had multiple items than one


In [179]:
# new metrics

# 1. average price per item for each order
# This is better since it gives the average value 'per sneaker' for each order
df['avg_item_value'] = df['order_value']/df['total_items']

# 2. ordered times of each item
# classifying the ordered times of items can help analyze customer behavior with respect to time
times = []
for index, row in df.iterrows():
    times.append(row['created_at'].split()[1])

df['order_time'] = times

# 3. morning(5 am - 12 pm)/day(12 pm - 5 pm)/evening(5 pm to 9 pm)/night(9 pm to 5 am) category for each customer
# Beneficial to track what time of day customers like to purchase the most
times = []
for index, row in df.iterrows():
    
    # raw time of the order
    time = row['order_time']
    
    # i.e.: if time is before noon
    if(len(time) < 8):
        if (time < '5:00:00'):
            times.append('night')
        else:
            times.append('morning')
    else:
        if (time >= '21:00:00'):
            times.append('night')
        elif (time >= '17:00:00' and time < '21:00:00'):
            times.append('evening')
        else:
            times.append('day')

df['order_time_of_day'] = times

0           day
1       evening
2         night
3           day
4         night
         ...   
4995        day
4996    evening
4997    morning
4998        day
4999        day
Name: order_time_of_day, Length: 5000, dtype: object

In [183]:
# Values of the new metric

print("Average value per sneakers: $", df['avg_item_value'].mean(), '\n')


# Number of users who purchase at morning
mNum = df['order_time_of_day'].value_counts()['morning']
print('Number of users who purchase at morning: ', mNum)

# Number of users who purchase at day
dNum = df['order_time_of_day'].value_counts()['day']
print('Number of users who purchase at day: ', dNum)

# Number of users who purchase at evening
eNum = df['order_time_of_day'].value_counts()['evening']
print('Number of users who purchase at evening: ', eNum)

# Number of users who purchase at night
nNum = df['order_time_of_day'].value_counts()['night']
print('Number of users who purchase at night: ', nNum)

Average value per sneakers: $ 387.7428 

Number of users who purchase at morning:  1028
Number of users who purchase at day:  1432
Number of users who purchase at evening:  812
Number of users who purchase at night:  1728


According to the new metric, the average price value of a sneaker is $ 387.74.

Also, users purchased the sneakers the most at night as measured by around 1728 people purchasing from 9 pm to 5 am in the timeframe measured, followed by day, morning, and evening.

