<h1 align="center" style="color:blue">Finding Mean and Standard Deviation</h1>

<p>Read the file located at <i>data_sets/purchases.txt</i> and find out the <i>mean, standard deviation and sum</i> of the sales per weekday.</p>

----

<h2 style="color:blue">Variables</h2>

<br>

<div style="border: 2px solid black; border-radius: 2px; padding-left: 20px">
    # \ Date<br>
    # \ Time<br>
    # \ Store<br>
    # \ Product<br>
    # \ Value<br>
    # \ Bank<br>
</div>

In [12]:
from datetime import datetime
import pandas as pd

In [13]:
# Dataset Directory and values to find
dataset_dir = 'data_sets/purchases.txt'

weekday_map = {0 : 'Monday', 1 : 'Tuesday', 2 : 'Wednesday', 3 : 'Thursday', 
               4 : 'Friday', 5 : 'Saturday', 6 : 'Sunday'}
results = None

----

In [14]:
# Reading
df = pd.read_csv(dataset_dir, sep='\t', header=None, names=['Date', 'Time', 'Store', 'Product', 'Value', 'Bank'])
df

Unnamed: 0,Date,Time,Store,Product,Value,Bank
0,2012-01-01,09:00,San Jose,Men's Clothing,214.05,Amex
1,2012-01-01,09:00,Fort Worth,Women's Clothing,153.57,Visa
2,2012-01-01,09:00,San Diego,Music,66.08,Cash
3,2012-01-01,09:00,Pittsburgh,Pet Supplies,493.51,Discover
4,2012-01-01,09:00,Omaha,Children's Clothing,235.63,MasterCard
...,...,...,...,...,...,...
4138471,2012-12-31,17:59,Albuquerque,Toys,345.70,MasterCard
4138472,2012-12-31,17:59,Rochester,DVDs,399.57,Amex
4138473,2012-12-31,17:59,Greensboro,Baby,277.27,Discover
4138474,2012-12-31,17:59,Arlington,Women's Clothing,134.95,MasterCard


In [15]:
# Filtering to use just the 'Date' and 'Value' columns
df = df.drop(df.iloc[:, 1:4], axis=1)
df = df.drop(df.iloc[:, 2:], axis=1)
df

Unnamed: 0,Date,Value
0,2012-01-01,214.05
1,2012-01-01,153.57
2,2012-01-01,66.08
3,2012-01-01,493.51
4,2012-01-01,235.63
...,...,...
4138471,2012-12-31,345.70
4138472,2012-12-31,399.57
4138473,2012-12-31,277.27
4138474,2012-12-31,134.95


In [16]:
# Checking out if there are any NaN values
df.isnull().sum()

Date     0
Value    0
dtype: int64

----

In [17]:
# Function to conver Date to weekday
#
# 0 >> Monday
# 1 >> Tuesday
# 2 >> Wednesday
# 3 >> Thursday
# 4 >> Friday
# 5 >> Saturday
# 6 >> Sunday
def convert_date_to_weekday(date):
    weekday_code = datetime.strptime(date, '%Y-%m-%d').weekday()
    return weekday_map[weekday_code]

In [18]:
# Transforming each date in weekday
df['Date'] = df['Date'].apply(convert_date_to_weekday)
df

Unnamed: 0,Date,Value
0,Sunday,214.05
1,Sunday,153.57
2,Sunday,66.08
3,Sunday,493.51
4,Sunday,235.63
...,...,...
4138471,Monday,345.70
4138472,Monday,399.57
4138473,Monday,277.27
4138474,Monday,134.95


In [19]:
# Changing 'Date' Column name to 'Day'
df.rename(columns={'Date':'Weekday'}, inplace=True)
df

Unnamed: 0,Weekday,Value
0,Sunday,214.05
1,Sunday,153.57
2,Sunday,66.08
3,Sunday,493.51
4,Sunday,235.63
...,...,...
4138471,Monday,345.70
4138472,Monday,399.57
4138473,Monday,277.27
4138474,Monday,134.95


In [20]:
# Getting mean and standard deviation
# of the sales by weekday
results = df['Value'].groupby(df['Weekday'])

In [21]:
# Mean of the sales per weekday
print('Mean: ', results.mean())

Mean:  Weekday
Friday       250.223089
Monday       250.009331
Saturday     250.084703
Sunday       249.946443
Thursday     249.872024
Tuesday      249.738228
Wednesday    249.851167
Name: Value, dtype: float64


In [22]:
# Standard Deviation of the sales per weekday
print('Standard Deviation: ', results.std())

Standard Deviation:  Weekday
Friday       144.367224
Monday       144.321291
Saturday     144.401330
Sunday       144.330682
Thursday     144.339287
Tuesday      144.205279
Wednesday    144.255367
Name: Value, dtype: float64


In [23]:
# Sum of the sales per weekday
print('Sum: ', results.sum())

Sum:  Weekday
Friday       1.474149e+08
Monday       1.503641e+08
Saturday     1.474102e+08
Sunday       1.502968e+08
Thursday     1.473538e+08
Tuesday      1.472467e+08
Wednesday    1.443715e+08
Name: Value, dtype: float64
