# Google Data Analytics Capstone Case Study 2: How can a wellness technology company play it smart 

## PHASE 1: ASK
### About [Bellabeat](beallabeat.com)
Bellabeat is the go-to wellness brand for women with an ecosystem of products and services focused on women’s health

### Business Task
* Analyze smart device usage data (Fitbit)
* Gain insights on how consumers use smart devices 
* Provice recommendations for marketing strategy of a Bellabeat product (Bellabeat App)

### Key Stakeholders
* Urška Sršen (Co-founder and Chief Creative Officer)
* Sandro Mur (Co-founder and Chief Executive Officer)
* Marketing analytics team

## PHASE 2 & 3: PREPARE & PROCESS
### Data Source: [Fitbit Fitness Tracker Data](https://www.kaggle.com/datasets/arashnic/fitbit)
* Personal tracker data of thirty Fitbit users
  * Minute-level output for physical activity, heart rate, and sleep monitoring
  * Information on daily activity, hourly activity, steps and heart rate
* Date Range: 12 March 2016 to 12 May 2016
* 18 CSV documents on different quantitative data tracked by Fitbit

### Import Libraries & Data

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import glob
import seaborn as sns
import matplotlib.pyplot as plt

# Import Data
path = '../input/fitbit/Fitabase Data 4.12.16-5.12.16'
filenames = glob.glob(path+ "/*.csv")
li = []

for filename in filenames:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)
    
def get_table_name(full_path_list):
    '''Returns name of csv file with no extension'''
    tablenames = []
    for table in full_path_list:
        table = table.split('/')[-1].split('.')[0].split('_')[0]
        tablenames.append(table)
    return tablenames  

tablesnames = get_table_name(filenames)
df = {}
for i in range(0,len(tablesnames)):
    df[tablesnames[i]] = filenames[i]

for k,v in df.items():
    df[k] = pd.read_csv(v)
    

In [None]:
# Daily: dailyActivity_merged have data of dailyCalories_merged, dailyIntensities_merged, dailySteps_merged

df['dailyActivity']['ActivityDate']= pd.to_datetime(df['dailyActivity']['ActivityDate'])
df['sleepDay']['SleepDay']=pd.to_datetime(df['sleepDay']['SleepDay'])
daily_df = pd.merge(df['dailyActivity'], df['sleepDay'], 
                    how='left', left_on=['Id', 'ActivityDate'], right_on=['Id', 'SleepDay'])
daily_df.drop_duplicates(keep='first',inplace=True)

# Remove rows that have TotalDistance = 0, assuming that FitBit was not wore on that day
daily_df = daily_df.drop(daily_df[(daily_df['TotalDistance'] == 0)].index)

daily_df.rename(columns={'ActivityDate':'date'},inplace=True)
daily_df.columns= daily_df.columns.str.strip().str.lower()
daily_df.drop(columns='sleepday',inplace=True)

# Hourly
df['hourlyCalories']['ActivityHour'] = pd.to_datetime(df['hourlyCalories']['ActivityHour'])
df['hourlyIntensities']['ActivityHour'] = pd.to_datetime(df['hourlyIntensities']['ActivityHour'])
df['hourlySteps']['ActivityHour'] = pd.to_datetime(df['hourlySteps']['ActivityHour'])

hourly_df = pd.merge(df['hourlyCalories'], df['hourlyIntensities'], 
                    how='left', on=['Id', 'ActivityHour'])
hourly_df = pd.merge(hourly_df, df['hourlySteps'], 
                    how='left', on=['Id', 'ActivityHour'])
hourly_df.drop_duplicates(keep='first',inplace=True)
hourly_df.rename(columns={'ActivityDate':'date'},inplace=True)
hourly_df.columns= hourly_df.columns.str.strip().str.lower()

In [None]:
daily_df.head()

In [None]:
hourly_df.head()

In [None]:
df['heartrate'].nunique()

In [None]:
df['weightLogInfo'].nunique()

**Notes**
* As minute level data is summarized in hourly and daily level data, it will be excluded from the analysis 
* Due to small sample size (8 out of 31 users), *'weightLogInfo_merged.csv'* and (14 of 31 users) *'heartrate_seconds_merged'* will be excluded from the analysis

## PHASE 4 & 5: ANALYZE & SHARE


### Daily Summary Statistics

In [None]:
daily_df.describe()

**Insights**
* Average daily steps is 8329, which is lower than the recommended 10,000 steps per day
* Average daily calories burned is 2362, which is between the recommended daily calories intake for men (2500) and women (2000)
* Average daily sendatary minutes is 955 minutes (16 hours), much longer than the average daily minutes asleep of 419 minutes (6.9 hours)
* Difference between average daily time in bed is longer than average daily time asleep by 39 minutes

### Correlations
#### Correlation Matrix

In [None]:
cormat = daily_df[['totalsteps','calories', 'sedentaryminutes', 'totaltimeinbed', 'totalminutesasleep']].corr()
round(cormat,2)

**Insights**
* Positive correlation between daily steps and calories burned
* Little to no correlation between calories burned and amount of sedentary time
* Negative correlation between amount of sedentary time and time asleep
* Strong positive correlation between time in bed and time asleep

#### Daily Steps v Calories Burned

In [None]:
sns.lmplot(x='totalsteps', y='calories', data=daily_df)
plt.title('Daily Steps v Calories Burned')
plt.xlabel("Daily Steps")
plt.ylabel("Calorie Burned")
_ = plt.axvline(x=10000, color='red')
_ = plt.axhline(y=2500, color='red')
plt.show()

**Insights**
* Generally, daily steps of users range between 0 to 20,000 steps
* The estimated amount of calories burnt will be around 2500 if daily steps taken is around 10,000

#### Daily Sedentary Minutes v Time Asleep

In [None]:
sns.lmplot(x='sedentaryminutes', y='totalminutesasleep', data=daily_df)
plt.title('Daily Sendentary Minutes v Time Asleep')
plt.xlabel("Sendentary Minutes")
plt.ylabel("Asleep Minutes")
plt.show()

**Insights**
* If the wearer was more sedentary during the day, less time will be spent asleep

### Trends
#### Activity Level (Steps) By Day Of Week

In [None]:
# Add day column
daily_df['day'] = daily_df['date'].dt.day_name()

# Plot barchart day against totalsteps 
sns.barplot(x="totalsteps", y="day", data=daily_df, 
            order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 
            capsize=.2)
plt.title('Steps Total by Day of the Week')
plt.show()

**Insights**
* Users are most active on Saturdays and least active on Sundays
* From Tuesday, there is generally a decrease in user activity throughout the weekdays

#### Activity Level (Steps) By Time Of Day

In [None]:
# Add hour and day column from the activityhour
hourly_df['hour'] = hourly_df['activityhour'].dt.hour
hourly_df['day'] = hourly_df['activityhour'].dt.day_name()

# Plot linechart on HourOfDay against StepTotal
sns.lineplot(x='hour', y='steptotal', data=hourly_df)
plt.title('Steps total by time of day')
_ = plt.xticks(hourly_df['hour'].unique())

**Insights**
* Users tend to be the most active in the evenings (6pm - 8pm) and during noon (12pm - 2pm)

### Distribution
#### User Type By Daily Steps

In [None]:
# Create user_df
user_df = daily_df.groupby(['id']).mean()

conditions = {
    'Sendetary': user_df['totalsteps']<5000,
    'Lightly Active': user_df['totalsteps']<7500,
    'Fairly Active': user_df['totalsteps']<10000,
}
user_df['usertype'] = np.select(conditions.values(), conditions.keys(), default='Very Active')

#bins = [0,5000,7500,10000,np.inf]
#bins_labels = ['Sendetary','Lightly Active','Fairly Active','Very Active']
#user_df['usertype'] =   pd.cut(user_df['totalsteps'] ,bins, labels = bins_labels)

# Create user_type_df
user_type_df = user_df.groupby(['usertype']).size()

# Plot piechart on user
_ = plt.pie(user_type_df, labels=user_type_df.index, autopct='%.0f%%')
_ = plt.title('User Type By Daily Steps')

**Insights**
* User of all activity level types uses FitBit

## PHASE 6: ACT
### Key Insights
1. Users generally have lower daily steps on average at 8,329 as compared to the daily recommended 10,000 steps
    1. The more steps a user take, the more calories the user burns
    2. At 10,000 steps, a user will burn around ~2500 calories
2. Users spend 6.9 hours asleep on average
    1. The more sedentary time the user spent in the day, the less time will be spent asleep
    2. Users spend on average 39 minutes more in bed before falling asleep
3. Users are generally more active on Saturdays and in the evenings
    1. Activity level decrease from Tuesday throughout the weekdays
4. Users of all activity level uses FitBit

### Recommendations
1. To encourage engagement, there can be In-App daily challenges encouraging users to take either 10,000 steps or burn 2,500 calories in a day
    1. This is because most users have lower average daily steps than the recommended level
    2. As a motivator, users can be rewarded with discounts offered by Bellabeat
2. A value proposition that Bellabeat can give is to improve sleep for users
    1. During user's sedentary time, Bellabeat App can nudge users to do light activities (i.e. short walk)
    2. At night, the App can then nudge users to spend time in bed to drive time spent asleep up
3. Bellabeat App can offer simple 5 minutes exercies for users to do on weekdays
    1. This is to encourage user activities throughout the day, rather than only during the evenings or Saturdays
4. Bellabeat can be marketed for all user types including men
    1. This will allow more data for different user types and allow for better insights into our audience

### Limitations and Assumptions
1. Sample size of data is small (33 users)
2. Sample range of data is small (2 months worth of data)
3. Bellabeat is designed as product for women while FitBit designed as a more general use case
4. Due to the design of Bellabeat's fitness tracker as an accessory, user engagement will need to be driven through the App

## APPENDIX
### Credits & References
* [Capstone: Bellabeat Case Study | R](https://www.kaggle.com/code/irenashen1/capstone-bellabeat-case-study-r)
* [Bellabeat Website](https://bellabeat.com/)
* [Google Data Analytics Capstone](https://www.coursera.org/learn/google-data-analytics-capstone)