In [None]:
knitr::opts_chunk$set(echo = TRUE)

### Scenario: 

A junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women, has been asked to focus on one of
Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.

Bellabeat is a successful small company, but they have the potential to become a larger player in theglobal smart device market. Insights from the analysis will then help guide marketing strategy for the company. 



## ASK Phase

#### Business Task:

Analyse usage of smart health devices and identify trends and insight. How might Bellabeat customers’ usages compare to these trends? How can the trends influence our marketing strategy to maximise sales?

#### Key Stakeholders:

**Urška Sršen:** Bellabeat’s cofounder and Chief Creative Officer &nbsp; 
 
**Sando Mur:** Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

#### Product focus:

Bellabeat App



## PREPARE Phase

Acquire data and open with Excel to get a quick preview.

#### Key Observations about data

•	Data is nominal and cited; it is not comprehensive, and therefore not entirely reliable.
•	Data is outdated, as it was published in 2016 and updated in 2020. A more current data source may be required.
•	With a sample size of just 33, the data is most likely sample-biased, and with no indication of the genders of participants, inferences and insights for our female customers might not be entirely accurate.
•	Data consists of one summary table: “daily_activity_merged”, and several composite tables. Data on sleep and heart rate patterns are in separate, lone tables. 
•	Data on fat and body weight is incomplete, and supplied by few participants. Elected to omit it from analysis.
•	Minute data is a little too detailed for our use.


#### Data sources: 

“FitBit Fitness Tracker Data.
Pattern recognition with tracker data: : Improve Your Overall Health” by MÖBIUS – [Link](https://www.kaggle.com/datasets/arashnic/fitbit/code?datasetId=1041311)



## PROCESS Phase

#### Loading packages

In [None]:
library("tidyverse")
library("lubridate")
library("tidyr")
library("ggplot2")
library("dplyr")
library("janitor")

#### Importing datasets


In [None]:




activity <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
heartrate <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
calories <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
steps <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
intensity <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
sleep <- read_csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

#### Preview the data to verify consistency and check for possible errors

In [None]:
spec(activity)
spec(heartrate)
spec(calories)
spec(intensity)
spec(steps)
spec(sleep)

#### * Clean the column names to only include numbers, characters and underscores.


In [None]:
clean_names(activity)
clean_names(heartrate)
clean_names(calories)
clean_names(intensity)
clean_names(steps)
clean_names(sleep)


#### Change datetime formatting and split into separate columns

In [None]:
### activity

activity$ActivityDate <- as.Date(activity$ActivityDate, format = "%m/%d/%Y")
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")


### calories

calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %H:%M:%S %p", tz="UTC")
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")


### heartrate

heartrate$Time=as.POSIXct(heartrate$Time, format="%m/%d/%Y %H:%M:%S %p", tz="UTC")
heartrate$time <- format(heartrate$Time, format = "%H:%M:%S")
heartrate$date <- format(heartrate$Time, format = "%m/%d/%y")


### intensity

intensity$ActivityHour=as.POSIXct(intensity$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz="UTC")
intensity$time <- format(intensity$ActivityHour, format = "%H:%M:%S")
intensity$date <- format(intensity$ActivityHour, format = "%m/%d/%y")
intensity$date=as.POSIXct(intensity$date,format="%m/%d/%y", tz="UTC")


### sleep

sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %H:%M:%S %p", tz="UTC")
sleep$time <- format(sleep$SleepDay, format = "%H:%M:%S")
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")


### steps

steps$ActivityHour=as.POSIXct(steps$ActivityHour, format="%m/%d/%Y %H:%M:%S %p", tz="UTC")
steps$time <- format(steps$ActivityHour, format = "%H:%M:%S")
steps$date <- format(steps$ActivityHour, format = "%m/%d/%y")


## ANALYSE and SHARE Phases

#### Get a working summary of the different recorded parameters to ascertain exactly how much of what data has been supplied, and if we can glean any insights from that.


In [None]:
n_distinct(activity$Id)
n_distinct(sleep$Id)
n_distinct(heartrate$Id)

Only 24 of the total sample population of 33 submitted data for sleep, and 14 for heart rate. Even if these data points are used, insights from them might be inconclusive and not hold true for a larger sample size.

#### Next, get an overview of the actual recordings to find trends.


In [None]:
### Daily activity

activity%>%
  select(TotalSteps,TotalDistance,Calories,VeryActiveMinutes,LightlyActiveMinutes,SedentaryMinutes)%>%
  summary()

### Sleep

sleep%>%
  select(TotalSleepRecords,TotalMinutesAsleep,TotalTimeInBed)%>%
  summary()


* The data shows that a majority of the sample population walks above 7,000 steps per day, which is within the CDC recommended healthy standard. 

* The mean of very-active minutes is 21. Meaning most of the population does some strenuous exercise daily

* Average sleep time is a about 7 hours, which is also optimum


#### Next, examine what times during the day are most and least active:

In [None]:
ggplot(intensity)+geom_col(mapping=aes(x=time,y=TotalIntensity))+aes(fill=TotalIntensity)+labs(x="Time of Day",y="Intensity",title="Daily Activity Distribution")+theme(legend.position="right")+theme(axis.text.x=element_text(angle=90))

This visual seems to show a general rise in activity at around 8am, peaking between 5pm and 7pm. This might indicate that most of the participants have day jobs, take a lunch break around noon, and go out for (or make) dinner between 5 and 7.

It also shows a small percentage of the population getting up at 5am for apparent exercise (the light blue shade at the 5am mark).

#### We know that walking burns calories:

In [None]:
ggplot(activity)+
  geom_smooth(method=lm,mapping=(aes(x=TotalSteps,y=Calories)))+labs(title="Effect of walking on Calories burned")

At around 4,000 steps per day, there seems to be net zero calories for participants on a 2,000-calorie diet.
Earlier observations showed that most of the participants took above 7,000 steps per day


#### Explore the correlation between activity and sleep

In [None]:
sleep_activity <- data.frame(aggregate(cbind(TotalIntensity) ~ date, data=intensity, mean),aggregate(cbind(TotalMinutesAsleep) ~ date, data=sleep, mean))

ggplot(sleep_activity)+
  geom_smooth(method=loess,(mapping=aes(x=TotalIntensity,y=TotalMinutesAsleep)))+labs(title="How Daily Activity Affects Sleep")

This produces a rather interesting graph.
A possible explanation of this graph is that 

1. People who do little to no exercise tend to sleep more;
2. Then, as activity increases, there is a drop in sleep times, probably because these people work longer hours.
3. Finally, and the actual reason for this analysis, people who do intense exercise tend to have longer (and presumably better) sleep



## ACT 

After a comprehensive analysis of the data and trends, the following are the conclusions:

1. The sample size, along with being rather small, does not indicate the genders of the participants. All assumptions made therewith must be further tested and validated for our female clients

2. The shortage of weight and fat data may indicate that participants were reluctant to submit this information, probably due to personal insecurities 

3. A majority of the participants seem to lead active, lives. There might be a correlation between that and their use of a smart fitness device, but a control group would be needed to ascertain that

4. In contrast, only a small subset of participants have proper workouts in the mornings


### Suggestions for Bellabeat App UX and Marketing

1. According to [this](https://www.health.harvard.edu/staying-healthy/is-your-daily-nap-doing-more-harm-than-good) Havard Health article, daytime naps done right can boost overall health and wellness.
The Bellabeat app could introduce a feature where participants can set alarms for a quick nap in the early afternoon.

2. A sleep schedule can improve the quality of sleep and overall moods according to [this](https://healthysleep.med.harvard.edu/need-sleep/what-can-you-do/good-sleep-habits) Harvard article.
The Bellabeat app could also include a sleep scheduling option, if not already present.

3. Daily exercise prompts could also be added to the app, to get people to do, at least, light workouts in the morning.

4. For marketing campaigns, all three above points could be advertised as part of Bellabeat's USP.
There could also be a reward system for number of steps walked weekly or monthly: something like a tiered discount on monthly Bellabeat subscriptions based on reaching a certain number of steps.

5. Since most of the participants seem to be in the working class, adverts should be targeted at this demographic of the general population.

6. Also, for the people who already workout daily, ads could be placed in gyms and other workout hotspots, encouraging people to track their progress. The American Psychological Association, in this [article](https://www.apa.org/news/press/releases/2015/10/progress-goals) says that frequently monitoring progress toward goals increases the chances of success.