# Objective
You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused <br />
products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. <br />
Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. <br />
You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.<br />


# About the company
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. <br />
Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around <br />
the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with <br />
knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly <br />
positioned itself as a tech-driven wellness company for women. <br />
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available <br />
through a growing number of online retailers in addition to their own e-commerce channel on [their website](https://bellabeat.com/). The company <br />
has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital <br />
marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and <br />
consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google <br />
Display Network to support campaigns around key marketing dates. <br />
Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has <br />
asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain <br />
insight into how people are already using their smart devices. Then, using this information, she would like high-level <br />
recommendations for how these trends can inform Bellabeat marketing strategy <br />

# Business question

1. What are some trends in smart device usage?

#### Overall smart device trends

"Lastly, despite a relatively high selling price, smartwatch and wristband ownership is climbing, having risen by 58% since 2017. <br />
Having now passed the early adoption phase, they’re proving popular with health-conscious individuals and status-seekers." <br />
Main devices used are smartphones, laptops/PCs and tablets. In the age group 55 to 64 Tablets are owned more often compared to younger age groups."
[1](https://www.gwi.com/reports/device)

#### Smartwear

- Asia Pacific has the highest ownership of smartwach/ smartwristband, with 1 in 5 owning either.
- The largest group comes from the Singapore & Hong Kong region with a 29% and 28% owning either.
- In the UK and US Apple, Samsung and Fitbit are the leading Brands with a share of 56%, 16% and 7% respectively
- Smartwatch Apple is with a marketshare of 56% are leading brands. 
- Fitbit has only a marketshare of 4% but acquisition of Google can increase the marketshare.
[1](https://www.gwi.com/reports/device) [2](https://www.mordorintelligence.com/industry-reports/smart-wearables-market)

- Besides the market share another trend is going towards health with blood sugar measurement and early detection of diseases like covid. [3](https://www.ispo.com/en/news-sports-experts/six-latest-trends-smartwatches) 
- Smart bracelet instead of a watch they look like a beautiful bracelet which Bellabeat already are. [3](https://www.ispo.com/en/news-sports-experts/six-latest-trends-smartwatches)
- longer battery life 
- Authentication is another trend like unlocking your home, getting into concerts [4](https://edu.gcfglobal.org/en/wearables/the-future-of-wearable-technology/1/)

2. How could these trends apply to Bellabeat customers?

Increasing ownership of smartwachtes and smartwristbands leads to a higher adoption across different markets and larger customer groups. <br />
For Bellabeat customers that means that more products, services and other will be created to fulfill their needs. <br />
The product can also be used at a festival to get access or for other authentication needs besides being used as a smartwatch <br />
Bellabeat products can also be used as a health monitor.

3. How could these trends help influence Bellabeat marketing strategy?

You will produce a report with the following deliverables:
1. A clear summary of the business task
2. A description of all data sources used
3. Documentation of any cleaning or manipulation of data
4. A summary of your analysis
5. Supporting visualizations and key findings
6. Your top high-level content recommendations based on your analysis

In [None]:
library(lubridate)
library(ggplot2)
library(plotly)
library(skimr)
library(dplyr)
library(plotly)
library(patchwork)
library(tidyverse) # metapackage of all tidyverse packages

library(htmlwidgets) # to display plotly
library('IRdisplay')

list.files(path = "../input")

In [None]:
# reading in daily activity data
dailyActivity <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
dailyCalories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
dailyIntensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
dailySteps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
dailySleep <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
#Weight <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
hourlyIntensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
hourlyCalories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
hourlySteps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")

In [None]:
spec(dailyIntensities)
#spec(dailyActivity)
#spec(dailyCalories)
spec(dailySteps)


In [None]:
spec(hourlyIntensities)
spec(hourlyCalories)
spec(hourlySteps)

In [None]:
head(hourlyCalories)
head(hourlyIntensities)
head(hourlySteps)

In [None]:
merge_hourly_cal <- merge(hourlyIntensities, hourlyCalories, by = c("Id","ActivityHour"))
# merging Intensities and Steps
hourly_final <- merge(merge_hourly_cal, hourlySteps, by = c("Id","ActivityHour"))

In [None]:
hourly_final <- hourly_final %>% 
  rename(date_time = ActivityHour) %>%
  mutate(date_time= as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz= Sys.timezone()))
head(hourly_final)

In [None]:
hourly_final <- hourly_final %>% 
  separate(date_time, c('Date','Hour'), sep = " ")

In [None]:
hourly_final$day <- weekdays(as.Date(hourly_final$Date))

In [None]:
dailySleep <- dailySleep %>% 
  separate(SleepDay, c('Date','DateTime'), sep = " ")

In [None]:
#merging daily Activity with daily Calories by Id and Calories
merge_ac_cal <- merge(dailyActivity, dailyCalories, by = c("Id","Calories"))
# merging Intensities and Steps
merge_int_steps <- merge(dailyIntensities, dailySteps, by = c("Id","ActivityDay"))

merge_daily <- merge(merge_ac_cal, merge_int_steps, by = c("Id","ActivityDay","SedentaryMinutes", "LightlyActiveMinutes","FairlyActiveMinutes","VeryActiveMinutes", "SedentaryActiveDistance", "LightActiveDistance", "ModeratelyActiveDistance", "VeryActiveDistance")) %>%
select(-ActivityDay) %>% rename(Date = ActivityDate)

#dailySleep has not logged Sleep for every day.
daily_data <- merge(x=merge_daily, y=dailySleep, by = c("Id","Date")) %>% drop_na() %>% select(-DateTime, -TrackerDistance, -StepTotal)



In [None]:
head(merge_daily)
head(dailySleep)

In [None]:
options(repr.plot.width=10)

In [None]:
colSums(daily_data > 0)

Transforming date column from character to dateformat to find activities on weekdays

In [None]:
merge_daily <- merge_daily %>% 
  mutate(Date = as.Date(Date, format = "%m/%d/%Y"))

In [None]:
daily_data <- daily_data %>% 
  mutate(Date = as.Date(Date, format = "%m/%d/%Y"))

In [None]:
# Generating weekday column
daily_data$day <- weekdays(as.Date(daily_data$Date))
merge_daily$day <- weekdays(as.Date(merge_daily$Date))

In [None]:
summary(dailyActivity)
summary(dailySleep)

# Observations

We have no missing data in any column. The data was collected from 12.04.2016 till the 12.05.2016.

__Activity Observations__
- Sedentary Minutes on average 717 minutes ~ 12 hours.
- Lightly Active 216 minutes ~ 3.6 hours
- Fairly Active 11 minutes Max -> 143
- VeryActive. 25 minutes Max -> 210

intresting Observation the ligthActive Distance is in the mean and max higher than moderatelyActive Distance. <br />
VeryActive Distance on the other hand is with a __max of 21.9 and a  mean of 1.5__ higher than moderatelyActive Distance which can be a result of 
activities like jogging. <br /> LightActive includes activities like walking which results in the higher distance traveled. <br />
The average user has a sedentary time of around __12 Hours__. <br />
An outlier with a max of __1265 minutes__ which is around __100 hours__ probably removed his fitbit <br />
On average the user sleeps around __419.5 minutes__ ~ 7 hours <br />     
__TotalSteps__ has a mean of __5085 steps__ and a median of __3775__, which indicates a right skew of the data. <br />

only __5%__ log their Activity Distance.

__Points for Investigation__:
 
- Calories -> StepDistance 
- Calories -> Activity Status (Lightly, Fairly, VeryActive)
- Calories -> Date does Day of the week have any influence on calories?
- StepCount -> hypothesis in the week more active than on weekends? or vice versa more on weekends 
- Distance -> Weekdays 
- SedentaryMinutes -> Weekdays 

*the arrow signals to look if there is a trend ? or a correlation*

## Analysis

In the following the points for Investigations will be analyzed.

In [None]:
#plot(merge_daily$Calories, merge_daily$StepDistance)

In [None]:
head(daily_data)

In [None]:
length(daily_data$Id)

In [None]:
head(daily_data)

In [None]:
summary(daily_data)

In [None]:
ggplot(data=daily_data, aes(x=TotalDistance, y=Calories)) + 
  geom_point() + geom_smooth() + labs(title="Total Distance vs. Calories")

ggplot(data=daily_data, aes(x=TotalSteps, y=Calories )) + 
  geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories")

as expected the Total Distance and total Steps of a user have a positive correlation with calories <br />


In [None]:

# 3 figures arranged in 3 rows and 1 column
#layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
p1 <- ggplot(data=daily_data, aes(x=LightlyActiveMinutes, y=TotalSteps)) +   geom_point() + geom_smooth() + labs(title="Total Steps vs. lightly active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        
p2 <- ggplot(data=daily_data, aes(x=FairlyActiveMinutes, y=TotalSteps)) +   geom_point() + geom_smooth() + labs(title="Total Steps vs. fairly active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        
p3 <- ggplot(data=daily_data, aes(x=VeryActiveMinutes, y=TotalSteps)) +   geom_point() + geom_smooth() + labs(title="Total Steps vs. very active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        


(p1 + p2) / p3 & theme_minimal()

lightly active minutes has a positive effect on total steps compared with fairly and very active minutes <br />
Which can be a result of the activity form like *joggin*. 

In [None]:

# 3 figures arranged in 3 rows and 1 column
#layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
p1 <- ggplot(data=daily_data, aes(x=LightlyActiveMinutes, y=Calories)) +   geom_point() + geom_smooth() + labs(title="Calories vs. lightly active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        
p2 <- ggplot(data=daily_data, aes(x=FairlyActiveMinutes, y=Calories)) +   geom_point() + geom_smooth() + labs(title="Calories vs. fairly active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        
p3 <- ggplot(data=daily_data, aes(x=VeryActiveMinutes, y=Calories)) +   geom_point() + geom_smooth() + labs(title="Calories vs. very active Minutes") +theme(legend.position="none",  plot.title = element_text(size=22, hjust = 0.5))        

(p1 + p2) / p3 & theme_minimal()

except for fairly active Minutes we can see a __positive correlation__ with increasing active Minutes for very active and light active minutes.

In [None]:
level_order <- c("Monday", "Tuesday", "Wednesday","Thursday","Friday","Saturday","Sunday")

p <- merge_daily %>%
  group_by(day) %>%
  summarize(mean.TotalSteps=mean(TotalSteps),
            weekday=day) %>%
  ggplot(aes(x=factor(weekday,level= level_order), y=mean.TotalSteps)) + geom_point(aes(color=day),size=5) + 
  geom_line(aes(group=1),linetype='dotted')+
  labs(x = 'Day of the week',
       y = 'Mean Stepcount',
       title = 'Mean Step Amount by weekday')

fig_1 <- ggplotly(p)



htmlwidgets::saveWidget(fig_1, "fig_1.html")
display_html('<iframe src="./fig_1.html" width=100% height=400></iframe>')



In [None]:
merge_daily %>%
  group_by(day) %>%
  summarise_at(vars(TotalSteps), list(name = mean))

The Stepcount is on Sunday with __4324 Steps the lowest__ and with __5744 steps__ on Saturday the __highest__. <br />
Depending on the geographic region can be a result of running errands on Saturday or going out. <br />
On Sunday family gatherings or vistings friends can be the reasons for a lower Stepcount

In [None]:
merge_daily %>%
  group_by(Id) %>%
  summarise_at(vars(TotalSteps), list(name = mean))

In [None]:
level_order <- c("Monday", "Tuesday", "Wednesday","Thursday","Friday","Saturday","Sunday")

p <- merge_daily %>%
  group_by(day) %>%
  summarize(mean.SedentaryMinutes=mean(SedentaryMinutes),
            weekday=day) %>%
  ggplot(aes(x=factor(weekday,level= level_order), y=mean.SedentaryMinutes)) + geom_point(aes(color=day),size=5) + 
  geom_line(aes(group=1),linetype='dotted')+
  labs(x = 'Day of Month',
       y = 'Mean sedentary minutes',
       title = 'Mean sedentary minutes by Day of the week')

fig_2 <- ggplotly(p)


htmlwidgets::saveWidget(fig_2, "fig_2.html")
display_html('<iframe src="./fig_2.html" width=100% height=400></iframe>')

as earlier seen with the mean total step count we can see that mean sedentary minutes are higher on Sunday and Monday. <br />
Saturday and Thursday have the lowest sedentary minutes with around an __hour lower__ than Sunday and Monday.  <br />

#### Creating activity groups

We want to dive deeper into different activity groups:
We could divide them into 4 groups -> sedintary, lightly active, fairly active and very active. <br />
The groups are divided depending on the mean TotalStep amount:
- sedintary up to 5000 steps
- lightly active over 5000 but lower than 7500
- fairly active over 7500 but lower than 10000
- very active over 10000

In [None]:
activityCol <- daily_data %>% 
  group_by (Id) %>% 
  summarise(avg_steps= mean(TotalSteps), 
            avg_cal= mean(Calories), 
            avg_sleep= mean(TotalMinutesAsleep, 
                                   na.rm = TRUE)) %>% 
  mutate(activity_type= case_when(
    avg_steps < 5000 ~ "sedentary",
    avg_steps >= 5000 & avg_steps <7600 ~"lightly active",
    avg_steps >= 7600 & avg_steps <10000 ~"fairly active",
    avg_steps >= 10000 ~"very active"
  ))

final_df <- merge(daily_data, activityCol[c("Id","activity_type")], by = "Id")
final_df$activity_type <- factor(final_df$activity_type, levels = c("sedentary", "lightly active", "fairly active", "very active"))

In [None]:
fig_3 <- plot_ly(final_df, y = ~TotalSteps, color = ~activity_type, type = "box")


htmlwidgets::saveWidget(fig_3, "fig_3.html")
display_html('<iframe src="./fig_3.html" width=100% height=400></iframe>')


In [None]:
fig_4 <- plot_ly(final_df, y = ~Calories, color = ~activity_type, type = "box")


htmlwidgets::saveWidget(fig_4, "fig_4.html")
display_html('<iframe src="./fig_4.html" width=100% height=400></iframe>')

In [None]:
fig_5 <- plot_ly(final_df, y = ~TotalMinutesAsleep, color = ~activity_type, type = "box")


htmlwidgets::saveWidget(fig_5, "fig_5.html")
display_html('<iframe src="./fig_5.html" width=100% height=400></iframe>')

In [None]:
fig_6 <- plot_ly(final_df, x = ~TotalSteps, y = ~Calories, color = ~activity_type)


htmlwidgets::saveWidget(fig_6, "fig_6.html")
display_html('<iframe src="./fig_6.html" width=100% height=400></iframe>')


In [None]:
options(repr.plot.width=10, repr.plot.height=10)

In [None]:
# levels is used to sort column
merge_daily$day <- factor(merge_daily$day, levels = c("Monday", "Tuesday", "Wednesday","Thursday","Friday","Saturday","Sunday"))


At the start of the week and on sundays the amount of sedentary minutes are the highest. <br />
Sedentary minutes correlates with the activity type which other charts showed

#### Hourly Analysis

In the following hourly activites will be analyzed.

In [None]:
hourly_final$day <- factor(hourly_final$day, levels = c("Monday", "Tuesday", "Wednesday","Thursday","Friday","Saturday","Sunday"))


In [None]:
head(hourly_final)

In [None]:
fig_7 <- plot_ly(
    x = hourly_final$Hour, y=hourly_final$day,
    z = hourly_final$StepTotal, type = "heatmap"
) %>% layout(title="Total steps throughout the week")


htmlwidgets::saveWidget(fig_7, "fig_7.html")
display_html('<iframe src="./fig_7.html" width=100% height=400></iframe>')

In the week the most active time is between __9__ and __17__. On the weekend there are no times were a high total Step count can be identified <br />
Friday on __14__ and __16__ there are two spots with a high stepcount of __9769__ and __3458__.

In [None]:
fig_8 <- plot_ly(
    x = hourly_final$Hour, y=hourly_final$day,
    z = hourly_final$Calories, type = "heatmap"
) %>% layout(title="Calories throughout the week")


htmlwidgets::saveWidget(fig_8, "fig_8.html")
display_html('<iframe src="./fig_8.html" width=100% height=400></iframe>')

In the week the most active time is between __9__ and __17__. <br />
Friday on __14__ there is one spot with a high calorie count of __843__. <br />
Wednesday at __17__ is the second highest calorie amount with __789__. <br />
Saturday has two timeframes on __12__ and __13__ with __602__ and __705__. <br />


In [None]:
fig_9 <- plot_ly(
    x = hourly_final$Hour, y=hourly_final$day,
    z = hourly_final$TotalIntensity, type = "heatmap"
) %>% layout(title="Intensity throughout the week")


htmlwidgets::saveWidget(fig_9, "fig_9.html")
display_html('<iframe src="./fig_9.html" width=100% height=400></iframe>')

In the week the most active time is between __9__ and __18__. <br />
Friday on __14__ there is one spot with a high calorie count of __176__. <br />
Wednesday at __17__ is the second highest calorie amount with __168__. <br />
Saturday has two timeframes on __12__ and __13__ with __137__ and __180__. <br />

The "intense" spots corrolate with calories during the week.

# __Summary__

__Findings__:
- Users take around 5000 steps per day which is half of the recommended 10000 by the cdc [5](https://www.cdc.gov/diabetes/prevention/pdf/postcurriculum_session8.pdf).
- Total Step count correlate with calories.
- sleep amount has no influence on step count or calories.
- Users are either a sleep or sedentary
- Activity log isn't used really often
- Users are most active between 9 and 17
- Saturday is the day with highest step count on average where sunday is the day with the lowest stepcount.

__Recommendations__:

1. As we saw people are most active in the week in the time between 9 and 17. People are mostly active after 16 in the week, which may come from people working between 9 to 16.
2. BellaBeat should recommend or add a feature to animate its users to get up for a few minutes and walk to increase their daily step count. The goal should be to hit 10000 Steps 
3. Another feature could be to challenge users like: "walk through a park nearby today" or "visit a museum this Sunday".
4. A social component to challenge your friends to beat your stepcount could also add a way to improve the stepcount and interactions with the smartwatch

#### Source

- 1 Device trends of tomorrow https://www.gwi.com/reports/device
- 2 Mordor Intelligence smart wearables market https://www.mordorintelligence.com/industry-reports/smart-wearables-market
- 3 ISPO.com  The Six Latest Trends in Smartwatches https://www.ispo.com/en/news-sports-experts/six-latest-trends-smartwatches
- 4 GCFLearnFree.org The future of wearable technology https://edu.gcfglobal.org/en/wearables/the-future-of-wearable-technology/1/
- 5 CDC https://www.cdc.gov/diabetes/prevention/pdf/postcurriculum_session8.pdf