# **Bellabeat Fitbit Tracker**

### This case study was done using R

### Introduction: 
Bellabeat, a health-focused products company that aims toward women, Urska Srsen the co-founder and Chief Creative Officer of Bellabeat wants to analyze the smart device fitness data to help create new growth opportunities for the company. 

### Questions:

1. What are some trends in smart device usage?
2. How could these trends apply to Bellabeat customers?
3. How could these trends help influence Bellabeat marketing strategy?

### Business Task: 

Identify possible opportunities for growth and recommendations for Bellabeat to improve its marketing strategy based on current trends in smart device usage. 

### Import Packages: 

In [133]:
library(tidyverse)
library(tidyr)
library(lubridate)
library(dplyr)
library(ggplot2)

### Data:

The dataset is from Kaggle where it takes multiple points of data collected about the user. This includes daily calories, minutes sleeping, hourly calories, etc. This data will be the basis in helping Bellabeat create and useful business decisions invovling the fitbit. 

Direct link to orginal code: 
https://www.kaggle.com/datasets/arashnic/fitbit/code

### Importing Data:

In [134]:
activity <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
steps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
heartrate <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
sleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
intense <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")


In [135]:
#checking data
head(calories)

In [136]:
head(activity)

In [137]:
head(intensities)

In [138]:
head(steps)

In [139]:
head(heartrate)

In [140]:
head(sleep)

In [141]:
head(weight)

In [142]:
head(intense)

After looking at each piece of the data represented here. The date and time needs to be formatted correctly. So I will clean data next. 

### Cleaning Data/Formatting

In [143]:
#activity
activity$ActivityDate= as.POSIXct(activity$ActivityDate, format = "%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%Y")

#calories
calories$ActivityDay= as.POSIXct(calories$ActivityDay, format = "%m/%d/%Y", tz=Sys.timezone())
calories$date <- format(calories$ActivityDay, format = "%m/%d/%Y")

#intense (hourly)
intense$ActivityHour= as.POSIXct(intense$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intense$date <- format(intense$ActivityHour, format = "%m/%d/%Y")
intense$time <- format(intense$ActivityHour, format = "%H:%M:%S")

#steps
steps$ActivityDay= as.POSIXct(steps$ActivityDay, format = "%m/%d/%Y", tz=Sys.timezone())
steps$date <- format(steps$ActivityDay, format = "%m/%d/%Y")

#heartrate
heartrate$Time= as.POSIXct(heartrate$Time, format = "%m/%d/%Y %H:%M:%S", tz=Sys.timezone())
heartrate$date <- format(heartrate$Time, format = "%m/%d/%Y %H:%M:%S")

#sleep
sleep$SleepDay= as.POSIXct(sleep$SleepDay, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%Y")

#weight
weight$Date= as.POSIXct(weight$Date, format = "%m/%d/%Y %H:%M:%S", tz=Sys.timezone())
weight$date <- format(weight$Date, format = "%m/%d/%Y %H:%M:%S")


In [144]:
#Determine how many participants are in each of the datasets 

n_distinct(activity$Id)
n_distinct(calories$Id)
n_distinct(intense$Id)
n_distinct(steps$Id)
n_distinct(heartrate$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

Due to the low participation in heartrate and weight it will not give an accurate representation to create recommendations.  

In [145]:
#I will exclude heartrate and weight and will continue with the other categories
#I will summarize the data next

#activity per day
activity %>% select(TotalSteps, TotalDistance) %>% 
summary()

#compare different types of active levels per day
activity %>% select(VeryActiveMinutes, LightlyActiveMinutes, SedentaryMinutes) %>% 
summary()

#calories per day
calories %>% select(Calories) %>% 
summary()

#intense 
intense %>% select(TotalIntensity, AverageIntensity) %>%
summary()

#steps
steps %>% select(StepTotal) %>%
summary()

#sleep
sleep %>% select(TotalMinutesAsleep, TotalTimeInBed) %>%
summary()

#intense
intense %>% select(ActivityHour, TotalIntensity) %>%
summary()




There are a couple things to note in these summaries:
1. The average sleep time is about 7 hours, which is great
2. The average time in bed is 7.64 hours, meaning after a participant is awake it takes them 38 minutes to get up or 38 minutes before they fall asleep. 
3. There is an outlier of a participant who is laying in bed (not asleep the whole time) for 16 hours. 
4. Most participants walk their furthest during "Light Activity" 
5. The average calories burned per day is 2304
6. When looking at minutes of different levels of activity, the average amount of minutes consumed during "sendentary" level is 16.5 hours per day.


After looking at the data prepared I realized that to get all information into one dataset I will need to merge the datasets of activity and sleep together. 

In [146]:
total <- merge(activity,sleep,by=c("Id", "date"))
head(total)

I now want to drop ActivityDate and SleepDay since we have now combine the two tables using "date"

In [147]:
total = subset(total, select = -c(ActivityDate,SleepDay) )
head(total)

### Visulizations 

Now that I have remove duplicate data, cleaned and formatted it is now time to visualize the data. 

In [148]:
#scatterplot of data to find correlation 

gg <- ggplot(total, aes(x=TotalSteps, y=Calories)) + 
  geom_point() + geom_smooth() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(subtitle="Total Steps vs Calories", 
       y="Calories", 
       x="Total Steps", 
       title="Scatterplot", 
       caption = "Source: total dataset") 

plot(gg)

There looks to be a postive correlation between the total steps to calories loss. In this we would conclude that **the more steps taken the higher the calorie count is**. 

In [149]:
#scatterplot of data to find correlation 

gg <- ggplot(total, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + 
  geom_point() + geom_smooth() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(subtitle="Total Minutes Asleep vs Total Time in Bed", 
       y="Total Time in Bed", 
       x="Total Minutes Asleep", 
       title="Scatterplot", 
       caption = "Source: total dataset") 

plot(gg)

This scatterplot has a positive correlation. Meaning that the increase in total mintues asleep also increases the amount of time in bed. Showing that in order to get more rest at night or improve your hours of sleeping you **should set up a time to get into bed**.

In [150]:
#scatterplot of data to find correlation 

gg <- ggplot(total, aes(x=SedentaryMinutes, y=Calories)) + 
  geom_point() + geom_smooth() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(subtitle="Sedentary Minutes vs Calories", 
       y="Calories", 
       x="Sedentary Minutes (per day)", 
       title="Scatterplot", 
       caption = "Source: total dataset") 

plot(gg)

Calories vs Sedentary minutes do not show a correlation and will not be used for analysis. I will instead try the same thing with sedentary minutes and total minutes asleep. 

In [151]:
#scatterplot of data to find correlation 

gg <- ggplot(total, aes(x=SedentaryMinutes, y=TotalMinutesAsleep)) + 
  geom_point() + geom_smooth() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(subtitle="Sedentary Minutes vs Total Mintues Asleep", 
       y="Total Minutes Asleep", 
       x="Sedentary Minutes (per day)", 
       title="Scatterplot", 
       caption = "Source: total dataset") 

plot(gg)

This shows a negative correlation. Meaning with increased sedentary minutes throughout the day decreased the total minutes of sleep per day. **Having a reminder to get up and be active throughout the day can potentially help improve your sleep**. 

In [166]:
bp <- intense %>%
  group_by(time) %>%
  drop_na() %>%
  summarise(mean_bp = mean(TotalIntensity))

ggplot(data=bp, aes(x=time, y=mean_bp)) + geom_histogram(stat = "identity", fill='gray') + theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title="Average Total Intensity vs. Time", y="Total Intensity Level", 
       x="Time (military time)")


The bar graph allowed participants to see that the highest peak of intensity level sits between 5pm to 7 pm with a dip at 3pm. **This can help the participant to know when to increase levels**. Whether that be to go for a walk or move around to maintain the intensity level throughout the day. 

### Conclusion: 

Bellabeat has become a global manufacturing company focused on women-centered health products to track stress, sleep, menstrual cycle, and more. After cleaning, formatting, and analyzing the data from the smart device, I have a few recommendations that can help improve Bellabeat's marketing strategy.

As a company (Bellabeat) we want to help women improve their lifestyles and assist them in creating healthier and happier lifestyles. With a smart device, it can track different components of the client's lifestyle and then interpret it to give recommendations such as: setting a bedtime or increasing steps taken incrementally throughout the day. 

As a woman myself, I want the option of creating a lifestyle plan that both fits my needs and helps me out. The Bellabeat smart device could feed into the app and alert the client of potentially harmful habits (ie. increase sedentary minutes, decrease the amount of sleep per night) and create suggestions in guiding you to the right path. 

### Recommendations (for the app): 

1. Create a bedtime alert to help clients with healthy sleeping habits
2. Alert clients who have been sedentary for a certain time to get up and walk or move 
3. Since the most active hours is between 5-7 pm. Create a friendly reminder to workout or move around that timeframe
4. Have the client set a goal of total steps or distance to increase calories burned and decrease sedentary minutes per day
5. Extra: Create daily competitions of steps to achieve (if a client wants/needs motivation)