# **Case Study: How Can a Wellness Company Play It Smart?**

For this case study, I am a junior data analyst working with the marketing team for a wellness company called Bellabeat, a high-tech manufacturer of heath-focused products for women. I am looking into current smart device usage trends and how these trends can apply to Bellabeat customers. 

The data being used is from FitBit Fitness Tracker, a public dataset available through Mobius (https://www.kaggle.com/arashnic/fitbit). My objective is to provide Bellabeat with any information to help with marketing decisions and high-level recommendations. 

# **Goals**

The goals of my analysis are as follows:

1. Evaluate how customers are using their fitness trackers.

2. Look into which trends, if any, can apply to Bellabeat customers.

3. Bring any relevant data to Bellabeat that assists in the marketing of their smart devices. 

# **Preparations**

The data provided has a smaller group of 30 Fitbit users, who have consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes
information about daily activity, steps, and heart rate that can be used to explore users’ habits. The data was collected between March and May in 2016 and may not be representative of all users given the smaller sampling size. 

After downloading the dataset from Kaggle, I unzipped and saved the files to a folder on my computer. There were a total of 18 CSV files, but I am choosing to work with the following 6 for analysis.

Data is in the form of long data.

* dailyActivity_merged.csv
* dailyCalories_merged.csv
* dailyIntensities_merged.csv
* dailySteps_merged.csv
* sleepDay_merged.csv
* weightLogInfo_merged.csv



## Loading libraries

In [None]:
library(tidyverse)
library(dplyr)
library(readr)
library(readxl)
library(lubridate)

I began by importing the files needed into R Studio. I also cleaned up the names to make them easier to read.

# **Importing files**

In [None]:
daily_activity <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_calories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
daily_intensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
daily_steps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
daily_sleep <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
daily_weight <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

# **Data Inspection**

In [None]:
head(daily_activity)


In [None]:
head(daily_intensities)

In [None]:
head(daily_calories)

In [None]:
head(daily_steps)

In [None]:
head(daily_weight)

In [None]:
head(daily_sleep)

## Glimpse through datasets

In [None]:
glimpse(daily_activity)

In [None]:
glimpse(daily_intensities)

In [None]:
glimpse(daily_calories)

In [None]:
glimpse(daily_steps)

In [None]:
glimpse(daily_weight)

In [None]:
glimpse(daily_sleep)

# **Cleaning the data**
- Looking through the datasets selected for total distinct Id's 
- Checking for duplicates
- Checking for any "NA" values

In [None]:
#daily_activity
n_distinct(daily_activity$Id)
sum(duplicated(daily_activity))
any(is.na(daily_activity))

In [None]:
#daily_intensities
n_distinct(daily_intensities$Id)
sum(duplicated(daily_intensities))
any(is.na(daily_intensities))

In [None]:
#daily_calories
n_distinct(daily_calories$Id)
sum(duplicated(daily_calories))
any(is.na(daily_calories))

In [None]:
#daily_steps
n_distinct(daily_steps$Id)
sum(duplicated(daily_steps))
any(is.na(daily_steps))

In [None]:
#daily_weight
n_distinct(daily_weight$Id)
sum(duplicated(daily_weight))
any(is.na(daily_weight))

drop_na(daily_weight)

In [None]:
#daily_sleep
n_distinct(daily_sleep$Id)
sum(duplicated(daily_sleep))
any(is.na(daily_sleep))

sleep <- distinct(sleep) 
sum(duplicated(sleep)) #check again for duplicate

### Let's drill down into the daily_activity dataset

In [None]:
#adding total minutes people used the FitBit
daily_activity$total_minutes <- (daily_activity$VeryActiveMinutes + daily_activity$FairlyActiveMinutes + daily_activity$LightlyActiveMinutes + daily_activity$SedentaryMinutes)

#changing date format
daily_activity$ActivityDate <- mdy(daily_activity$ActivityDate)

#adding day of week to dataframe
daily_activity$Days <- format(as.Date(daily_activity$ActivityDate), "%A")
daily_activity$Days <- ordered(daily_activity$Days, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

#checking daily_activity
head(daily_activity)

### Basic calculations for the daily_activity dataframe (mean, max, min)

In [None]:
mean(daily_activity$TotalSteps) #average of total steps of all users
mean(daily_activity$Calories) #average calories burned
mean(daily_activity$total_minutes) #average total minutes of all users

In [None]:
max(daily_activity$TotalSteps) #maximum total steps of all users
max(daily_activity$Calories)   #maximum calories 
max(daily_activity$total_minutes) #maximum total minutes of all users

In [None]:
min(daily_activity$TotalSteps) #minimum total steps for all users
min(daily_activity$Calories)   #minimum calories burned
min(daily_activity$total_minutes) #minimum total minutes of all users

### Checking total steps per day

In [None]:
aggregate(daily_activity$TotalSteps ~ daily_activity$Days, FUN = mean)

### You can see from the table above that Tuesday and Saturday are the two most active days. 

Will add a new dataframe of average steps per day alongside average calories burned per day.

In [None]:
active_days <- daily_activity %>%
  group_by(Days) %>% 
  summarize(Total_steps=mean(TotalSteps),calories=mean(Calories))
View(active_days)

### Drilldown into usage per day

In [None]:
#usage per day dataframe
active <- daily_activity %>% 
  summarize(LightlyActive=sum(LightlyActiveMinutes),VeryActive=sum(VeryActiveMinutes),FairlyActive=sum(FairlyActiveMinutes),Sedentary=sum(SedentaryMinutes))

#convert data above to pivot longer
active <- pivot_longer(active,names_to = "active_types", values_to = "value",LightlyActive:Sedentary)

active <- active %>% 
  mutate(percentage=value/sum(daily_activity$total_minutes)) %>% 
  mutate(person = (percentage*100))

head(active)

#### Can see that 81% of users fall under catergory of Sedentary

# Visualization

In [None]:
#bar graph of activity levels by type
ggplot(data = active) +
geom_col(mapping = aes(x = active_types, y = value, fill = active_types)) +
labs(title = "Activity Levels of Fitbit Users", x = "Activity Types", y = "Total Minutes")

In [None]:
#visualization of total steps against total calories
ggplot(data = daily_activity) +
geom_smooth(mapping = aes(x = Calories, y = TotalSteps)) +
geom_jitter(mapping = aes(x = Calories, y = TotalSteps)) +
labs(title = "Total Steps vs Total Calories")

In [None]:
#visualization of most active days of week
ggplot(active_days) + 
geom_col(mapping = aes(x = Days, y = Total_steps, fill = Days)) + 
labs(title = "Activity by Days of Week", x = "Days", y = "Total Steps")

# Act Phase

#### Conclusion

From the data above we can see that:

1. Saturday and Tuesday are the most active days of this group.

2. These FitBit users spent 81% of their time being sedentary.

3. As shown in the graph, the more steps taken by users, the more calories burned, with a few outliers.



#### Recommendation

1. Since the majority of the time spent wearing the FitBit was without activity (sedentary), I would recomend Bellabeat have notifications or an alert to have user move or get up after a period of time without activity.

2. The most active days are Tuesday and Saturday, so maybe set goals or have rewards for more activity during the stretches in between, maybe at the start of the week to kick off and before the weekend to keep momentum. 

3. The users in this group never reached the CDC recommended 10,000 steps a day (the closest being arounf 8000), Bellabeat should make this a priority goal and offer rewards or competitions between users/friends to encourage movement and step targets to be reached. 