# About the company

Bellabea is a high-tech women's health and fitness company. It is successful and small, but have the ability to gain marketshare in the global smart device market. Urška Sršen, a cofounder, believes smart device fitness data is the key to new growth opportunities in the industry.

# Analysis questions

1. What are some trends in smart device usage? 
2. How could these trends apply to Bellabeat customers? 
3. How could these trends help influence Bellabeat marketing strategy

# Business task

Identify potential growth opportunities, and make recommendations for company improvement based on data trends in smart device usage.

# Packages loaded


In [None]:
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

# Imported datasets
For my smart device data analysis I used the FitBit Fitness Tracker [Data](https://www.kaggle.com/arashnic/fitbit).

In [None]:
activity <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
sleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

The data was initially inspected using Google sheets. I used the View() and head() functions in R to confirm the data was imported correctly.

In [None]:
head(activity)

Before analysis I noticed an issue with the timestamp. So I formatted the timestamp into *date time* format, then separated the data into date and time.

# Formatting fix

In [None]:
# intensities
intensities$ActivityHour=as.POSIXct(intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensities$time <- format(intensities$ActivityHour, format = "%H:%M:%S")
intensities$date <- format(intensities$ActivityHour, format = "%m/%d/%y")
# calories
calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")



# Exploration and summary

In [None]:
n_distinct(activity$Id)
n_distinct(calories$Id)
n_distinct(intensities$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

This gives us the number of participants in the respective data sets. 

This activity is made up of 33 participants in the calories and intensities data sets, 24 in the sleep data set, and 8 in the weight data set. 8 is a reasonable sample size for extracting any actionable insights.

Here we have the summary statistics of each data set:

In [None]:
# activity
activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes, Calories) %>%
  summary()

# explore num of active minutes per category
activity %>%
  select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
  summary()

# calories
calories %>%
  select(Calories) %>%
  summary()
# sleep
sleep %>%
  select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()
# weight
weight %>%
  select(WeightKg, BMI) %>%
  summary()

**A few interesting insights uncovered:**

* The average sedentary time is 991mins, which amounts to 16hrs. This number figure needs to be reduced. A reduction in sedentary standby would lead to more positive customer engagement.

* The majority of users tracked were in the 'lightly active' catergory.

* The users tracked on average sleep once for a period of 7 hours.

* The average total steps per day are 7638. This figure is close, but ultimately short of the 8000 steps the CDC attributes with health benefits. The CDC associates 8,000 steps a day to a 51% lower risk for all-cause mortality.

# Merging data

I used inner join to merge activity and sleep on columns Id and date. Date was created when formatting to date time. This prepared the set for visualiztion.

In [None]:
merged_data <- merge(sleep, activity, by=c('Id', 'date'))
head(merged_data)

# Data Visualization

In [None]:
ggplot(data=activity, aes(x=TotalSteps, y=Calories)) + 
  geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories")

There is an obviously expected positive correlation between Total Steps and Calories. This follows the logic that more calories are burned the longer activity continues.

In [None]:
ggplot(data=sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + 
  geom_point()+ labs(title="Total Minutes Asleep vs. Total Time in Bed")

The Total Minutes Asleep and Total Time in Bed appear to be linearly correlated. **So to improve sleep, I'd recommend Bellabeat consumers use sleep notifications.**


Here's the **intensities data** over time (hourly).

In [None]:
int_new <- intensities %>%
  group_by(time) %>%
  drop_na() %>%
  summarise(mean_total_int = mean(TotalIntensity))

ggplot(data=int_new, aes(x=time, y=mean_total_int)) + geom_histogram(stat = "identity", fill='darkblue') +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title="Average Total Intensity vs. Time")

* Once I plotted Total Intensity hourly, I noticed people are more active between 5am and 10pm.

* The majority of activity happens from 5 pm to 7 pm. I assume this is attributed to people exercising after work. We can **I recommend using this block of time to send exercise reminders via the bellabeat app to motivate user activity.**

Total Minutes Asleep and Sedentry Minutes

In [None]:
ggplot(data=merged_data, aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) + 
geom_point(color='darkblue') + geom_smooth() +
  labs(title="Minutes Asleep vs. Sedentary Minutes")

* There is a clear negative correlation between Sedentary Minutes and Sleep time. **I'd recommend reminding customers via Bellabeat app that reducing sedentary time can improve their sleep.** This insight **does** need more data behind it. Correlation does not mean causation.


# Stakeholder Recommendations

I analyzed the FitBit Fitness Tracker Data, amd found **insights that can benefit the strategic approach Bellabeat assumes in its market.**


**Target audience**

Users who work full-time jobs (using hourly intensity data), and spend a lot of time sitting (using sedentary time data).

They do light activity to stay healthy (using activity type analysis), but these users need to increase everyday activity in order to have health benefits. They might need knowledge about healthy habits or motivation to get active. 

* As there is no gender information about the participants, I assumed that all genders were presented and balanced in this data set. 

**The key message for the Bellabeat online campaign**

The Bellabeat app is much more than a fitness app. 
It’s a guide for women's healthy habits, routines, and education that motives through daily app reminders. 

**App Recommendations**

1. The average total steps per day are 7638. This figure is close, but ultimately short of the 8000 steps the CDC attributes with health benefits. The CDC associates 8,000 steps a day to a 51% lower risk for all-cause mortality. Bellabeat can advise people to take **at least 8 000 steps** to benefit their health.

2. If users want to lose weight, it’s probably a good idea to control daily calorie consumption. Bellabeat can suggest some ideas for low-calorie lunch and dinner.

3. Bellabeat should consider sending sleep notifications, and advising users on better sleep.

4. The majority of activity happens from 5 pm to 7 pm. I assume this is attributed to people exercising after work. We can **I recommend using this block of time to send exercise reminders via the bellabeat app to motivate user activity.**

5. As an idea: if users want to improve their sleep, the Bellabeat app can recommend reducing sedentary time.


**Thank you** for your interest to my bellabeat Case Study!
