# Google Data Analytics Capstone Project: Bellabeat 

## Case Study: How can a Wellness Technology Company play it smart?

### Introduction

Welcome to the Bellabeat data analysis case study! In this case study, you will perform many real-world tasks of a junior data
analyst. You will imagine you are working for Bellabeat, a high-tech manufacturer of health-focused products for women, and
meet different characters and team members. In order to answer the key business questions, you will follow the steps of the
data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables —
including guiding questions and key tasks — will help you stay on the right path.

By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case
study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge
and skills to potential employers.

### Scenario

You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused
products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the
global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart
device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of
Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The
insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat
executive team along with your high-level recommendations for Bellabeat’s marketing strategy.

### Characters and products

● Characters

○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer

○ Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and
reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have
been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst,
can help Bellabeat achieve them.

● Products

○ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress,
menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and
make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

○ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects
to the Bellabeat app to track activity, sleep, and stress.

○ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user
activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your
daily wellness.

○ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are
appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your
hydration levels.

○ Bellabeat membership: Bellabeat also offers a subscription-based membership program for users.
Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and
beauty, and mindfulness based on their lifestyle and goals.

### About the company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products.
Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around
the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with
knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly
positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available
through a growing number of online retailers in addition to their own e-commerce channel on their website. The company
has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital
marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and
consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google
Display Network to support campaigns around key marketing dates.

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has
asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain
insight into how people are already using their smart devices. Then, using this information, she would like high-level
recommendations for how these trends can inform Bellabeat marketing strategy

## STEP 1: ASK

### Business task:

Analyze consumers use of an existing competitor to identify potential opportunities for growth and recommendations for the Bellabeat marketing strategy

### Questions for the analysis:

1. What are some trends in smart device usage?
2. How could these trends apply to Bellabeat customers?
3. How could these trends help influence Bellabeat’s marketing strategy?

### Key Stakeholders:

· Urška Sršen — Bellabeat’s cofounder and Chief Creative Officer

· Sando Mur — Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

· Bellabeat marketing analytics team — A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

## STEP 2: PREPARE

The data for this analysis will come from FitBit Fitness Tracker Data on Kaggle. These 18 datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016–05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

Limitations for this data exist due to the sample size and absence of key characteristics of the participants, such as gender, age, location, lifestyle.

For this analysis the datasets for daily activity, daily calories, daily intensities, daily steps, heartrate by seconds, minute METs, daily sleep, and weight log information, will be used.

Because of the largeness of the datasets being used, R Studio was used to prepare, process and complete this analysis of which the  many packages and data visualization features available therein can be used to explore the data.

In [62]:
# Setting up my environment

install.packages("tidyverse")
install.packages("here")
install.packages("janitor")
install.packages("skimr")
install.packages("dplyr")

In [5]:
# Loading the packages

library(tidyverse)
library(here)
library(janitor)
library(skimr)
library(dplyr)

### Import the datasets:

The csv files were first opened in Excel and the formatting for the time and/or date was changed from “custom” to “time” and/or “short date” where appropriate. Also, the numbers in the columns with distance were rounded off into 2 decimal places to maintain consistency. The files were then imported into R Studio and the data frames were created with simplified names.

In [6]:
# Importing the datasets

daily_activity <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_calories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
daily_intensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
daily_steps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
minute_METs <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteMETsNarrow_merged.csv")
heart_rate_sec <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
sleep_day <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight_log <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

## STEP 3: PROCESS

### View the data frames:

To ensure the data frames were imported correctly, the head() function is used. The colnames() and glimpse() functions were used to explore the data frames and find common features.

In [7]:
# Viewing the dataframes

# daily_activity

head(daily_activity)

In [8]:
colnames(daily_activity)


In [37]:
glimpse(daily_activity)

In [9]:
# daily_calories

head(daily_calories)


In [10]:
colnames(daily_calories)


In [11]:
glimpse(daily_calories)

In [12]:
# daily_intensities

head(daily_intensities)


In [13]:
colnames(daily_intensities)


In [14]:
glimpse(daily_intensities)

In [15]:
# daily_steps

head(daily_steps)


In [16]:
colnames(daily_steps)


In [17]:
glimpse(daily_steps)

In [18]:
# minute_METs

head(minute_METs)



In [19]:
colnames(minute_METs)


In [20]:
glimpse(minute_METs)

In [21]:
# heart_rate_sec

head(heart_rate_sec)

In [22]:
colnames(heart_rate_sec)

In [23]:
glimpse(heart_rate_sec)

In [24]:
# sleep_day

head(sleep_day)

In [25]:
colnames(sleep_day)

In [26]:
glimpse(sleep_day)

In [27]:
# weight_log

head(weight_log)


In [28]:
colnames(weight_log)

In [29]:
glimpse(weight_log)

### Removing data frames:

All the eight data frames contain the “Id” column, so it is possible to merge all of them (with SQL) if needed. The daily_activity data frame seems to contain data for calories, intensities, and steps. In order to use the daily_activity frame in place of daily_calories, daily_intensities, and daily_steps, the number of observations must be the same and the observations must match for each ID number.

We will load the sqldf package to utilize SQL syntax to determine if the values of daily_calories, daily_intensities, and daily_steps are contained in daily_activity. However, the number of columns must be the same between the data frames, so a temporary data frame with the important columns is created first.

In [30]:
# Removing data frames

install.packages("sqldf")

In [31]:
library(sqldf)

In [32]:
daily_activity_2 <- daily_activity %>%
    select(Id, ActivityDate, Calories)

In [33]:
head(daily_activity_2)

In [34]:
sql_check <- sqldf('SELECT * FROM daily_activity_2 INTERSECT SELECT * FROM daily_calories')

In [35]:
head(sql_check)

In [36]:
nrow(daily_activity_2)

In [37]:
nrow(sql_check)

In [38]:
daily_activity_3 <- daily_activity %>% 
    select(Id, ActivityDate, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance)

In [39]:
head(daily_activity_3)

In [40]:
sql_check_2 <- sqldf('SELECT * FROM daily_activity_3 INTERSECT SELECT * FROM daily_intensities')

In [41]:
head(sql_check_2)

In [42]:
nrow(daily_activity_3)

In [43]:
nrow(sql_check_2)

In [44]:
daily_activity_4 <- daily_activity %>%
  select(Id, ActivityDate, TotalSteps)

In [45]:
head(daily_activity_4)

In [46]:
sql_check_3 <- sqldf('SELECT * FROM daily_activity_4 INTERSECT SELECT * FROM daily_steps')

In [47]:
head(sql_check_3)

In [48]:
nrow(daily_activity_4)

In [49]:
nrow(sql_check_3)

The outputs of the head() function of the temporary data frames created, match the outputs of the head() function for the original data frames. 

The outputs of the head() function of the SQL data frames match the outputs of the head() function for the temporary data frames. The number of observations for each SQL data frame are equal to 940. 

Conclusively, the data for the daily_calories, daily_intensities, and daily_steps data frames are contained in daily_activity. These three data frames will be removed from the analysis for simplicity.

## STEP 4: ANALYZE

### Summarize the data:

The n_distinct() and nrow() functions are used to determine the number of unique values and the number of rows in a data frame, respectively.

In [50]:
n_distinct(daily_activity$Id)
n_distinct(minute_METs$Id)
n_distinct(heart_rate_sec$Id)
n_distinct(sleep_day$Id)
n_distinct(weight_log$Id)

In [51]:
nrow(daily_activity)
nrow(minute_METs)
nrow(heart_rate_sec)
nrow(sleep_day)
nrow(weight_log)

The heart rate and weight log data frames contain a very low number of participants based on the n_distinct() outputs. Thus, reliable recommendations and conclusions cannot be made solely from these data frames.

The summary() function is used to pull key statistics about the data frames.


In [52]:
# daily_activity

daily_activity %>%
  select(TotalSteps, TotalDistance, SedentaryMinutes, LightlyActiveMinutes, 
         FairlyActiveMinutes, VeryActiveMinutes, Calories) %>%
  summary()

This summary portends that the average user takes 7638 steps per day, not up to the recommended 10,000 steps for health by the Centre for Disease Control (CDC). 

On average, users get 21.16 minutes of very active or vigorous activity per day, equating to 148.12 minutes per week. The CDC recommends 75 minutes of vigorous activity per week, so the typical Fitbit user is doing well in this area and achieving additional health benefits. 

In contrast, participants are averaging 991.2 minutes, or 16.52 hours of sedentary time a day! This is a significant amount of time and can lead to other health issues because the body functions best upright. Scientists have determined that 40 minutes of moderate to vigorous activity a day will balance out the effects of sitting up to 10 hours a day. 

In addition, this summary shows the average user is burning 2304 calories per day. Researches show the average person in the population burns 1800 calories a day, but burning 3500 is needed to lose a pound of weight. 

The Fitbit users in this case are burning more than the norm, therefore, they are on track to lose a few pounds per week if they so choose.

In [53]:
# heart_rate_sec

heart_rate_sec %>%
    select(Value) %>%
    summary()

Despite the low number of users in the heart rate data frame, the average heartrate of 77 beats per minute (bpm) fits within the “normal” range. 

The range between 50 to 80 bpm for men, and 53 to 82 bpm for women are considered to be Normal.

However, research suggests that it is more important for individuals to determine what is a normal and healthy heartrate for them, and not compare to population levels. This is because resting heart rating between different people can vary by as much as 70 bpm. Changes in resting heartrate over days can be a sign of infection, menstrual cycle effects, body chemical composition, or other acute triggers. 

Thus, making heartrate a vital health characteristic to monitor.


In [54]:
# minute_METs

minute_METs %>%
  select(METs) %>%
  summary()

The summary of minute METs shows the average user has a MET of 14.47. 

A MET is the division between your working metabolic rate and your resting metabolic rate. One MET is the energy your body consumes when at rest. This means an activity with a MET of four, would require a person to exert four times the energy they do when they are sitting. 

Therefore, a user averaging 14.47 MET throughout the day is considerably high, which leads to the assumption that the Fitbit is not calculating this data point correctly. Due to this, the minute MET data frame will no longer be used in this analysis.

In [55]:
# sleep_day

sleep_day %>%
  select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()

The summary of the sleep data frame displays the average user sleeps once per day for 419.5 minutes, or roughly 7 hours. This falls within the CDC’s recommendations for adults in order to get the proper amount of rest. 

The average participant is spending 458.6 minutes in bed, or 7.64 hours. 

This means the typical user is spending 38.6 minutes awake in bed. 

According to Health Central, people should not spend more than 1 hour in bed awake. This is to prevent a mental link being formed between being awake and being in bed, which can lead to insomnia.

In [56]:
# weight_log

weight_log %>% 
  select(WeightPounds, BMI) %>%
  summary()


While this data frame has a low number of participants, the average BMI is 25.19. This is considered an overweight BMI. However, BMI can be a screening tool and does not diagnose the body fatness or health of an individual.

## STEP 5: SHARE

The ggplot() function of R Studio was used to create data visualizations that depict patterns and trends found in the data frames, which can give us further insights for this project.

In [57]:
ggplot(data=daily_activity, aes(x=VeryActiveMinutes, y=Calories)) +
geom_point() +
stat_smooth(method=lm) +
labs(title="The Relationship between Very Active Minutes and Total Daily Calories Burned")

The 1st plot above shows a positive relationship between very active minutes and total daily calories burned. This means that the more vigorous physical activity the participant did, the more calories they burned.

In [58]:
ggplot(data=daily_activity, aes(x=TotalSteps, y=Calories))+
geom_point()+
stat_smooth(method=lm)+
labs(title="The Relationship between Total Daily Steps and Total Daily Calories Burned")

The 2nd plot above depicts a positive relationship between total daily steps taken and total calories burned. This means that, the more steps the Fitbit users took, the more calories they burned.

In [60]:
ggplot(data=daily_activity, aes(x=TotalDistance, y=Calories))+
geom_smooth()+
labs(title="The Relationship between Total Distance and Total Daily Calories Burned")

The 3rd plot displays a positive trend between total distance and total daily calories burned. As the participants moved a greater distance, the number of calories they burned also increased.

In [61]:
ggplot(data=sleep_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed))+
geom_point()+
stat_smooth(method=lm)+
labs(title="The Relatinship between Total Minutes Asleep and Total Time in Bed")

The 4th plot shows a positive relationship between total minutes asleep and total time in bed. For the most part, the time participants spent asleep and the time they spent in bed was very similar.

## STEP 6: ACT

Bellabeat has been successful since it was founded by empowering women through providing data on their activity, sleep, stress, hydration levels, and reproductive health. Based on analyzing how Fitbit consumers use and respond to features, recommendations can be made to promote further growth for Bellabeat.

The Bellabeat app should be completely enhanced and revamped. Rather than simply providing data on user’s health, the app should further encourage users to meet fitness goals and become a social media platform.

The CDC recommends working out with a friend in order to feel more motivated, be more adventurous in trying workouts, and to become consistent. 

The CDC even recommends the use of a social media workout app to connect with friends and reach your goals. The Bellabeat app could become that social media workout app that women turn to, by creating an online community of supportive women ready to prioritize their health.

### Recommendations for Bellabeat app:

1. Enable social networking so users can post their favorite workouts, wellness tips, healthy meals, etc.

2. Enable users to add friends and view each other’s activity.

3. Create weekly fitness and wellness challenges to encourage use.

4. Recommend users to get 10,000 steps a day and enable alert notifications to encourage users to meet goal.

5. Recommend users to get at least 7 hours of sleep a night and enable alert notifications to encourage users to meet this.

6. Recommend users get 75 minutes of vigorous activity a week and enable alert notifications to entourage users to meet this.

7. Have health and fitness companies pay for advertising.

8. Encourage users to enter in weight and height to track BMI.

9. If users are interested in losing weight, enable notifications to keep users on track to burn necessary calories to meet goal.

10. Enable alert notifications if user’s resting heart rate varies significantly from their normal.

11. Enable notifications to encourage activity if a user has spent an hour in bed awake.

12. Enable notifications to encourage activity if a user has been sedentary for an extended period of time.

### Recommendations for Bellabeat membership:

1. Partner with health & fitness companies and offer discounts for members.

2. Offer reduced subscription fee when a member refers a friend.

3. Offer 30-day free trial subscription

4. Offer discounts for Bellabeat smart device products with membership.

### Recommendations for Bellabeat products:

1. Offer a bundle deal for the Spring and Leaf together.

2. Heavily market Spring as Fitbit does not track hydration levels.

### Works Cited:

“The Dangers of Sitting: Why Sitting Is the New Smoking.” The Dangers of Sitting: Why Sitting Is the New Smoking — Better Health Channel, 22 Aug. 2020, www.betterhealth.vic.gov.au/health/healthyliving/the-dangers-of-sitting

"3 Reasons to Work out with a Friend.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 23 Apr. 2021, www.cdc.gov/diabetes/library/spotlights/workout-buddy.html

“About Adult Bmi.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 17 Sept. 2020, www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/index.html

Gornall, Lucy. “How to Lose Weight: How Many Calories Should i Eat to Lose Weight?” GoodtoKnow, 12 Aug. 2020, www.goodto.com/wellbeing/diets-exercise/what-is-calorie-how-many-lose-weigt-425557

“CDC — How Much Sleep Do I Need? — Sleep and Sleep Disorders.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 2 Mar. 2017, www.cdc.gov/sleep/about_sleep/how_much_sleep.html

Reed, Martin. “Spend Less Time In Bed If You Want More Sleep.” Healthcentral.com, 7 May 2017, www.healthcentral.com/article/spend-less-time-in-bed-if-you-want-more-sleep

Roland, James. “What Are Mets, and How Are They Calculated?” Healthline, Healthline Media, 21 Oct. 2019, www.healthline.com/health/what-are-mets#calculation

Grey, Heather. “Heart Rates Can Vary by 70 Bpm: What That Means for Your Health.” Healthline, Healthline Media, 9 Feb. 2020, www.healthline.com/health-news/what-your-heart-rate-says-about-your-health

“How Much Physical Activity Do Adults Need?” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 7 Oct. 2020, www.cdc.gov/physicalactivity/basics/adults/index.htm

Nield, David. “Scientists Figured out How Much Exercise You Need to ‘Offset’ a Day of Sitting.” ScienceAlert, 26 Nov. 2020, www.sciencealert.com/getting-a-sweat-on-for-30-40-minutes-could-offset-a-day-of-sitting-down


