#  **INTRODUCTION**

Welcome to the Bellabeat data analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will imagine you are working for Bellabeat, a high-tech manufacturer of health-focused products for women, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: **ask, prepare, process, analyze, share, and act**.

**SCENARIO**

You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of *health-focused products for women*. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and **analyze smart device data to gain insight into how consumers are using their smart devices**. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team **along with your high-level recommendations for Bellabeat’s marketing strategy**.

**CHARACTERS**

○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer

○ Sando Mur: Mathematician and Bellabeat’s cofounder: key member of the Bellabeat executive team

○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and
reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have
been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst,
can help Bellabeat achieve them

**PRODUCTS**

○ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress,
menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and
make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

○ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects
to the Bellabeat app to track activity, sleep, and stress.

○ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user
activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your
daily wellness.

○ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are
appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your
hydration levels.

○ Bellabeat membership: Bellabeat also offers a subscription-based membership program for users.
Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and
beauty, and mindfulness based on their lifestyle and goals.

**ABOUT THE COMPANY**

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products.
Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around
the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly
positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has
asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain
insight into h*ow people are already using their smart devices*. Then, using this information, she would like high-level
recommendations for how these trends can inform Bellabeat marketing strategy.

# **ASK**

**Guiding questions**

● What is the problem you are trying to solve? I am trying to find trends on how existing customers use their smart devices.

● How can your insights drive business decisions? The insights will help to understand customer need and in turn help for our business growth.

**Key tasks**

Business task: Analyze smart device data to gain insight into how consumers are using their smart devices and deliver high level recommendations for marketing startegy.

Stakeholders: Urška Sršen, Sando Mur and Bellabeat marketing analytics team.

# **PREPARE**

**Key tasks**

1. Download data and store it appropriately.
2. Identify how it’s organized.
3. Sort and filter the data.
4. Determine the credibility of the data.

In [1]:
library(tidyverse)
library(tidymodels)

**Loading the data**

In [2]:
activity <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
steps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
sleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

In [3]:
head(activity)

In [4]:
head(sleep)

In [5]:
head(weight)

**Guiding questions**

● Where is your data stored? The data is stored on kaggle.

● How is the data organized? Is it in long or wide format? The data is organized by different health parameters like heartrate, steps etc. It is in long format.

● Are there issues with bias or credibility in this data? Does your data ROCCC? the data is credible and is reliable, original, comprehensive, current and cited.

● How are you addressing licensing, privacy, security, and accessibility? The data is licesensed and personal information is not shraed. 

● How did you verify the data’s integrity? the data is consistent and has reqd information.

● How does it help you answer your question? By analyzing data on customers, I will be able to gain insight into how consumers are using their smart devices.

● Are there any problems with the data? There is not enough data and some data types needs correction.

# **PROCESS**

**Key tasks**

1. Check the data for errors.
2. Choose your tools.
3. Transform the data so you can work with it effectively.
4. Document the cleaning process.

In [7]:
#correcting data types
activity$ActivityDate <- lubridate :: mdy(activity$ActivityDate)
sleep$SleepDay <- lubridate :: mdy_hms(sleep$SleepDay)
weight$Date <- lubridate :: mdy_hms(weight$Date)

**Guiding questions**

● What tools are you choosing and why? Using R to get familiar with the language.

● Have you ensured your data’s integrity? Yes

● What steps have you taken to ensure that your data is clean?  Checking data consistency and errors.

● How can you verify that your data is clean and ready to analyze? By ensuring the data is consistent and has the required data.

● Have you documented your cleaning process so you can review and share those results? Yes

# **ANALYZE**

**Key tasks**
1. Aggregate your data so it’s useful and accessible.
2. Organize and format your data.
3. Perform calculations.
4. Identify trends and relationships.

In [8]:
#checking no of distinct entries in all 3 datasets
n_distinct(activity$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

In [9]:
#checking total no of entries in these datasets
nrow(activity)
nrow(sleep)
nrow(weight)

In [10]:
#combining the data(sleep and activity) into one data frame
combined_data <- merge(sleep,activity, by="Id") 
n_distinct(combined_data$Id)

In [11]:
#insights on data summary
summary(combined_data)

**VISUALIZING DATA**

**1. Distibution of usertypes - very active, fairly active, lightly active and sedentary active**

In [12]:
#creating a new dataset grouped by user type
data_by_usertype <- combined_data %>%
summarise(
user_type = factor(case_when(
    SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), TotalSteps, Calories,TotalMinutesAsleep,TotalTimeInBed,  .group=Id) %>%
drop_na()

In [13]:
#no of users of each type
ggplot(data_by_usertype, aes(x = user_type, fill = user_type)) + geom_bar() 

**Observation**: the dataset has more lightly active and sedentary users as compared to fairly and very active users.

**2.Total steps for each user**

In [14]:
ggplot(data_by_usertype, aes(x = user_type, y = TotalSteps, fill = user_type)) + geom_boxplot()

**Observation:** fairly active and very active users travel more steps and sedentary users the least which is what we expected.

**3. Total steps vs calories burned**

In [15]:
ggplot(activity, aes(x = TotalSteps, y = Calories)) + geom_point() + geom_smooth()

**Observation:** More steps implies more calory burn which is a obvious trend.

In [16]:
#calories burned by each user type
ggplot(data_by_usertype, aes(x = user_type, y = Calories, fill = user_type)) + geom_boxplot()

**4. Total sleep vs total steps**

In [17]:
ggplot(combined_data, aes(x = TotalMinutesAsleep, y = TotalSteps)) + geom_point() + geom_smooth()

**Observation**: We expected a positive relation between the hours of sleep taken and total no of steps but the results show that they both are not related.

**5. Total sleep vs total time in bed**

In [18]:
ggplot(sleep, aes(x = TotalMinutesAsleep, y = TotalTimeInBed)) + geom_point() + geom_smooth()

**Observation**: The results are as expected. More time in bed implies more sleep.

**6. Total sleep for each user type**

In [19]:
ggplot(data_by_usertype, aes(x = user_type,y = TotalMinutesAsleep, fill = user_type)) + geom_boxplot()

**Observation**: Very active users take normal sleep of 7-8 hours while other users oversleep or undersleep.

**CONCLUSION**

1. There are more sedentary and lightly active users in bellabeat user data.
2. Fairly active and very active users travel more steps and hence burn more calories.
3. Total steps and total sleep has no direct correlation however those how get normal sleep(7-8 hours) tends to travel more steps(very active users)
4. More time in bed implies more sleep.