## **1. About the company**

Bellabeat was founded in 2013, by Urška Sršen (*Cofounder & Chief Creative Officer*) and Sando Mur (*Mathematician and Bellabeat’s cofounder*). The company manufactures health-focused smart products for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

## **2. Guiding Questions**

1. What are some trends in smart device usage?
2. How could these trends apply to Bellabeat customers?
3. How could these trends help influence Bellabeat marketing strategy?

## **3.Business Task**

Focus on a Bellabeat product and analyze smart device usage data in order to gain
insight into how people are already using their smart devices. Then, using this information, give
recommendations for how these trends can inform Bellabeat marketing strategy.

## **4. Key Stakeholders**

Urška Sršen- Bellabeat’s co-founder and Chief Creative Officer,<br/>Sando Mur- Mathematician and Bellabeat’s cofounder and Bellabeat’s marketing analytics team.

## **5.Loading Packages**

In [1]:
library(here)
library(skimr)
library(janitor)
library(lubridate)
library(tidyverse)
library(ggplot2)
library(dplyr)

## **6. Importing Dataset**

For this case study, I have used FitBit Fitness Tracker Data, an open source dataset available in Kaggle- [Dataset](https://www.kaggle.com/datasets/arashnic/fitbit).<br/>
*This Kaggle data set contains personal fitness tracker from thirty fitbit users*.Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.


In [2]:
Daily_Activity <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
Daily_Calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
Daily_Intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
Daily_Steps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
Hourly_Calories <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
Hourly_Steps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
Hourly_Intensities <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
Sleep_Day <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

## **7. Checking data types for errors**

In [3]:
str(Daily_Activity)
str(Daily_Calories)
str(Daily_Intensities)
str(Daily_Steps)
str(Hourly_Calories)
str(Hourly_Steps)
str(Hourly_Intensities)
str(Sleep_Day)

Here we can see that Daily_Activity, Daily_Calories, Daily_Steps has "date" format, while, Hourly_Calories, Hourly_Steps, Sleep_Day has "date-time" format. We will now convert the date-time format into separate date and time columns.

## **8. Formatting and Summarizing data**

#### 8.1. Summarizing

In [4]:
# activity
 Daily_Activity%>%  
  select(TotalSteps,
         TotalDistance,
         VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes,
         SedentaryMinutes, Calories) %>%
  summary()

# sleep
Sleep_Day %>%
  select() %>%
  summary()


#### *How many members have paricipated in different activities?*

In [5]:
n_distinct(Daily_Activity$Id)
n_distinct(Daily_Calories$Id)
n_distinct(Daily_Steps$Id)
n_distinct(Hourly_Calories$Id)
n_distinct(Hourly_Intensities$Id)
n_distinct(Sleep_Day$Id)

Here we can see that the Sleep_Day dataset has 24 participating members instead of 33.

#### *What information do we have about 'the number of days' of activity by a certain Id?*

In [6]:
table(Daily_Activity$Id)

Here, we can see that out of 33 users,<br/> for Id no. 4057192912, we have data of 4 days only,<br/> for Id no. 2347167796, we have data of 18 days, <br/> for Id no. 8256242879, we have data of 19 days, <br/> for Id no. 3372868164, we have data for 20 days, <br/> while most others have provided data of 30 and 31 days i.e. one month.<br/>  

#### 8.2. Fixing Formats

In [7]:
# daily activity
Daily_Activity$ActivityDate=as.POSIXct(Daily_Activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
Daily_Activity$date <- format(Daily_Activity$ActivityDate, format = "%m/%d/%y")
# daily calories
Daily_Calories <- rename(Daily_Calories, date=ActivityDay)
# daily intensities
Daily_Intensities <- rename(Daily_Intensities, date=ActivityDay)
# daily steps
Daily_Steps <- rename(Daily_Steps, date=ActivityDay)
# hourly calories
Hourly_Calories$ActivityHour=as.POSIXct(Hourly_Calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Hourly_Calories$time <- format(Hourly_Calories$ActivityHour, format = "%H:%M:%S")
Hourly_Calories$date <- format(Hourly_Calories$ActivityHour, format = "%m/%d/%y")
# hourly steps
Hourly_Steps$ActivityHour=as.POSIXct(Hourly_Steps$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Hourly_Steps$time <- format(Hourly_Steps$ActivityHour, format = "%H:%M:%S")
Hourly_Steps$date <- format(Hourly_Steps$ActivityHour, format = "%m/%d/%y")
# hourly intensities
Hourly_Intensities$ActivityHour=as.POSIXct(Hourly_Intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Hourly_Intensities$time <- format(Hourly_Intensities$ActivityHour, format = "%H:%M:%S")
Hourly_Intensities$date <- format(Hourly_Intensities$ActivityHour, format = "%m/%d/%y")
# sleep
Sleep_Day$SleepDay=as.POSIXct(Sleep_Day$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Sleep_Day$date <- format(Sleep_Day$SleepDay, format = "%m/%d/%y")







In [8]:
head(Daily_Activity)
head(Daily_Calories)
head(Daily_Steps)
head(Hourly_Calories)
head(Hourly_Steps)
head(Hourly_Intensities)
head(Sleep_Day)

Now that the data types have been correctly formatted, we can merge the different data available.

## **9. Merging Data**

While checking data types for errors, we can see that Daily_Activity is a merged data set of Daily_Calories and Daily_Intensities. Let us merge Daily_Steps with Daily_Activity too by "Id" and "date".<br/>
Next, we will merge Hourly_Calories, Hourly_Intensities and Hourly_Steps by "Id","date","time" and "ActivityHour".

In [9]:
merged_daily_data <- merge(Sleep_Day, Daily_Activity, by =c('Id','date'))
head(merged_daily_data)


Let us now merge Hourly_Calories, Hourly_Intensities and Hourly_Steps

In [10]:
merged_data <- merge(Hourly_Steps, Hourly_Calories, by = c('Id','date','ActivityHour','time'))
head(merged_data)


In [11]:
merged_hourly_data <- merge(merged_data, Hourly_Intensities, by = c('Id','date','ActivityHour','time'))
head(merged_hourly_data)


## **10. Visualization**

Before visualising, we’ll create a common theme for our plots.



In [12]:
custom_theme <- function() {
  theme(
    panel.border = element_rect(colour = "black", 
                                fill = NA, 
                                linetype = 1),
    panel.background = element_rect(fill = "white", 
                                    color = 'azure2'),
    panel.grid.minor.y = element_blank(),
    axis.text = element_text(colour = "black", 
                             family = "Tahoma",
                             size= 12),
    axis.title = element_text(colour ="black", 
                             family = "Tahoma",
                             size= 16),
    axis.ticks = element_line(colour = "black"),
    plot.title = element_text(size=23, 
                              hjust = 0.5, 
                              family = "Tahoma"),
    plot.subtitle=element_text(size=16, 
                              hjust = 0.5),
    plot.caption = element_text(colour = "brown4", 
                              face = "italic", 
                              family = "Tahoma")
  )
}


#### 10.1.    CALORIES vs TOTAL STEPS

In [13]:
Daily_Activity %>%
  group_by(TotalSteps, Calories) %>% 
  ggplot(aes(x = TotalSteps, y = Calories, color = Calories)) +
  geom_point() +
  geom_smooth() + 
  custom_theme() +
  theme(legend.position = c(.9, .2),
        legend.spacing.y = unit(1, "mm"), 
        panel.border = element_rect(colour = "black", fill=NA),
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "brown4")) +
  labs(title = 'Calories vs Total Steps',
       y = 'Calories',
       x = 'Total Steps',
      )


The above graph clearly depicts a positive correlation between total steps and calories burnt.

#### 10.2. Sedentary minutes vs Total Minutes Asleep

In [14]:
merged_daily_data %>%
  group_by(SedentaryMinutes, TotalMinutesAsleep) %>% 
  ggplot(aes(x = SedentaryMinutes, y = TotalMinutesAsleep, color = TotalMinutesAsleep)) +
  geom_point() +  
  geom_smooth()+
  custom_theme() +
  theme(legend.position = c(.87, .85),
        legend.spacing.y = unit(1, "mm"), 
        panel.border = element_rect(colour = "black", fill=NA),
        legend.background = element_blank(),
        legend.box.background = element_rect(colour = "brown4")) +
  labs(title = 'Sedentary minutes vs Total Minutes Asleep',
       y = 'Total Minutes Asleep',
       x = 'Sedentary Minutes',
      )

This graph clearly depicts a negative relationship between sedentary minutes and the sleep duration. It is said that the more active we are during the day, the better we sleep. Still, the data we have is not sufficient to state if 'less activity' is causing low sleep duration.

#### **10.3.** Duration(in minutes) vs Activity Type

In [15]:
activity_df <- Daily_Activity %>% 
  select(VeryActiveMinutes, 
         FairlyActiveMinutes, 
        LightlyActiveMinutes,
         SedentaryMinutes) %>% 
summarise(across(everything(), list(sum)))%>% 
gather(activity_type, duration) %>% 
mutate(activity_type = factor(activity_type, labels = c('Moderate Activity',
                                                        'Light Activity','Sedentary','Heavy Activity'))) 

ggplot(data = activity_df) +
geom_col(mapping = aes(x=activity_type, y= duration, fill= duration))+
custom_theme()+
labs(title = 'Duration (in minutes) vs Activity Type',
       x = 'activity type',
       y = 'duration (in minutes)'
      )




Here we can clearly see that users are sedentary majority of the day.

## **11. Conclusion**

**1. Regarding Calories Burnt vs Total Steps Taken**: Bellabeat could keep track of users daily calorie intake and recommend number of steps accordingly. <br/>
<br/>
**2. Regarding Sedentary Minutes vs Total Time Asleep:** Since there is a visible negative relationship between the two, Bellabeat could encourage the users to reduce sedentary minutes by sending notifications creatively at different times, throughout the day.<br/>
<br/>
**3. Regarding Activity Type and their duration:** We found that most of the time users spend is by being inactive, which raises concern for heath. Although, from the data provided,it is not clear if the women are working at a desk job that's why they have more sedentary minutes, Bellabeat could recommend a physical activity at calculated intervals via notification. 