# Bellabeat Case Study

**Business Task Summary:**
The objective of this case study is to analyze smart device usage data to understand how Bellabeat’s customers are interacting with their wellness devices. This data will uncover trends and behaviors that will inform Bellabeat's marketing strategy, helping them increase engagement, product adoption, and ultimately growth.


## 1.Ask


**Key Business Questions:**

What are some trends in smart device usage?
How can these trends apply to Bellabeat customers?
How can these trends help influence Bellabeat’s marketing strategy?



# 2. Prepare

**Data Sources:**

**Fitbit Fitness Tracker Data:** 
Downloaded A public dataset from Kaggle containing personal fitness data from 30 Fitbit users, including daily activity, steps, heart rate, and sleep monitoring.

The dataset was used to explore **daily activity**, **steps**, and **calories **to uncover user habits that can inform marketing strategy.

**Data Storage and Organization:**
* R libraries such as tidyverse, tidyr,stringr and ggplot2 were loaded to clean and transform csv file data.
* The data was stored in a CSV format and loaded into R for cleaning and analysis.
* Key columns of interest included: **activitydate, totalsteps, calories, and activity_level.**

In [None]:
library(dplyr)
library(tidyverse)
library(ggplot2)
library(tidyr)
library("stringr") 

In [None]:
#read dataset csv file
daily_activity <- read.csv("/kaggle/input/fitbit/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")

#view dataset
head(daily_activity)
str(daily_activity)

# 3. Process

**Data Cleaning and Transformation:**


* **Renaming columns:** Converted column names to lowercase for consistency.
* **Date formatting:** The activitydate column was corrected by replacing hyphens with slashes and converting to the Date format.
* **Handling missing values:** Checked for any missing or inconsistent values, with no major issues found.
* **Creating new variables:** Derived new columns such as day_type (weekend vs weekday) and activity_level (sedentary, lightly active, very active) based on daily steps.

**Data Transformation:**
* Reshaped the data using the tidyr package to make activity minutes easier to analyze.
* Aggregated data by activity level for summarizing calories and steps.

In [None]:
#convert all column names to lowercase 
daily_activity <-  rename_with(daily_activity,tolower)

#unify date format style by replacing - with / in all the dates values
daily_activity$activitydate <- str_replace_all(daily_activity$activitydate,"-","/")

#convert date columns from chr to Date format
daily_activity$activitydate <- as.Date.character(daily_activity$activitydate, format = "%m/%d/%Y")

#add new column "day_type"
daily_activity <-daily_activity %>% mutate(day_type = ifelse(
  weekdays(activitydate) %in% c("Saturday", "Sunday"),
  "Weekend", "Weekday"))

#add new column "activity_level"
daily_activity <- daily_activity %>% mutate(activity_level = case_when(
    totalsteps < 5000 ~ "Sedentary",  
    totalsteps >= 5000 & totalsteps < 10000 ~ "Lightly Active",
    totalsteps >= 10000 ~ "Very Active"))




# 4. Analyze

###   **Key Findings from the Viaulization and Analysis:**


  



### Scatter plot : shows positive correlation between total steps and calories burned by activity level.

In [None]:
#graph for total steps vs calories by activity level
ggplot(data = daily_activity, aes(y=totalsteps, x=calories, colour = activity_level))+
    geom_point()+
    geom_smooth(method=lm, col="red")+
    labs(title = "Breakdown of Totalsteps vs Calories",y = "Total Steps", x = "Calories", colour = "Activity Level")

### Line graph : represents that more steps were taken on weekends.

In [None]:
#graph of total steps distribution by day type
ggplot(data = daily_activity, aes(y=totalsteps, x=activitydate, colour = day_type))+
  geom_line()+
  labs(title = "Breakown of Totalsteps by weekdays and weekends",y = "Total Steps", x = "Activity Date")


### Line graph : represents that more Calories were burnt on weekends.

In [None]:
#graph of calories distribution by day type
ggplot(data = daily_activity, aes(y=calories, x=activitydate, colour = day_type))+
  geom_line()+
  labs(title = "Breakown of Calories by weekdays and weekends",y = "Calories", x = "Activity Date")+
scale_fill_brewer(palette = "Set2")


###  Bar graph : shows that very active users burnt more calories than sendentory users.

In [None]:
#user graph : Breakdown of Calories by activity level
ggplot(data = daily_activity, aes(y=calories, x=activity_level, fill = activity_level))+
       geom_bar(stat = "identity") +
       labs(title = "User graph : Breakdown of Calories by Activity Level",y = "Calories", x = "Activity Level")+
       facet_wrap(~id) 

In [None]:
#user graph : Breakdown of Total Steps by activity level
ggplot(data = daily_activity, aes(y=totalsteps, x=activity_level, fill = activity_level))+
       geom_bar(stat = "identity") +
       labs(title = "User graph : Breakdown of Total Steps by Activity Level",y = "Total Steps", x = "Activity Level")+
       facet_wrap(~id) 

**Trends in Smart Device Usage:**
Users were more active on weekends than weekend, with step counts and calories being significantly higher on weekdays.
A positive relationship was found between total steps and calories burned, indicating that more steps correlate with higher calorie expenditure.

**Activity Level Breakdown:**
Users in the Very Active group burned significantly more calories than those in the Sedentary group.
Steps per day were used to categorize users as Sedentary, Lightly Active, or Very Active.

**Key Relationships:**
Total steps are highly correlated with calories burned, and this relationship varies by activity level. Higher activity levels lead to more calories burned.



# 5. Share

**Key Visualizations:**

**Scatter Plot (Total Steps vs. Calories):**
Highlights the positive relationship between steps and calories burned, reinforcing the importance of physical activity.

**Line Plot (Steps and calories by Day type):**
Shows trends in steps and calories, with peaks on weekends.

**Bar Plot (Steps and Calories by Activity Level):**
Demonstrates how activity levels influence both steps and calories, with "Very Active" users burning more calories.

**Target Audience:**

Bellabeat Marketing Team: The audience for these insights includes Bellabeat's marketing executives and product managers who can use the findings to shape customer engagement and marketing strategies.


# 6. Act
**Recommendations:**

**Target Weekend Activity:**
Bellabeat should capitalize on this trend by creating weekend-specific wellness challenges or campaigns, such as step goals, calorie burn targets, or hydration reminders using their smart water bottle, Spring.

For example, introduce “Weekend Warrior” badges or rewards for users who exceed their typical weekday activity levels.

**Personalized Messaging:**
Use personalized messaging to motivate users based on their activity level:

Sedentary users: Send reminders to increase activity through achievable goals, such as walking 2,000 steps more daily. Integrate mindfulness content to encourage healthier routines.

Lightly Active users: Encourage progression with moderate-intensity fitness challenges and highlight the benefits of reaching higher activity levels.

Very Active users: Provide advanced fitness insights (e.g., trends in activity and calorie burn), share recovery tips, or recommend subscription-based features to keep them engaged.

**Promote Consistency:**

Use marketing campaigns to encourage daily consistent activity. Emphasize how small, consistent efforts can lead to significant health benefits over time.

**Segmented Marketing:**
Segment users by activity level and target them with customized content:

* **For Lightly Active users:** Encourage them to take on moderate activity challenges.For
* **Sedentary users:** Offer beginner-level fitness content and guides.
