## The Business Problem

[Bellabeat](https://bellabeat.com) is looking to analyze existing data in smart device usage to understand how potential users are tracking diferent health related metrics in their daily lives. These trends could be used to identify features that current Bellabeat customers may not know our devices have. These trends could also influence marketing strategy in identifying other Bellabeat products to recommend to existing customers, and which features to highlight in future marketing campaigns.

In this analysis we are looking to identify the following trends:

  * frequency of fitness wearable use
  * frequency of tracking fitness activity with wearable technology
  * impact of wearable fitness trackers on purchasing decisons
  * feature used least consistently
  * when users are most/least active (to schedule marketing campaigns)
  * attitudes of users towards their wearable technology

## The Data
  
For this analysis the marketing team started with the [FitBit Fitness Tracker Data](https://www.kaggle.com/datasets/arashnic/fitbit) (CC0: Public Domain, dataset made available through [Mobius](https://www.kaggle.com/arashnic)). This dataset contains the personal fitness information from thirty fitbit users. These users consented to the submission of their data including output for physcial activity, heart rate, and sleep monitoriing. It also includes information about daily physcial activity including steps taken, distance traveled, and heart rate that can be used to explore consumer usage habits. This data set has limitations in that it does not explore all the features available with Bellabeat devices and it does not explore any device types other than fitbit. It also doesn't explore general attitudes towards fitness wearables that could be useful for marketing purposes. For this reason, extra data was needed. 

Two additional sets of data were sourced to be evaluated as potential supplements to the existing data. 

The [FitLife: Heath & Fitness Tracking](https://www.kaggle.com/datasets/jijagallery/fitlife-health-and-fitness-tracking-dataset) (CCO: Public Domain, dataset made available through [Jija Taheri](https://www.kaggle.com/jijagallery)) dataset was considered and evaluated as a possible data source, but as it is synthetic data, it wasn't captured from actual users. Rather, it is a dataset created specifically for the purposes of data exploration and community learning. This dataset didn't contain any data trends that could be applicable to this business question we are attempting to answer in this case study, but was a good exercise in cleaning and analyzing data using various tools, including BigQuery, spreadsheets, and Tableau Public. My changelog for this dataset can be found [here](https://672c99c272ad43a8b8b96a3eec6c855f.app.posit.cloud/file_show?path=%2Fcloud%2Fproject%2Fbellabeat_case_study_fitlife_changelog.nb.html)

The [Fitness Consumer Survey Data](https://www.kaggle.com/datasets/harshitaaswani/fitness-consumer-survey-data)  dataset (CCO: Public Domain, dataset made available through [Harshita Aswani](https://www.kaggle.com/harshitaaswani)) was also considered and evaluated as a possible data source. This dataset contains survey responses from a variety of respondents about their attitudes and experiences using a fitness wearable. The data was collected from an online survey and all respondents consented to their anonymous responses being shared. The data is primary in nature and appears credible and relatively current. This dataset will be useful in providing context to the Bellabeat executives on broader views of fitness wearables. My changelog for this dataset can be found [here](#fitness_consumer_survey) 

## The Analysis

To discuss trends that can apply to Bellabeat customers, we must first understand who those customers are. Using the Fitness Consumer Survey Data, we can see that our customers targeted by this study span ages ranging from under 18 to 64 years old, with most respondents falling in the 18-24 and 25-34 ranges. To better represent our target market segment, we then filter our data to exclude those that self-identified as "Male." This shows that the 18-24 and 25-34 categories still hold as the top two, with the 45-54 category coming in with the third most respondents.

Our customers span a wide range of occupations and education levels. These images will be useful to refer back to when tailoring campaigns to specific slices of the market, though they provide more general categorizations rather than specific occupational data.

When examining how engaged users feel with their fitness wearable, we see interesting trends when we filter our graph by age groups. The 18-24 year olds span from negative to very positive responses regarding engagement, while other age groups tend to answer in neighboring pairs like neutral and somewhat engaged, or somewhat and very engaged.

On the next slide we start to examine the different ways users engage with their fitness wearables. Most customers reported a positive impact on their fitness routines, an acceleration in the achievement of their fitness goals, and an increase in both enjoying excercising and mainting motivation to exercise. 

Continuing our exploration of how users engage with their wearables, we see that users give generally favorable responses in seeing an improvement in sleep patterns, in feeling connected to the fitness community through their wearables, and in impact on both their overall health and well-being.

Now that we know a little bit more about our users and their opinions, we can take a quick peek at some anonymized fitness tracker data. These 4 charts look overwhelming at the start because it shows data for every user, but if you take a scroll through the IDs in the legend, you will see the data for each category for a single user at a time. Please note that not every user tracked sleep data, so some of those charts will appear blank.

So, now we know a bit about the users, and a bit about how they are using products like ours. Let's take a look at their decision making trends. As you can see users reported that using a wearable fitness device influenced them to make other health & fitness changes, such as dietary changes, increasing their exercise activity, joining a gym or fitness class, and purchasing other fitness related products. This is helpful because we can use the information to help us decide which features to highlight in ad campaigns and help target possible repeat customers for new product launches.

Our last slide just repeats the fourth graph from the previous slide because it is important. The majority of respondents reported that using a fitness wearable influenced their decision to purchase other fitness-related products. This shows that there is a potential repeat purchase audience available, which could lead to possible future products and potential partnerships for limited edition jewelry accessories, etc. 

## The Vizualizations

Vizualizations to accompany the analysis were created using Tableau Public and can be [found here.](https://public.tableau.com/views/BellabeatCaseStudyFinal/Story1?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link)

## Conclusion and Key Takeaways

Since we have evidence that the majority of our customers fall into the 18-34 age range, the existing menstrual cycle and fertility tracking features will be important to highlight in targeted ad campaigns as well as the wellness features. For the 45-54 segment, which is also a popular demographic for our products, we could hightlight existing features such as the Coach library. 

For the future we could also consider expanding to new software updates that utilize existing hardware features to help predict and manage perimenopause symptoms to further tap into that population demographic and help women continue their fitness journey across their whole life, instead of focusing on only younger adults.

Recommended next steps for analysis: gather anonymized usage data from different times of the year to compare with the snapshot we already have to help narrow down which features are used more and how the popularity of said features changes from season to season. This could be useful information for the marketing team.

<a id="fitlife"></a>

# FitLife Dataset Changelog

This dataset is too large to view in a spreadsheet, so this notebook is a changelog for using SQL to clean the health_fitness_dataset on google's BigQuery.

## 1. Investigating NULL Values

Variations of the following script were run for every column to discover if there were any NULL values.

In [1]:
SELECT  (*)
FROM `complete-will-468115-p2.fitlife360_synthetic_health_data.fitlife`
WHERE 
  particpant_id IS NULL;

ERROR: Error in parse(text = input): <text>:1:10: unexpected '*'
1: SELECT  (*
             ^


<a id="fitness_consumer_survey"></a>

# Fitness Consumer Survey Data Changelog

This dataset is small enough to view as a spreadsheet as it only contains data from 30 respondents and 21 questions, plus a timestamp. However, it was uploaded into postgresSQL as practice for creating tables and importing datasets. Most analysis was completed using spreadsheet software.

## 1. Uploading the data.
A table was created in the bellabeat server titled survey_605 using the following query:

In [None]:
create table survey_605 (
	response_timestamp timestamp,
	age text,
	gender text, 
	education_level text,
	occupation text,
	weekly_exercise_frequency text,
	length_wearable_history text,
	wearable_use_frequency text,
	fitness_data_tracking_frequency text,
	impact_fitness_routine text,
	impact_fitness_motivation text,
	impact_exercise_enjoyment text,
	wearable_engagement text,
	community_connection text,
	impact_fitness_goal_achievement text,
	impact_overall_health text,
	impact_sleep_patterns text,
	impact_overall_wellbeing text,
	influence_exercise_frequency text,
	influence_fitness_purchases text,
	influence_join_gym text,
	influence_dietary_decision text
);

## 2. Investigating NULL Values

There were no null values in the dataset.

## 3. Summarizing data

A second sheet was created in the survey 605 file as a summary table. As the various responses were tallied, it is evident that the responses were all multiple choice on the survey, though that could not be verified from the dataset's description. This would have been helpful as it is currently unknown if there are any answer options that recieved zero responses, which would change the appearance of data visualizations. As it is, I investigated how many responses of "Strongly disagree" there were in the whole dataset as many of the questions used the same scale ratings as possible answers. There was exactly one "Strongly Disagree" response in the whole sheet, which means there are likely multiple questions with options that recieved zero response.

Since all of the answers are standardized, the entire summary table was created using COUNTIF funtions in google sheets.

## 4. Data Visualization

The data was visualized using various tables in Tableau Public.

## 5. Conclusion

This dataset is useful in that it provides demographic information, provides insight into possible marketing campaign directions, and shows potential repeat-buying trends. The opinions and self-evaluations provided by the respondents, when combined with the separate fitbit user dataset could provide powerful insight into Bellabeat's target market.
