
# Introduction

In this case study, I will be analyzing a public dataset using SQL for Bellabeat, a high-tech manufacturer of health-focused products for women. I'll be looking at smart device data to gain insight on how consumers use their smart devices and provide recommendations to the Bellabeat marketing strategy.

# Background

Bellabeat was founded in 2013 by Urška Sršen and Sando Mur with the goal of developing beautifully designed technology that would inform and inspire women. The technology would collect data on activity, sleep, stress, and reproductive health to empower women with knowledge about their own health.

Bellabeat products are available at a number of online retailers in addition to their website. The company has invested in traditional advertising media such as radio, billboards, print, and television, but focuses on digital marketing extensively. They have ads on Youtube and Google, and are active on multiple social media platforms including Facebook, Instagram, and Twitter.

**Bellabeat Products**
- Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
- Leaf: A wellness tracker that can be worn as a bracelet, necklace, or clip and connects to the Bellabeat app to track activity, sleep, and stress.
- Time: A wellness watch with smart technology and connects to the Bellabeat app to track user activity, sleep, and stress.
- Spring: A water bottle with smart technology and connects to the Bellabeat app to track daily water intake.
- Bellabeat membership: A subscription-based membership program for users to have 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.



### Business Task

Analyze smart device data to gain insight on how consumers use smart devices and answer the following questions:

- What are some trends in smart device usage?
- How could these trends apply to Bellabeat customers?
- How could these trends help influence Bellabeat marketing strategy?



### Prepare Data

The dataset was obtained from [Kaggle](https://www.kaggle.com/datasets/arashnic/fitbit) and contains FitBit tracker data from 30 users including minute, hourly, and daily level output data for activity intensity, steps, calories, sleep, and heart rate. The dataset contained 18 CSV files, each file containing a table varying in number of columns and content. Using Pandas and a Python loop, each file was read to a dataframe then written to a PostgreSQL database as a new table. 


In [1]:
#Environment setup
import pandas as pd
import psycopg2
import sqlalchemy
%load_ext sql

#Connect to SQL database
%sql postgresql://postgres:postgres@mydataplace.cgaor8iekxp8.us-east-1.rds.amazonaws.com:5432/bellabeat
engine = sqlalchemy.create_engine('postgresql://postgres:postgres@mydataplace.cgaor8iekxp8.us-east-1.rds.amazonaws.com:5432/bellabeat')

In [None]:
#Read multiple CSV files and load into SQL database
import glob 
import os
file_names = glob.glob('data/*.csv')

for names in file_names:
    tablename = os.path.basename(names)
    tablename, ext = os.path.splitext(tablename)
    df = pd.read_csv(names)
    df.columns = df.columns.str.lower() #convert column names to lower case
    df.to_sql(tablename, engine, if_exists='replace', index=False)

In [8]:
%%sql
-- Verify tables in database
SELECT tablename
FROM pg_catalog.pg_tables
WHERE schemaname = 'public';

 * postgresql://postgres:***@mydataplace.cgaor8iekxp8.us-east-1.rds.amazonaws.com:5432/bellabeat
16 rows affected.


tablename
dailyactivity
dailycalories
dailyintensities
dailysteps
heartrate_seconds
minutecaloriesnarrow
minutecalorieswide
minuteintensitiesnarrow
minuteintensitieswide
minutemetsnarrow



#### Consolidate Tables

With so many tables to work with, I wanted to look at all the data in one place to see if I can simplify the dataset by removing any tables with redundant information or consolidate tables with similar information. I proceeded to import my database schema to an online database designer [Azimutt](https://azimutt.app/) and created the following diagram.

<p align="center">
<img src='images/databasediagram.png' width=90%>
</p>

- I found that there were narrow and wide versions of minute calories, intensities, and steps. I decided to drop the wide versions of these tables (`minutecalorieswide`, `minuteintensitieswide`, `minutestepswide`) as the data already existed in the narrow verisons. 
- Likewise, I dropped the `dailycalories`, `dailyintensities`, and `dailysteps`tables because the data already existed in the `dailyactivity` table. 
- For the hourly data, I wanted to create a new table by joining the `hourlycalories`, `hourlyintensities`, and `hourlysteps` tables. The information would be easier to view on 1 table as opposed to 3 tables. Once the new table was created, the initial 3 hourly tables were dropped. 



In [9]:
%%sql
-- Drop unnecessary tables
DROP TABLE IF EXISTS dailycalories, dailyintensities, dailysteps, minutecalorieswide, minuteintensitieswide, minutestepswide;

-- Verify tables have been dropped
SELECT tablename
FROM pg_catalog.pg_tables
WHERE schemaname = 'public';

 * postgresql://postgres:***@mydataplace.cgaor8iekxp8.us-east-1.rds.amazonaws.com:5432/bellabeat
Done.
10 rows affected.


tablename
dailyactivity
heartrate_seconds
minutecaloriesnarrow
minuteintensitiesnarrow
minutemetsnarrow
minutesleep
minutestepsnarrow
sleepday
weightloginfo
hourlydata


In [None]:
%%sql
-- Join hourly tables and create new table
CREATE TABLE hourlydata AS(
SELECT c.id,
       c.activityhour,
       c.calories,
       i.totalintensity,
       i.averageintensity,
       s.steptotal
FROM hourlycalories AS c
FULL OUTER JOIN hourlyintensities AS i
    ON c.id = i.id
    AND c.activityhour = i.activityhour
FULL OUTER JOIN hourlysteps AS s
    ON i.id = s.id
    AND i.activityhour = s.activityhour
);

-- Drop joined tables
DROP TABLE IF EXISTS hourlycalories, hourlyintensities, hourlysteps;

-- Verify tables have been dropped
SELECT tablename
FROM pg_catalog.pg_tables
WHERE schemaname = 'public';