# Fitness Tracker

The data is available in this workbook as `fitness_data.csv`. It is synthetic data consisting of the following columns:

### Data Dictionary

| Column Name | Description | Additional Context |
|------------|-------------|-------------------|
| `date` | The specific day of data recording | Allows tracking changes and patterns over the 185-day period |
| `steps` | Total daily step count | The common goal of 10,000 steps serves as a reference point for daily activity level assessment |
| `weight` | Body weight measurement (in kg) | Tracked daily to monitor body mass changes over time |
| `resting_heart_rate` | Heart beats per minute while at complete rest | Lower values typically indicate better cardiovascular fitness |
| `sleep_hours` | Total daily sleep duration | Includes all sleep phases; adults typically need 7-9 hours per night for optimal health |
| `active_minutes` | Total time spent in physical activity | Encompasses all activity intensities throughout the day |
| `total_calories_burned` | Total daily energy expenditure | Combines both resting metabolic rate and activity-based calorie burn |
| `fat_burn_minutes` | Time in 50-69% of max heart rate zone | Lower intensity zone optimal for building base endurance and metabolizing fat |
| `cardio_minutes` | Time in 70-84% of max heart rate zone | Moderate to high intensity zone that improves cardiovascular capacity |
| `peak_minutes` | Time in 85%+ of max heart rate zone | Highest intensity zone, typically reached during interval training or sprints |
| `workout_type` | Category of exercise performed | Helps analyze the distribution and effectiveness of different activities |
| `workout_duration` | Length of exercise session in minutes | Used to analyze exercise patterns and time commitment |
| `workout_calories` | Energy expended during workout | Specifically tracks calories burned during structured exercise sessions |
| `workout_avg_hr` | Mean heart rate during exercise | Indicates the overall intensity of the workout session |
| `workout_max_hr` | Highest heart rate during exercise | Shows the point of maximum exertion during the workout |

In [26]:
# Import plotly and pandas
import pandas as pd
import plotly.express as px

## Step 1: Exploratory Data Analysis (EDA)

Let's read in the `fitness_data.csv` file using `pandas` and do some exploratory data analysis like checking for missing values, inspecting data types, etc.

In [27]:
fitness_data = pd.read_csv('fitness_data.csv')
fitness_data.head()

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
0,7/1/24,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0
1,7/2/24,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0
2,7/3/24,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0
3,7/4/24,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0
4,7/5/24,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0


In [28]:
# Check for missing values in the dataset
missing_values = fitness_data.isnull().sum()
missing_values

date                     0
weight                   6
steps                    6
resting_heart_rate       6
sleep_hours              6
active_minutes           6
total_calories_burned    6
workout_type             6
workout_duration         6
workout_calories         6
workout_avg_hr           6
workout_max_hr           6
fat_burn_minutes         6
cardio_minutes           6
peak_minutes             6
dtype: int64

In [29]:
# Display rows with missing values
rows_with_missing_values = fitness_data[fitness_data.isnull().any(axis=1)]
rows_with_missing_values

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
150,11/28/24,,,,,,,,,,,,,,
151,11/29/24,,,,,,,,,,,,,,
152,11/30/24,,,,,,,,,,,,,,
176,12/24/24,,,,,,,,,,,,,,
177,12/25/24,,,,,,,,,,,,,,
178,12/26/24,,,,,,,,,,,,,,


In [30]:
# Calculate summary statistics of the dataset
summary_statistics = fitness_data.describe()
summary_statistics

Unnamed: 0,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
count,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0
mean,74.838842,8476.927374,64.926538,7.511456,85.128492,853.396648,38.227039,384.143224,129.950615,148.408547,42.284916,25.100559,16.636872
std,1.817232,2350.241475,4.76561,1.061339,22.242287,222.456798,19.778319,202.061761,37.68899,42.916258,11.133124,6.703257,4.442768
min,71.11,3207.0,50.38,3.17,30.0,303.0,0.0,0.0,0.0,0.0,15.0,9.0,6.0
25%,73.13,6823.0,61.755,6.995,71.0,711.5,26.81,262.48,130.79,148.85,35.0,21.0,14.0
50%,75.269587,8524.0,64.71,7.59,84.0,841.0,34.34,350.91,140.42,159.56,42.0,25.0,16.0
75%,76.53,9704.5,67.61,8.055,98.5,988.0,46.85,493.37,146.915,168.08,49.0,29.0,19.0
max,78.17,15724.0,80.99,9.78,157.0,1573.0,94.29,1072.2,160.24,187.55,78.0,47.0,31.0


In [31]:
# Show the data types of each column
column_data_types = fitness_data.dtypes
column_data_types

date                      object
weight                   float64
steps                    float64
resting_heart_rate       float64
sleep_hours              float64
active_minutes           float64
total_calories_burned    float64
workout_type              object
workout_duration         float64
workout_calories         float64
workout_avg_hr           float64
workout_max_hr           float64
fat_burn_minutes         float64
cardio_minutes           float64
peak_minutes             float64
dtype: object

In [32]:
# Convert the 'date' column to datetime format
fitness_data['date'] = pd.to_datetime(fitness_data['date'])
fitness_data.dtypes

date                     datetime64[ns]
weight                          float64
steps                           float64
resting_heart_rate              float64
sleep_hours                     float64
active_minutes                  float64
total_calories_burned           float64
workout_type                     object
workout_duration                float64
workout_calories                float64
workout_avg_hr                  float64
workout_max_hr                  float64
fat_burn_minutes                float64
cardio_minutes                  float64
peak_minutes                    float64
dtype: object

## Step 2: Creating useful new columns

Sometimes we might want to rename columns with more descriptive names to be easier to interpret, or create new columns to measure things we are interested in.

In [33]:
# Rename the 'weight' and 'workout_duration' to more descriptive names
fitness_data.rename(
    columns={
        'weight': 'body_weight_kg',
        'workout_duration': 'workout_duration_minutes'
    },
    inplace=True
)
fitness_data.head()

Unnamed: 0,date,body_weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0
2,2024-07-03,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0
3,2024-07-04,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0
4,2024-07-05,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0


In [34]:
# Add a new column that converts the weight to pounds
fitness_data['body_weight_lbs'] = fitness_data['body_weight_kg'] * 2.20462
fitness_data.head()

Unnamed: 0,date,body_weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,body_weight_lbs
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0,165.894033
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0,165.158145
2,2024-07-03,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147
3,2024-07-04,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0,165.787424
4,2024-07-05,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0,165.941747


In [35]:
# Add a column to indicate the day of the week
fitness_data['day_of_week'] = fitness_data['date'].dt.day_name()

# Add a column to indicate if it's a weekend or not
fitness_data['is_weekend'] = fitness_data['day_of_week'].isin(['Saturday', 'Sunday'])

fitness_data.head()

Unnamed: 0,date,body_weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,body_weight_lbs,day_of_week,is_weekend
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0,165.894033,Monday,False
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0,165.158145,Tuesday,False
2,2024-07-03,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147,Wednesday,False
3,2024-07-04,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0,165.787424,Thursday,False
4,2024-07-05,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0,165.941747,Friday,False


In [36]:
# Add a new column 'sleep_debt' that calculates the difference between 'sleep_hours' and a target of 7.5 hours
fitness_data['sleep_debt'] = fitness_data['sleep_hours'] - 7.5

# Add a cumulative column for 'sleep_debt'
fitness_data['cumulative_sleep_debt'] = fitness_data['sleep_debt'].cumsum()

fitness_data.head()

Unnamed: 0,date,body_weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,body_weight_lbs,day_of_week,is_weekend,sleep_debt,cumulative_sleep_debt
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0,165.894033,Monday,False,-0.47,-0.47
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0,165.158145,Tuesday,False,0.35,-0.12
2,2024-07-03,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147,Wednesday,False,0.03,-0.09
3,2024-07-04,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0,165.787424,Thursday,False,-2.87,-2.96
4,2024-07-05,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0,165.941747,Friday,False,-2.12,-5.08


## Step 3: Visualizing trends across single variables 📈

Let's visualize the distributions and trends of different columns in our data.

In [37]:
# Create a histogram of the number of steps
fig = px.histogram(fitness_data, x='steps', title='Distribution of Steps')
fig.show()

In [38]:
# Create a line chart of the number of steps over time with markers for weekends
fig = px.line(fitness_data, x='date', y='steps', title='Number of Steps Over Time', markers =True)
fig.update_traces(marker=dict(color=fitness_data['is_weekend'].map({True: 'red', False: 'blue'})))
fig.show()

In [39]:
# Create a histogram of the total calories burned
fig = px.histogram(fitness_data, x='total_calories_burned', title='Distribution of Total Calories Burned')
fig.show()

In [40]:
# Create a line chart of the total calories burned over time with a beautified y-axis label
fig = px.line(fitness_data, x='date', y='total_calories_burned', title='Total Calories Burned Over Time')
fig.update_layout(yaxis_title='Total Calories Burned (kcal)')
fig.show()

In [41]:
# Create a line chart for the weight column with nice axis labels
fig = px.line(fitness_data, x='date', y='body_weight_kg', title='Body Weight Over Time')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Body Weight (kg)'
)
fig.show()

## Task 4: Visualizing trends across multiple variables 📊

In [42]:
# Create a line chart visualizing sleep metrics over time
fig = px.line(fitness_data, x='date', y=['sleep_hours', 'sleep_debt', 'cumulative_sleep_debt'], title='Sleep Metrics Over Time')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Hours',
    legend_title='Sleep Metrics'
)
fig.show()

In [43]:
# Create a box plot for sleep hours comparing weedays and weekends
fig = px.box(fitness_data, x='is_weekend', y='sleep_hours', title='Sleep Hours: Weekdays vs Weekends', labels={'is_weekend': 'Is Weekend', 'sleep_hours': 'Sleep Hours'})
fig.update_layout(
    xaxis_title='Day Type',
    yaxis_title='Sleep Hours'
)
fig.show()

In [44]:
# Calculate the average sleep hours by day of the week
avg_sleep_by_day = fitness_data.groupby('day_of_week')['sleep_hours'].mean().reset_index()

# Define the order of days in the week starting from Monday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Convert 'day_of_week' to a categorical type with the specified order
avg_sleep_by_day['day_of_week'] = pd.Categorical(avg_sleep_by_day['day_of_week'], categories=day_order, ordered=True)

# Sort the dataframe by 'day_of_week'
avg_sleep_by_day = avg_sleep_by_day.sort_values('day_of_week')

# Create a bar chart to visualize the average sleep hours by day of the week
fig = px.bar(avg_sleep_by_day, x='day_of_week', y='sleep_hours', title='Average Sleep Hours by Day of the Week', labels={'day_of_week': 'Day of the Week', 'sleep_hours': 'Average Sleep Hours'})
fig.update_layout(
    xaxis_title='Day of the Week',
    yaxis_title='Average Sleep Hours'
)
fig.show()


In [45]:
# Create a scatter plot to visualize the relationship between steps and total calories burned
fig = px.scatter(fitness_data, x='steps', y='total_calories_burned', title='Steps vs Total Calories Burned', labels={'steps': 'Number of Steps', 'total_calories_burned': 'Total Calories Burned'})
fig.update_layout(
    xaxis_title='Number of Steps',
    yaxis_title='Total Calories Burned'
)
fig.show()

In [48]:
# Create a scatter plot with avg heart rate on the x axis, the burned calories on the y-axis,
# the workout type as the color and the size being the workout duration in minutes
fig = px.scatter(
    fitness_data,
    x='workout_avg_hr',
    y='workout_calories',
    color='workout_type',
    size='workout_duration_minutes',
    title='Avg Heart Rate vs Burned Calories by Workout Type',
    labels={
        'workout_avg_hr': 'Average Heart Rate',
        'workout_calories': 'Calories Burned',
        'workout_type': 'Workout Type',
        'workout_duration_minutes': 'Workout Duration (minutes)'
    }
)

fig.update_layout(
    xaxis_title='Average Heart Rate',
    yaxis_title='Calories Burned',
    xaxis=dict(range=[100, fitness_data['workout_avg_hr'].max() + 10])
)
fig.show()