# AI Python Zero-to-Hero: Build Your Own Fitness Tracker

In this project, we'll learn how to combine AI, data analysis, and Python to explore and visualize fitness data.

The data is available in this repository as `fitness_data.csv`. It is synthetic data consisting of the following columns:

### Data Dictionary

| Column Name | Description | Additional Context |
|------------|-------------|-------------------|
| `date` | The specific day of data recording | Allows tracking changes and patterns over the 185-day period |
| `steps` | Total daily step count | The common goal of 10,000 steps serves as a reference point for daily activity level assessment |
| `weight` | Body weight measurement (in kg) | Tracked daily to monitor body mass changes over time |
| `resting_heart_rate` | Heart beats per minute while at complete rest | Lower values typically indicate better cardiovascular fitness |
| `sleep_hours` | Total daily sleep duration | Includes all sleep phases; adults typically need 7-9 hours per night for optimal health |
| `active_minutes` | Total time spent in physical activity | Encompasses all activity intensities throughout the day |
| `total_calories_burned` | Total daily energy expenditure | Combines both resting metabolic rate and activity-based calorie burn |
| `fat_burn_minutes` | Time in 50-69% of max heart rate zone | Lower intensity zone optimal for building base endurance and metabolizing fat |
| `cardio_minutes` | Time in 70-84% of max heart rate zone | Moderate to high intensity zone that improves cardiovascular capacity |
| `peak_minutes` | Time in 85%+ of max heart rate zone | Highest intensity zone, typically reached during interval training or sprints |
| `workout_type` | Category of exercise performed | Helps analyze the distribution and effectiveness of different activities |
| `workout_duration` | Length of exercise session in minutes | Used to analyze exercise patterns and time commitment |
| `workout_calories` | Energy expended during workout | Specifically tracks calories burned during structured exercise sessions |
| `workout_avg_hr` | Mean heart rate during exercise | Indicates the overall intensity of the workout session |
| `workout_max_hr` | Highest heart rate during exercise | Shows the point of maximum exertion during the workout |

Most fitness tracking apps and devices contain variations of the above columns, and allow you to export the data for your own analysis.

Let's dive in!

## Task 0: Setup

We're going to use `pandas` for data analysis and `plotly.express` for interactive data visualization.

In [1]:
# Import necessary packages
import pandas as pd
import plotly.express as px

## Task 1: Reading in the data 📖

Let's read in and display the `fitness_data.csv` file using `pandas`.

In [4]:
# Read in the dataset "fitness_data.csv"
fitness_data = pd.read_csv('fitness_data.csv')

# Display the data
fitness_data

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
0,7/1/24,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.120000,143.67,171.17,33.0,19.0,13.0
1,7/2/24,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.00,0.000000,0.00,0.00,33.0,19.0,13.0
2,7/3/24,75.140000,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0
3,7/4/24,75.200000,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.040000,152.04,168.87,55.0,33.0,22.0
4,7/5/24,75.270000,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.010000,144.06,167.28,39.0,23.0,15.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,12/28/24,77.690000,5883.0,62.33,9.78,71.0,712.0,Strength Training,94.29,1072.200000,131.24,154.21,35.0,21.0,14.0
181,12/29/24,77.610000,8639.0,66.12,6.55,87.0,874.0,Running,41.47,419.240000,139.88,157.09,43.0,26.0,17.0
182,12/30/24,77.490000,8587.0,61.23,8.02,87.0,874.0,Rest,0.00,0.000000,0.00,0.00,43.0,26.0,17.0
183,12/31/24,77.550000,10408.0,59.33,4.57,90.0,900.0,HIIT,26.73,267.900000,133.59,146.81,45.0,27.0,18.0


## Task 2: Checking for missing values 🔎

Now that we've read in the data, let's check whether it has any missing values.

In [5]:
# Does the data have any missing values?
missing_values = fitness_data.isnull().sum()
missing_values

date                     0
weight                   6
steps                    6
resting_heart_rate       6
sleep_hours              6
active_minutes           6
total_calories_burned    6
workout_type             6
workout_duration         6
workout_calories         6
workout_avg_hr           6
workout_max_hr           6
fat_burn_minutes         6
cardio_minutes           6
peak_minutes             6
dtype: int64

In [6]:
# Which rows have missing values?
rows_with_missing_values = fitness_data[fitness_data.isnull().any(axis=1)]
rows_with_missing_values

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
150,11/28/24,,,,,,,,,,,,,,
151,11/29/24,,,,,,,,,,,,,,
152,11/30/24,,,,,,,,,,,,,,
176,12/24/24,,,,,,,,,,,,,,
177,12/25/24,,,,,,,,,,,,,,
178,12/26/24,,,,,,,,,,,,,,


## Task 3: Exploring the data 🔎

Let's start exploring our data by calculating summary statistics and confirming that our columns are of the correct data type.

In [7]:
# What are the summary statistics?
summary_statistics = fitness_data.describe()
summary_statistics

Unnamed: 0,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
count,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0
mean,74.838842,8476.927374,64.926538,7.511456,85.128492,853.396648,38.227039,384.143224,129.950615,148.408547,42.284916,25.100559,16.636872
std,1.817232,2350.241475,4.76561,1.061339,22.242287,222.456798,19.778319,202.061761,37.68899,42.916258,11.133124,6.703257,4.442768
min,71.11,3207.0,50.38,3.17,30.0,303.0,0.0,0.0,0.0,0.0,15.0,9.0,6.0
25%,73.13,6823.0,61.755,6.995,71.0,711.5,26.81,262.48,130.79,148.85,35.0,21.0,14.0
50%,75.269587,8524.0,64.71,7.59,84.0,841.0,34.34,350.91,140.42,159.56,42.0,25.0,16.0
75%,76.53,9704.5,67.61,8.055,98.5,988.0,46.85,493.37,146.915,168.08,49.0,29.0,19.0
max,78.17,15724.0,80.99,9.78,157.0,1573.0,94.29,1072.2,160.24,187.55,78.0,47.0,31.0


In [8]:
# Check the data types of each column
column_data_types = fitness_data.dtypes
column_data_types

date                      object
weight                   float64
steps                    float64
resting_heart_rate       float64
sleep_hours              float64
active_minutes           float64
total_calories_burned    float64
workout_type              object
workout_duration         float64
workout_calories         float64
workout_avg_hr           float64
workout_max_hr           float64
fat_burn_minutes         float64
cardio_minutes           float64
peak_minutes             float64
dtype: object

In [9]:
# Convert the date column to a datetime format
fitness_data['date'] = pd.to_datetime(fitness_data['date'])

In [11]:
fitness_data.dtypes

date                     datetime64[ns]
weight                          float64
steps                           float64
resting_heart_rate              float64
sleep_hours                     float64
active_minutes                  float64
total_calories_burned           float64
workout_type                     object
workout_duration                float64
workout_calories                float64
workout_avg_hr                  float64
workout_max_hr                  float64
fat_burn_minutes                float64
cardio_minutes                  float64
peak_minutes                    float64
dtype: object

## Task 4: Creating useful new columns

Sometimes we might want to rename columns with more descriptive names to be easier to interpret, or create new columns to measure things we are interested in.

In [12]:
# Rename 'weight" and 'workout_duration' to have more descriptive names
fitness_data.rename(columns = {'weight': 'weight_kg', 'workout_duration' : 'workout_duration_minutes'}, inplace = True)

In [13]:
# Add a new column 'weight_lbs' converting weight from kilograms to pounds (1 kg = 2.20462 lbs)
fitness_data['weight_lbs'] = fitness_data['weight_kg'] * 2.20462
fitness_data

Unnamed: 0,date,weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,weight_lbs
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.120000,143.67,171.17,33.0,19.0,13.0,165.894033
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.00,0.000000,0.00,0.00,33.0,19.0,13.0,165.158145
2,2024-07-03,75.140000,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147
3,2024-07-04,75.200000,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.040000,152.04,168.87,55.0,33.0,22.0,165.787424
4,2024-07-05,75.270000,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.010000,144.06,167.28,39.0,23.0,15.0,165.941747
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,2024-12-28,77.690000,5883.0,62.33,9.78,71.0,712.0,Strength Training,94.29,1072.200000,131.24,154.21,35.0,21.0,14.0,171.276928
181,2024-12-29,77.610000,8639.0,66.12,6.55,87.0,874.0,Running,41.47,419.240000,139.88,157.09,43.0,26.0,17.0,171.100558
182,2024-12-30,77.490000,8587.0,61.23,8.02,87.0,874.0,Rest,0.00,0.000000,0.00,0.00,43.0,26.0,17.0,170.836004
183,2024-12-31,77.550000,10408.0,59.33,4.57,90.0,900.0,HIIT,26.73,267.900000,133.59,146.81,45.0,27.0,18.0,170.968281


In [14]:
# Add a column to indicate the day of the week
fitness_data['day_of_week'] = fitness_data['date'].dt.day_name()

# Add a column to indicate weekends or weekends
fitness_data['is_weekend'] = fitness_data['day_of_week'].isin(['Saturday', 'Sunday'])

# Display the data
fitness_data

Unnamed: 0,date,weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,weight_lbs,day_of_week,is_weekend
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.120000,143.67,171.17,33.0,19.0,13.0,165.894033,Monday,False
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.00,0.000000,0.00,0.00,33.0,19.0,13.0,165.158145,Tuesday,False
2,2024-07-03,75.140000,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147,Wednesday,False
3,2024-07-04,75.200000,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.040000,152.04,168.87,55.0,33.0,22.0,165.787424,Thursday,False
4,2024-07-05,75.270000,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.010000,144.06,167.28,39.0,23.0,15.0,165.941747,Friday,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,2024-12-28,77.690000,5883.0,62.33,9.78,71.0,712.0,Strength Training,94.29,1072.200000,131.24,154.21,35.0,21.0,14.0,171.276928,Saturday,True
181,2024-12-29,77.610000,8639.0,66.12,6.55,87.0,874.0,Running,41.47,419.240000,139.88,157.09,43.0,26.0,17.0,171.100558,Sunday,True
182,2024-12-30,77.490000,8587.0,61.23,8.02,87.0,874.0,Rest,0.00,0.000000,0.00,0.00,43.0,26.0,17.0,170.836004,Monday,False
183,2024-12-31,77.550000,10408.0,59.33,4.57,90.0,900.0,HIIT,26.73,267.900000,133.59,146.81,45.0,27.0,18.0,170.968281,Tuesday,False


In [15]:
# Add a new column 'sleep_debt' that calculates the difference between sleep_hours and a target of 7.5 hours
fitness_data['sleep_debt'] = fitness_data['sleep_hours'] - 7.5

In [16]:
# Create a new column that calculates the cumulative sleep debt
fitness_data['cumulative_sleep_debt'] = fitness_data['sleep_debt'].cumsum()
fitness_data

Unnamed: 0,date,weight_kg,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration_minutes,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes,weight_lbs,day_of_week,is_weekend,sleep_debt,cumulative_sleep_debt
0,2024-07-01,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.120000,143.67,171.17,33.0,19.0,13.0,165.894033,Monday,False,-0.47,-0.470000
1,2024-07-02,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.00,0.000000,0.00,0.00,33.0,19.0,13.0,165.158145,Tuesday,False,0.35,-0.120000
2,2024-07-03,75.140000,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0,165.655147,Wednesday,False,0.03,-0.090000
3,2024-07-04,75.200000,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.040000,152.04,168.87,55.0,33.0,22.0,165.787424,Thursday,False,-2.87,-2.960000
4,2024-07-05,75.270000,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.010000,144.06,167.28,39.0,23.0,15.0,165.941747,Friday,False,-2.12,-5.080000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180,2024-12-28,77.690000,5883.0,62.33,9.78,71.0,712.0,Strength Training,94.29,1072.200000,131.24,154.21,35.0,21.0,14.0,171.276928,Saturday,True,2.28,9.740577
181,2024-12-29,77.610000,8639.0,66.12,6.55,87.0,874.0,Running,41.47,419.240000,139.88,157.09,43.0,26.0,17.0,171.100558,Sunday,True,-0.95,8.790577
182,2024-12-30,77.490000,8587.0,61.23,8.02,87.0,874.0,Rest,0.00,0.000000,0.00,0.00,43.0,26.0,17.0,170.836004,Monday,False,0.52,9.310577
183,2024-12-31,77.550000,10408.0,59.33,4.57,90.0,900.0,HIIT,26.73,267.900000,133.59,146.81,45.0,27.0,18.0,170.968281,Tuesday,False,-2.93,6.380577


## Task 5: Visualizing trends across single variables 📈

Let's visualize the distributions and trends of different columns in our data.

In [43]:
# Create a histogram of the number of steps
steps_hist = px.histogram(fitness_data, x='steps', title='Distribution of Steps',
                              labels={'steps': 'Steps'}, nbins=30)
steps_hist.show()

In [22]:
# Create a line chart for the number of steps over time with markers for weekends
fig = px.line(fitness_data, x='date', y='steps', title='Number of Steps Over Time', markers=True)

# Add markers for weekends
weekend_data = fitness_data[fitness_data['is_weekend']]
fig.add_scatter(x=weekend_data['date'], y=weekend_data['steps'], mode='markers', name='Weekend', marker=dict(color='red'))

fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Number of Steps'
)

# Display the line chart
fig.show()

In [44]:
cals_hist = px.histogram(fitness_data, x='total_calories_burned', title='Distribution of Calories Burned',
                              labels={'total_calories_burned': 'Total Calories Burned'}, nbins=30)
cals_hist.show()

In [24]:
# Create a line chart for the number of calories burned over time
fig = px.line(fitness_data, x='date', y='total_calories_burned', title='Total Calories Burned Over Time', markers=True)

fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Total Calories Burned'
)

# Display the line chart
fig.show()

In [26]:
# Visualize weight trend over time
fig = px.line(fitness_data, x='date', y='weight_kg', title='Weight Trend Over Time')

fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Weight (kg)'
)

# Display the line chart
fig.show()

## Task 6: Visualizing trends across multiple variables 📊

In [40]:
# Create a line chart visualizing sleep metrics over time
fig_sleep = px.line(
    fitness_data, 
    x='date', 
    y=['sleep_hours', 'sleep_debt', 'cumulative_sleep_debt'], 
    title='Sleep Metrics Over Time',
    labels={
        'value': 'Hours',
        'date': 'Date',
        'variable': 'Metric'
    },
    markers=True
)

fig_sleep.show()

In [42]:
# Create a box plot for sleep hours comparing weekdays and weekends
fig = px.box(
    fitness_data,
    x='is_weekend',
    y='sleep_hours',
    title='Sleep Hours: Weekdays vs Weekends',
    labels={'is_weekend': 'Day Type (Weekday or Weekend)', 'sleep_hours': 'Sleep Hours'},
    template='plotly_white',
    color='is_weekend'
)

# Display the box plot
fig.show()

In [37]:
# Prepare the data for visualization
average_sleep_by_day = (
    fitness_data.groupby('day_of_week')['sleep_hours']
    .mean()
    .reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
    .reset_index()
)

# Create a bar chart for average sleep by day of the week
fig = px.bar(
    average_sleep_by_day,
    x='day_of_week',
    y='sleep_hours',
    title='Average Sleep Hours by Day of the Week',
    labels={'day_of_week': 'Day of the Week', 'sleep_hours': 'Average Sleep Hours'},
)

# Display the bar chart
fig.show()

In [36]:
# Scatter plot for number of steps vs. calories burned
steps_vs_calories = px.scatter(fitness_data, x='steps', y='total_calories_burned', title='Steps vs. Calories Burned',
                                labels={'steps': 'Steps', 'total_calories_burned': 'Calories Burned'}, trendline="ols")
steps_vs_calories.show()

In [35]:
# How many different workout types are in the dataset?
fitness_data["workout_type"].value_counts()

workout_type
Running              51
Yoga                 42
Strength Training    41
HIIT                 32
Rest                 13
Name: count, dtype: int64

In [34]:
# Visualize trends across workout types
workout_plot = px.scatter(
    fitness_data,
    x='workout_avg_hr',
    y='workout_calories',
    color='workout_type',
    size='workout_duration_minutes',
    title='Calories Burned vs. Average Heart Rate by Workout Type',
    labels={'workout_avg_hr': 'Average Heart Rate (bpm)', 'workout_calories': 'Calories Burned'},
    hover_data=['fat_burn_minutes']
)
workout_plot.show()