## Fitness Data Analysis: Case Study
- Fitness data analysis involves looking at information about your physical activities and health to understand how your body is performing. This information might come from things like fitness apps, wearables like smartwatches, or records you keep yourself.

- The dataset consists of fitness-related metrics collected through a fitness watch over multiple days. Each record in the dataset includes the following attributes:

1. Date: The date when the data was recorded.
2. Time: The time at which the data was recorded.
3. Step Count: The number of steps taken during the recorded interval.
4. Distance: The distance covered in meters during the recorded interval.
5. Energy Burned: The amount of energy burned in kilocalories during the recorded interval.
6. Flights Climbed: The number of flights of stairs climbed during the recorded interval.
7. Walking Double Support Percentage: The percentage of time both feet are in contact with the ground while walking.
8. Walking Speed: The walking speed in meters per second during the recorded interval.

Your task is to analyze the fitness watch data and address the following problems:

- Perform exploratory data analysis (EDA) to gain insights into the distribution, trends, and patterns of each fitness metric.
- Create visualizations to depict how different metrics vary over time, across different time intervals, or in relation to one another.
- Analyze how step count, distance, energy burned, and other metrics correlate with each other.
- Identify potential patterns in walking efficiency, energy expenditure, and the relationship between step count and walking speed.
- Segment the data into time intervals (e.g., morning, afternoon, evening) based on the recorded timestamps.
- Investigate variations in fitness metrics (e.g., step count, walking speed) during different time intervals.

In [2]:
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
pio.templates.deafult = 'plotly_white'

In [3]:
data = pd.read_csv('Apple-Fitness-Data.csv')
data.head()

Unnamed: 0,Date,Time,Step Count,Distance,Energy Burned,Flights Climbed,Walking Double Support Percentage,Walking Speed
0,2023-03-21,16:01:23,46,0.02543,14.62,3,0.304,3.06
1,2023-03-21,16:18:37,645,0.40041,14.722,3,0.309,3.852
2,2023-03-21,16:31:38,14,0.00996,14.603,4,0.278,3.996
3,2023-03-21,16:45:37,13,0.00901,14.811,3,0.278,5.04
4,2023-03-21,17:10:30,17,0.00904,15.153,3,0.281,5.184


- First let's look if the data contains any null values or not

In [4]:
data.isnull().sum()

Date                                 0
Time                                 0
Step Count                           0
Distance                             0
Energy Burned                        0
Flights Climbed                      0
Walking Double Support Percentage    0
Walking Speed                        0
dtype: int64

- As we can see from the above implementation the data doesn't have any null values. Let's move further by analyzing step count over time

In [5]:
# Step count over time

fig = px.line(data, x='Time', y='Step Count', title='Step Count Ovet Time')

fig.show()

pio.write_image(fig, './images/image1.png')

 ![Plot Description](./images/image1.png)

- Now let's have a look at the distance covered over time

In [6]:
# Distance Covered Over Time

fig = px.line(data, x='Time', y='Distance', title='Distance Covered Over Time')

fig.show()

pio.write_image(fig, './images/image2.png')

 ![Plot Description](./images/image2.png)

- Now, let's look the energy burnde over time

In [7]:
# Energy Burned over time

fig = px.line(data, x='Time', y='Energy Burned', title='Energy Burned Over Time')

fig.show()

pio.write_image(fig, './images/image3.png')

 ![Plot Description](./images/image3.png)

- Now, let's look at the walking speed over time

In [8]:
# Walking Speed Over Time

fig = px.line(data, x='Time', y='Walking Speed', title='Walking Speed Over Time')

fig.show()

pio.write_image(fig, './images/image4.png')

 ![Plot Description](./images/image4.png)

- Now, let's calculate and look at the average step counts per day

In [12]:
# Calculate Average Step Count per Day

average_step_count_per_day = data.groupby('Date')['Step Count'].mean().reset_index()

fig = px.bar(average_step_count_per_day, x='Date', y='Step Count', title='Average Step Count per Day')
fig.update_xaxes(type='category')
fig.show()

pio.write_image(fig, './images/image5.png')

 ![Plot Description](./images/image5.png)

- Now, let's have a look at walking efficiency over time

In [14]:
# Calculate Walking Efficiency

data['Walking Efficiency'] = data['Distance']/data['Step Count']

fig = px.line(data, x='Time', y='Walking Efficiency', title='Walking Efficiency Over Time')
fig.show()

pio.write_image(fig, './images/image6.png')

 ![Plot Description](./images/image6.png)

- Now, let's have a look at the step count and walking speed variations by time intervals

In [16]:
# Create Time Intervals
time_intervals = pd.cut(pd.to_datetime(data["Time"]).dt.hour,
                        bins=[0, 12, 18, 24],
                        labels=["Morning", "Afternoon", "Evening"], 
                        right=False)

data["Time Interval"] = time_intervals

# Variations in Step Count and Walking Speed by Time Interval
fig = px.scatter(data, x="Step Count",
                  y="Walking Speed",
                  color="Time Interval",
                  title="Step Count and Walking Speed Variations by Time Interval",
                  trendline='ols')
fig.show()

pio.write_image(fig, './images/image7.png')


Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.



 ![Plot Description](./images/image7.png)

- Now, let's compare the daily average of all the health and fitness metrics

In [23]:
# Reshape data for treemap

data['Date'] = pd.to_datetime(data['Date'])

daily_avg_metrics = data.groupby('Date')[['Step Count', 'Distance' ,'Energy Burned', 'Flights Climbed', 'Walking Double Support Percentage', 'Walking Speed']].mean().reset_index()

daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=['Date'], value_vars=['Step Count', 'Distance' ,'Energy Burned', 'Flights Climbed', 'Walking Double Support Percentage', 'Walking Speed'])

fig = px.treemap(daily_avg_metrics_melted, path=['variable'], values='value', color='variable', hover_data=['value'], title='Daily Averages for Different Metrics')

fig.show()

pio.write_image(fig, './images/image8.png')

 ![Plot Description](./images/image8.png)

- The above graph represents each health and fitness metric as a rectangular tile. The size of each tile corresponds to the value of the metric and the colour of the tiles represents the metric itself. Hover data displays the exact average value for each metric when interacting with the visualization.

- The Step Count metric dominates the visualization due to its generally higher numerical values compared to other metrics, making it difficult to visualize variations in the other metrics effectively. As the value of step count is higher than the value of all other metrics, let’s have a look at this visualization again without step counts:

In [29]:
# Select metrics excluding Step Count
metrics_to_visulize = ['Distance', 'Energy Burned', 'Flights Climbed', 'Walking Double Support Percentage', 'Walking Speed']

# Reshape data for treemap
daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=['Date'], value_vars=metrics_to_visulize)

fig = px.treemap(daily_avg_metrics_melted, path=['variable'], values='value', color='variable', hover_data=['value'], title='Daily Averages for Different Metrics (Excluding Step Count)')

fig.show()

pio.write_image(fig, './images/image9.png')

 ![Plot Description](./images/image9.png)