In [1]:
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
import plotly.express as px

In [2]:
Data = pd.read_csv("Apple-Fitness-Data.csv")

In [4]:
print(Data.head())

         Date       Time  Step Count  Distance  Energy Burned  \
0  2023-03-21  16:01:23           46   0.02543         14.620   
1  2023-03-21  16:18:37          645   0.40041         14.722   
2  2023-03-21  16:31:38           14   0.00996         14.603   
3  2023-03-21  16:45:37           13   0.00901         14.811   
4  2023-03-21  17:10:30           17   0.00904         15.153   

   Flights Climbed  Walking Double Support Percentage  Walking Speed  
0                3                              0.304          3.060  
1                3                              0.309          3.852  
2                4                              0.278          3.996  
3                3                              0.278          5.040  
4                3                              0.281          5.184  


Let's check if this data contains any null values.

In [5]:
print(Data.isnull().sum())

Date                                 0
Time                                 0
Step Count                           0
Distance                             0
Energy Burned                        0
Flights Climbed                      0
Walking Double Support Percentage    0
Walking Speed                        0
dtype: int64


Since the data doesn't have any null values, let's proceed by analyzing my step count over time.

In [18]:
import plotly.express as px

fig1 = px.line(Data, x="Time", y="Step Count", title="Step Count Over Time")
fig1.update_traces(line=dict(color='green'))
fig1.update_layout(title_text="Step Count Over Time", title_x=0.5)
fig1.show()



Let's now visualize the distance covered over time.

In [21]:
import plotly.express as px

fig2 = px.line(Data, x="Time", y="Distance", title="Distance Covered Over Time")
fig2.update_traces(line=dict(color='green'))
fig2.update_layout(title_text="Distance Covered Over Time", title_x=0.5)
fig2.show()


Let's visualize the energy burned over time.

In [23]:
import plotly.express as px
fig3 = px.line(Data, x="Time", y="Energy Burned", title="Energy Burned Over Time")
fig3.update_traces(line=dict(color='green'))
fig3.update_layout(title_text="Energy Burned Over Time", title_x=0.5)
fig3.show()



Let's visualize my walking speed over time.

In [25]:
import plotly.express as px
fig4 = px.line(Data, x="Time", y="Walking Speed", title="Walking Speed Over Time")
fig4.update_traces(line=dict(color='green'))
fig4.update_layout(title_text="Walking Speed Over Time", title_x=0.5)
fig4.show()


Let's calculate and visualize the average daily step counts.

In [26]:
import plotly.express as px

Average_step_count_per_day = Data.groupby("Date")["Step Count"].mean().reset_index()
fig5 = px.bar(Average_step_count_per_day, x="Date", y="Step Count", title="Average Step Count per Day")
fig5.update_traces(marker_color='green')  
fig5.update_layout(title_text="Average Step Count per Day", title_x=0.5)
fig5.update_xaxes(type='category') 
fig5.show()


Let's analyze my walking efficiency over time.

In [28]:
import plotly.express as px
Data["Walking Efficiency"] = Data["Distance"] / Data["Step Count"]
fig6 = px.line(Data, x="Time", y="Walking Efficiency", title="Walking Efficiency Over Time")
fig6.update_traces(line=dict(color='green')) 
fig6.update_layout(title_text="Walking Efficiency Over Time", title_x=0.5)  
fig6.show()


Let's explore the variations in step count and walking speed over different time intervals.

In [48]:
time_intervals = pd.cut(pd.to_datetime(Data["Time"]).dt.hour,
                        bins=[0, 12, 18, 24],
                        labels=["Morning", "Afternoon", "Evening"], 
                        right=False)

Data["Time Interval"] = time_intervals

# Variations in Step Count and Walking Speed by Time Interval
fig7 = px.scatter(Data, x="Step Count",
                  y="Walking Speed",
                  color="Time Interval",
                  title="Step Count and Walking Speed Variations by Time Interval",
                  trendline='ols')
fig7.update_layout(title_text="Step Count and Walking Speed Variations by Time Interval", title_x=0.5)
fig7.show()


Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.



Let's compare the daily averages across all health and fitness metrics.

In [None]:
daily_avg_metrics = Data.groupby("Date").mean().reset_index()

daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], 
                                                  value_vars=["Step Count", "Distance", 
                                                              "Energy Burned", "Flights Climbed", 
                                                              "Walking Double Support Percentage", 
                                                              "Walking Speed"])

# Treemap of Daily Averages for Different Metrics Over Several Weeks
fig8 = px.treemap(daily_avg_metrics_melted,
                 path=["variable"],
                 values="value",
                 color="variable",
                 hover_data=["value"],
                 title="Daily Averages for Different Metrics")
fig8.show()

The treemap graph visualizes each health and fitness metric as a rectangular tile. The size of each tile indicates the metric's average value, and the color represents the specific metric. Hovering over the tiles displays precise average values for each metric, providing detailed insights into the data.

Due to Step Count's consistently higher numerical values compared to other metrics, it overwhelms the visualization, hindering effective visualization of variations in the other metrics. To address this, let's reexamine the visualization without including Step Count.

In [None]:
metrics_to_visualize = ["Distance", "Energy Burned", "Flights Climbed", 
                        "Walking Double Support Percentage", "Walking Speed"]
daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], value_vars=metrics_to_visualize)
fig = px.treemap(daily_avg_metrics_melted,
                 path=["variable"],
                 values="value",
                 color="variable",
                 hover_data=["value"],
                 title="Daily Averages for Different Metrics (Excluding Step Count)")
fig.show()

This demonstrates how you can analyze and manage fitness data using Python.

In summary, this is how you can conduct Fitness Data Analysis using Python. Analyzing data from fitness wearables is essential for businesses in the health and wellness industry. By examining user data, companies can gain insights into user behavior, provide personalized solutions, and contribute to enhancing overall health and well-being. I hope you found this article on Fitness Watch Data Analysis using Python informative. Please feel free to share your questions and comments in the section below.