### INTRODUCTION :-

Fitness Watch Data Analysis involves analyzing the data collected by fitness wearables or smartwatches to gain insights into users’ health and activity patterns. These devices track metrics like steps taken, energy burned, walking speed, and more.

Fitness Watch Data Analysis is a crucial tool for businesses in the health and wellness domain. By analyzing user data from fitness wearables, companies can understand user behaviour, offer personalized solutions, and contribute to improving users’ overall health and well-being.

Below is the process we can follow while working on the problem of Fitness Watch Data Analysis:

* Collect data from fitness watches, ensuring it’s accurate and reliable.
* Perform EDA to gain initial insights into the data.
* Create new features from the raw data that might provide more meaningful insights.
* Create visual representations of the data to communicate insights effectively.
* Segment user’s activity based on time intervals or the level of fitness metrics and analyze their performance.

In [1]:
## Importing all the necessary libraries
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
import plotly.express as px

In [2]:
## Reading the data
df = pd.read_csv('D:\\Data Science and Machine Learning Projects\\DataSets\\Apple-Fitness-Data.csv')

In [3]:
print(df.head())

         Date       Time  Step Count  Distance  Energy Burned  \
0  2023-03-21  16:01:23           46   0.02543         14.620   
1  2023-03-21  16:18:37          645   0.40041         14.722   
2  2023-03-21  16:31:38           14   0.00996         14.603   
3  2023-03-21  16:45:37           13   0.00901         14.811   
4  2023-03-21  17:10:30           17   0.00904         15.153   

   Flights Climbed  Walking Double Support Percentage  Walking Speed  
0                3                              0.304          3.060  
1                3                              0.309          3.852  
2                4                              0.278          3.996  
3                3                              0.278          5.040  
4                3                              0.281          5.184  


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 8 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Date                               149 non-null    object 
 1   Time                               149 non-null    object 
 2   Step Count                         149 non-null    int64  
 3   Distance                           149 non-null    float64
 4   Energy Burned                      149 non-null    float64
 5   Flights Climbed                    149 non-null    int64  
 6   Walking Double Support Percentage  149 non-null    float64
 7   Walking Speed                      149 non-null    float64
dtypes: float64(4), int64(2), object(2)
memory usage: 9.4+ KB


In [5]:
## Checking if the data contains any nulls
print(df.isnull().sum())

Date                                 0
Time                                 0
Step Count                           0
Distance                             0
Energy Burned                        0
Flights Climbed                      0
Walking Double Support Percentage    0
Walking Speed                        0
dtype: int64


In [6]:
## Step Count Over Time
fig1 = px.line(df, x = "Time",
               y= "Step Count",
               title= "Step Count Over Time")

fig1.show()

In [7]:
## Distance covered over time
fig2 = px.line(df, x = 'Time',
               y = 'Distance',
               title = 'Distance Covered Over Time')

fig2.show()

In [8]:
## Energy Burned over Time
fig3 = px.line(df, x = 'Time',
               y = 'Energy Burned',
               title = 'Energy Burned over Time')

fig3.show()

In [9]:
## Walking Speed over Time
fig4 = px.line(df,x = 'Time',
               y = 'Walking Speed',
               title = 'Walking Speed over Time')

fig4.show()

#### Calculate Average Step Counts per day

In [10]:
average_step_counts_per_day = df.groupby('Date')['Step Count'].mean().reset_index()

In [11]:
average_step_counts_per_day

Unnamed: 0,Date,Step Count
0,2023-03-21,137.636364
1,2023-03-22,354.233333
2,2023-03-23,109.125
3,2023-03-24,64.666667
4,2023-03-25,117.0
5,2023-03-26,101.0
6,2023-03-27,48.875
7,2023-03-28,163.75
8,2023-03-29,169.578947
9,2023-03-30,384.181818


In [12]:
## Plotting Calculate Average Step Counts per day
fig5 = px.bar(average_step_counts_per_day, x = 'Date',
               y = 'Step Count',
               title = 'Average Step Count Per Day')

fig5.update_xaxes(type='category')
fig5.show()

#### Calculate Walking Efficiency

In [13]:
df['Walking Efficiency'] = df['Distance']/df['Step Count']

In [14]:
df['Walking Efficiency']

0      0.000553
1      0.000621
2      0.000711
3      0.000693
4      0.000532
         ...   
144    0.000675
145    0.000551
146    0.000675
147    0.000662
148    0.000628
Name: Walking Efficiency, Length: 149, dtype: float64

In [15]:
## Plotting Walking Efficiency
fig6 = px.line(df,x = 'Time',
               y = 'Walking Efficiency',
               title = 'Walking Efficiency Over Time')

fig6.show()

#### Step Count and Walking Speed Variations by Time Intervals

In [16]:
df["Time"] = df["Time"].str.strip()  # Remove leading and trailing spaces

In [17]:
time_intervals = pd.cut(
    pd.to_datetime(df["Time"], format="%H:%M:%S").dt.hour,
    bins=[0, 12, 18, 24],
    labels=["Morning", "Afternoon", "Evening"],
    right=False
)

In [18]:
df['Time Intervals'] = time_intervals

In [19]:
df['Time Intervals']

0      Afternoon
1      Afternoon
2      Afternoon
3      Afternoon
4      Afternoon
         ...    
144    Afternoon
145    Afternoon
146    Afternoon
147    Afternoon
148    Afternoon
Name: Time Intervals, Length: 149, dtype: category
Categories (3, object): ['Morning' < 'Afternoon' < 'Evening']

In [20]:
## Variations in Step Count and Walking Speed by Time Interval
fig7 = px.scatter(df, x = 'Step Count',
                  y = 'Walking Speed',
                  color = 'Time Intervals',
                  title = 'Step Count and Walking Speed Variations by Time Interval',
                  trendline='ols')

fig7.show()

####  Daily Average of all the Health and Fitness Metrics

In [21]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

In [22]:
# Select only numeric columns for aggregation
numeric_data = df.select_dtypes(include='number')

In [23]:
# Print column names to check for any inconsistencies
print(df.columns)

Index(['Date', 'Time', 'Step Count', 'Distance', 'Energy Burned',
       'Flights Climbed', 'Walking Double Support Percentage', 'Walking Speed',
       'Walking Efficiency', 'Time Intervals'],
      dtype='object')


In [24]:
print(df.dtypes)

Date                                 datetime64[ns]
Time                                         object
Step Count                                    int64
Distance                                    float64
Energy Burned                               float64
Flights Climbed                               int64
Walking Double Support Percentage           float64
Walking Speed                               float64
Walking Efficiency                          float64
Time Intervals                             category
dtype: object


In [25]:
# Check for any missing values in the relevant columns
print(df.isnull().sum())

# If necessary, fill missing values or drop rows with NaN
data = df.dropna(subset=["Step Count", "Distance", "Energy Burned", 
                           "Flights Climbed", "Walking Double Support Percentage", 
                           "Walking Speed"])


Date                                 0
Time                                 0
Step Count                           0
Distance                             0
Energy Burned                        0
Flights Climbed                      0
Walking Double Support Percentage    0
Walking Speed                        0
Walking Efficiency                   0
Time Intervals                       0
dtype: int64


In [26]:
# Group by Date and calculate the mean for the numeric columns
daily_avg_metrics = data.groupby("Date").agg({
    "Step Count": "mean",
    "Distance": "mean",
    "Energy Burned": "mean",
    "Flights Climbed": "mean",
    "Walking Double Support Percentage": "mean",
    "Walking Speed": "mean"
}).reset_index()

# Check the aggregated data
print(daily_avg_metrics.head())


        Date  Step Count  Distance  Energy Burned  Flights Climbed  \
0 2023-03-21  137.636364  0.086225      14.721273         2.909091   
1 2023-03-22  354.233333  0.230261      15.158233         2.466667   
2 2023-03-23  109.125000  0.075796      14.303000         2.375000   
3 2023-03-24   64.666667  0.042067      15.268667         2.666667   
4 2023-03-25  117.000000  0.080747      15.060222         2.555556   

   Walking Double Support Percentage  Walking Speed  
0                           0.294273       4.352727  
1                           0.310467       3.502800  
2                           0.312375       3.762000  
3                           0.307333       3.936000  
4                           0.297778       3.520000  


In [27]:
daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], 
                                                  value_vars=["Step Count", "Distance", 
                                                              "Energy Burned", "Flights Climbed", 
                                                              "Walking Double Support Percentage", 
                                                              "Walking Speed"])

In [28]:
# Treemap of Daily Averages for Different Metrics Over Several Weeks
fig8 = px.treemap(daily_avg_metrics_melted,
                 path=["variable"],
                 values="value",
                 color="variable",
                 hover_data=["value"],
                 title="Daily Averages for Different Metrics")
fig8.show()

In [29]:
## Select metrics excluding Step Count
metrics_to_visualize = ["Distance", "Energy Burned", "Flights Climbed", 
                        "Walking Double Support Percentage", "Walking Speed"]

In [30]:
## Reshape data for treemap
daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], value_vars=metrics_to_visualize)

In [31]:
fig9 = px.treemap(daily_avg_metrics_melted,
                 path=["variable"],
                 values="value",
                 color="variable",
                 hover_data=["value"],
                 title="Daily Averages for Different Metrics (Excluding Step Count)")
fig9.show()