### Preliminary Fitness Clustering Analysis

In our initial fitness clustering analysis, we utilized the features 'TotalSteps,' 'VeryActiveDistance,' and 'Calories' to evaluate Fitbit users' general fitness and activity levels. While these features provide valuable insights into physical activity and calorie expenditure, it's important to note that the 'Calories' feature may introduce bias into our clustering results.

Calorie consumption is influenced by various factors, such as metabolic rates, body size, and gender. This influence can potentially lead to users with different characteristics being assigned to separate clusters. To mitigate this potential bias and achieve more accurate fitness clustering, we plan to incorporate additional data, including user-specific information like age, weight, and gender. These variables will allow us to normalize calorie data based on individual characteristics, providing a fairer assessment of fitness levels.

Our analysis is ongoing, and we are committed to refining our methodology. For now, we have used the 'Calories' feature as a starting point, recognizing its limitations but appreciating its significance in capturing overall physical activity.

In [5]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import plotly.express as px

In [16]:
# Load and inspect data
df_daily_activity_merged = pd.read_csv('data/dailyActivity_merged.csv')
df_daily_activity_merged.head()
df_daily_activity_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Id                        940 non-null    int64  
 1   ActivityDate              940 non-null    object 
 2   TotalSteps                940 non-null    int64  
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64  
 11  FairlyActiveMinutes       940 non-null    int64  
 12  LightlyActiveMinutes      940 non-null    int64  
 13  SedentaryMinutes          940 non-null    int64  
 14  Calories  

In [None]:
# Select subset of features
subset = df_daily_activity_merged[['TotalSteps', 'VeryActiveDistance', 'Calories']]

# Standardize the features
scaler = StandardScaler()
scaled_data = scaler.fit_transform(subset)

: 

In [None]:
# Apply KMeans clustering algorithm
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
df_daily_activity_merged['ActivityLevel'] = kmeans.fit_predict(scaled_data)

: 

In [None]:
# Visualize the clustering results
fig = px.scatter_3d(
    df_daily_activity_merged,
    x='TotalSteps',
    y='VeryActiveDistance',
    z='Calories',
    color='ActivityLevel',
    title='Clustering Visualization',
    template='plotly_dark'
)

fig.update_traces(marker=dict(size=3, opacity=0.6, line=dict(width=2, color='DarkSlateGrey')))
fig.update_layout(scene=dict(xaxis_title='TotalSteps', yaxis_title='VeryActiveDistance', zaxis_title='Calories'), scene_aspectmode='cube')
fig.show()

: 

In [None]:
# Analyze cluster activity levels
cluster_activity_means = df_daily_activity_merged.groupby('ActivityLevel').mean(numeric_only=True)
print(cluster_activity_means[['TotalSteps', 'TotalDistance', 'VeryActiveMinutes']])

: 

In [None]:
# Map cluster labels to activity levels
labels = {0: 'Moderately Active', 1: 'Not Very Active', 2: 'Highly Active'}
df_daily_activity_merged['ActivityDescription'] = df_daily_activity_merged['ActivityLevel'].map(labels)
df_daily_activity_merged.head()

: 