### 📊 **Fitness Data Analysis**
As a Data Analyst for a **Fitness technology** company, I conducted an exploratory analysis of **smartwatch** data to uncover patterns in *user activity, mood, calorie expenditure, and sleep.*

**Dataset Overview**

* **Step Count -** Total daily steps recorded

* **Mood -** Self-reported emotional state (Happy, Sad, Neutral)

* **Calories Burned -** Daily energy expenditure

* **Hours of Sleep -** Total nightly sleep duration

* **Active Status -** Whether the user was active or inactive on a given day

\
#### **Objective**
The goal is to identify lifestyle patterns and potential recommendations to improve user wellness through data-driven insights, which could inform smartwatch features like personalized activity goals, sleep reminders, and mood-based alerts.

In [1]:
# load the data to the Colab Notebook
!gdown 1q93B2JaTMIs25wkAv9LkjdRTK_4ZZlCy

Downloading...
From: https://drive.google.com/uc?id=1q93B2JaTMIs25wkAv9LkjdRTK_4ZZlCy
To: /content/fit.txt
  0% 0.00/3.43k [00:00<?, ?B/s]100% 3.43k/3.43k [00:00<00:00, 19.3MB/s]


In [2]:
# import the necessary libraries
import numpy as np

In [3]:
# load data
data = np.loadtxt('fit.txt', dtype='str')

  data = np.loadtxt('fit.txt', dtype='str')


In [None]:
# detail of the data

data.ndim

2

In [None]:
data.shape

(96, 6)

#### **Let's Analyse on the 2 dimensional array/data or metrix**

In [None]:
# few records to understand the pattern of the data
data[:5]

array([['06-10-2017', '5464', 'Neutral', '181', '5', 'Inactive'],
       ['07-10-2017', '6041', 'Sad', '197', '8', 'Inactive'],
       ['08-10-2017', '25', 'Sad', '0', '5', 'Inactive'],
       ['09-10-2017', '5461', 'Sad', '174', '4', 'Inactive'],
       ['10-10-2017', '6915', 'Neutral', '223', '5', 'Active']],
      dtype='<U10')

All the features in the form of string because numpy arrays conclude homogeneous data , so we have to seggregate the data accordingly and assign them to features accordingly .

In [4]:
# seggregate the data into diff variable

date = data[:, 0]

step_count = data[:, 1].astype('int')

mood = data[:, 2]

calories = data[:, 3].astype('int')

sleep = data[:, 4].astype('int')

activity = data[:, 5]

### Ques : What is the average step count on "Active" days?

In [None]:
np.mean(step_count[activity == 'Active'])

np.float64(3226.5714285714284)

In [None]:
np.mean(step_count[data[:, 5] == 'Active'])

np.float64(3226.5714285714284)

### Ques : How many days had more than 5000 steps and burned more than 150 calories?

In [None]:
len((step_count > 5000) & (calories > 150)) # len will return the overall length doesnt matter is it True or False , so this output is not correct

96

In [None]:
np.sum((step_count > 5000) & (calories > 150))

np.int64(17)

17 days when the steps are more than 5000 and calories burned more than 150

#### Ques : What percentage of days had a "Sad" mood?

In [None]:
len(date[mood == 'Sad']) / len(date) * 100

30.208333333333332

In [None]:
len(data[data[:, 2] == 'Sad']) / len(data) * 100

30.208333333333332

#### Ques : What is the maximum number of steps taken on days with less than 6 hours of sleep?

In [None]:
np.max(step_count[sleep < 6])

np.int64(7422)

In [None]:
np.max(data[data[:, 4].astype('int') < 6][:, 1].astype('int'))

np.int64(7422)

#### Ques : What are the average calories burned per hour of sleep on "Inactive" days?

In [None]:
np.mean(data[:, -3].astype('int')[data[:, -1] == 'Inactive']/data[:, -2].astype('int')[data[:, -1] == 'Inactive'])

np.float64(18.62282480893592)

In [None]:
inactive = activity == 'Inactive'

np.mean(calories[inactive]/sleep[inactive])

np.float64(18.62282480893592)

#### Ques : Which mood had the highest average step count?

In [None]:
np.unique(mood)

array(['Happy', 'Neutral', 'Sad'], dtype='<U10')

In [None]:
print(np.mean(step_count[mood == 'Happy']))
print(np.mean(step_count[mood == 'Neutral']))
print(np.mean(step_count[mood == 'Sad']))

3392.725
3153.777777777778
2103.0689655172414


In [None]:
# return list of mean value of all the step_count in term of each mood

[np.mean(step_count[mood == m]) for m in np.unique(mood)]

[np.float64(3392.725),
 np.float64(3153.777777777778),
 np.float64(2103.0689655172414)]

In [None]:
# find the max value's index number

np.argmax([np.mean(step_count[mood == m]) for m in np.unique(mood)])

np.int64(0)

0th index position of np.unique(mood) is return the maximum step count

In [None]:
# pass the max value's index number
np.unique(mood)[np.argmax([np.mean(step_count[mood == m]) for m in np.unique(mood)])]

np.str_('Happy')

In [None]:
np.unique(data[:, 2])[np.argmax([np.mean(data[:, 1].astype('int')[data[:, 2] == m ]) for m in np.unique(data[:, 2])])]

np.str_('Happy')

When the mood is Happy , the step count of person is more

#### Ques : Calculate the correlation between step count and calories burned.

In [None]:
np.corrcoef(step_count, calories)

array([[1.       , 0.9892597],
       [0.9892597, 1.       ]])

In [None]:
np.corrcoef(data[:, 1].astype('int'), data[:, -3].astype('int'))

array([[1.       , 0.9892597],
       [0.9892597, 1.       ]])

* **1.0:** A correlation of 1.0 indicates a perfect positive linear relationship, which is highly unlikely for real-world data like steps and calories.
* **0.66:** This value represents a weaker positive correlation than the actual calculated value.
* **0.75:** This value also represents a weaker positive correlation than the actual calculated value. These options are specific numbers and do not match the precise calculation from the data.

In [None]:
np.corrcoef(data[:, 1].astype('int'), data[:, -3].astype('int'))[0, 1]

np.float64(0.9892596952985985)

#### Ques : On how many days was the mood "Sad" and sleep less than the median sleep?

In [None]:
np.sum((sleep < np.median(sleep)) & (mood== 'Sad'))

np.int64(10)

In [None]:
np.sum((data[:, -2].astype('int') < np.median(data[:,-2].astype('int'))) & (data[:, 2]== 'Sad'))

np.int64(10)

There are total 10 days when the mood is Sad and Sleep is less than the median of sleep which means the sleep he usually takes

#### Ques : What is the standard deviation of step count only for "Happy" mood days?

In [None]:
np.std(step_count[mood == 'Happy'])

np.float64(2088.4016254961593)

In [None]:
np.std(data[:, 1].astype('int')[data[:, 2] == 'Happy'])

np.float64(2088.4016254961593)

#### Quess : What is the most frequent number of hours slept?

In [None]:
np.unique(sleep)

array([2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
np.argmax([np.sum(sleep == s) for s in np.unique(sleep)])

np.int64(3)

In [None]:
np.unique(sleep)[np.argmax([np.sum(sleep == s) for s in np.unique(sleep)])]

np.int64(5)

This person sleeps mostly for 5 hours

#### Ques : Which mood has the lowest average calories burned on days with more than the median hours of sleep and a step count above 4000?

In [6]:
np.unique(mood)

array(['Happy', 'Neutral', 'Sad'], dtype='<U10')

In [7]:
# array of calories of each mood based on the condition where People who slept more than the median sleep for that mood & had more than 4000 steps.
[calories[mood == m][(sleep[mood == m] > np.median(sleep[mood == m])) & (step_count[mood == m] > 4000)] for m in np.unique(mood)]

[array([150, 150, 154, 193, 139, 192, 234, 131, 194]),
 array([129]),
 array([197, 149, 140, 220])]

In [8]:
# find out the avg calories of each mood
[np.mean(calories[mood == m][(sleep[mood == m] > np.median(sleep[mood == m])) & (step_count[mood == m] > 4000)]) for m in np.unique(mood)]

[np.float64(170.77777777777777), np.float64(129.0), np.float64(176.5)]

In [9]:
# find out the index value of min calories
np.argmin([np.mean(calories[mood == m][(sleep[mood == m] > np.median(sleep[mood == m])) & (step_count[mood == m] > 4000)]) for m in np.unique(mood)])

np.int64(1)

In [10]:
# pass the index number to the unique(mood) so that we can identify which mood has low calories based on specified condition
np.unique(mood)[np.argmin([np.mean(calories[mood == m][(sleep[mood == m] > np.median(sleep[mood == m])) & (step_count[mood == m] > 4000)]) for m in np.unique(mood)])]

np.str_('Neutral')

In [12]:
# but this not appears neat and clean
np.unique(data[:, 2])[np.argmin([np.mean(data[:, 3].astype('int')[data[:, 2]== m][(data[:, 4].astype('int')[data[:, 2]== m] > np.median(data[:, 4].astype('int')[data[:, 2]==m])) & (data[:, 1].astype('int')[data[:, 2]==m]> 4000)]) for m in np.unique(data[:, 2])])]

np.str_('Neutral')

#### Ques : What is the longest streak (in days) where 'Inactive' days had calories burned less than the daily mean across the full dataset?

In [15]:
count = 0
long_streak = 0

for i in range(len(activity)):
  if activity[i] == 'Inactive' and calories[i] < np.mean(calories):
    count += 1
    long_streak = max(count, long_streak)
  else :
    count = 0

print(long_streak)

8


#### Ques : On days when the person was 'Sad' and burned fewer than average calories, what is the median step count?

In [16]:
np.median(step_count[(mood == 'Sad') & (calories < np.mean(calories))])

np.float64(651.0)

#### Ques : How many days had below-median step count, above-average sleep, and the mood was not 'Sad'?

In [17]:
np.sum((step_count < np.mean(step_count)) & (sleep > np.mean(sleep)) & (mood != 'Sad'))

np.int64(13)

* 13 participants slept more than average but walked less than average and weren’t in a sad mood.
* Encourage these individuals to increase daily activity to match their good sleep patterns for better overall health.

#### Ques : What is the correlation between hours of sleep and calories burned, only for days marked as 'Active'?

In [18]:
np.corrcoef(sleep[activity == 'Active'], calories[activity == 'Active'])

array([[1.        , 0.03816229],
       [0.03816229, 1.        ]])

A correlation of 0.04 means there’s essentially no linear relationship between sleep and calories burned for “Active” participants — neither good nor bad by itself, just indicating that in this group, sleep duration doesn’t meaningfully affect calorie burn.

#### Ques : Which date corresponds to the day with the highest ratio of calories burned to hours of sleep?

In [22]:
ratio = calories/sleep

In [24]:
np.argmax(ratio) # returns the index value

np.int64(32)

In [25]:
data[np.argmax(ratio), 0]

np.str_('07-11-2017')

#### Ques : Find the 3-day period (consecutive days) with the maximum total calories burned. What is the starting date of that period?

In [26]:
lst_of_total_calories = []

for i in range(len(calories)- 2):
  total_calories_for_3_days = calories[i] + calories[i+1] + calories[i+2]
  lst_of_total_calories.append(total_calories_for_3_days)

data[np.argmax(lst_of_total_calories), 0]

np.str_('12-12-2017')

In [28]:
#Alternate Way :

# Compute rolling 3-day calorie sums
window_sum = np.array([np.sum(calories[i:i+3]) for i in range(len(calories)-2)])

data[np.argmax(window_sum), 0]

np.str_('12-12-2017')

The highest-calorie 3-day stretch started on 12 Dec 2017, indicating peak activity in that period.