In [None]:
# !pip install pandas_profiling
!pip install calplot
# import the libraries 
import numpy as np
import pandas as pd
import os
from pandas_profiling import ProfileReport

### 1 Data introduction
This data set was downloaded from Kaggle web repository by the following URL: https://www.kaggle.com/datasets/arashnic/fitbit/

Data collected from 30 eligible Fitbit users' personal tracker data (with consent). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.


In total, there are 18 CSV files with a combining total of 338 MB:

* dailyActivity_merged: A comprehensive file containing information about users' activity in different days.
* dailyCalories_merged: A file containing the amount of calories burnt for each day.
* dailyIntensities_merged: A file detailings the various user's states during the day.
* dailySteps_merged: Contains info about the amount of steps taken per day.
* heartrate_seconds_merged: Containing the heart rate monitored during differect periods of the day.
* hourlyCalories_merged: Describing the amount of calories burnt by hour.
* hourlyIntensities_merged: A file with hourly intensities info.
* hourlySteps_merged: File recordings the amount of steps taken per hour.
* minuteCaloriesNarrow_merged: A file with the overall amount of calories burnt per minute.
* minuteCaloriesWide_merged: File with details on WHAT KIND of calories burnt.
* minuteMETsNarrow_merged: Data about the Metabolic equivalents of user per minute.
* minuteSleep_merged: Unclear.
* minuteStepsNarrow_merged: Overall steps per minute.
* minuteStepsWide_merged: Details the step for each second of a minute.
* sleepDay_merged: Sleeping details for each day.
* weightLogInfo_merged: Daily weigh log.

Notable/Potential CSV files: dailyACtivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, sleepDay_merged, weightLogInfo_merged. These are the one that stood out with the most relevant information on how we can observe users' behaviors with smart health trackers.

#### Reading the data

In [None]:
os.chdir(r"D:\Download\archive (1)\Fitabase Data 4.12.16-5.12.16")

daily_activities = pd.read_csv("dailyActivity_merged.csv")
daily_calories = pd.read_csv("dailyCalories_merged.csv")
daily_intensity = pd.read_csv("dailyIntensities_merged.csv")
daily_steps = pd.read_csv("dailySteps_merged.csv")
seconds_heartrate = pd.read_csv("heartrate_seconds_merged.csv")
daily_sleep = pd.read_csv("sleepDay_merged.csv")
daily_weight = pd.read_csv("weightLogInfo_merged.csv")
minute_METs = pd.read_csv("minuteMETsNarrow_merged.csv")


In [None]:
daily_activities.head()

In [None]:
daily_activities.info()

In [None]:
profile = ProfileReport(daily_activities, title="Pandas Profiling Report")
profile.to_notebook_iframe()

In [None]:
profile = ProfileReport(daily_sleep, title="Pandas Profiling Report")
profile.to_notebook_iframe()

In [None]:
profile = ProfileReport(daily_weight, title="Pandas Profiling Report")
profile.to_notebook_iframe()

### Analysis directions

Possible questions to be answered:

* What do the data sets involve?
* Is there any trend on user usage ?
* How can the analysis be helpful for Redback Operations?


#### The unique ids of each data frame

(1.) daily_activity data available for 2016-04-12 and 2016-05-12 with 33 unique IDs.

(2.) hourly_activity data available for 2016-04-12 and 2016-05-12 with with 33 unique Ids.

(3.) sleep data available for 2016-04-12 and 2016-05-12 with with 24 unique Ids.

(4.) heartrate data available for 2016-04-12 and 2016-05-12 with with 14 unique Ids.

(5.) weight_log data available for 2016-04-12 and 2016-05-12 with with 8 unique Ids.

#### Is there any trend in daily usage?

Let's first gather few ideas into how users spent their steps at a day level

In [None]:
# Get number of users used their devices each day:
import calplot

obs_users=daily_activities

obs_users["ActivityDate"] = pd.to_datetime(obs_users["ActivityDate"])
obs_users=daily_activities.groupby(["ActivityDate"])["Id"].nunique()

pl1 = calplot.calplot(data = obs_users,how = 'sum', cmap = 'Blues', suptitle = "Number of Users Used Devices by Day")

In [None]:
obs_users.value_counts()

In [None]:
obs_users.value_counts(normalize=True)*100

Daily usage at first look:

Within an 31 days of data recorded, we can see few information:

Of a total of 33 Ids (100%), the number of users who used their devices daily can vary from as little as 3% (21 users) to as many as 45% (32 users) each day. The greatest number of users per day is around 4 times that of the least number of user per day.

Participants used their devices more frequently in the first half of the period than days towards the end.

#### Summary of analysis
Based on the results of profiler, data seems to be cleaned with no obvious missing values. There is no duplicated row.

Unfortunately, the data does not include any information about users who use the device for strength training. While strength training would have been beneficial, analysis would have been preferable. The data does, however, provide information on steps, calories, and total distance, which was used in the analysis stage.

Certain expected outcomes and trends were discovered after analyzing the data. It is a foregone conclusion that daily activities such as walking, running, and exercising are beneficial to humans and aid in the burning of calories. This was demonstrated in the total distance vs total steps plot, which illustrated that the more steps and distance moved, the more calories burned.

Due to the limited timelines in which the data was collected, there was little information on individual weights. Due to a limited timeframe, a statistical summarization of each ID was generated, which revealed minimal changes in weight.



The device for sleep tracking has fewer users than the device for daily activity tracking, according to an analysis of the device. In comparison to daily activity tracking, 9 fewer users use their device for sleep. 50 percent of those users use their device for more than 20 days, 38 percent for 11-20 days, and 38 percent for 10 or fewer days.

According to the data, more people are using their devices to track daily activities such as activity levels, steps, distance, and calories. 87.9 percent of users have a high usage rate, 9.1 percent have a moderate usage rate, and 3 percent have a low usage rate.
