## Smartwatch data analysis

In [23]:
import numpy as np

In [24]:
import pandas as pd

In [3]:
import matplotlib.pyplot as plt

In [4]:
import plotly.express as px

In [5]:
import plotly.graph_objects as go

In [6]:
data = pd.read_csv('dailyActivity_merged.csv')

In [7]:
print(data.head())

           Id ActivityDate  TotalSteps  TotalDistance  TrackerDistance  \
0  1503960366    4/12/2016       13162           8.50             8.50   
1  1503960366    4/13/2016       10735           6.97             6.97   
2  1503960366    4/14/2016       10460           6.74             6.74   
3  1503960366    4/15/2016        9762           6.28             6.28   
4  1503960366    4/16/2016       12669           8.16             8.16   

   LoggedActivitiesDistance  VeryActiveDistance  ModeratelyActiveDistance  \
0                       0.0                1.88                      0.55   
1                       0.0                1.57                      0.69   
2                       0.0                2.44                      0.40   
3                       0.0                2.14                      1.26   
4                       0.0                2.71                      0.41   

   LightActiveDistance  SedentaryActiveDistance  VeryActiveMinutes  \
0                 6.06

In [8]:
print(data.isnull().sum())

Id                          0
ActivityDate                0
TotalSteps                  0
TotalDistance               0
TrackerDistance             0
LoggedActivitiesDistance    0
VeryActiveDistance          0
ModeratelyActiveDistance    0
LightActiveDistance         0
SedentaryActiveDistance     0
VeryActiveMinutes           0
FairlyActiveMinutes         0
LightlyActiveMinutes        0
SedentaryMinutes            0
Calories                    0
dtype: int64


In [9]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Id                        940 non-null    int64  
 1   ActivityDate              940 non-null    object 
 2   TotalSteps                940 non-null    int64  
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64  
 11  FairlyActiveMinutes       940 non-null    int64  
 12  LightlyActiveMinutes      940 non-null    int64  
 13  SedentaryMinutes          940 non-null    int64  
 14  Calories  

change the ActivityDate which is object to a datetime, because we will use dates in our analysis

In [10]:

data["ActivityDate"] = pd.to_datetime(data["ActivityDate"], 
                                      format="%m/%d/%Y")

In [11]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Id                        940 non-null    int64         
 1   ActivityDate              940 non-null    datetime64[ns]
 2   TotalSteps                940 non-null    int64         
 3   TotalDistance             940 non-null    float64       
 4   TrackerDistance           940 non-null    float64       
 5   LoggedActivitiesDistance  940 non-null    float64       
 6   VeryActiveDistance        940 non-null    float64       
 7   ModeratelyActiveDistance  940 non-null    float64       
 8   LightActiveDistance       940 non-null    float64       
 9   SedentaryActiveDistance   940 non-null    float64       
 10  VeryActiveMinutes         940 non-null    int64         
 11  FairlyActiveMinutes       940 non-null    int64         
 12  LightlyActiveMinutes  

at these data we have columns that give information about; very active, fairly active, light active, sedentary active, we can combine these all to get total minutes before moving forward

In [12]:
data['Totalminutes'] = data['VeryActiveMinutes'] + data['FairlyActiveMinutes'] +  data['LightlyActiveMinutes'] + data['SedentaryMinutes']

In [13]:
print(data['Totalminutes'].sample(5))

301    1440
778    1440
617    1440
102    1440
248     922
Name: Totalminutes, dtype: int64


Now, we can have a look at the descriptive statistics of the dataset

In [14]:
print(data.describe())

                 Id    TotalSteps  TotalDistance  TrackerDistance  \
count  9.400000e+02    940.000000     940.000000       940.000000   
mean   4.855407e+09   7637.910638       5.489702         5.475351   
std    2.424805e+09   5087.150742       3.924606         3.907276   
min    1.503960e+09      0.000000       0.000000         0.000000   
25%    2.320127e+09   3789.750000       2.620000         2.620000   
50%    4.445115e+09   7405.500000       5.245000         5.245000   
75%    6.962181e+09  10727.000000       7.712500         7.710000   
max    8.877689e+09  36019.000000      28.030001        28.030001   

       LoggedActivitiesDistance  VeryActiveDistance  ModeratelyActiveDistance  \
count                940.000000          940.000000                940.000000   
mean                   0.108171            1.502681                  0.567543   
std                    0.619897            2.658941                  0.883580   
min                    0.000000            0.000000   



## Analyzing The Smartwatch Data 

Now, let's have a look at what we can analyize using dataset, and what are the information we can got from it 

firstly, we have a column which is calories; which contains the number of calories that have burned in a day , let's have a look at the relationship between calories that burned and total steps walked in a day

In [15]:
cal_ste = px.scatter(data_frame = data, x="Calories",
                    y="TotalSteps", size="VeryActiveMinutes", 
                    trendline="ols", 
                    title="Relationship between Calories & Total Steps")

In [1]:
cal_ste.show()

NameError: name 'cal_ste' is not defined

From the graph we can see that there is a linear relationships between Calories burned and total steps. 

Now, let's have a look at the average total number of active minutes in a days

In [17]:
label = ["veryactiveminutes", "Fairlyactiveminutes", "Lightlyactiveminutes", "Inactiveminutes"]
counts = data[["VeryActiveMinutes", "FairlyActiveMinutes", 
               "LightlyActiveMinutes", "SedentaryMinutes"]].mean()
colors = ["blue",'Green',"red","Orange"]
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Total Active Minutes')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()

81.3% of total inactive minute in a day 
15.8% of lightly active minute in a day 
on average 21.16 (1.74%) were very active minute in a day 
and only 13.56 (1.11%) were fairly active minute 

We transformed the data type of the ActivityDate column to the datetime column above. 
Let’s use it to find the weekdays of the records and add a new column to this dataset as “Day”

In [18]:
data['Days'] = data['ActivityDate'].dt.day_name() 

In [19]:
print(data['Days'].head())

0      Tuesday
1    Wednesday
2     Thursday
3       Friday
4     Saturday
Name: Days, dtype: object


Now, lets have a look at the different active minutes on each days of the week

In [20]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=data["Days"],
    y=data["VeryActiveMinutes"],
    name='Very Active',
    marker_color='purple'
))
fig.add_trace(go.Bar(
    x=data["Days"],
    y=data["FairlyActiveMinutes"],
    name='Fairly Active',
    marker_color='green'
))
fig.add_trace(go.Bar(
    x=data["Days"],
    y=data["LightlyActiveMinutes"],
    name='Lightly Active',
    marker_color='pink'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)

now, let's have a look at the inactive minutes on each day of week:


In [21]:
day = data["Days"].value_counts()
label = day.index
counts = data["SedentaryMinutes"]
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Inactive Minutes Daily')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3)))

So Thursday is the most inactive day according to the lifestyle of all the individuals in this dataset.

Now let’s have a look at the number of calories burned on each day of the week:

In [22]:
calories = data["Days"].value_counts()
label = calories.index
counts = data["Calories"]
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Calories Burned Daily')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3)))

So, from the pie above we can say that nearly all the calories burned among the week have almost the same percentage, but if we
want to get aspecific day, we can say that tuesday have the most number of calories that were burned. 

So this is how we can analyze smartwatch data using the Python programming language packages. 

In [25]:
import jovian

In [None]:
jovian.commit()

## Summary

Smartwatch data analysis information will be prefered for those who cares about their fitness and how they can improve their
health care. and here, we have analyze using python data analysis packages the data that collected from those people.