## **AVERAGE HEARTBEAT OF VARIOUS SPORT ACTIVITIES** <a id="1"></a>

<a><img style="float: right;" src="https://www.linkpicture.com/q/download1_7.jpg" width="300" /></a>
 



- Dataset source: https://datasets.simula.no/pmdata/

### 1.2 Notebook Preparation <a id="1.2"></a>

This part of the notebook deals with the relevant library import and visual configuration.

In [3]:
# Import libraries

import pandas as pd
import numpy as np 
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.graph_objects as go
import plotly.express as px

from sklearn.preprocessing import StandardScaler

from sklearn.cluster import KMeans
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
from sklearn.metrics import silhouette_samples, silhouette_score

## **2. Data Preparation** <a id="2"></a>

- The below section provides an initial exploration of the available data.
- We are importing several datasets to compute average heartbeat based on various activities


### Dataset 1

In [40]:
# Create URL
url1 = 'pmdata/p01/fitbit/exercise.json'
# Load data
df1 = pd.read_json(url1)
# View the first five rows
df1.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,tcxLink,speed,pace,vo2Max
0,26451905128,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",94,192,1331000,1331000,1878,auto_detected,...,24.384,False,False,,,,,,,
1,26455950499,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",94,302,2202000,2202000,2786,auto_detected,...,27.432,False,False,,,,,,,
2,26467488515,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",98,354,2458000,2458000,3035,auto_detected,...,21.336,False,False,,,,,,,
3,26520401069,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",97,145,1024000,1024000,1284,auto_detected,...,21.336,False,False,,,,,,,
4,26538035127,Walk,90013,"[{'minutes': 3, 'name': 'sedentary'}, {'minute...",93,121,973000,973000,1065,auto_detected,...,3.048,False,False,,,,,,,


In [26]:
# Let us count the number of rows and columns in the dataset.

df1.shape

(190, 26)

In [33]:
# Let us count the types and number of activities in dataframe 1

df1['activityName'].value_counts()

Walk         150
Sport         15
Run           14
Treadmill     11
Name: activityName, dtype: int64

### Dataset 2

In [39]:
# Create URL
url2 = 'pmdata/p02/fitbit/exercise.json'
# Load data
df2 = pd.read_json(url2)
# View the first five rows
df2.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,shouldFetchDetails,distance,distanceUnit,source,speed,pace,tcxLink,swimLengths,poolLength,poolLengthUnit
0,26555895608,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",108.0,105,922000,922000,1139.0,auto_detected,...,False,,,,,,,,,
1,26516538165,Run,90009,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",136.0,544,4315000,4114000,10034.0,tracker,...,False,9.356687,Kilometer,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",8.188609,439.635126,,,,
2,26576453784,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",97.0,186,1280000,1280000,1939.0,auto_detected,...,False,,,,,,,,,
3,26580173239,Sport,15000,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",95.0,258,1792000,1792000,2741.0,auto_detected,...,False,,,,,,,,,
4,26580173240,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",88.0,197,1433000,1433000,1861.0,auto_detected,...,False,,,,,,,,,


In [35]:
# Let us count the number and types of activities in dataframe 2

df2['activityName'].value_counts()

Walk                    203
Run                      43
Weights                  32
Sport                    28
Aerobic Workout           8
Bike                      3
Outdoor Bike              3
Treadmill                 2
Swim                      1
Cross Country Skiing      1
Name: activityName, dtype: int64

### Dataset 3

In [38]:
# Create URL
url3 = 'pmdata/p03/fitbit/exercise.json'
# Load data
df3 = pd.read_json(url3)
# View the first five rows
df3.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,speed
0,26446767579,Walk,90013,"[{'minutes': 3, 'name': 'sedentary'}, {'minute...",107,311,2355000,2355000,3171,auto_detected,...,2019-11-01 11:45:06,11/01/19 11:45:06,2355000,30.48,False,False,,,,
1,26533264129,Walk,90013,"[{'minutes': 4, 'name': 'sedentary'}, {'minute...",103,235,1946000,1946000,2535,auto_detected,...,2019-11-05 14:36:06,11/05/19 14:36:06,1946000,27.432,False,False,,,,
2,26549706578,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",118,144,922000,922000,1429,auto_detected,...,2019-11-06 11:51:04,11/06/19 11:51:04,922000,0.0,False,False,,,,
3,26655580888,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",106,306,2202000,2202000,3283,auto_detected,...,2019-11-11 14:22:34,11/11/19 14:22:34,2202000,27.432,False,False,,,,
4,26726335047,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",99,264,1997000,1997000,3105,auto_detected,...,2019-11-14 16:16:52,11/14/19 16:16:52,1997000,6.096,False,False,,,,


In [37]:
# Let us count the number and types of activities in dataframe 3

df3['activityName'].value_counts()

Walk         54
Treadmill     3
Name: activityName, dtype: int64

### Dataset 4

In [65]:
# Create URL
url4 = 'pmdata/p04/fitbit/exercise.json'
# Load data
df4 = pd.read_json(url4)
# View the first five rows
df4.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,calories,duration,activeDuration,steps,source,logType,...,shouldFetchDetails,averageHeartRate,distance,distanceUnit,heartRateZones,speed,elevationGain,tcxLink,pace,vo2Max
0,26514906767,Hockey,15360,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",370,3000000,3000000,0,"{'type': 'app', 'name': 'Fitbit for iPhone', '...",manual,...,False,,,,,,,,,
1,26514392187,Cross Country Skiing,90015,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",248,2382000,2379000,3236,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",tracker,...,False,132.0,2.097664,Kilometer,"[{'name': 'Out of Range', 'min': 30, 'max': 99...",3.174265,82.906,,,
2,26562951624,Weights,2131,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",215,2780000,2778000,2663,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",tracker,...,False,114.0,,,"[{'name': 'Out of Range', 'min': 30, 'max': 99...",,263.347,,,
3,26587322270,Cross Country Skiing,90015,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",336,2927000,2925000,4081,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",tracker,...,False,139.0,2.662726,Kilometer,"[{'name': 'Out of Range', 'min': 30, 'max': 99...",3.262892,407.822,,,
4,26640397772,Cross Country Skiing,90015,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",321,3221000,3218000,3824,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",tracker,...,False,131.0,2.549914,Kilometer,"[{'name': 'Out of Range', 'min': 30, 'max': 99...",2.853554,62.179,,,


In [44]:
# Let us count the number and types of activities in dataframe 4

df4['activityName'].value_counts()

Walk                    60
Run                     37
Weights                 19
Workout                 11
Treadmill                8
Sport                    8
Cross Country Skiing     6
Aerobic Workout          5
Hike                     5
Hockey                   1
Tennis                   1
Name: activityName, dtype: int64

### Dataset 5

In [43]:
# Create URL
url5 = 'pmdata/p05/fitbit/exercise.json'
# Load data
df5 = pd.read_json(url5)
# View the first five rows
df5.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,speed,pace
0,26472425317,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",99,131,1075000,1075000,1110.0,auto_detected,...,11/01/19 18:56:06,1075000,18.288,False,False,,,,,
1,26480520277,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",102,221,1536000,1536000,2311.0,auto_detected,...,11/02/19 23:47:02,1536000,33.528,False,False,,,,,
2,26681014672,Outdoor Bike,1071,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",108,170,1229000,1229000,,auto_detected,...,11/05/19 14:51:54,1229000,37.744,False,False,,,,,
3,26681014682,Outdoor Bike,1071,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",110,165,1126000,1126000,,auto_detected,...,11/07/19 07:02:44,1126000,32.004,False,False,,,,,
4,26681014684,Walk,90013,"[{'minutes': 9, 'name': 'sedentary'}, {'minute...",91,254,2560000,2560000,2214.0,auto_detected,...,11/07/19 13:24:13,2560000,21.336,False,False,,,,,


In [45]:
# Let us count the number and types of activities in dataframe 5

df5['activityName'].value_counts()

Walk            80
Outdoor Bike    54
Run              4
Treadmill        3
Bike             2
Elliptical       1
Weights          1
Name: activityName, dtype: int64

### Dataset 6

In [46]:
# Create URL
url6 = 'pmdata/p06/fitbit/exercise.json'
# Load data
df6 = pd.read_json(url6)
# View the first five rows
df6.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,distance,distanceUnit,duration,activeDuration,...,pace,lastModified,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,tcxLink,vo2Max
0,26421235751,Interval Workout,20057,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",150,658,5.879976,Kilometer,2406000,2400000,...,407.78253,11/02/19 09:38:13,2019-11-02 08:44:54,11/02/19 08:44:54,2406000,266.09,False,False,,
1,26465868505,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",102,222,,,1331000,1331000,...,,11/02/19 12:24:21,2019-11-02 11:45:28,11/02/19 11:45:28,1331000,3.048,False,False,,
2,26476567208,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",73,145,,,973000,973000,...,,11/03/19 00:09:00,2019-11-02 23:47:34,11/02/19 23:47:34,973000,20.675,False,False,,
3,26484875738,Walk,90013,"[{'minutes': 1, 'name': 'sedentary'}, {'minute...",99,142,,,1024000,1024000,...,,11/03/19 13:04:22,2019-11-03 12:32:25,11/03/19 12:32:25,1024000,9.144,False,False,,
4,26499646497,Outdoor Bike,1071,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",110,197,,,1280000,1280000,...,,11/04/19 06:33:36,2019-11-04 05:58:54,11/04/19 05:58:54,1280000,27.432,False,False,,


In [47]:
# Let us count the number and types of activities in dataframe 6

df6['activityName'].value_counts()

Outdoor Bike        76
Walk                33
Run                 26
Treadmill           25
Interval Workout     1
Name: activityName, dtype: int64

### Dataset 7

In [48]:
# Create URL
url7 = 'pmdata/p07/fitbit/exercise.json'
# Load data
df7 = pd.read_json(url7)
# View the first five rows
df7.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,source,...,shouldFetchDetails,distance,distanceUnit,speed,tcxLink,pace,vo2Max,swimLengths,poolLength,poolLengthUnit
0,26512393978,Workout,3000,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",131.0,341,2351000,2342000,3206.0,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",...,False,,,,,,,,,
1,26555473792,Walk,90013,"[{'minutes': 5, 'name': 'sedentary'}, {'minute...",91.0,132,1383000,1383000,1483.0,,...,False,,,,,,,,,
2,26559313067,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",117.0,156,1178000,1178000,1703.0,,...,False,,,,,,,,,
3,26571167066,Walk,90013,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",104.0,202,1741000,1741000,1938.0,,...,False,,,,,,,,,
4,26588462989,Workout,3000,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",134.0,746,4604000,4601000,7412.0,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",...,False,,,,,,,,,


In [49]:
# Let us count the number and types of activities in dataframe 7

df7['activityName'].value_counts()

Treadmill           82
Walk                31
Run                 18
Sport               16
Hike                15
Workout             10
Outdoor Bike         2
Swim                 1
Interval Workout     1
Name: activityName, dtype: int64

### Dataset 8

In [50]:
# Create URL
url8 = 'pmdata/p08/fitbit/exercise.json'
# Load data
df8 = pd.read_json(url8)
# View the first five rows
df8.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,distance,distanceUnit,duration,activeDuration,...,elevationGain,hasGps,shouldFetchDetails,tcxLink,pace,vo2Max,customHeartRateZones,swimLengths,poolLength,poolLengthUnit
0,26643949293,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",138.0,774,4.935664,Kilometer,3842000,3838000,...,1.219,False,False,,,,,,,
1,26657422715,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",101.0,178,,,1280000,1280000,...,9.144,False,False,,,,,,,
2,26662876854,Run,90009,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",144.0,224,1.781578,Kilometer,913000,904000,...,76.2,True,True,https://www.fitbit.com/activities/exercise/266...,517.035283,{'vo2Max': 47.731840000000005},,,,
3,26668999160,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",156.0,626,5.07846,Kilometer,2462000,2460000,...,0.305,False,False,,,,,,,
4,26682069506,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",150.0,249,,,1177000,1177000,...,5.283,False,False,,,,,,,


In [52]:
# Let us count the number and types of activities in dataframe 8

df8['activityName'].value_counts()

Run                80
Walk               74
Treadmill          65
Sport              12
Aerobic Workout    10
Weights             8
Elliptical          4
Spinning            4
Swim                3
Workout             1
Name: activityName, dtype: int64

### Dataset 9

In [53]:
# Create URL
url9 = 'pmdata/p09/fitbit/exercise.json'
# Load data
df9 = pd.read_json(url9)
# View the first five rows
df9.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,tcxLink,speed,pace,vo2Max
0,26477744222,Walk,90013,"[{'minutes': 19, 'name': 'sedentary'}, {'minut...",96,575,5260000,5260000,5555,auto_detected,...,0.0,False,False,,,,,,,
1,26704603834,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",101,137,973000,973000,1419,auto_detected,...,15.24,False,False,,,,,,,
2,26753270984,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",105,156,1024000,1024000,1577,auto_detected,...,6.096,False,False,,,,,,,
3,26756778162,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",113,154,922000,922000,1505,auto_detected,...,12.192,False,False,,,,,,,
4,26936376296,Walk,90013,"[{'minutes': 5, 'name': 'sedentary'}, {'minute...",102,240,1895000,1895000,2446,auto_detected,...,12.192,False,False,,,,,,,


In [54]:
# Let us count the number and types of activities in dataframe 9

df9['activityName'].value_counts()

Walk               48
Aerobic Workout     4
Run                 2
Name: activityName, dtype: int64

### Dataset 10

In [55]:
# Create URL
url10 = 'pmdata/p10/fitbit/exercise.json'
# Load data
df10 = pd.read_json(url10)
# View the first five rows
df10.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,distance,distanceUnit,duration,activeDuration,...,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,pace,swimLengths,poolLength,poolLengthUnit
0,26564448450,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",167.0,359,4.866132,Kilometer,1838000,1815000,...,2019-11-07 19:01:53,11/07/19 19:01:53,1838000,0.0,False,False,,,,
1,26580801547,Walk,90013,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",110.0,106,,,1127000,1127000,...,2019-11-07 20:03:49,11/07/19 20:03:49,1127000,21.336,False,False,,,,
2,26630948021,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",117.0,120,,,1075000,1075000,...,2019-11-08 07:47:56,11/08/19 07:47:56,1075000,33.528,False,False,,,,
3,26630948024,Walk,90013,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",103.0,155,,,1741000,1741000,...,2019-11-09 13:17:55,11/09/19 13:17:55,1741000,21.336,False,False,,,,
4,26591433059,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",150.0,67,1.121648,Kilometer,437000,420000,...,2019-11-09 13:46:46,11/09/19 13:46:46,437000,0.305,False,False,,,,


In [56]:
# Let us count the number and types of activities in dataframe 10

df10['activityName'].value_counts()

Walk               116
Treadmill           16
Run                  4
Aerobic Workout      1
Outdoor Bike         1
Swim                 1
Elliptical           1
Name: activityName, dtype: int64

### Dataset 11

In [57]:
# Create URL
url11 = 'pmdata/p11/fitbit/exercise.json'
# Load data
df11 = pd.read_json(url11)
# View the first five rows
df11.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,speed,pace
0,26746043476,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",113,142,1332000,1332000,2109,auto_detected,...,11/15/19 10:55:24,1332000,12.192,False,False,,,,,
1,26752934740,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",111,76,717000,717000,1128,auto_detected,...,11/15/19 13:39:16,717000,6.096,False,False,,,,,
2,26778568808,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",111,65,717000,717000,923,auto_detected,...,11/16/19 22:11:54,717000,3.048,False,False,,,,,
3,26778568809,Dancing,3031,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",123,164,1485000,1485000,1798,auto_detected,...,11/16/19 23:15:03,1485000,,False,False,,,,,
4,26784597230,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",154,442,2638000,2635000,5219,tracker,...,11/18/19 15:26:15,2638000,5.182,False,False,4.168117,Kilometer,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",5.696238,


In [58]:
# Let us count the number and types of activities in dataframe 11

df11['activityName'].value_counts()

Walk                    61
Treadmill               14
Spinning                 5
Weights                  3
Circuit Training         3
Workout                  2
Aerobic Workout          2
Cross Country Skiing     2
Dancing                  1
Run                      1
Skiing                   1
Yoga                     1
Name: activityName, dtype: int64

### Dataset 12

In [59]:
# Create URL
url12 = 'pmdata/p12/fitbit/exercise.json'
# Load data
df12 = pd.read_json(url12)
# View the first five rows
df12.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,source,...,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,speed,pace
0,26586466233,Workout,3000,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",108,45,1166000,480000,151.0,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",...,2019-11-07 09:26:57,11/07/19 09:26:57,1166000,0.0,False,False,,,,
1,26623103640,Run,90009,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",177,356,1485000,1485000,3584.0,,...,2019-11-09 08:30:45,11/09/19 08:30:45,1485000,0.0,False,False,,,,
2,26789881019,Walk,90013,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",103,162,1484000,1484000,1854.0,,...,2019-11-15 13:08:56,11/15/19 13:08:56,1484000,9.144,False,False,,,,
3,26789881025,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",101,162,1536000,1536000,1745.0,,...,2019-11-16 13:10:19,11/16/19 13:10:19,1536000,9.144,False,False,,,,
4,26789881034,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",93,92,1024000,1024000,785.0,,...,2019-11-16 15:16:37,11/16/19 15:16:37,1024000,6.096,False,False,,,,


In [60]:
# Let us count the number and types of activities in dataframe 12

df12['activityName'].value_counts()

Walk            75
Run             10
Workout          4
Sport            3
Outdoor Bike     1
Name: activityName, dtype: int64

### Dataset 13

In [61]:
# Create URL
url13 = 'pmdata/p13/fitbit/exercise.json'
# Load data
df13 = pd.read_json(url13)
# View the first five rows
df13.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,distance,distanceUnit,duration,activeDuration,...,speed,pace,lastModified,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails,vo2Max
0,26641508476,Run,90009,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",154.0,287,4.160421,Kilometer,1876000,1816000,...,8.247548,436.493343,11/11/19 15:27:35,2019-11-11 14:51:47,11/11/19 14:51:47,1876000,3.353,True,True,
1,26656339313,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",101.0,86,,,1177000,1177000,...,,,11/11/19 15:57:53,2019-11-11 15:24:16,11/11/19 15:24:16,1177000,9.144,False,False,
2,26699525211,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",106.0,85,,,973000,973000,...,,,11/13/19 04:35:38,2019-11-12 16:02:21,11/12/19 16:02:21,973000,54.864,False,False,
3,26706522385,Walk,90013,"[{'minutes': 15, 'name': 'sedentary'}, {'minut...",,14,,,922000,922000,...,,,11/13/19 15:07:41,2019-11-13 14:48:39,11/13/19 14:48:39,922000,0.0,False,False,
4,26752705984,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",106.0,159,,,1382000,1382000,...,,,11/15/19 18:47:10,2019-11-15 16:29:36,11/15/19 16:29:36,1382000,21.336,False,False,


In [62]:
# Let us count the number and types of activities in dataframe 13

df13['activityName'].value_counts()

Walk            36
Outdoor Bike     6
Run              4
Sport            4
Name: activityName, dtype: int64

### Dataset 14

In [63]:
# Create URL
url14 = 'pmdata/p14/fitbit/exercise.json'
# Load data
df14 = pd.read_json(url14)
# View the first five rows
df14.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,originalDuration,elevationGain,hasGps,shouldFetchDetails,customHeartRateZones,distance,distanceUnit,source,speed,pace
0,26720615647,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",103,224,1741000,1741000,2832.0,auto_detected,...,1741000,39.624,False,False,,,,,,
1,26742666940,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",159,298,1485000,1485000,2532.0,auto_detected,...,1485000,21.336,False,False,,,,,,
2,26745114913,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",115,174,1229000,1229000,1677.0,auto_detected,...,1229000,21.336,False,False,,,,,,
3,26749990917,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",112,286,2202000,2202000,2843.0,auto_detected,...,2202000,27.432,False,False,,,,,,
4,26780944266,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",110,194,1452000,1452000,2423.0,auto_detected,...,1452000,12.192,False,False,"[{'name': 'Below', 'min': 30, 'max': 55, 'minu...",,,,,


In [64]:
# Let us count the number and types of activities in dataframe 14

df14['activityName'].value_counts()

Walk            210
Run              36
Treadmill        12
Workout          10
Yoga              1
Outdoor Bike      1
Name: activityName, dtype: int64

### Dataset 15

In [68]:
# Create URL
url15 = 'pmdata/p15/fitbit/exercise.json'
# Load data
df15 = pd.read_json(url15)
# View the first five rows
df15.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,elevationGain,hasGps,shouldFetchDetails,distance,distanceUnit,source,speed,tcxLink,pace,vo2Max
0,26632974511,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",113.0,512,3526000,3526000,5924.0,auto_detected,...,57.912,False,False,,,,,,,
1,26649938471,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",116.0,322,2099000,2099000,3965.0,auto_detected,...,21.336,False,False,,,,,,,
2,26675654271,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",94.0,304,2304000,2304000,4165.0,auto_detected,...,18.288,False,False,,,,,,,
3,26680336968,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",122.0,178,1126000,1126000,1751.0,auto_detected,...,9.144,False,False,,,,,,,
4,26668695509,Treadmill,20049,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",146.0,522,2564000,2560000,5493.0,tracker,...,6.096,False,False,5.027921,Kilometer,"{'type': 'tracker', 'name': 'Versa 2', 'id': '...",7.011478,,,


In [67]:
# Let us count the number and types of activities in dataframe 15

df15['activityName'].value_counts()

Walk               128
Run                 60
Weights             20
Treadmill           11
Bike                10
Sport                5
Aerobic Workout      3
Elliptical           3
Outdoor Bike         2
Workout              1
Name: activityName, dtype: int64

### Dataset 16

In [71]:
# Create URL
url16 = 'pmdata/p16/fitbit/exercise.json'
# Load data
df16 = pd.read_json(url16)
# View the first five rows
df16.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,manualValuesSpecified,heartRateZones,lastModified,startTime,originalStartTime,originalDuration,elevationGain,hasGps,shouldFetchDetails
0,26841255647,Walk,90013,"[{'minutes': 1, 'name': 'sedentary'}, {'minute...",106,82,922000,922000,942.0,auto_detected,"{'calories': False, 'distance': False, 'steps'...","[{'name': 'Out of Range', 'min': 30, 'max': 98...",11/19/19 21:35:49,2019-11-19 21:05:35,11/19/19 21:05:35,922000,15.24,False,False
1,27111188955,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",102,82,922000,922000,858.0,auto_detected,"{'calories': False, 'distance': False, 'steps'...","[{'name': 'Out of Range', 'min': 30, 'max': 98...",12/02/19 16:28:52,2019-12-02 15:58:42,12/02/19 15:58:42,922000,15.24,False,False
2,27221284627,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",112,94,973000,973000,841.0,auto_detected,"{'calories': False, 'distance': False, 'steps'...","[{'name': 'Out of Range', 'min': 30, 'max': 98...",12/06/19 15:28:09,2019-12-06 14:58:01,12/06/19 14:58:01,973000,15.24,False,False
3,27384796360,Walk,90013,"[{'minutes': 2, 'name': 'sedentary'}, {'minute...",104,91,1126000,1126000,905.0,auto_detected,"{'calories': False, 'distance': False, 'steps'...","[{'name': 'Out of Range', 'min': 30, 'max': 98...",12/14/19 17:35:53,2019-12-14 17:04:08,12/14/19 17:04:08,1126000,15.24,False,False
4,27475858034,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",104,264,2816000,2816000,2910.0,auto_detected,"{'calories': False, 'distance': False, 'steps'...","[{'name': 'Out of Range', 'min': 30, 'max': 98...",12/18/19 15:38:02,2019-12-18 14:46:55,12/18/19 14:46:55,2816000,9.144,False,False


In [72]:
# Let us count the number and types of activities in dataframe 16

df16['activityName'].value_counts()

Walk               15
Aerobic Workout     2
Outdoor Bike        2
Name: activityName, dtype: int64

- Now, we have all the 16 datasets imported. Let us concatenate the 16 datasets to form a single dataset.


In [74]:
df = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8, df9, df10, df11, df12, df13, df14, df15, df16], 
               ignore_index=True)

df.head(5)

Unnamed: 0,logId,activityName,activityTypeId,activityLevel,averageHeartRate,calories,duration,activeDuration,steps,logType,...,distanceUnit,source,tcxLink,speed,pace,vo2Max,swimLengths,poolLength,poolLengthUnit,customHeartRateZones
0,26451905128,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",94.0,192,1331000,1331000,1878.0,auto_detected,...,,,,,,,,,,
1,26455950499,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",94.0,302,2202000,2202000,2786.0,auto_detected,...,,,,,,,,,,
2,26467488515,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",98.0,354,2458000,2458000,3035.0,auto_detected,...,,,,,,,,,,
3,26520401069,Walk,90013,"[{'minutes': 0, 'name': 'sedentary'}, {'minute...",97.0,145,1024000,1024000,1284.0,auto_detected,...,,,,,,,,,,
4,26538035127,Walk,90013,"[{'minutes': 3, 'name': 'sedentary'}, {'minute...",93.0,121,973000,973000,1065.0,auto_detected,...,,,,,,,,,,


In [79]:
# Let us find the total number of rows and columns in our concatenated dataset.

df.shape

(2440, 30)

In [81]:
# Let us inspect the missing values in the dataset

df.isnull().sum()

logId                       0
activityName                0
activityTypeId              0
activityLevel               0
averageHeartRate           13
calories                    0
duration                    0
activeDuration              0
steps                     169
logType                     0
manualValuesSpecified       0
heartRateZones             15
lastModified                0
startTime                   0
originalStartTime           0
originalDuration            0
elevationGain             158
hasGps                      0
shouldFetchDetails          0
distance                 1817
distanceUnit             1817
source                   1687
tcxLink                  2209
speed                    1817
pace                     2096
vo2Max                   2311
swimLengths              2434
poolLength               2434
poolLengthUnit           2434
customHeartRateZones     1935
dtype: int64

- Obviously,there are missing records in the dataset. However, we are only interested in the 'averageHeartRate' and 'activityName'missing records. 

- Let us delete the missing records of 'activityName' and 'averageHeartRate'

In [82]:
# We are deleting rows with missing activityName, and averageHeartRate records

df = df.dropna(subset=['activityName', 'averageHeartRate'])

In [83]:
# Let us inspect if we still missing records of 'activityName', 'averageHeartRate' in the dataset

df.isnull().sum()

logId                       0
activityName                0
activityTypeId              0
activityLevel               0
averageHeartRate            0
calories                    0
duration                    0
activeDuration              0
steps                     163
logType                     0
manualValuesSpecified       0
heartRateZones              2
lastModified                0
startTime                   0
originalStartTime           0
originalDuration            0
elevationGain             149
hasGps                      0
shouldFetchDetails          0
distance                 1815
distanceUnit             1815
source                   1680
tcxLink                  2198
speed                    1815
pace                     2093
vo2Max                   2298
swimLengths              2427
poolLength               2427
poolLengthUnit           2427
customHeartRateZones     1922
dtype: int64

- Let us group the sport 'activityName' and find the mean of 'averageHeartRate' for each 'activityName' 

## Average Heart Rate (Number of heartbeat/minutes)

In [84]:
# Group activityName, calculate mean of averageHeartRate

df.groupby(['activityName'])['averageHeartRate'].mean()

activityName
Aerobic Workout         117.000000
Bike                    130.600000
Circuit Training        160.666667
Cross Country Skiing    121.111111
Dancing                 123.000000
Elliptical              125.888889
Hike                     95.842105
Interval Workout        124.500000
Outdoor Bike            102.222973
Run                     136.136905
Skiing                  102.000000
Spinning                141.444444
Sport                   104.780220
Tennis                  108.000000
Treadmill               140.191235
Walk                    103.204661
Weights                 117.674699
Workout                 125.974359
Yoga                     88.000000
Name: averageHeartRate, dtype: float64

# ***5. Conclusion*** <a id="5"></a>

### SOCCER and NFL Players
- We can group SOCCER players and NFL players as players, whose major activities are running. Average heartbeat = (136 heartbeat/minute) as shown above 'Run = 136.136905 heartbeat per minute'


### PGA Players
- We can categorize PGA players (golf players) as players, whose major activities as walking. Average heartbeat = (103 heartbeat/minute) as shown above 'Walk = 103.204661 heartbeat per minute'


### NBA Players https://fit.nba.com/heart-rate/
- For NBA players, see the following link for details https://fit.nba.com/heart-rate/. The average age of NBA players obtained in the previous data analyis is 27years. To find the maximum predicted heart rate we take 220 and subtract 27years (220-27)=193 beats per minute. Then take the maximum and multiply it by 65% (to find the lower number (193 x 0.65) =125) and by 85% (to find your upper number (193 x 0.85)=164). This means ideally that once warmed up, a 27-year-old average age should exercise hard enough to keep their heart between 125 and 164 beats per minute.  This is your Target Heart Rate Zone –that is an exercise level that safely and effectively trains and strengthens your heart and lungs. So, average and healthy heartbeat rate of NBA players should be between 125 and 164 heartbeat per minute. Finding the average (125 + 164)/2 = 144 heartbeat per minute. In summary, average heartbeat of NBA players is 144 heartbeat per minute.
