In [101]:
import pandas as pd
import seaborn as sn
from matplotlib import pyplot as plt
import numpy as np

In [102]:
df = pd.read_csv('exercise_calories.csv')
df.shape

(248, 6)

In [103]:
df.head()

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",130 lb,155 lb,180 lb,205 lb,Calories per kg
0,"Cycling, mountain bike, bmx",502,598,695,791,1.75073
1,"Cycling, <10 mph, leisure bicycling",236,281,327,372,0.823236
2,"Cycling, >20 mph, racing",944,1126,1308,1489,3.294974
3,"Cycling, 10-11.9 mph, light",354,422,490,558,1.234853
4,"Cycling, 12-13.9 mph, moderate",472,563,654,745,1.647825


First let's change pounds (lbs) to kilograms (kg) in column names.

In [104]:
print(f"130 lbs = {130/2.205} kg\n155 lbs = {155/2.205} kg\n180 lbs = {180/2.205} kg\n205 lbs = {205/2.205} kg")

130 lbs = 58.95691609977324 kg
155 lbs = 70.29478458049887 kg
180 lbs = 81.63265306122449 kg
205 lbs = 92.9705215419501 kg


In [105]:
df.rename(columns={
    '130 lb': '59 kg',
    '155 lb': '70 kg',
    '180 lb': '82 kg',
    '205 lb': '93 kg'
}, inplace=True)

df.head()

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
0,"Cycling, mountain bike, bmx",502,598,695,791,1.75073
1,"Cycling, <10 mph, leisure bicycling",236,281,327,372,0.823236
2,"Cycling, >20 mph, racing",944,1126,1308,1489,3.294974
3,"Cycling, 10-11.9 mph, light",354,422,490,558,1.234853
4,"Cycling, 12-13.9 mph, moderate",472,563,654,745,1.647825


It seems that the data in the column 'Calories per kg' is very wrong, I believe it was calculated wrongly so I will be replacing it with proper values now. I believe the formula that was used to calculate the values in this column is as follows:

In [106]:
502/(130*2.205)

1.7512646084074655

As I thought the conversion was done incorrectly, when the author of the dataset tried converting pounds (lbs) to kilograms (kg) they multiplied pounds by 2.205 instead of dividing them to get kilograms.

In [107]:
df['Calories per kg'] = df['59 kg'] / 58.95691609977324

df.head()

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
0,"Cycling, mountain bike, bmx",502,598,695,791,8.514692
1,"Cycling, <10 mph, leisure bicycling",236,281,327,372,4.002923
2,"Cycling, >20 mph, racing",944,1126,1308,1489,16.011692
3,"Cycling, 10-11.9 mph, light",354,422,490,558,6.004385
4,"Cycling, 12-13.9 mph, moderate",472,563,654,745,8.005846


In [108]:
502/59

8.508474576271187

The values have now been replaced with proper values and we can continue the analysis.

I wish to find what are the best exercises, according to this dataset, to do to lose weight. To do that let's sort it by 'Calories per kg'.

In [109]:
sorted_by_best_exercises = df.sort_values('Calories per kg', ascending=False)
sorted_by_best_exercises.head(10)

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
47,"Running, 10.9 mph (5.5 min mile)",1062,1267,1471,1675,18.013154
217,"Cross country skiing, uphill",974,1161,1348,1536,16.520538
46,"Running, 10 mph (6 min mile)",944,1126,1308,1489,16.011692
2,"Cycling, >20 mph, racing",944,1126,1308,1489,16.011692
188,"Skin diving, fast",944,1126,1308,1489,16.011692
45,"Running, 9 mph (6.5 min mile)",885,1056,1226,1396,15.010962
51,"Running, stairs, up",885,1056,1226,1396,15.010962
212,"Speed skating, ice, competitive",885,1056,1226,1396,15.010962
216,"Cross country skiing, racing",826,985,1144,1303,14.010231
44,"Running, 8.6 mph (7 min mile)",826,985,1144,1303,14.010231


Seems that running for 1 hour at the speed of 10.9 mph is the best calories burner, let me just convert that to km/h so I can understand it.

In [110]:
10.9 * 1.609

17.5381

17.5 km/h is a very high speed and I doubt its something many people can keep up for a full hour. Let's see if we can find something that will maximise weight loss while being something an average person could do, or at least something they can progress towards easier than maintaing a full sprint for an hour.

There are no fancy formulas to find such a thing as I will have to manually read the name of the exercise and provide my own judgement and opinion so I will mostly now be just scrolling through the sorted dataset until I run into something good.

Biking at the speed of over 30 km/h for a full hour could be possible but just in case let's explore some more options.

In [111]:
20 * 1.609

32.18

In [114]:
sorted_by_best_exercises.index = np.arange(sorted_by_best_exercises.__len__())

sorted_by_best_exercises.head()

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
0,"Running, 10.9 mph (5.5 min mile)",1062,1267,1471,1675,18.013154
1,"Cross country skiing, uphill",974,1161,1348,1536,16.520538
2,"Running, 10 mph (6 min mile)",944,1126,1308,1489,16.011692
3,"Cycling, >20 mph, racing",944,1126,1308,1489,16.011692
4,"Skin diving, fast",944,1126,1308,1489,16.011692


In [116]:
sorted_by_best_exercises[(sorted_by_best_exercises.index >= 10) & (sorted_by_best_exercises.index < 20)]

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
10,"Running, 8 mph (7.5 min mile)",797,950,1103,1256,13.518346
11,"Running, 7.5mph (8 min mile)",738,880,1022,1163,12.517615
12,"Skin diving, moderate",738,880,1022,1163,12.517615
13,"Stationary cycling, very vigorous",738,880,1022,1163,12.517615
14,"Cycling, 16-19 mph, very fast, racing",708,844,981,1117,12.008769
15,"Jumping rope, fast",708,844,981,1117,12.008769
16,"Boxing, in ring",708,844,981,1117,12.008769
17,"Roller blading, in-line skating",708,844,981,1117,12.008769
18,Squash,708,844,981,1117,12.008769
19,"Rowing machine, very vigorous",708,844,981,1117,12.008769


Jump rope is a great exercise as well so if you are able to do it it might not be a bad choice.

Let's explore the worst exercises, according to this dataset, for weight loss.

In [118]:
sorted_by_worst_exercises = df.sort_values('Calories per kg', ascending=True)
sorted_by_worst_exercises.index = np.arange(sorted_by_worst_exercises.__len__())
sorted_by_worst_exercises.head(10)

Unnamed: 0,"Activity, Exercise or Sport (1 hour)",59 kg,70 kg,82 kg,93 kg,Calories per kg
0,Watering lawn or garden,89,106,123,140,1.509577
1,"Walking, under 2.0 mph, very slow",118,141,163,186,2.001462
2,"Walking 2.0 mph, slow",148,176,204,233,2.510308
3,Croquet,148,176,204,233,2.510308
4,Billiards,148,176,204,233,2.510308
5,"Horseback riding, walking",148,176,204,233,2.510308
6,Bird watching,148,176,204,233,2.510308
7,Pushing stroller or walking with children,148,176,204,233,2.510308
8,"Football or baseball, playing catch",148,176,204,233,2.510308
9,Mild stretching,148,176,204,233,2.510308


Yeah those certainly aren't great for weight loss...

I thought about visualising this data but it would all just be skewed so I don't think there is much point.

In [119]:
df['Calories per kg'].describe()

count    248.000000
mean       6.610828
std        3.297900
min        1.509577
25%        4.002923
50%        6.004385
75%        8.005846
max       18.013154
Name: Calories per kg, dtype: float64

In [120]:
6.610828 + 3*3.297900

16.504528

In [121]:
df[df['Calories per kg'] > 16.504528].shape

(2, 6)

Even if I removed some 'outliers', if we can even label them outliers as they are perfectly normal exercises and I cant really remove them because they are good, the data would still be skewed. Also there are only 2 outliers so nothing would really change.