<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Problem" data-toc-modified-id="Problem-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Problem</a></span></li><li><span><a href="#Introduction" data-toc-modified-id="Introduction-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Creating-the-Dataset" data-toc-modified-id="Creating-the-Dataset-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Creating the Dataset</a></span></li><li><span><a href="#Changing-from-Metric-to-Imperial" data-toc-modified-id="Changing-from-Metric-to-Imperial-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Changing from Metric to Imperial</a></span></li><li><span><a href="#Checking-Data-Integrity" data-toc-modified-id="Checking-Data-Integrity-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Checking Data Integrity</a></span></li><li><span><a href="#Feature-Engineering" data-toc-modified-id="Feature-Engineering-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Feature Engineering</a></span></li><li><span><a href="#Analysis" data-toc-modified-id="Analysis-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Analysis</a></span></li><li><span><a href="#Preprocessing" data-toc-modified-id="Preprocessing-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Preprocessing</a></span></li></ul></div>

## Problem

   Knowing one's total proportion of body fat is beneficial both from a health standpoint as well as a cosmetic standpoint. The two most accurate forms of this measurement are through use of a DEXA scanner or hydrostatic weighing. Both of those processes are expense and time consuming for the average individual. By using information about patients that have performed those tests, we may be able to find a commonality among these variables that will assist us in making a more affordable and accessible, but still fairly accurate, estimation of a persons body fat percentage. Current algorithms based on easily accessed data are considered only slightly accurate across the population as a whole, but wildly inaccurate to an individual. 

In [3]:
## Based on the research I have acquired, frame size has a positive correlation to TBF and FFM... Can we improve the current 
## measure of body fat based on bmi or specific measurements by incorporating the frame size into those calculations?

## Introduction

The data set acquired is taken from 252 adult male samples. That being said, any predictions created from it, will likely only apply to the adult male population. I believe it potentially important to create separate models based on gender, and possibly other unforeseen factors to arrive at an estimation of the desired accuracy based on the individual rather than across the population. Each sample in this set has had their total body fat measured through hydrostatic weighing. Density is the resulting column (X*X). With that value it has been plugged into two separate formulas to calculate the proportion (Siri & Brozek) Brozek has shown to be more accurate used on samples that have not had any recent weight fluctuations. 

(I will either use Brozek because majority dont have regular weight fluctuations or the mean of the 2)

## Creating the Dataset

In [8]:
import pandas as pd
import matplotlib.pyplot as plt
from pandas import DataFrame
import math
import numpy as np

In [9]:
#Importing the main dataset
df = pd.read_csv('Bodyfat.csv')

In [10]:
#Importing an additional dataset with more features to add to the original
df2 = pd.read_csv('fat.dat.txt', header=None, delimiter='\s+')

In [11]:
#Combining extra features from the additional df (df2)
df['Bodyfat_Brozek'] = df2[1]
df['BMI'] = df2[7]
df['Lean_Weight_Brozek'] = df2[8]

In [12]:
#Renaming column for uniformity
df.rename(columns={'bodyfat':'Bodyfat_Siri'}, inplace=True)

In [13]:
df.style.background_gradient(cmap ='viridis').set_properties(**{'font-size': '20px'}) 

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,Knee,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek
0,1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59.0,37.3,21.9,32.0,27.4,17.1,12.6,23.7,134.9
1,1.0853,6.1,22,173.25,72.25,38.5,93.6,83.0,98.7,58.7,37.3,23.4,30.5,28.9,18.2,6.9,23.4,161.3
2,1.0414,25.3,22,154.0,66.25,34.0,95.8,87.9,99.2,59.6,38.9,24.0,28.8,25.2,16.6,24.6,24.7,116.0
3,1.0751,10.4,26,184.75,72.25,37.4,101.8,86.4,101.2,60.1,37.3,22.8,32.4,29.4,18.2,10.9,24.9,164.7
4,1.034,28.7,24,184.25,71.25,34.4,97.3,100.0,101.9,63.2,42.2,24.0,32.2,27.7,17.7,27.8,25.6,133.1
5,1.0502,20.9,24,210.25,74.75,39.0,104.5,94.4,107.8,66.0,42.0,25.6,35.7,30.6,18.8,20.6,26.5,167.0
6,1.0549,19.2,26,181.0,69.75,36.4,105.1,90.7,100.3,58.4,38.3,22.9,31.9,27.8,17.7,19.0,26.2,146.6
7,1.0704,12.4,25,176.0,72.5,37.8,99.6,88.5,97.1,60.0,39.4,23.2,30.5,29.0,18.8,12.8,23.6,153.6
8,1.09,4.1,25,191.0,74.0,38.1,100.9,82.5,99.9,62.9,38.3,23.8,35.9,31.1,18.2,5.1,24.6,181.3
9,1.0722,11.7,23,198.25,73.5,42.1,99.6,88.6,104.1,63.1,41.7,25.0,35.6,30.0,19.2,12.0,25.8,174.4


In [14]:
df

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,Knee,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek
0,1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59.0,37.3,21.9,32.0,27.4,17.1,12.6,23.7,134.9
1,1.0853,6.1,22,173.25,72.25,38.5,93.6,83.0,98.7,58.7,37.3,23.4,30.5,28.9,18.2,6.9,23.4,161.3
2,1.0414,25.3,22,154.00,66.25,34.0,95.8,87.9,99.2,59.6,38.9,24.0,28.8,25.2,16.6,24.6,24.7,116.0
3,1.0751,10.4,26,184.75,72.25,37.4,101.8,86.4,101.2,60.1,37.3,22.8,32.4,29.4,18.2,10.9,24.9,164.7
4,1.0340,28.7,24,184.25,71.25,34.4,97.3,100.0,101.9,63.2,42.2,24.0,32.2,27.7,17.7,27.8,25.6,133.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247,1.0736,11.0,70,134.25,67.00,34.9,89.2,83.6,88.8,49.6,34.8,21.5,25.6,25.7,18.5,11.5,21.1,118.9
248,1.0236,33.6,72,201.00,69.75,40.9,108.5,105.0,104.5,59.6,40.8,23.2,35.2,28.6,20.1,32.3,29.1,136.1
249,1.0328,29.3,72,186.75,66.00,38.9,111.1,111.5,101.7,60.3,37.3,21.5,31.3,27.2,18.0,28.3,30.2,133.9
250,1.0399,26.0,72,190.75,70.50,38.9,108.3,101.3,97.8,56.0,41.6,22.7,30.5,29.4,19.8,25.3,27.0,142.6


In [4]:
## I need to create another column which uses the military measure for bodyfat on each sample

In [67]:
## the formula used by the DoD is 86.010 x log10(abdomen - neck) - 70.041 x log10(height) + 36.76
# Creating a copy dataset only including the 3 variables used in the DOD measurement 
# Im going to change the measurements to inches for the equation
df3 = df[['Height', 'Neck', 'Abdomen']].copy()

df3['Neck'] = df3['Neck'] * 0.393701
df3['Abdomen'] = df3['Abdomen'] * 0.393701

df3

Unnamed: 0,Height,Neck,Abdomen
0,67.75,14.251976,33.543325
1,72.25,15.157489,32.677183
2,66.25,13.385834,34.606318
3,72.25,14.724417,34.015766
4,71.25,13.543314,39.370100
...,...,...,...
247,67.00,13.740165,32.913404
248,69.75,16.102371,41.338605
249,66.00,15.314969,43.897662
250,70.50,15.314969,39.881911


In [72]:
#Using the DoD formula against all samples to create a list of body fat percentages and then adding it as a column to df3

bf = []
ans1 = []
ans2 = []

for x in df3['Height']:
    ans2.append(np.log10(x))

for x in df3['Abdomen']:
    for y in df3['Neck']:
        ans1.append(np.log10(x-y))
        
answers = zip(ans1, ans2)

for x, y in answers:
    bf.append(round(86.010 * x - 70.041 * y + 36.76, 2))
print(bf)

df3['DoD_BF'] = bf
df3['Brozek'] = df['Bodyfat_Brozek']
df3

[19.08, 15.32, 21.4, 16.19, 18.89, 13.89, 18.04, 15.77, 14.91, 11.81, 14.39, 13.06, 16.58, 15.02, 14.87, 19.72, 15.53, 12.86, 17.68, 13.58, 16.68, 14.08, 20.57, 18.61, 20.35, 17.82, 19.19, 17.15, 20.3, 18.14, 14.54, 16.7, 16.07, 14.8, 11.81, 18.62, 16.29, 14.06, 3.47, 16.23, 12.87, 44.06, 17.23, 13.16, 22.38, 17.08, 21.12, 18.74, 21.25, 21.17, 18.1, 19.94, 18.85, 17.43, 17.95, 14.29, 16.68, 14.87, 14.33, 18.27, 12.51, 19.2, 16.58, 18.49, 14.98, 15.42, 18.21, 16.9, 17.4, 17.36, 17.97, 16.67, 16.34, 23.03, 19.76, 19.34, 16.34, 16.78, 16.73, 17.48, 17.36, 17.82, 14.47, 16.13, 16.61, 18.96, 18.14, 19.76, 15.62, 16.42, 17.31, 15.94, 16.86, 15.93, 15.34, 10.95, 15.7, 17.69, 19.08, 14.21, 13.5, 16.61, 17.47, 12.9, 15.72, 22.44, 14.72, 11.58, 13.98, 19.9, 17.97, 17.39, 14.9, 18.65, 15.93, 18.33, 16.45, 13.37, 14.75, 15.72, 12.5, 14.83, 18.09, 18.99, 18.48, 19.11, 16.51, 18.07, 13.85, 16.55, 17.97, 17.79, 15.46, 18.29, 18.07, 16.95, 17.14, 15.23, 18.96, 12.3, 16.72, 18.07, 18.07, 17.65, 13.59, 

Unnamed: 0,Height,Neck,Abdomen,DoD_BF,Brozek
0,67.75,14.251976,33.543325,19.08,12.6
1,72.25,15.157489,32.677183,15.32,6.9
2,66.25,13.385834,34.606318,21.40,24.6
3,72.25,14.724417,34.015766,16.19,10.9
4,71.25,13.543314,39.370100,18.89,27.8
...,...,...,...,...,...
247,67.00,13.740165,32.913404,20.39,11.5
248,69.75,16.102371,41.338605,14.42,32.3
249,66.00,15.314969,43.897662,17.75,28.3
250,70.50,15.314969,39.881911,15.75,25.3


In [75]:
# Adding a colummn for the error between actual and DoD measurement
df3['Error'] = abs(df3['DoD_BF'] - df3['Brozek'])
df['Error'] = df3['Error']

In [74]:
df3

Unnamed: 0,Height,Neck,Abdomen,DoD_BF,Brozek,Error
0,67.75,14.251976,33.543325,19.08,12.6,6.48
1,72.25,15.157489,32.677183,15.32,6.9,8.42
2,66.25,13.385834,34.606318,21.40,24.6,3.20
3,72.25,14.724417,34.015766,16.19,10.9,5.29
4,71.25,13.543314,39.370100,18.89,27.8,8.91
...,...,...,...,...,...,...
247,67.00,13.740165,32.913404,20.39,11.5,8.89
248,69.75,16.102371,41.338605,14.42,32.3,17.88
249,66.00,15.314969,43.897662,17.75,28.3,10.55
250,70.50,15.314969,39.881911,15.75,25.3,9.55


## Checking Data Integrity

The minimum essential amount of body fat required for life in a male is 2-5% 

In [76]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252 entries, 0 to 251
Data columns (total 24 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Density             252 non-null    float64
 1   Bodyfat_Siri        252 non-null    float64
 2   Age                 252 non-null    int64  
 3   Weight              252 non-null    float64
 4   Height              252 non-null    float64
 5   Neck                252 non-null    float64
 6   Chest               252 non-null    float64
 7   Abdomen             252 non-null    float64
 8   Hip                 252 non-null    float64
 9   Thigh               252 non-null    float64
 10  Knee                252 non-null    float64
 11  Ankle               252 non-null    float64
 12  Biceps              252 non-null    float64
 13  Forearm             252 non-null    float64
 14  Wrist               252 non-null    float64
 15  Bodyfat_Brozek      252 non-null    float64
 16  BMI     

In [77]:
df[df['Error'] > 10]

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,...,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek,Lean_Weight_Siri,Wrist_Inches,Frame,Knee_Inches,Skin_Weight,Error
16,1.0333,29.0,34,195.75,71.00,38.9,101.9,96.4,105.2,64.8,...,17.3,28.1,27.3,140.8,138.98,6.81,0,16.06,31.32,12.57
25,1.0911,3.7,27,159.25,71.50,35.7,89.6,79.7,96.5,55.0,...,17.7,4.6,21.9,151.9,153.36,6.97,0,14.45,25.48,13.22
26,1.0811,7.9,34,131.50,67.50,36.2,88.6,74.6,85.3,51.7,...,16.5,8.5,20.3,120.3,121.11,6.50,0,13.66,21.04,10.69
28,1.0910,3.7,27,133.25,64.75,36.4,93.5,73.9,88.5,50.1,...,17.2,4.7,22.4,127.0,128.32,6.77,0,13.58,21.32,15.60
31,1.0862,5.7,29,160.25,71.25,37.3,93.5,84.5,100.6,58.5,...,17.9,6.5,22.2,149.8,151.12,7.05,0,15.28,25.64,10.20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
244,1.0334,29.0,67,199.50,68.50,40.7,118.3,106.1,101.6,58.2,...,18.5,28.1,29.9,143.6,141.64,7.28,0,15.28,31.92,12.96
246,1.0308,30.2,69,215.50,70.50,40.8,113.7,107.6,110.0,63.3,...,18.8,29.1,30.5,152.7,150.42,7.40,0,17.32,34.48,14.92
248,1.0236,33.6,72,201.00,69.75,40.9,108.5,105.0,104.5,59.6,...,20.1,32.3,29.1,136.1,133.46,7.91,1,16.06,32.16,17.88
249,1.0328,29.3,72,186.75,66.00,38.9,111.1,111.5,101.7,60.3,...,18.0,28.3,30.2,133.9,132.03,7.09,0,14.69,29.88,10.55


In [78]:
df[df['Error'] < 3]

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,...,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek,Lean_Weight_Siri,Wrist_Inches,Frame,Knee_Inches,Skin_Weight,Error
6,1.0549,19.2,26,181.0,69.75,36.4,105.1,90.7,100.3,58.4,...,17.7,19.0,26.2,146.6,146.25,6.97,0,15.08,28.96,0.96
7,1.0704,12.4,25,176.0,72.5,37.8,99.6,88.5,97.1,60.0,...,18.8,12.8,23.6,153.6,154.18,7.4,0,15.51,28.16,2.97
9,1.0722,11.7,23,198.25,73.5,42.1,99.6,88.6,104.1,63.1,...,19.2,12.0,25.8,174.4,175.05,7.56,1,16.42,31.72,0.19
15,1.0512,20.9,35,162.75,66.0,36.4,99.1,92.8,99.2,63.1,...,16.9,20.5,26.3,129.3,128.74,6.65,0,15.24,26.04,0.78
18,1.0622,16.0,28,183.75,67.75,38.0,106.8,89.6,102.4,64.2,...,18.5,16.1,28.2,154.3,154.35,7.28,0,15.24,29.4,1.58
19,1.061,16.5,33,211.75,73.5,40.0,106.2,100.5,109.0,65.8,...,18.2,16.5,27.6,176.8,176.81,7.17,0,15.98,33.88,2.92
20,1.0551,19.1,28,179.0,68.0,39.1,103.3,95.9,104.9,63.5,...,18.4,19.0,27.3,145.1,144.81,7.24,0,14.96,28.64,2.32
21,1.064,15.2,28,200.5,69.75,41.3,111.4,98.8,104.8,63.4,...,19.9,15.3,29.1,169.8,170.02,7.83,1,15.98,32.08,1.22
23,1.0584,17.7,32,148.75,70.0,35.5,86.7,80.0,93.4,54.9,...,17.1,17.6,21.4,122.6,122.42,6.73,0,14.25,23.8,1.01
30,1.0716,11.9,32,182.0,73.75,38.7,100.5,88.7,99.8,57.5,...,18.4,12.3,23.6,159.7,160.34,7.24,0,15.24,29.12,2.24


In [16]:
df[df['Bodyfat_Siri'] < 5]

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,Knee,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek
8,1.09,4.1,25,191.0,74.0,38.1,100.9,82.5,99.9,62.9,38.3,23.8,35.9,31.1,18.2,5.1,24.6,181.3
25,1.0911,3.7,27,159.25,71.5,35.7,89.6,79.7,96.5,55.0,36.7,22.5,29.9,28.2,17.7,4.6,21.9,151.9
28,1.091,3.7,27,133.25,64.75,36.4,93.5,73.9,88.5,50.1,34.5,21.3,30.5,27.9,17.2,4.7,22.4,127.0
49,1.0903,4.0,47,127.5,66.75,34.0,83.4,70.4,87.2,50.6,34.4,21.9,26.8,25.8,16.8,5.0,20.2,121.2
54,1.0906,3.9,42,136.25,67.5,37.8,87.6,77.6,88.6,51.9,34.9,22.5,27.7,27.5,18.5,4.9,21.1,129.6
170,1.0926,3.0,35,152.25,67.75,37.0,92.2,81.9,92.8,54.7,36.2,22.1,30.4,27.4,17.7,4.1,23.4,146.1
171,1.0983,0.7,35,125.75,65.5,34.0,90.8,75.0,89.2,50.0,34.8,22.0,24.8,25.9,16.9,1.9,20.6,123.4
181,1.1089,0.0,40,118.5,68.0,33.8,79.3,69.4,85.0,47.2,33.5,20.2,27.7,24.6,16.5,0.0,18.1,118.5


In [17]:
## 171 and 181 have estimated bf percentages lower than 2%, which is required for life. I can either remove these as inaccurate or test a model 
## increasing the estimate bf by 2 for all. I would like to find the measurements responsible for the inaccuracies and maybe remove or alter
## them for a model.

In [18]:
df[df['Bodyfat_Siri'] >40]

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,Knee,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek
35,1.0101,40.1,49,191.75,65.0,38.4,118.5,113.1,113.8,61.9,38.3,21.9,32.0,29.8,17.0,38.2,32.0,118.4
215,0.995,47.5,51,219.0,64.0,41.2,119.8,122.1,112.8,62.5,36.9,23.6,34.7,29.1,18.4,45.1,37.6,120.2


In [19]:
## Should I remove the two Obese measurements? With there only being 2 will it negatively impact my model?

## Feature Engineering

In [20]:
df['Lean_Weight_Siri'] = round((df['Weight'] * (1 - (df['Bodyfat_Siri'] / 100))), 2)

In [21]:
df

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,Knee,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek,Lean_Weight_Siri
0,1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59.0,37.3,21.9,32.0,27.4,17.1,12.6,23.7,134.9,135.28
1,1.0853,6.1,22,173.25,72.25,38.5,93.6,83.0,98.7,58.7,37.3,23.4,30.5,28.9,18.2,6.9,23.4,161.3,162.68
2,1.0414,25.3,22,154.00,66.25,34.0,95.8,87.9,99.2,59.6,38.9,24.0,28.8,25.2,16.6,24.6,24.7,116.0,115.04
3,1.0751,10.4,26,184.75,72.25,37.4,101.8,86.4,101.2,60.1,37.3,22.8,32.4,29.4,18.2,10.9,24.9,164.7,165.54
4,1.0340,28.7,24,184.25,71.25,34.4,97.3,100.0,101.9,63.2,42.2,24.0,32.2,27.7,17.7,27.8,25.6,133.1,131.37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247,1.0736,11.0,70,134.25,67.00,34.9,89.2,83.6,88.8,49.6,34.8,21.5,25.6,25.7,18.5,11.5,21.1,118.9,119.48
248,1.0236,33.6,72,201.00,69.75,40.9,108.5,105.0,104.5,59.6,40.8,23.2,35.2,28.6,20.1,32.3,29.1,136.1,133.46
249,1.0328,29.3,72,186.75,66.00,38.9,111.1,111.5,101.7,60.3,37.3,21.5,31.3,27.2,18.0,28.3,30.2,133.9,132.03
250,1.0399,26.0,72,190.75,70.50,38.9,108.3,101.3,97.8,56.0,41.6,22.7,30.5,29.4,19.8,25.3,27.0,142.6,141.16


In [22]:
## Creating a column for wrist measurement in inches form centimeters and then to create a column for body frame size based on that measurement. 
## See readme for body frame category measurements

df['Wrist_Inches'] = round(df['Wrist'] * 0.393701,2)

In [23]:
## Creating the frame column

wrist_inches = pd.array(df['Wrist_Inches'])

In [24]:
## Going to use -1 for small, 0 for medium, and 1 for large frames

frame = []

for x in wrist_inches:
    if x > 7.5:
        frame.append(1)
    elif x <= 7.5 and x >= 6.5:
        frame.append(0)
    else:
        frame.append(-1)

In [25]:
## Checking accuracy

len(frame)

252

In [26]:
## adding the frame column

df['Frame'] = DataFrame(frame, columns=['Frame'])
df

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,...,Ankle,Biceps,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek,Lean_Weight_Siri,Wrist_Inches,Frame
0,1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59.0,...,21.9,32.0,27.4,17.1,12.6,23.7,134.9,135.28,6.73,0
1,1.0853,6.1,22,173.25,72.25,38.5,93.6,83.0,98.7,58.7,...,23.4,30.5,28.9,18.2,6.9,23.4,161.3,162.68,7.17,0
2,1.0414,25.3,22,154.00,66.25,34.0,95.8,87.9,99.2,59.6,...,24.0,28.8,25.2,16.6,24.6,24.7,116.0,115.04,6.54,0
3,1.0751,10.4,26,184.75,72.25,37.4,101.8,86.4,101.2,60.1,...,22.8,32.4,29.4,18.2,10.9,24.9,164.7,165.54,7.17,0
4,1.0340,28.7,24,184.25,71.25,34.4,97.3,100.0,101.9,63.2,...,24.0,32.2,27.7,17.7,27.8,25.6,133.1,131.37,6.97,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247,1.0736,11.0,70,134.25,67.00,34.9,89.2,83.6,88.8,49.6,...,21.5,25.6,25.7,18.5,11.5,21.1,118.9,119.48,7.28,0
248,1.0236,33.6,72,201.00,69.75,40.9,108.5,105.0,104.5,59.6,...,23.2,35.2,28.6,20.1,32.3,29.1,136.1,133.46,7.91,1
249,1.0328,29.3,72,186.75,66.00,38.9,111.1,111.5,101.7,60.3,...,21.5,31.3,27.2,18.0,28.3,30.2,133.9,132.03,7.09,0
250,1.0399,26.0,72,190.75,70.50,38.9,108.3,101.3,97.8,56.0,...,22.7,30.5,29.4,19.8,25.3,27.0,142.6,141.16,7.80,1


In [27]:
## adding knee in inches column
df['Knee_Inches'] = round(df['Knee'] * 0.393701,2)

In [28]:
## adding skin weight column
skin = []

for x in df['Weight']:
    skin.append( x * 0.16)
    
df['Skin_Weight'] = DataFrame(skin, columns=['Skin_Weight'])
df

Unnamed: 0,Density,Bodyfat_Siri,Age,Weight,Height,Neck,Chest,Abdomen,Hip,Thigh,...,Forearm,Wrist,Bodyfat_Brozek,BMI,Lean_Weight_Brozek,Lean_Weight_Siri,Wrist_Inches,Frame,Knee_Inches,Skin_Weight
0,1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59.0,...,27.4,17.1,12.6,23.7,134.9,135.28,6.73,0,14.69,24.68
1,1.0853,6.1,22,173.25,72.25,38.5,93.6,83.0,98.7,58.7,...,28.9,18.2,6.9,23.4,161.3,162.68,7.17,0,14.69,27.72
2,1.0414,25.3,22,154.00,66.25,34.0,95.8,87.9,99.2,59.6,...,25.2,16.6,24.6,24.7,116.0,115.04,6.54,0,15.31,24.64
3,1.0751,10.4,26,184.75,72.25,37.4,101.8,86.4,101.2,60.1,...,29.4,18.2,10.9,24.9,164.7,165.54,7.17,0,14.69,29.56
4,1.0340,28.7,24,184.25,71.25,34.4,97.3,100.0,101.9,63.2,...,27.7,17.7,27.8,25.6,133.1,131.37,6.97,0,16.61,29.48
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247,1.0736,11.0,70,134.25,67.00,34.9,89.2,83.6,88.8,49.6,...,25.7,18.5,11.5,21.1,118.9,119.48,7.28,0,13.70,21.48
248,1.0236,33.6,72,201.00,69.75,40.9,108.5,105.0,104.5,59.6,...,28.6,20.1,32.3,29.1,136.1,133.46,7.91,1,16.06,32.16
249,1.0328,29.3,72,186.75,66.00,38.9,111.1,111.5,101.7,60.3,...,27.2,18.0,28.3,30.2,133.9,132.03,7.09,0,14.69,29.88
250,1.0399,26.0,72,190.75,70.50,38.9,108.3,101.3,97.8,56.0,...,29.4,19.8,25.3,27.0,142.6,141.16,7.80,1,16.38,30.52


In [29]:
df['Frame'].mean()

0.15476190476190477

## Analysis

In [30]:
df.mean()

Density                 1.055574
Bodyfat_Siri           19.150794
Age                    44.884921
Weight                178.924405
Height                 70.148810
Neck                   37.992063
Chest                 100.824206
Abdomen                92.555952
Hip                    99.904762
Thigh                  59.405952
Knee                   38.590476
Ankle                  23.102381
Biceps                 32.273413
Forearm                28.663889
Wrist                  18.229762
Bodyfat_Brozek         18.938492
BMI                    25.436905
Lean_Weight_Brozek    143.713889
Lean_Weight_Siri      143.158810
Wrist_Inches            7.176786
Frame                   0.154762
Knee_Inches            15.192937
Skin_Weight            28.627905
dtype: float64

In [31]:
## How do fat free weight and wrist measurement correlate?
## There is a previous correlation of 0.59 according to attached research paper
## According to the same paper, the knee measurement, which we have, had a correlation of 0.65 to FFM and 0.48 TBF and 0.71 Weight
## If the person has had a recent significant weight loss, using bone measurements would not be as accurate on them

In [32]:
## Remove older samples maybe having a lower bone density? Cannot remove if using previous research paper as they used samples from ages 18-65.


In [33]:
## skin estimated to be 16% of weight and intestines 7.5lbs
## https://www.livescience.com/32939-how-much-does-skin-weigh.html#:~:text=As%20an%20organ%2C%20skin%20is,a%20person's%20total%20body%20weight.&text=Most%20adults'%20skin%20weighs%20in%20at%2020%20pounds%20or%20more.
## other organs 9.3 pounds mean with no correlation to height weight or bmi of man, so the mean is the best fidure to use
## https://journals.lww.com/amjforensicmedicine/Abstract/2012/12000/Normal_Organ_Weights_in_Men__Part_II_The_Brain,.22.aspx#:~:text=The%20following%20reference%20ranges%20(95,the%20presence%20of%20pathologic%20disease.

## Preprocessing

In [34]:
## Split Training and Testing Data
from sklearn.model_selection import train_test_split
y = df[['Bodyfat_Brozek']]
X = df.drop(['Bodyfat_Brozek'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)


In [35]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

linreg_all = LinearRegression()
linreg_all.fit(X_train, y_train)

print('Training r^2:', linreg_all.score(X_train, y_train))
print('Test r^2:', linreg_all.score(X_test, y_test))
print('Training MSE:', mean_squared_error(y_train, linreg_all.predict(X_train)))
print('Test MSE:', mean_squared_error(y_test, linreg_all.predict(X_test)))

Training r^2: 0.99955625873706
Test r^2: 0.9996119332529141
Training MSE: 0.024782171270588134
Test MSE: 0.027837902104457937
