# OpenPowerlifting Dataset Preprocessing Script

This project uses data from [OpenPowerlifting.org](https://OpenPowerlifting.org). The goal of this project is to solve multiple, different regression problems using the OpenPowerlifting dataset. We would like to predict age, bodyweight, and lift performances based on other features present in the data. This is a preprocessing script in order to clean and organize the data that will later be fed into machine learning models. 


In [1]:
# Import Necessary Libraries
import numpy as np
import pandas as pd

### Load The Dataset
This requires you to go [here](https://openpowerlifting.gitlab.io/opl-csv/bulk-csv.html) to download and extract the OpenPowerlifting dataset. Place the .csv file in the `data` folder.  Make sure the `read_csv` line matches the name of the downloaded file (or the current name if you renamed it).  

In [2]:
# Change the read_csv line depending on the csv file downloaded
powerlifting_data = pd.read_csv('data/openpowerlifting_data_04152025.csv', low_memory = False)
pd.set_option("display.max_columns", None)
powerlifting_data.head(5)

Unnamed: 0,Name,Sex,Event,Equipment,Age,AgeClass,BirthYearClass,Division,BodyweightKg,WeightClassKg,Squat1Kg,Squat2Kg,Squat3Kg,Squat4Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Bench4Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Deadlift4Kg,Best3DeadliftKg,TotalKg,Place,Dots,Wilks,Glossbrenner,Goodlift,Tested,Country,State,Federation,ParentFederation,Date,MeetCountry,MeetState,MeetTown,MeetName,Sanctioned
0,E.S. Denisenko,F,B,Raw,28.5,24-34,24-39,Open,67.3,,,,,,,-40.0,-45.0,-45.0,,,,,,,,,DQ,,,,,,,,GSF-Belarus,,2019-06-22,Belarus,,Luninets,Bison Power Cup,Yes
1,I.S. Lebetskaya,F,B,Raw,43.5,40-44,40-49,Open,73.2,,,,,,,80.0,85.0,90.0,,90.0,,,,,,90.0,1,88.8,86.89,76.5,70.06,,,,GSF-Belarus,,2019-06-22,Belarus,,Luninets,Bison Power Cup,Yes
2,K. Yakimovich,F,B,Raw,26.5,24-34,24-39,Open,60.6,,,,,,,40.0,42.5,45.0,,45.0,,,,,,45.0,2,49.57,49.79,43.98,38.42,,,,GSF-Belarus,,2019-06-22,Belarus,,Luninets,Bison Power Cup,Yes
3,A.G. Golneva,F,B,Raw,19.5,20-23,19-23,Juniors 17-21,50.3,,,,,,,32.5,35.0,-37.5,,35.0,,,,,,35.0,2,43.67,44.76,39.73,34.55,,,,GSF-Belarus,,2019-06-22,Belarus,,Luninets,Bison Power Cup,Yes
4,E.V. Marunevskaya,F,B,Raw,19.5,20-23,19-23,Juniors 17-21,63.7,,,,,,,40.0,42.5,-45.0,,42.5,,,,,,42.5,1,45.4,45.26,39.96,35.25,,,,GSF-Belarus,,2019-06-22,Belarus,,Luninets,Bison Power Cup,Yes


### Preliminary Filtering
We first filter the data. We filter out competitors that do not compete as "full-power" lifters, i.e. they do not perform all three lifts in a meet. Raw powerlifting (powerlifting with minimal supportive gear, only a belt, knee sleeves, and wrist wraps are permitted) is more popular compared to equipped lifting (powerlifting with more supportive gear, like a bench shirt, bench slingshot, or deadlift suit), so we filter out equipped lifters. Also, to control the size of our data, we only look at the two most popular federation in the United States, USA Powerlifting (USAPL), and Powerlifting America (AMP). We also filter out untested lifters, as we would like to predict more "realistic" values for an average person based on our given data. Only sanctioned meets are included since they are considered official., We filter out lifters that bomb out (i.e. do not complete a squat, bench, or deadlift event with at least one successful attempt for each). Finally, we drop columns with

In [3]:
SBD_powerlifting_data = powerlifting_data[(powerlifting_data['Event'] == 'SBD') 
                        & (powerlifting_data['Equipment'] == 'Raw') 
                        & ((powerlifting_data['Federation'] == 'USAPL') | (powerlifting_data['Federation'] == 'AMP'))
                        & (powerlifting_data['Sanctioned'] == 'Yes') 
                        & (powerlifting_data['Tested'] == 'Yes')
                        & (~powerlifting_data['TotalKg'].isna())]

print(SBD_powerlifting_data.shape)
print(SBD_powerlifting_data.dtypes)
SBD_powerlifting_data.head(5)

(227527, 42)
Name                 object
Sex                  object
Event                object
Equipment            object
Age                 float64
AgeClass             object
BirthYearClass       object
Division             object
BodyweightKg        float64
WeightClassKg        object
Squat1Kg            float64
Squat2Kg            float64
Squat3Kg            float64
Squat4Kg            float64
Best3SquatKg        float64
Bench1Kg            float64
Bench2Kg            float64
Bench3Kg            float64
Bench4Kg            float64
Best3BenchKg        float64
Deadlift1Kg         float64
Deadlift2Kg         float64
Deadlift3Kg         float64
Deadlift4Kg         float64
Best3DeadliftKg     float64
TotalKg             float64
Place                object
Dots                float64
Wilks               float64
Glossbrenner        float64
Goodlift            float64
Tested               object
Country              object
State                object
Federation           object
ParentF

Unnamed: 0,Name,Sex,Event,Equipment,Age,AgeClass,BirthYearClass,Division,BodyweightKg,WeightClassKg,Squat1Kg,Squat2Kg,Squat3Kg,Squat4Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Bench4Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Deadlift4Kg,Best3DeadliftKg,TotalKg,Place,Dots,Wilks,Glossbrenner,Goodlift,Tested,Country,State,Federation,ParentFederation,Date,MeetCountry,MeetState,MeetTown,MeetName,Sanctioned
571865,Taylor Aitken,F,SBD,Raw,21.5,20-23,19-23,FR-Jr,55.55,56.0,92.5,97.5,-102.5,,97.5,42.5,45.0,-47.5,,45.0,100.0,-110.0,-110.0,,100.0,242.5,1,282.6,287.14,254.03,58.01,Yes,USA,UT,USAPL,,2022-11-19,USA,UT,,Salt Lake Powerlifting Classic,Yes
571866,Dayana Chavez,F,SBD,Raw,20.5,20-23,19-23,FR-Jr,74.75,75.0,160.0,170.0,177.5,,177.5,77.5,82.5,85.0,,85.0,167.5,177.5,185.0,,185.0,447.5,1,436.63,426.3,375.05,88.93,Yes,USA,UT,USAPL,,2022-11-19,USA,UT,,Salt Lake Powerlifting Classic,Yes
571867,Ashlyn Dorminey,F,SBD,Raw,22.0,20-23,19-23,FR-Jr,80.4,82.5,140.0,150.0,155.0,,155.0,57.5,62.5,67.5,,67.5,157.5,167.5,175.0,,175.0,397.5,1,373.62,362.71,318.0,76.42,Yes,USA,ID,USAPL,,2022-11-19,USA,UT,,Salt Lake Powerlifting Classic,Yes
571868,Terry Pientok,F,SBD,Raw,64.5,60-64,60-69,FR-M3b,55.45,56.0,25.0,,,,25.0,25.0,,,,25.0,55.0,,,,55.0,105.0,1,122.51,124.5,110.15,25.15,Yes,USA,NV,USAPL,,2022-11-19,USA,UT,,Salt Lake Powerlifting Classic,Yes
571869,Rebecca Davis,F,SBD,Raw,38.5,35-39,24-39,FR-O,55.6,56.0,82.5,85.0,87.5,,87.5,45.0,50.0,-52.5,,50.0,97.5,105.0,-107.5,,105.0,242.5,1,282.43,286.93,253.84,57.97,Yes,USA,UT,USAPL,,2022-11-19,USA,UT,,Salt Lake Powerlifting Classic,Yes


Then, we eliminate columns that have no relation to a lifter's performance/personal demographic information at all. These include things like where a meet took place for a given lifter, the country, their division, etc. 

In [4]:
SBD_powerlifting_data = SBD_powerlifting_data.drop(columns={'Squat4Kg', 
                                                            'Bench4Kg', 
                                                            'Deadlift4Kg', 
                                                            'AgeClass', 
                                                            'BirthYearClass',
                                                            'MeetCountry',
                                                            'MeetState', 
                                                            'MeetTown',
                                                            'MeetName', 
                                                            'Sanctioned',
                                                            'ParentFederation', 
                                                            'Federation', 
                                                            'Tested',
                                                            'Country',
                                                            'State', 
                                                            'Date', 
                                                            'Place', 
                                                            'WeightClassKg', 
                                                            'Division'})

#### Feature Engineering
We then add some features based on given data in order to give our models more to work with. These include 
- Lift ratios (proportion of each lift to total)
- Strength-to-bodyweight ratios for each lift type
- One Hot Binary encoding of sex (Male or Female, filtered out Non-Binary as there is significantly less to work with)
- Indicators that a given attempt for a lift is successful
  
Then we fill NaN values for each attempt with a 0. 


In [5]:
# Feature Engineering

# Filter out Non-Binary Lifters
SBD_powerlifting_data = SBD_powerlifting_data[SBD_powerlifting_data['Sex'].isin(['M', 'F'])]

# Indicators for a given attempt being successful or a miss
SBD_powerlifting_data['Squat1Success'] = np.where(SBD_powerlifting_data['Squat1Kg'] > 0, 1, 0)
SBD_powerlifting_data['Squat2Success'] = np.where(SBD_powerlifting_data['Squat2Kg'] > 0, 1, 0)
SBD_powerlifting_data['Squat3Success'] = np.where(SBD_powerlifting_data['Squat3Kg'] > 0, 1, 0)
SBD_powerlifting_data['Bench1Success'] = np.where(SBD_powerlifting_data['Bench1Kg'] > 0, 1, 0)
SBD_powerlifting_data['Bench2Success'] = np.where(SBD_powerlifting_data['Bench2Kg'] > 0, 1, 0)
SBD_powerlifting_data['Bench3Success'] = np.where(SBD_powerlifting_data['Bench3Kg'] > 0, 1, 0)
SBD_powerlifting_data['Deadlift1Success'] = np.where(SBD_powerlifting_data['Deadlift1Kg'] > 0, 1, 0)
SBD_powerlifting_data['Deadlift2Success'] = np.where(SBD_powerlifting_data['Deadlift2Kg'] > 0, 1, 0)
SBD_powerlifting_data['Deadlift3Success'] = np.where(SBD_powerlifting_data['Deadlift3Kg'] > 0, 1, 0)

# Bodyweight ratios for each lift
SBD_powerlifting_data['Squat_BW_Ratio'] = SBD_powerlifting_data['Best3SquatKg'] / SBD_powerlifting_data['BodyweightKg']
SBD_powerlifting_data['Bench_BW_Ratio'] = SBD_powerlifting_data['Best3BenchKg'] / SBD_powerlifting_data['BodyweightKg']
SBD_powerlifting_data['Deadlift_BW_Ratio'] = SBD_powerlifting_data['Best3DeadliftKg'] / SBD_powerlifting_data['BodyweightKg']

# Lift to total ratios
SBD_powerlifting_data['Squat_Total_Ratio'] = SBD_powerlifting_data['Best3SquatKg'] / SBD_powerlifting_data['TotalKg']
SBD_powerlifting_data['Bench_Total_Ratio'] = SBD_powerlifting_data['Best3BenchKg'] / SBD_powerlifting_data['TotalKg']
SBD_powerlifting_data['Deadlift_Total_Ratio'] = SBD_powerlifting_data['Best3DeadliftKg'] / SBD_powerlifting_data['TotalKg']

# For imputing NaNs with 0
attempt_cols = ['Squat1Kg', 'Squat2Kg', 'Squat3Kg', 
                'Bench1Kg', 'Bench2Kg', 'Bench3Kg', 
                'Deadlift1Kg', 'Deadlift2Kg', 'Deadlift3Kg']
# Impute NaNs with 0, okay to do since we now have successful attempt indicators
SBD_powerlifting_data[attempt_cols] = SBD_powerlifting_data[attempt_cols].fillna(0)

# Indicate Lifter Sex
SBD_powerlifting_data['Sex_M'] = np.where(SBD_powerlifting_data['Sex'] == 'M', 1, 0)
SBD_powerlifting_data['Sex_F'] = np.where(SBD_powerlifting_data['Sex'] == 'F', 1, 0)

### Dropping More Unnecessary Columns
Now, we get rid of the Name, Sex, Event, and Equipment columns as we have already filtered to get raw full-power lifters. We also encoded the sex feature, and name is not important for our task at hand. We also drop any residual rows with NaN values in order to have a clean dataset ready for machine learning model training and testing. 

In [6]:
SBD_powerlifting_data = SBD_powerlifting_data.drop(columns={'Name', 
                                                            'Sex', 
                                                            'Event', 
                                                            'Equipment'})
# Cut out any remaining NaNs
SBD_powerlifting_data = SBD_powerlifting_data.dropna(axis=0)
SBD_powerlifting_data.head(5)

Unnamed: 0,Age,BodyweightKg,Squat1Kg,Squat2Kg,Squat3Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Best3DeadliftKg,TotalKg,Dots,Wilks,Glossbrenner,Goodlift,Squat1Success,Squat2Success,Squat3Success,Bench1Success,Bench2Success,Bench3Success,Deadlift1Success,Deadlift2Success,Deadlift3Success,Squat_BW_Ratio,Bench_BW_Ratio,Deadlift_BW_Ratio,Squat_Total_Ratio,Bench_Total_Ratio,Deadlift_Total_Ratio,Sex_M,Sex_F
571865,21.5,55.55,92.5,97.5,-102.5,97.5,42.5,45.0,-47.5,45.0,100.0,-110.0,-110.0,100.0,242.5,282.6,287.14,254.03,58.01,1,1,0,1,1,0,1,0,0,1.755176,0.810081,1.80018,0.402062,0.185567,0.412371,0,1
571866,20.5,74.75,160.0,170.0,177.5,177.5,77.5,82.5,85.0,85.0,167.5,177.5,185.0,185.0,447.5,436.63,426.3,375.05,88.93,1,1,1,1,1,1,1,1,1,2.374582,1.137124,2.474916,0.396648,0.189944,0.413408,0,1
571867,22.0,80.4,140.0,150.0,155.0,155.0,57.5,62.5,67.5,67.5,157.5,167.5,175.0,175.0,397.5,373.62,362.71,318.0,76.42,1,1,1,1,1,1,1,1,1,1.927861,0.839552,2.176617,0.389937,0.169811,0.440252,0,1
571868,64.5,55.45,25.0,0.0,0.0,25.0,25.0,0.0,0.0,25.0,55.0,0.0,0.0,55.0,105.0,122.51,124.5,110.15,25.15,1,0,0,1,0,0,1,0,0,0.450857,0.450857,0.991885,0.238095,0.238095,0.52381,0,1
571869,38.5,55.6,82.5,85.0,87.5,87.5,45.0,50.0,-52.5,50.0,97.5,105.0,-107.5,105.0,242.5,282.43,286.93,253.84,57.97,1,1,1,1,1,0,1,1,0,1.573741,0.899281,1.888489,0.360825,0.206186,0.43299,0,1


In [7]:
# Checking if there are any residual NaNs (there should not be)
SBD_powerlifting_data.isna().sum().sort_values(ascending=False)

Age                     0
BodyweightKg            0
Squat1Kg                0
Squat2Kg                0
Squat3Kg                0
Best3SquatKg            0
Bench1Kg                0
Bench2Kg                0
Bench3Kg                0
Best3BenchKg            0
Deadlift1Kg             0
Deadlift2Kg             0
Deadlift3Kg             0
Best3DeadliftKg         0
TotalKg                 0
Dots                    0
Wilks                   0
Glossbrenner            0
Goodlift                0
Squat1Success           0
Squat2Success           0
Squat3Success           0
Bench1Success           0
Bench2Success           0
Bench3Success           0
Deadlift1Success        0
Deadlift2Success        0
Deadlift3Success        0
Squat_BW_Ratio          0
Bench_BW_Ratio          0
Deadlift_BW_Ratio       0
Squat_Total_Ratio       0
Bench_Total_Ratio       0
Deadlift_Total_Ratio    0
Sex_M                   0
Sex_F                   0
dtype: int64

### Partioning The Preprocessed Dataset into Male and Female
Obviously, male and female lifters have massively different strength potential and results. For this reason, we split the dataset into two separate tables, one with male lifters, and the other with female lifters. We then drop the 'Sex_M' and 'Sex_F' columns as they are not numerical and are no longer of importance for our machine learning tasks. 

In [8]:
# Split into Male and Female
United_States_Powerlifters_Male = SBD_powerlifting_data[(SBD_powerlifting_data['Sex_M'] == 1) & (SBD_powerlifting_data['Sex_F'] == 0)]
United_States_Powerlifters_Female = SBD_powerlifting_data[(SBD_powerlifting_data['Sex_M'] == 0) & (SBD_powerlifting_data['Sex_F'] == 1)]

# Inspect Male Dataframe
print(United_States_Powerlifters_Male.shape)
United_States_Powerlifters_Male.head(5)


(145921, 36)


Unnamed: 0,Age,BodyweightKg,Squat1Kg,Squat2Kg,Squat3Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Best3DeadliftKg,TotalKg,Dots,Wilks,Glossbrenner,Goodlift,Squat1Success,Squat2Success,Squat3Success,Bench1Success,Bench2Success,Bench3Success,Deadlift1Success,Deadlift2Success,Deadlift3Success,Squat_BW_Ratio,Bench_BW_Ratio,Deadlift_BW_Ratio,Squat_Total_Ratio,Bench_Total_Ratio,Deadlift_Total_Ratio,Sex_M,Sex_F
571879,20.5,74.7,150.0,160.0,170.0,170.0,100.0,110.0,120.0,120.0,190.0,205.0,212.5,212.5,502.5,361.43,359.06,347.0,73.41,1,1,1,1,1,1,1,1,1,2.27577,1.606426,2.844712,0.338308,0.238806,0.422886,1,0
571880,20.5,67.6,117.5,130.0,132.5,132.5,67.5,75.0,77.5,77.5,130.0,140.0,145.0,145.0,355.0,273.32,273.39,265.36,54.64,1,1,1,1,1,1,1,1,1,1.960059,1.14645,2.14497,0.373239,0.21831,0.408451,1,0
571881,21.5,95.85,165.0,182.5,192.5,192.5,125.0,-135.0,-135.0,125.0,142.5,187.5,-205.0,187.5,505.0,316.78,312.87,299.14,65.1,1,1,1,1,0,0,1,1,0,2.008346,1.304121,1.956182,0.381188,0.247525,0.371287,1,0
571882,21.5,80.9,175.0,190.0,-200.0,190.0,90.0,97.5,-102.5,97.5,185.0,195.0,200.0,200.0,487.5,333.96,330.5,318.29,68.37,1,1,0,1,1,0,1,1,1,2.348578,1.205192,2.472188,0.389744,0.2,0.410256,1,0
571883,20.5,80.85,130.0,137.5,145.0,145.0,75.0,77.5,82.5,82.5,175.0,187.5,200.0,200.0,427.5,292.96,289.93,279.23,59.97,1,1,1,1,1,1,1,1,1,1.793445,1.020408,2.473717,0.339181,0.192982,0.467836,1,0


In [9]:
# Inspect Female Dataframe
print(United_States_Powerlifters_Female.shape)
United_States_Powerlifters_Female.head(5)

(77586, 36)


Unnamed: 0,Age,BodyweightKg,Squat1Kg,Squat2Kg,Squat3Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Best3DeadliftKg,TotalKg,Dots,Wilks,Glossbrenner,Goodlift,Squat1Success,Squat2Success,Squat3Success,Bench1Success,Bench2Success,Bench3Success,Deadlift1Success,Deadlift2Success,Deadlift3Success,Squat_BW_Ratio,Bench_BW_Ratio,Deadlift_BW_Ratio,Squat_Total_Ratio,Bench_Total_Ratio,Deadlift_Total_Ratio,Sex_M,Sex_F
571865,21.5,55.55,92.5,97.5,-102.5,97.5,42.5,45.0,-47.5,45.0,100.0,-110.0,-110.0,100.0,242.5,282.6,287.14,254.03,58.01,1,1,0,1,1,0,1,0,0,1.755176,0.810081,1.80018,0.402062,0.185567,0.412371,0,1
571866,20.5,74.75,160.0,170.0,177.5,177.5,77.5,82.5,85.0,85.0,167.5,177.5,185.0,185.0,447.5,436.63,426.3,375.05,88.93,1,1,1,1,1,1,1,1,1,2.374582,1.137124,2.474916,0.396648,0.189944,0.413408,0,1
571867,22.0,80.4,140.0,150.0,155.0,155.0,57.5,62.5,67.5,67.5,157.5,167.5,175.0,175.0,397.5,373.62,362.71,318.0,76.42,1,1,1,1,1,1,1,1,1,1.927861,0.839552,2.176617,0.389937,0.169811,0.440252,0,1
571868,64.5,55.45,25.0,0.0,0.0,25.0,25.0,0.0,0.0,25.0,55.0,0.0,0.0,55.0,105.0,122.51,124.5,110.15,25.15,1,0,0,1,0,0,1,0,0,0.450857,0.450857,0.991885,0.238095,0.238095,0.52381,0,1
571869,38.5,55.6,82.5,85.0,87.5,87.5,45.0,50.0,-52.5,50.0,97.5,105.0,-107.5,105.0,242.5,282.43,286.93,253.84,57.97,1,1,1,1,1,0,1,1,0,1.573741,0.899281,1.888489,0.360825,0.206186,0.43299,0,1


In [10]:
# Drop 'Sex_M', 'Sex_F' columns and inspect
United_States_Powerlifters_Female = United_States_Powerlifters_Female.drop(columns=['Sex_M', 'Sex_F'])
United_States_Powerlifters_Female.head(5)

Unnamed: 0,Age,BodyweightKg,Squat1Kg,Squat2Kg,Squat3Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Best3DeadliftKg,TotalKg,Dots,Wilks,Glossbrenner,Goodlift,Squat1Success,Squat2Success,Squat3Success,Bench1Success,Bench2Success,Bench3Success,Deadlift1Success,Deadlift2Success,Deadlift3Success,Squat_BW_Ratio,Bench_BW_Ratio,Deadlift_BW_Ratio,Squat_Total_Ratio,Bench_Total_Ratio,Deadlift_Total_Ratio
571865,21.5,55.55,92.5,97.5,-102.5,97.5,42.5,45.0,-47.5,45.0,100.0,-110.0,-110.0,100.0,242.5,282.6,287.14,254.03,58.01,1,1,0,1,1,0,1,0,0,1.755176,0.810081,1.80018,0.402062,0.185567,0.412371
571866,20.5,74.75,160.0,170.0,177.5,177.5,77.5,82.5,85.0,85.0,167.5,177.5,185.0,185.0,447.5,436.63,426.3,375.05,88.93,1,1,1,1,1,1,1,1,1,2.374582,1.137124,2.474916,0.396648,0.189944,0.413408
571867,22.0,80.4,140.0,150.0,155.0,155.0,57.5,62.5,67.5,67.5,157.5,167.5,175.0,175.0,397.5,373.62,362.71,318.0,76.42,1,1,1,1,1,1,1,1,1,1.927861,0.839552,2.176617,0.389937,0.169811,0.440252
571868,64.5,55.45,25.0,0.0,0.0,25.0,25.0,0.0,0.0,25.0,55.0,0.0,0.0,55.0,105.0,122.51,124.5,110.15,25.15,1,0,0,1,0,0,1,0,0,0.450857,0.450857,0.991885,0.238095,0.238095,0.52381
571869,38.5,55.6,82.5,85.0,87.5,87.5,45.0,50.0,-52.5,50.0,97.5,105.0,-107.5,105.0,242.5,282.43,286.93,253.84,57.97,1,1,1,1,1,0,1,1,0,1.573741,0.899281,1.888489,0.360825,0.206186,0.43299


In [11]:
# Drop 'Sex_M', 'Sex_F' columns and inspect
United_States_Powerlifters_Male = United_States_Powerlifters_Male.drop(columns=['Sex_M', 'Sex_F'])
United_States_Powerlifters_Male.head(5)

Unnamed: 0,Age,BodyweightKg,Squat1Kg,Squat2Kg,Squat3Kg,Best3SquatKg,Bench1Kg,Bench2Kg,Bench3Kg,Best3BenchKg,Deadlift1Kg,Deadlift2Kg,Deadlift3Kg,Best3DeadliftKg,TotalKg,Dots,Wilks,Glossbrenner,Goodlift,Squat1Success,Squat2Success,Squat3Success,Bench1Success,Bench2Success,Bench3Success,Deadlift1Success,Deadlift2Success,Deadlift3Success,Squat_BW_Ratio,Bench_BW_Ratio,Deadlift_BW_Ratio,Squat_Total_Ratio,Bench_Total_Ratio,Deadlift_Total_Ratio
571879,20.5,74.7,150.0,160.0,170.0,170.0,100.0,110.0,120.0,120.0,190.0,205.0,212.5,212.5,502.5,361.43,359.06,347.0,73.41,1,1,1,1,1,1,1,1,1,2.27577,1.606426,2.844712,0.338308,0.238806,0.422886
571880,20.5,67.6,117.5,130.0,132.5,132.5,67.5,75.0,77.5,77.5,130.0,140.0,145.0,145.0,355.0,273.32,273.39,265.36,54.64,1,1,1,1,1,1,1,1,1,1.960059,1.14645,2.14497,0.373239,0.21831,0.408451
571881,21.5,95.85,165.0,182.5,192.5,192.5,125.0,-135.0,-135.0,125.0,142.5,187.5,-205.0,187.5,505.0,316.78,312.87,299.14,65.1,1,1,1,1,0,0,1,1,0,2.008346,1.304121,1.956182,0.381188,0.247525,0.371287
571882,21.5,80.9,175.0,190.0,-200.0,190.0,90.0,97.5,-102.5,97.5,185.0,195.0,200.0,200.0,487.5,333.96,330.5,318.29,68.37,1,1,0,1,1,0,1,1,1,2.348578,1.205192,2.472188,0.389744,0.2,0.410256
571883,20.5,80.85,130.0,137.5,145.0,145.0,75.0,77.5,82.5,82.5,175.0,187.5,200.0,200.0,427.5,292.96,289.93,279.23,59.97,1,1,1,1,1,1,1,1,1,1.793445,1.020408,2.473717,0.339181,0.192982,0.467836


### Export to CSV
Finally, we export both preprocessed male and female lifter data to csv files. 

In [12]:
United_States_Powerlifters_Male.to_csv('data/USA_Male_Powerlifters.csv', index=False) 
United_States_Powerlifters_Female.to_csv('data/USA_Female_Powerlifters.csv', index=False) 