# **EDA on Gym Members Workouts**

# Exploratory Data Analysis (EDA) on a gym members exercise dataset

## **Getting to know the customers**

## Demographic Analysis

- Examine the age distribution of gym members
- Analyze gender breakdown and any patterns related to gender
- Look at the relationship between age, gender, and other variables

## Physical Characteristics

- Investigate the distribution of height, weight, and BMI among members (Or is it better to go into phisical composition?)
- Explore correlations between physical characteristics and other variables like fitness goals or workout preferences

## Workout Time-based Analysis

- Look for trends or patterns over time in members, workout habits, or fitness outcomes

## Workout Habits

- Examine the distribution of preferred workout types
- Analyze average workout durations and how they vary across different member segments
- Investigate the frequency of gym visits and any patterns that emerge

## Performance Metrics

- Examine the distribution of average heart rates during workouts
- Analyze patterns in calories burned per session and how they relate to other variables

## Experience and Fitness Levels

- Look at the distribution of years of gym experience among members
- Analyze how experience relates to fitness level, workout habits, and goals
- Investigate the relationship between fitness level and other variables like BMI or workout duration

## **How to Help Customers by Offering Them Goods and Services of Interest**

## Dietary Plans and Supplement Usage

## Personal Trainer Utilization

## Setting up some Fitness Goals and good Behaviors to aim this goals

By focusing on these areas, we can gain a comprehensive understanding of the gym members' characteristics, behaviors, and outcomes. This analysis can provide valuable insights for gym management, personal trainers, and fitness program developers.

Citations:  
[1] http://arno.uvt.nl/show.cgi?fid=172644  
[2] https://www.healthandfitness.org/improve-your-club/data-based-fitness-assessments-help-gym-members-get-results/  
[3] https://www.nature.com/articles/s41597-022-01784-7  
[4] https://verpex.com/blog/website-tips/eda-in-machine-learning  
[5] https://ugoproto.github.io/ugo_py_doc/eda_machine_learning_feature_engineering_and_kaggle/  
[6] https://www.healthandfitness.org/improve-your-club/how-gyms-are-using-member-data-to-increase-retention/  
[7] https://semasuka.github.io/blog/2019/03/26/introduction-to-eda.html  
[8] https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset/code

# **Main Hypothesis**

**"There is a significant relationship between demographic factors (such as age and gender) and workout habits (including preferred workout types, frequency of gym visits, and average workout duration) among gym members."**

1. **Demographic Influence**: It is commonly observed that demographic factors can influence fitness behaviors. For instance, younger individuals might prefer different types of workouts compared to older members, and gender may also play a role in the types of exercises preferred.

2. **Workout Habits**: By examining how these demographic factors correlate with workout habits, we can uncover patterns that may inform gym management about customer preferences and help tailor services accordingly.

3. **Potential Outcomes**: This hypothesis can lead to further questions about how these relationships affect overall fitness outcomes, member retention, and satisfaction.

## Testing the Hypothesis

- Conduct **bivariate analyses** to explore relationships between demographic variables and workout habits using scatter plots, box plots, or correlation matrices.

This hypothesis serves as a foundation for deeper analysis and can guide subsequent investigations into how best to serve gym members based on their unique profiles.


# **Other Hypotheses**

## Hypotheses Related to Physical Characteristics

1. **BMI and Workout Duration**:
   - **Hypothesis**: Members with a higher BMI tend to have shorter average workout durations compared to those with a normal BMI.

2. **Height and Weight Correlation**:
   - **Hypothesis**: There is a positive correlation between height and weight among gym members.

3. **Is BMI a good health indicator?**:
   - **Hypothesis**: body composition metrics are much better metrics  than BMI for assessing health.

## Hypotheses Related to Experience

4. **Years of Experience and Workout Frequency**:
   - **Hypothesis**: Members with more years of gym experience visit the gym more frequently than newer members.

## Hypotheses Related to Performance Metrics

5. **Calories Burned and Average Heart Rate**:
   - **Hypothesis**: There is a positive correlation between average heart rate during workouts and calories burned per session.

6. **Workout Duration and Calories Burned**:
    - **Hypothesis**: Longer workout durations are associated with higher calories burned per session.

These hypotheses can guide your analysis and help you uncover meaningful patterns in the dataset. Be sure to use appropriate statistical methods to test these hypotheses effectively!


# DATA SOURCES

In [2]:
import pandas as pd

In [3]:
df_gym = pd.read_csv("data/gym_members_exercise_tracking.csv")

# Citations: https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset/code


In [4]:
df_gym.head(20)

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,56,Male,88.3,1.71,180,157,60,1.69,1313.0,Yoga,12.6,3.5,4,3,30.2
1,46,Female,74.9,1.53,179,151,66,1.3,883.0,HIIT,33.9,2.1,4,2,32.0
2,32,Female,68.1,1.66,167,122,54,1.11,677.0,Cardio,33.4,2.3,4,2,24.71
3,25,Male,53.2,1.7,190,164,56,0.59,532.0,Strength,28.8,2.1,3,1,18.41
4,38,Male,46.1,1.79,188,158,68,0.64,556.0,Strength,29.2,2.8,3,1,14.39
5,56,Female,58.0,1.68,168,156,74,1.59,1116.0,HIIT,15.5,2.7,5,3,20.55
6,36,Male,70.3,1.72,174,169,73,1.49,1385.0,Cardio,21.3,2.3,3,2,23.76
7,40,Female,69.7,1.51,189,141,64,1.27,895.0,Cardio,30.6,1.9,3,2,30.57
8,28,Male,121.7,1.94,185,127,52,1.03,719.0,Strength,28.9,2.6,4,2,32.34
9,28,Male,101.8,1.84,169,136,64,1.08,808.0,Cardio,29.7,2.7,3,1,30.07


In [5]:
df_gym.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 973 entries, 0 to 972
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            973 non-null    int64  
 1   Gender                         973 non-null    object 
 2   Weight (kg)                    973 non-null    float64
 3   Height (m)                     973 non-null    float64
 4   Max_BPM                        973 non-null    int64  
 5   Avg_BPM                        973 non-null    int64  
 6   Resting_BPM                    973 non-null    int64  
 7   Session_Duration (hours)       973 non-null    float64
 8   Calories_Burned                973 non-null    float64
 9   Workout_Type                   973 non-null    object 
 10  Fat_Percentage                 973 non-null    float64
 11  Water_Intake (liters)          973 non-null    float64
 12  Workout_Frequency (days/week)  973 non-null    int

In [6]:
df_MTA = pd.read_csv("data/com_corp_mta.csv")

# Citations: https://connect.garmin.com/modern/weight

In [7]:
df_MTA.head(20)

Unnamed: 0,Tiempo,Peso,Cambio,IMC,Grasa corporal,Masa muscular esquelética,Masa ósea,Agua corporal,Unnamed: 8
0,"Oct 29, 2024",,,,,,,,
1,7:06 am,75.6 kg,0.2 kg,25.6,15.4 %,29.8 kg,4.7 kg,61.7 %,
2,"Oct 28, 2024",,,,,,,,
3,6:19 am,75.4 kg,0.7 kg,25.5,14.7 %,29.7 kg,4.7 kg,62.3 %,
4,"Oct 27, 2024",,,,,,,,
5,8:14 am,76.1 kg,0.1 kg,25.7,15.4 %,29.9 kg,4.7 kg,61.8 %,
6,"Oct 26, 2024",,,,,,,,
7,8:35 am,76.2 kg,0.5 kg,25.7,15.5 %,29.9 kg,4.7 kg,61.7 %,
8,"Oct 25, 2024",,,,,,,,
9,6:57 am,75.7 kg,0.2 kg,25.6,15 %,29.8 kg,4.7 kg,62.1 %,


In [8]:
df_MTA.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Tiempo                     686 non-null    object 
 1   Peso                       412 non-null    object 
 2   Cambio                     412 non-null    object 
 3   IMC                        412 non-null    float64
 4   Grasa corporal             412 non-null    object 
 5   Masa muscular esquelética  412 non-null    object 
 6   Masa ósea                  412 non-null    object 
 7   Agua corporal              412 non-null    object 
 8   Unnamed: 8                 0 non-null      float64
dtypes: float64(2), object(7)
memory usage: 48.4+ KB


In [9]:
df_GYM_EX = pd.read_csv("data/megaGymDataset.csv")

In [10]:
df_GYM_EX.head(10)

Unnamed: 0.1,Unnamed: 0,Title,Desc,Type,BodyPart,Equipment,Level,Rating,RatingDesc
0,0,Partner plank band row,The partner plank band row is an abdominal exe...,Strength,Abdominals,Bands,Intermediate,0.0,
1,1,Banded crunch isometric hold,The banded crunch isometric hold is an exercis...,Strength,Abdominals,Bands,Intermediate,,
2,2,FYR Banded Plank Jack,The banded plank jack is a variation on the pl...,Strength,Abdominals,Bands,Intermediate,,
3,3,Banded crunch,The banded crunch is an exercise targeting the...,Strength,Abdominals,Bands,Intermediate,,
4,4,Crunch,The crunch is a popular core exercise targetin...,Strength,Abdominals,Bands,Intermediate,,
5,5,Decline band press sit-up,The decline band press sit-up is a weighted co...,Strength,Abdominals,Bands,Intermediate,,
6,6,FYR2 Banded Frog Pump,,Strength,Abdominals,Bands,Intermediate,,
7,7,Band low-to-high twist,The band low-to-high twist is a core exercise ...,Strength,Abdominals,Bands,Intermediate,,
8,8,Barbell roll-out,The barbell roll-out is an abdominal exercise ...,Strength,Abdominals,Barbell,Intermediate,8.9,Average
9,9,Barbell Ab Rollout - On Knees,The barbell roll-out is an abdominal exercise ...,Strength,Abdominals,Barbell,Intermediate,8.9,Average


In [11]:
df_GYM_EX.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2918 entries, 0 to 2917
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  2918 non-null   int64  
 1   Title       2918 non-null   object 
 2   Desc        1368 non-null   object 
 3   Type        2918 non-null   object 
 4   BodyPart    2918 non-null   object 
 5   Equipment   2886 non-null   object 
 6   Level       2918 non-null   object 
 7   Rating      1031 non-null   float64
 8   RatingDesc  862 non-null    object 
dtypes: float64(1), int64(1), object(7)
memory usage: 205.3+ KB


In [12]:
df_GYM_churn = pd.read_csv("data/gym_churn_us.csv")

In [13]:
df_GYM_churn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 14 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   gender                             4000 non-null   int64  
 1   Near_Location                      4000 non-null   int64  
 2   Partner                            4000 non-null   int64  
 3   Promo_friends                      4000 non-null   int64  
 4   Phone                              4000 non-null   int64  
 5   Contract_period                    4000 non-null   int64  
 6   Group_visits                       4000 non-null   int64  
 7   Age                                4000 non-null   int64  
 8   Avg_additional_charges_total       4000 non-null   float64
 9   Month_to_end_contract              4000 non-null   float64
 10  Lifetime                           4000 non-null   int64  
 11  Avg_class_frequency_total          4000 non-null   float

In [14]:
df_GYM_churn

Unnamed: 0,gender,Near_Location,Partner,Promo_friends,Phone,Contract_period,Group_visits,Age,Avg_additional_charges_total,Month_to_end_contract,Lifetime,Avg_class_frequency_total,Avg_class_frequency_current_month,Churn
0,1,1,1,1,0,6,1,29,14.227470,5.0,3,0.020398,0.000000,0
1,0,1,0,0,1,12,1,31,113.202938,12.0,7,1.922936,1.910244,0
2,0,1,1,0,1,1,0,28,129.448479,1.0,2,1.859098,1.736502,0
3,0,1,1,1,1,12,1,33,62.669863,12.0,2,3.205633,3.357215,0
4,1,1,1,1,1,1,0,26,198.362265,1.0,3,1.113884,1.120078,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995,1,1,1,0,1,12,0,33,2.406023,12.0,8,2.284497,2.349070,0
3996,0,1,0,0,1,1,1,29,68.883764,1.0,1,1.277168,0.292859,1
3997,1,1,1,1,1,12,0,28,78.250542,11.0,2,2.786146,2.831439,0
3998,0,1,1,1,1,6,0,32,61.912657,5.0,3,1.630108,1.596237,0
