# Prédire le risque de surentraînement
## Objectif : Utiliser les caractéristiques instantanées pour estimer si cette personne est à risque de surentraînement ou non, en se basant sur des corrélations entre les variables.
### Variables clés à analyser :
- **Resting_BPM** : Un Resting_BPM élevé peut indiquer un stress physiologique.
- **Fat_Percentage** et BMI : Un déséquilibre peut suggérer un métabolisme perturbé.
- **Workout_Frequency et Session_Duration** : Une fréquence ou durée excessive peut être un facteur de risque.
- **Calories_Burned** : Un nombre anormalement bas ou élevé par rapport à la moyenne du groupe.
- **Experience_Level** : Les débutants et les athlètes expérimentés n’ont pas les mêmes risques.

### Import

In [1]:
import pandas as pd
import numpy as np

### Consolidation des datas
Nous avons 2 datasets qui possèdent les mêmes colonnes, nous allons les unir pour en former qu'un et en faire un csv.

In [3]:
df1 = pd.read_csv('exercise_tracking.csv')
df1.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,56,Male,88.3,1.71,180,157,60,1.69,1313.0,Yoga,12.6,3.5,4,3,30.2
1,46,Female,74.9,1.53,179,151,66,1.3,883.0,HIIT,33.9,2.1,4,2,32.0
2,32,Female,68.1,1.66,167,122,54,1.11,677.0,Cardio,33.4,2.3,4,2,24.71
3,25,Male,53.2,1.7,190,164,56,0.59,532.0,Strength,28.8,2.1,3,1,18.41
4,38,Male,46.1,1.79,188,158,68,0.64,556.0,Strength,29.2,2.8,3,1,14.39


In [5]:
df2 = pd.read_csv('exercise_tracking_synthetic_data.csv')
df2.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,34.0,Female,86.7,1.86,174,152.0,74.0,1.12,712.0,Strength,12.8,2.4,5.0,2.0,14.31
1,26.0,Female,84.7,1.83,166,156.0,73.0,1.0,833.0,Strength,27.9,2.8,5.0,2.0,33.49
2,22.0,Male,64.8,1.85,187,166.0,64.0,1.24,1678.0,Cardio,28.7,1.9,3.0,2.0,12.73
3,54.0,Female,75.3,1.82,187,169.0,58.0,1.45,628.0,Cardio,31.8,2.4,4.0,1.0,20.37
4,34.0,Female,52.8,1.74,177,169.0,66.0,1.6,1286.0,Strength,26.4,3.2,4.0,2.0,20.83


In [6]:
df1.shape, df2.shape

((973, 15), (1800, 15))

In [8]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 973 entries, 0 to 972
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            973 non-null    int64  
 1   Gender                         973 non-null    object 
 2   Weight (kg)                    973 non-null    float64
 3   Height (m)                     973 non-null    float64
 4   Max_BPM                        973 non-null    int64  
 5   Avg_BPM                        973 non-null    int64  
 6   Resting_BPM                    973 non-null    int64  
 7   Session_Duration (hours)       973 non-null    float64
 8   Calories_Burned                973 non-null    float64
 9   Workout_Type                   973 non-null    object 
 10  Fat_Percentage                 973 non-null    float64
 11  Water_Intake (liters)          973 non-null    float64
 12  Workout_Frequency (days/week)  973 non-null    int

In [9]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1800 entries, 0 to 1799
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            1790 non-null   float64
 1   Gender                         1729 non-null   object 
 2   Weight (kg)                    1778 non-null   float64
 3   Height (m)                     1774 non-null   float64
 4   Max_BPM                        1779 non-null   object 
 5   Avg_BPM                        1770 non-null   float64
 6   Resting_BPM                    1781 non-null   float64
 7   Session_Duration (hours)       1777 non-null   float64
 8   Calories_Burned                1777 non-null   float64
 9   Workout_Type                   1739 non-null   object 
 10  Fat_Percentage                 1784 non-null   float64
 11  Water_Intake (liters)          1776 non-null   float64
 12  Workout_Frequency (days/week)  1742 non-null   f

In [None]:
# Convertir les types de df2 en type de df1 car c'est les plus cohérentes
for col in df1.columns:
    dtype = df1[col].dtype
    if dtype == 'int64':
        df2[col] = pd.to_numeric(df2[col], errors='coerce').fillna(0).astype('int64') # on forme la conversion en cas d'erreur
    elif dtype == 'float64':
        df2[col] = pd.to_numeric(df2[col], errors='coerce').fillna(0.0)
    else:
        df2[col] = df2[col].astype(dtype)

# Vérifier les types
print(df2.dtypes)

Age                                int64
Gender                            object
Weight (kg)                      float64
Height (m)                       float64
Max_BPM                            int64
Avg_BPM                            int64
Resting_BPM                        int64
Session_Duration (hours)         float64
Calories_Burned                  float64
Workout_Type                      object
Fat_Percentage                   float64
Water_Intake (liters)            float64
Workout_Frequency (days/week)      int64
Experience_Level                   int64
BMI                              float64
dtype: object


In [14]:
df = pd.concat([df1, df2], ignore_index=True)
df.shape

(2773, 15)

In [16]:
df.to_csv('consolidated_exercise_tracking.csv', index=False)