# Pr√©dire le risque de surentra√Ænement
## Objectif : Utiliser les caract√©ristiques instantan√©es pour estimer si cette personne est √† risque de surentra√Ænement ou non, en se basant sur des corr√©lations entre les variables.
### Variables cl√©s √† analyser :
- **Resting_BPM** : Un Resting_BPM √©lev√© peut indiquer un stress physiologique.
- **Fat_Percentage** et BMI : Un d√©s√©quilibre peut sugg√©rer un m√©tabolisme perturb√©.
- **Workout_Frequency et Session_Duration** : Une fr√©quence ou dur√©e excessive peut √™tre un facteur de risque.
- **Calories_Burned** : Un nombre anormalement bas ou √©lev√© par rapport √† la moyenne du groupe.
- **Experience_Level** : Les d√©butants et les athl√®tes exp√©riment√©s n‚Äôont pas les m√™mes risques.

### Import

In [6]:
import pandas as pd
import numpy as np

### Consolidation des datas
Nous avons 2 datasets qui poss√®dent les m√™mes colonnes, nous allons les unir pour en former qu'un et en faire un csv.

In [8]:
df1 = pd.read_csv('exercise_tracking.csv')
df1.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,56,Male,88.3,1.71,180,157,60,1.69,1313.0,Yoga,12.6,3.5,4,3,30.2
1,46,Female,74.9,1.53,179,151,66,1.3,883.0,HIIT,33.9,2.1,4,2,32.0
2,32,Female,68.1,1.66,167,122,54,1.11,677.0,Cardio,33.4,2.3,4,2,24.71
3,25,Male,53.2,1.7,190,164,56,0.59,532.0,Strength,28.8,2.1,3,1,18.41
4,38,Male,46.1,1.79,188,158,68,0.64,556.0,Strength,29.2,2.8,3,1,14.39


In [5]:
df2 = pd.read_csv('exercise_tracking_synthetic_data.csv')
df2.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,34.0,Female,86.7,1.86,174,152.0,74.0,1.12,712.0,Strength,12.8,2.4,5.0,2.0,14.31
1,26.0,Female,84.7,1.83,166,156.0,73.0,1.0,833.0,Strength,27.9,2.8,5.0,2.0,33.49
2,22.0,Male,64.8,1.85,187,166.0,64.0,1.24,1678.0,Cardio,28.7,1.9,3.0,2.0,12.73
3,54.0,Female,75.3,1.82,187,169.0,58.0,1.45,628.0,Cardio,31.8,2.4,4.0,1.0,20.37
4,34.0,Female,52.8,1.74,177,169.0,66.0,1.6,1286.0,Strength,26.4,3.2,4.0,2.0,20.83


In [6]:
df1.shape, df2.shape

((973, 15), (1800, 15))

In [8]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 973 entries, 0 to 972
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            973 non-null    int64  
 1   Gender                         973 non-null    object 
 2   Weight (kg)                    973 non-null    float64
 3   Height (m)                     973 non-null    float64
 4   Max_BPM                        973 non-null    int64  
 5   Avg_BPM                        973 non-null    int64  
 6   Resting_BPM                    973 non-null    int64  
 7   Session_Duration (hours)       973 non-null    float64
 8   Calories_Burned                973 non-null    float64
 9   Workout_Type                   973 non-null    object 
 10  Fat_Percentage                 973 non-null    float64
 11  Water_Intake (liters)          973 non-null    float64
 12  Workout_Frequency (days/week)  973 non-null    int

In [9]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1800 entries, 0 to 1799
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            1790 non-null   float64
 1   Gender                         1729 non-null   object 
 2   Weight (kg)                    1778 non-null   float64
 3   Height (m)                     1774 non-null   float64
 4   Max_BPM                        1779 non-null   object 
 5   Avg_BPM                        1770 non-null   float64
 6   Resting_BPM                    1781 non-null   float64
 7   Session_Duration (hours)       1777 non-null   float64
 8   Calories_Burned                1777 non-null   float64
 9   Workout_Type                   1739 non-null   object 
 10  Fat_Percentage                 1784 non-null   float64
 11  Water_Intake (liters)          1776 non-null   float64
 12  Workout_Frequency (days/week)  1742 non-null   f

In [None]:
# Convertir les types de df2 en type de df1 car c'est les plus coh√©rentes
for col in df1.columns:
    dtype = df1[col].dtype
    if dtype == 'int64':
        df2[col] = pd.to_numeric(df2[col], errors='coerce').fillna(0).astype('int64') # on forme la conversion en cas d'erreur
    elif dtype == 'float64':
        df2[col] = pd.to_numeric(df2[col], errors='coerce').fillna(0.0)
    else:
        df2[col] = df2[col].astype(dtype)

# V√©rifier les types
print(df2.dtypes)

Age                                int64
Gender                            object
Weight (kg)                      float64
Height (m)                       float64
Max_BPM                            int64
Avg_BPM                            int64
Resting_BPM                        int64
Session_Duration (hours)         float64
Calories_Burned                  float64
Workout_Type                      object
Fat_Percentage                   float64
Water_Intake (liters)            float64
Workout_Frequency (days/week)      int64
Experience_Level                   int64
BMI                              float64
dtype: object


In [14]:
df = pd.concat([df1, df2], ignore_index=True)
df.shape

(2773, 15)

In [16]:
df.to_csv('consolidated_exercise_tracking.csv', index=False)

# EDA
| Variable                                         | Signification                                             | Ce qu‚Äôelle peut indiquer                                                                    |
| ------------------------------------------------ | --------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| **Resting_BPM** (Battements par minute au repos) | Le rythme cardiaque moyen quand la personne est au repos. | üî∫ S‚Äôil est √©lev√©, cela peut indiquer une fatigue ou un stress ‚Üí risque de surentra√Ænement. |
| **Fat_Percentage**                               | Pourcentage de masse grasse corporelle.                   | S‚Äôil est trop haut ou trop bas selon la morphologie ‚Üí d√©s√©quilibre m√©tabolique.             |
| **BMI** (Indice de masse corporelle)             | Rapport poids/taille.                                     | Permet d‚Äô√©valuer la corpulence ; associ√© √† la condition physique.                           |
| **Workout_Frequency**                            | Combien de fois la personne s‚Äôentra√Æne par semaine.       | Trop d‚Äôentra√Ænements = risque de surmenage.                                                 |
| **Session_Duration**                             | Dur√©e moyenne d‚Äôune s√©ance.                               | Des s√©ances trop longues peuvent aussi indiquer un surentra√Ænement.                         |
| **Calories_Burned**                              | Calories d√©pens√©es pendant l‚Äôentra√Ænement.                | Si la d√©pense √©nerg√©tique est tr√®s √©lev√©e ou tr√®s basse ‚Üí d√©s√©quilibre.                     |
| **Experience_Level**                             | D√©butant, interm√©diaire, avanc√©, expert.                  | Un d√©butant s‚Äô√©puise plus vite ; un expert g√®re mieux son effort.                           |


In [10]:
import pandas as pd
import numpy as np

df = pd.read_csv('consolidated_exercise_tracking.csv')

In [12]:
df.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,56,Male,88.3,1.71,180,157,60,1.69,1313.0,Yoga,12.6,3.5,4,3,30.2
1,46,Female,74.9,1.53,179,151,66,1.3,883.0,HIIT,33.9,2.1,4,2,32.0
2,32,Female,68.1,1.66,167,122,54,1.11,677.0,Cardio,33.4,2.3,4,2,24.71
3,25,Male,53.2,1.7,190,164,56,0.59,532.0,Strength,28.8,2.1,3,1,18.41
4,38,Male,46.1,1.79,188,158,68,0.64,556.0,Strength,29.2,2.8,3,1,14.39


In [14]:
df.shape

(2773, 15)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2773 entries, 0 to 2772
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            2773 non-null   int64  
 1   Gender                         2702 non-null   object 
 2   Weight (kg)                    2773 non-null   float64
 3   Height (m)                     2773 non-null   float64
 4   Max_BPM                        2773 non-null   int64  
 5   Avg_BPM                        2773 non-null   int64  
 6   Resting_BPM                    2773 non-null   int64  
 7   Session_Duration (hours)       2773 non-null   float64
 8   Calories_Burned                2773 non-null   float64
 9   Workout_Type                   2712 non-null   object 
 10  Fat_Percentage                 2773 non-null   float64
 11  Water_Intake (liters)          2773 non-null   float64
 12  Workout_Frequency (days/week)  2773 non-null   i

In [18]:
df.describe()

Unnamed: 0,Age,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
count,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0,2773.0
mean,35.827624,69.262604,1.717083,178.29066,143.80238,62.906599,1.332625,980.115038,23.888604,2.655139,3.263253,1.781464,21.48026
std,12.593311,21.396771,0.208606,21.332486,21.157348,9.341346,0.386121,326.777542,6.298444,0.715767,1.041323,0.78025,7.33063
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,25.0,54.4,1.63,170.0,132.0,56.0,1.09,758.0,20.7,2.1,3.0,1.0,16.06
50%,35.0,66.8,1.73,180.0,144.0,64.0,1.32,966.0,24.9,2.7,3.0,2.0,20.75
75%,47.0,82.3,1.82,190.0,158.0,71.0,1.56,1187.0,28.2,3.3,4.0,2.0,25.68
max,59.0,129.9,2.0,199.0,169.0,74.0,2.0,1783.0,35.0,3.7,5.0,3.0,49.84


In [20]:
df.duplicated().sum()

0

Verification des valeurs manquantes 

In [25]:
df.isnull().sum()

Age                               0
Gender                           71
Weight (kg)                       0
Height (m)                        0
Max_BPM                           0
Avg_BPM                           0
Resting_BPM                       0
Session_Duration (hours)          0
Calories_Burned                   0
Workout_Type                     61
Fat_Percentage                    0
Water_Intake (liters)             0
Workout_Frequency (days/week)     0
Experience_Level                  0
BMI                               0
dtype: int64

In [27]:
(df.isnull().sum() / len(df)) * 100   # pourcentage de Nan pr chaq colonne 

Age                              0.000000
Gender                           2.560404
Weight (kg)                      0.000000
Height (m)                       0.000000
Max_BPM                          0.000000
Avg_BPM                          0.000000
Resting_BPM                      0.000000
Session_Duration (hours)         0.000000
Calories_Burned                  0.000000
Workout_Type                     2.199784
Fat_Percentage                   0.000000
Water_Intake (liters)            0.000000
Workout_Frequency (days/week)    0.000000
Experience_Level                 0.000000
BMI                              0.000000
dtype: float64

In [31]:
df.dropna(subset=['Gender', 'Workout_Type'], inplace = True)

In [33]:
cols = ['Age', 'Height (m)', 'Weight (kg)']
for c in cols:
    print(c, (df[c] == 0).sum())

Age 10
Height (m) 26
Weight (kg) 21


In [37]:
df = df[(df['Age'] != 0) & (df['Height (m)'] != 0) & (df['Weight (kg)'] != 0)]