<div>
<img src="https://raw.githubusercontent.com/RafaelCaballero/tdm/refs/heads/master/images/mioti.png" width=600 />
</ivdi>
<div style="background-color: #E9967A; color: black; padding: 10px; border-radius: 5px;">
<h1>Study on the variables influencing calorie burning </h1>
</div>

# Table of contents
1. [Data loading, cleaning, and restructuring](#data-loading,-cleaning-and-restructuring)<br>
2. [Visualization of variables individually](#visualization-of-variables-individually)<br>
3. [Outliers](#outliers)<br>
4. [Statistical tests](#statistical-tests)<br>
5. [Correlation between variables](#correlation-between-variables)<br>
6. [Regression model](#regression-model)<br>
7. [Conclusions](#conclusions)<br>

## DATA LOADING, CLEANING AND RESTRUCTURING

First, we read the data from the csv file.

In [None]:
#Carga de datos
import numpy as np
import pandas as pd

df= pd.read_csv('gym_members_exercise_tracking.csv', sep=',')
df.head(10)

We drop the column'water_intake' since it is not of interest.

In [None]:
df= df.drop(columns='Water_Intake (liters)')

We obtain some statistics to familiarize ourselves with the data and the information of each column through our own 'info_df()' function.

In [None]:
df.describe()

In [None]:
def info_df(df):
    return pd.DataFrame({
        'Columna': df.columns,
        'No Nulos': df.notnull().sum().values,
        'Nulos': df.isnull().sum().values,
        'Tipo Python': df.dtypes.values,
        'Núm. valores': [ df[col].cat.ordered  if isinstance(df[col].dtype, pd.CategoricalDtype) else None
            for col in df.columns  ]
    })

for col in df.columns:
    if len(df[col].unique())<10:
        print(df[col].value_counts())
        print("="*50)
        
info_df(df)

We convert the columns 'Gender', 'Workout_Type', 'Workout_Frequency', and 'Experience_Level' into categorical variables:

In [None]:
from pandas.api.types import CategoricalDtype 

cat_type=CategoricalDtype(categories=['Male','Female'], ordered=False)
df['Gender'] = df['Gender'].astype(cat_type)

cat_type = CategoricalDtype(categories=['Strength','Cardio','Yoga','HIIT'], ordered=False) 
df['Workout_Type'] = df['Workout_Type'].astype(cat_type)

df['Workout_Frequency (days/week)']=df['Workout_Frequency (days/week)'].astype(str)
cat_type = CategoricalDtype(categories=['2','3','4','5'], ordered=False) 
df['Workout_Frequency (days/week)'] = df['Workout_Frequency (days/week)'].astype(cat_type)

#La columna 'experience_level' la convertimos en palabras y después cambiamos su tipo
mapa={1:'Principiante',2:'Intermedio',3:'Avanzado'}
df['Experience_Level']=df['Experience_Level'].map(mapa)
cat_type=CategoricalDtype(categories=['Principiante','Intermedio','Avanzado'], ordered=False)
df['Experience_Level']=df['Experience_Level'].astype(cat_type)

info_df(df)

## VISUALIZATION OF VARIABLES INDIVIDUALLY

<div style="background-color: blue; padding: 10px; border-radius: 5px;">
We now proceed to present some graphs of the variables individually to better understand the data. Graphs of all variables are not presented for the sake of brevity.
</div>

First, we plot histograms of all the continuous variables:

In [None]:
#Histogramas de las variables continuas
import matplotlib.pyplot as plt

# Columnas a graficar
columnas_a_graficar = ['Age', 'Weight (kg)', 'Height (m)','Max_BPM', 'Avg_BPM', 'Session_Duration (hours)','Calories_Burned','Fat_Percentage','BMI']

# Crear subgráficos
fig, axes = plt.subplots(3, 3, figsize=(12, 8))
axes = axes.flatten()

for i, col in enumerate(columnas_a_graficar):
    ax = axes[i]
    df[col].plot(kind='density', ax=ax, color='blue', xlim=(df[col].min(), df[col].max()))
    df[col].plot(kind='hist', ax=ax, bins=30, density=True, alpha=0.5, color='orange')
    ax.set_title(f'Distribución de {col}')
    ax.set_xlabel(col)
    ax.set_ylabel('Densidad/Frecuencia')

# Ajustar diseño
plt.tight_layout()
plt.show()


Comments on each of the graphs:  

- **Distribution of Age**: A nearly uniform distribution is observed.  

- **Distribution of Weight (kg)**: A right-skewed normal distribution is observed. Most weights are concentrated between 60 and 80 kg.  
- **Distribution of Height (m)**: A roughly bimodal distribution is observed, with peaks around 1.63 m and 1.78 m. These peaks evidently correspond to the average heights of each gender (Male and Female).  
- **Distribution of Max_BPM**: A uniform distribution is observed.  
- **Distribution of Avg_BPM**: A uniform distribution is observed, with a slight (almost negligible) peak around 132 BPM.  
- **Distribution of Session_Duration (hours)**: A roughly symmetric normal distribution is observed. However, the right tail has slightly greater preponderance.  
- **Distribution of Calories_Burned**: A right-skewed normal distribution is observed. The most common values are between 800 and 900.  
- **Distribution of Fat_Percentage**: A broadly left-skewed (negative) normal distribution is observed.  
- **Distribution of BMI**: A right-skewed normal distribution is observed. The most common values are around 25, as expected.  

Next, all bar charts are displayed grouped into a single image for the sake of brevity. The variables shown here are categorical and provide the following information:

In [None]:
#Todos los gráficos de barras juntos.
fig, axes= plt.subplots(2,2,figsize=(20,18))

df['Experience_Level'].value_counts().plot(kind='bar',ax=axes[0,0],figsize=(10, 8), color='blue', alpha=0.5, title='Frecuencia de cada nivel de exp.',
                                           xlabel='Nivel de experiencia')
df['Workout_Type'].value_counts().plot(kind='bar',ax=axes[0,1],figsize=(10, 8), color='orange',alpha=0.5, title='Frecuencia de cada tipo de entrenamiento',
                                       xlabel='Tipo de entrenamiento')
df['Gender'].value_counts().plot(kind='bar',ax=axes[1,0],figsize=(10, 8), color='orange',alpha=0.5, title='Frecuencia de género',
                                 xlabel='Género')
df['Workout_Frequency (days/week)'].value_counts().sort_index().plot(kind='bar',ax=axes[1,1],figsize=(10, 8), color='blue', alpha=0.5, title='Frecuencia de entrenamiento', 
                                                                     xlabel='Días de entrenamiento por semana')
axes[0, 0].set_xticklabels(axes[0, 0].get_xticklabels(), rotation=45)
axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=45)
axes[1, 0].set_xticklabels(axes[1, 0].get_xticklabels(), rotation=45)
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=0)
plt.tight_layout()
plt.show()

Comments on each of the bar charts:  

- **First chart**: Shows the number of people at each of the three experience levels in training: beginner, intermediate, and advanced. It is observed that there are far fewer advanced individuals compared to beginners and intermediates.  

- **Second chart**: Provides information about the type of training performed by individuals. The four types are fairly balanced; however, strength and cardio are the most prevalent.  
- **Third chart**: Simply shows the number of men and women in the sample. There are more men than women.  
- **Fourth chart**: Details the number of days per week each person trains. It is evident that training 3 or 4 days per week is the most common.  

## OUTLIERS

<div style="background-color: blue; padding: 10px; border-radius: 5px;">
    Once we have an idea of how our data behaves, we proceed to detect the outliers in each column. We calculate them numerically for all the variables since it is essential to understand the data we will use to train our model. Additionally, we will represent some variables using box-and-whisker plots.
</div>

The method chosen to calculate the outliers is *Hampel X84*, which uses the median and the MAD (Mean Absolute Deviation). Points that deviate more than $1.4826 \times \theta\times MAD$ from the median will be considered outliers. We arbitrarily set the value of $\theta=3$.

In [None]:
from scipy.stats import median_abs_deviation

for col in columnas_a_graficar:
    columna = df[col]
    mediana = columna.median()
    MAD = median_abs_deviation(columna)
    inferior = mediana -  1.4826*3*MAD
    superior = mediana + 1.4826*3*MAD
    filtro_outliers_sup = columna > superior
    filtro_outliers_inf = columna < inferior
    
    if columna[filtro_outliers_inf].empty and columna[filtro_outliers_sup].empty:
        print(f'La columna {col} no tiene outliers para \u03B8 = 3')
    elif columna[filtro_outliers_inf].empty and not columna[filtro_outliers_sup].empty:
        print(f'Los outliers superiores de la columna {col} son:', columna[filtro_outliers_sup].tolist())
    elif not columna[filtro_outliers_inf].empty and columna[filtro_outliers_sup].empty:
        print(f'Los outliers inferiores de la columna {col} son:', columna[filtro_outliers_inf].tolist())
    else:
        print(f'Los outliers inferiores de la columna {col} son:', columna[filtro_outliers_inf].tolist())
        print(f'Los outliers superiores de la columna {col} son:', columna[filtro_outliers_sup].tolist())



We observe that many columns do not have outliers. On the other hand, the 'BMI' column has a relatively high number of outliers, possibly because the \( \theta \) value is too restrictive for this variable.

Next, we create box-and-whisker plots for the variables that show outliers and for the variable 'Avg_BPM'. Since this type of plot uses the interquartile range method to calculate outliers, more outliers may appear compared to the *Hampel X84* method.

In [None]:
#Graficos de caja y bigotes
fig, axes = plt.subplots(2,2,figsize=(12,10))

df['Weight (kg)'].plot(kind='box', ax=axes[0,0], figsize=(11,9), boxprops=dict(color='blue'), medianprops=dict(linewidth=2,color='orange'), title='Weight (kg)',
                       whiskerprops=dict(color='blue'),capprops=dict(color='blue'))

df['Avg_BPM'].plot(kind='box', ax=axes[0,1], boxprops=dict(color='blue'), medianprops=dict(linewidth=2,color='orange'), title='Avg BPM', whiskerprops=dict(color='blue'),capprops=dict(color='blue'))

df['Calories_Burned'].plot(kind='box', ax=axes[1,0],boxprops=dict(color='blue'), medianprops=dict(linewidth=2,color='orange'), title='Calories Burned',
                           whiskerprops=dict(color='blue'),capprops=dict(color='blue'))

df['BMI'].plot(kind='box', ax=axes[1,1],boxprops=dict(color='blue'), medianprops=dict(linewidth=2,color='orange'), title='BMI', whiskerprops=dict(color='blue'),capprops=dict(color='blue'))


plt.tight_layout()
plt.show()

It can be observed that more outliers have appeared than before. For 'Weight', they are very close to the upper limit, so we will not address them. For 'Calories Burned', although some deviate slightly further, we will also leave them untouched. The same applies to 'BMI', which does have some fairly distant outliers, but we will leave them as it is a secondary measure dependent on two independent variables (weight and height) and is not of much use to us.

The reason for not addressing the outliers is that they do not seem to result from erroneous measurements but rather from extreme cases of overweight in the case of 'Weight' and 'BMI' (as these variables are correlated). The outliers in 'Calories Burned' are plausible.

## STATISTICAL TESTS


<div style="background-color: blue; padding: 10px; border-radius: 5px;">
After studying the outliers, we proceed to perform some statistical tests on certain variables. In our DataFrame, we have categorical variables such as 'Gender', 'Exp_Level', 'Workout_Frequency', and 'Workout_Type'. We will analyze how the quantitative variables behave depending on the category they belong to. Later, we will explore the relationships between the categorical variables themselves.
</div>

First, we will perform a *t-student* test on the categories of 'Gender' (Male/Female) for the variable 'Calories_Burned'. The goal is to determine whether the mean calories burned differ depending on gender.

In [None]:
import scipy.stats as stats

calorias_hombres = df[df['Gender'] == 'Male']['Calories_Burned']
calorias_mujeres = df[df['Gender'] == 'Female']['Calories_Burned']

# Prueba t-student
t_stat, p_value = stats.ttest_ind(calorias_hombres, calorias_mujeres, equal_var=True)

print(f"Estadístico t: {t_stat}")
print(f"Valor p: {p_value}")
alpha=0.05
if p_value < alpha:
    print(f'Rechazamos la hipótesis nula: hay diferencia significativa entre la media de calorías quemadas.')
else:
    print(f'No se puede rechazar la hipótesis nula: no hay diferencia significativa entre la media de calorías quemadas.')

We conclude that the mean calories burned differ significantly depending on gender.

Now, we perform an ANOVA test on the variable 'Calories_Burned' based on the type of workout to determine if all workout types are equal in terms of calorie burning or if there is a difference. The null hypothesis (\( H_0 \)) is that the mean calories burned is the same for all workout types. Let's see:

In [None]:
calorias_fuerza=df[df['Workout_Type']=='Strength']['Calories_Burned']
calorias_cardio=df[df['Workout_Type']=='Cardio']['Calories_Burned']
calorias_yoga=df[df['Workout_Type']=='Yoga']['Calories_Burned']
calorias_hiit=df[df['Workout_Type']=='HIIT']['Calories_Burned']

f_stat, p_value = stats.f_oneway(calorias_fuerza, calorias_cardio, calorias_yoga, calorias_hiit)

print(f"Estadístico F: {f_stat}")
print(f"Valor p: {p_value}")
alpha=0.05
if p_value < alpha:
    print(f'Rechazamos la hipótesis nula: las medias son significativamente diferentes.')
else:
    print(f'No se puede rechazar la hipótesis nula: las medias no son significativamente diferentes.')

The high $F$ value and the $P > 0.05$ prevent us from rejecting the null hypothesis, meaning there is insufficient evidence to conclude that the mean calories burned differ across workout types.

Now we perform another ANOVA test between 'Calories_Burned' and 'Experience_Level' to determine whether experience influences calorie burning.

In [None]:
calorias_principiante=df[df['Experience_Level']=='Principiante']['Calories_Burned']
calorias_intermedio=df[df['Experience_Level']=='Intermedio']['Calories_Burned']
calorias_avanzado=df[df['Experience_Level']=='Avanzado']['Calories_Burned']

f_stat, p_value = stats.f_oneway(calorias_principiante, calorias_intermedio, calorias_avanzado)

print(f"Estadístico F: {f_stat}")
print(f"Valor p: {p_value}")
alpha=0.05
if p_value < alpha:
    print(f'Rechazamos la hipótesis nula: las medias son significativamente diferentes.')
else:
    print(f'No se puede rechazar la hipótesis nula: las medias no son significativamente diferentes.')

In this case, we can reject the null hypothesis and affirm that the mean calories burned vary significantly depending on the level of experience. However, we do not yet know if there are two groups where it does not vary or if it varies across all three levels.

To conclude the ANOVA tests, we examine whether the weekly training frequency influences the calories burned per workout. At first glance, they should not be related, but let's see:

In [None]:
calorias_dos_dias=df[df['Workout_Frequency (days/week)']=='2']['Calories_Burned']
calorias_tres_dias=df[df['Workout_Frequency (days/week)']=='3']['Calories_Burned']
calorias_cuatro_dias=df[df['Workout_Frequency (days/week)']=='4']['Calories_Burned']
calorias_cinco_dias=df[df['Workout_Frequency (days/week)']=='5']['Calories_Burned']

f_stat, p_value = stats.f_oneway(calorias_principiante, calorias_intermedio, calorias_avanzado)

print(f"Estadístico F: {f_stat}")
print(f"Valor p: {p_value}")
alpha=0.05
if p_value < alpha:
    print(f'Rechazamos la hipótesis nula: las medias son significativamente diferentes.')
else:
    print(f'No se puede rechazar la hipótesis nula: las medias no son significativamente diferentes.')

The result shows that calories burned per workout vary significantly depending on the number of days per week someone trains. We will explore a possible explanation for this later.

Now we perform a test $\chi^2$ between all the categorical variables to determine whether there is a significant association between them.

In [None]:
columnas_categoricas=['Gender','Workout_Type','Workout_Frequency (days/week)','Experience_Level']
for i, col1 in enumerate(columnas_categoricas):
    for col2 in columnas_categoricas[i+1:]:
        tabla_contingencia = pd.crosstab(df[col1], df[col2])
        chi2_stat, p_value, dof, expected = stats.chi2_contingency(tabla_contingencia)
        
        print(f'\nComparando {col1} y {col2}: Estadístico chi-cuadrado: {chi2_stat}, Valor p: {p_value}')
        alpha = 0.05
        if p_value < alpha:
            print(f'Rechazamos la hipótesis nula: Existe una asociación significativa entre {col1} y {col2}.')
        else:
            print(f'No se puede rechazar la hipótesis nula: No se encontró asociación significativa entre {col1} y {col2}.')
        print('----------------------------------------------------------------------------------------------------------------------')

As we can see, the only relationship showing a significant association is between 'Weekly Training Frequency' and 'Experience Level'.

## CORRELATION BETWEEN VARIABLES


<div style="background-color: blue; padding: 10px; border-radius: 5px;">
Next, we proceed to analyze the CORRELATIONS between some variables.
</div>

We begin by analyzing the correlation between 'Workout_Frequency' and 'Experience_Level', which we previously observed to have a relationship.

In [None]:
#Graficamos un mapa de calor
import seaborn as sns

tabla_contingencia_new = pd.crosstab(df['Workout_Frequency (days/week)'],df['Experience_Level'])
plt.figure(figsize=(10, 8))
sns.heatmap(tabla_contingencia_new, annot=True, cmap="YlGnBu", fmt='d')
plt.title("Heatmap de frecuencias entre Frecuencia de entrenamiento y Nivel de experiencia")
plt.xlabel("Nivel de experiencia")
plt.ylabel("Frecuencia de entrenamiento")
plt.show()

This heatmap illustrates quite well that higher experience levels are associated with higher training frequency (days per week).

In the next graph, we show the correlation between session duration and calories burned, separated by gender. It can be observed that the relationship is linear, as expected.

In [None]:
#Gráfico de lineas
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='Session_Duration (hours)', y="Calories_Burned",hue="Gender")
plt.title('Calorías quemadas dependiendo la duración del entrenamiento')
plt.xlabel('Duración del entrenamiento')
plt.ylabel('Calorías quemadas')
plt.grid(True)
plt.show()

Now, we proceed to display several violin and swarm plots for groups of three variables, where two are categorical and one is continuous.

The first plot shows calories burned as a function of workout type, classified by gender.

In [None]:
fig=plt.figure(figsize=(9,7))
sns.violinplot(data=df,x='Workout_Type',y='Calories_Burned',hue='Gender')
plt.xlabel('Tipo de entrenamiento')
plt.ylabel('Calorías quemadas')
plt.title('Calorías quemadas dependiendo del tipo de entrenamiento y clasificado por género')
plt.show()

It is observed that calories burned are roughly the same across all workout types except for Yoga, which burns slightly fewer calories (as previously analyzed, now clearly visualized). In all cases, men burn more calories than women. The fact that the "violins" show width all the way to the tips indicates that the data is dispersed rather than concentrated around the mean.

The next plot shows calories burned as a function of experience level, again divided by gender.

In [None]:
fig=plt.figure(figsize=(9,7))
sns.violinplot(data=df,x='Experience_Level',y='Calories_Burned',hue='Gender')
plt.xlabel('Nivel de experiencia')
plt.ylabel('Calorías quemadas')
plt.title('Calorías quemadas dependiendo del nivel de experiencia y clasificado por género')
plt.show()

It is clear that as the level of experience increases, the calories burned per workout also increase (as we previously observed, there is a significant difference in at least one group). The difference between men and women is only noticeable at the advanced level. In this case, the "violins" for the beginner level show a large dispersion of values, while the intermediate and advanced levels are more concentrated around the mean.

The next swarm plot shows calories burned as a function of weekly training frequency and experience level. This will help us understand why we previously found that the number of training days per week influences calories burned per workout.

In [None]:
#Gráfico de enjambre
fig, ax = plt.subplots(figsize=(10,8))
sns.swarmplot(data=df, x="Workout_Frequency (days/week)", y="Calories_Burned",hue='Experience_Level')
plt.xlabel('Días de entrenamiento por semana')
plt.ylabel('Calorías quemadas')
plt.title('Calorías quemadas en función de la frecuencia semanal de entrenamiento y del nivel de experiencia')
plt.legend()
plt.show()

We can see that beginners typically train 2 or 3 days per week; intermediates, 3 or 4; and advanced individuals, 4 or 5. Additionally, as we have just observed, calories burned increase with experience level. Finally, regarding why calories burned are related to weekly training frequency, it is evident that those with higher experience levels train more days per week and also burn more calories per workout. 

Thus, we conclude that more calories are not burned per workout by training more days per week, but rather due to having a higher experience level.

Another interesting variable to study is the fat percentage ('Fat_Percentage'). Let’s create two violin plots to observe its behavior.

In [None]:
#Gráfico de violines
fig=plt.figure(figsize=(9,7))
sns.violinplot(data=df,x='Experience_Level',y='Fat_Percentage',hue='Gender')
plt.xlabel('Nivel de experiencia')
plt.ylabel('Porcentaje de grasa corporal')
plt.title('Porcentaje de grasa corporal dependiendo del nivel de experiencia y clasificado por género')
plt.show()

This plot clearly shows that individuals with an advanced level have a significantly lower fat percentage compared to those at intermediate and beginner levels, with much less dispersion as well. As expected, the fat percentage is higher in women than in men across all groups, reflecting biological differences. It is also noteworthy that there is no perceptible difference in fat percentage between the intermediate and beginner levels, suggesting that achieving low fat levels requires a substantial amount of time.

In [None]:
#Gráfico de violines
fig=plt.figure(figsize=(9,7))
sns.violinplot(data=df,x='Workout_Type',y='Fat_Percentage',hue='Experience_Level')
plt.xlabel('Nivel de experiencia')
plt.ylabel('Porcentaje de grasa corporal')
plt.title('Porcentaje de grasa corporal dependiendo del nivel de experiencia y clasificado por género')
plt.show()

This last plot is also highly significant: first, we observe the previously mentioned difference between experience levels and fat percentage (only advanced individuals show significant differences). Additionally, it is evident that 'HIIT' training yields the best results in terms of fat percentage.

## REGRESSION MODEL

<div style="background-color: blue; padding: 10px; border-radius: 5px;">
Finally, we will train a linear regression model to predict the calories burned per workout based on the parameters that have the greatest influence on it.
</div>

We train the linear regression model using the variables: age, height, weight, experience level, gender, and session duration, as these are the factors that most influence calorie burning.

In [None]:
import statsmodels.api as sm
from sklearn.model_selection import train_test_split


#Entrenamiento del modelo de regresión
X = pd.get_dummies(df[['Age','Height (m)','Weight (kg)','Experience_Level','Gender', 'Session_Duration (hours)']], drop_first=True)
X = X.astype(float)
X = sm.add_constant(X)
y = df['Calories_Burned']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

modelo_calorias = sm.OLS(y_train, X_train).fit()  

Now that we have trained the model, we test and evaluate it:

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

#Predicciones del modelo
y_pred = modelo_calorias.predict(X_test)

#Evaluación del modelo
bias = np.mean(y_pred - y_test)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
rmse_normalizado = rmse / (np.max(y_test) - np.min(y_test))

predicciones= pd.DataFrame(y_pred, columns=['Calorías quemadas predichas'])
print(f'BIAS: {bias}, MAE: {mae}, RMSE: {rmse}, RMSE normalizado: {rmse_normalizado}')
print(predicciones.head(5))

We see that the normalized mean squared error is quite low, indicating that the model fits the data well.

Now, we obtain the model equation to identify which parameters have the most influence.

In [None]:
coeficientes = modelo_calorias.params
print(coeficientes)

We see that the variables with the most influence are session duration, gender, and height.

To conclude, we represent the results. We choose the variables 'Height (m)' and 'Session Duration', as the 'Gender' variable, although it has more influence than 'Height (m)', does not represent well due to being 0 or 1.

In [None]:
import plotly.graph_objects as go


# Crear los puntos para la recta de regresión
rango_altura = np.linspace(df['Height (m)'].min(), df['Height (m)'].max(), 20)
rango_duracion = np.linspace(df['Session_Duration (hours)'].min(), df['Session_Duration (hours)'].max(), 20)
altura_grid, duracion_grid = np.meshgrid(rango_altura, rango_duracion)


# Extraer específicamente los coeficientes deseados
alpha = coeficientes['const']  
coef_altura = coeficientes['Height (m)']
coef_duracion = coeficientes['Session_Duration (hours)']  
calories_pred = alpha + coef_altura * altura_grid + coef_duracion * duracion_grid


# Crear la figura conp plotly
fig = go.Figure()

# Agregar puntos de datos originales
fig.add_trace(go.Scatter3d(
    x=df['Height (m)'],
    y=df['Session_Duration (hours)'],
    z=df['Calories_Burned'],
    mode='markers',
    marker=dict(size=5, color='blue', opacity=0.8),
    name="Datos originales"
))

# Agregar la superficie de la recta de regresión
fig.add_trace(go.Surface(
    x=rango_altura,
    y=rango_duracion,
    z=calories_pred,
    colorscale="viridis",
    opacity=0.5,
    name="Recta de regresión"
))

# Configurar el diseño del gráfico
fig.update_layout(
    title="Regresión Lineal en 3D",
    scene=dict(
        xaxis_title="x - Altura (m)",
        yaxis_title="y - Duración de la sesión (horas) ",
        zaxis_title="z- Calorías quemadas"

    ),
    width=900,  # Ancho de la figura
    height=700  # Altura de la figura
)

# Mostrar el gráfico
fig.show()

## CONCLUSIONS

To conclude, the following findings on calorie burning have been presented:

- The type of training does not influence calorie burning.  
- The level of experience does influence calorie burning (the more experience, the higher the calorie burn).  
- The duration of the training session does influence calorie burning (longer sessions result in more calories burned).  
- Biological factors such as gender and height significantly affect calorie burning, especially gender, where men burn considerably more calories than women.  
- Weight and age have little impact on calorie burning.

