## Table of Contents
1. Data preparation
2. Exploratory data analysis
3. Model training
4. Conclusion

### Analysis of Exercise and Fitness Metrics and Exercise Intensity prediction
Engaging in regular physical activity is essential for individuals to maintain optimal health and well-being. 
The benefits of physical activity extend far beyond just physical fitness. They encompass mental, emotional, and 
social aspects of our lives, making it an integral part of human existence. 
Likewise, the topic of health is one of the most critical subjects for humanity to prioritize due to its profound 
impact on individuals and society as a whole.

### Gathering individual health data when working out at a gym is crucial for several reasons

Firstly, tracking personal health data allows individuals to monitor their progress and make informed decisions about their fitness goals. By collecting data, individuals can tailor their workout routines, adjust intensity levels, and make necessary modifications to achieve desired results effectively.
Secondly, individual health data provides insights into overall health and potential risk factors. By regularly monitoring metrics such as heart rate variability, resting heart rate, and blood pressure, individuals can identify any abnormalities or potential health issues.

Collecting health data promotes accountability and motivation. When individuals track their progress and see tangible results, it serves as a powerful motivator to continue their fitness journey.

Lastly, the aggregation of health data from gym-goers can contribute to research and the development of evidence-based practices. With consent and proper anonymization, aggregated health data can be used to identify trends, patterns, and correlations that can benefit the larger population.

This project is mostly EDA practice - we take a closer look at the data and look for patterns in it. We will also do a little data preparation step since it's a good practice overall and try to predict Exercise Intensity as our ML task, but there won't be any meaningful tuning and I don't expect good results from it. (Don't mind sklearn pipeline, I'm just probing it's possibilities)

## Description of columns:

- ID - A unique identifier for each sample in the dataset.
- Exercise - The type of exercise performed during the session
- Calories Burn - The estimated number of calories burned during the exercise session.
- Dream Weight - The desired weight of the individual.
- Actual Weight - The measured weight of the individual, including natural variation.
- Age - The age of the individual performing the exercise.
- Gender - The gender of the individual (Male or Female)
- Duration - The duration of each exercise session in minutes.
- Heart Rate - The average heart rate during the exercise session.
- BMI - The body mass index of the individual, indicating body composition.
- Weather Conditions
- Exercise Intensity

## Data preparation
Let's start from loading all necessary libraries and dataframe:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import phik
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV, GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OrdinalEncoder, Normalizer
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

# RandomState
state = np.random.RandomState(12345)

In [None]:
df = pd.read_csv('dataset.csv')

Let's peek into our data:

In [None]:
df.head()

In [None]:
df.info()

Data is well interpretable.

Let's remove the ID column, stock pandas indexing is enough for us and proceed further:

In [None]:
df.drop('ID', axis=1, inplace=True)

And let's change 'Exercise' column to int:

In [None]:
df['Exercise'] = df['Exercise'].map(lambda x: ''.join([i for i in x if i.isdigit()]))

And give some columns more suitable dtypes:

In [None]:
ints = [
    'Exercise', 
    'Age', 
    'Duration', 
    'Heart Rate', 
    'Exercise Intensity'
]

categories = [
    'Gender', 
    'Weather Conditions'
]

for col in ints:
    df[col] = df[col].astype('int16')

for col in categories:
    df[col] = df[col].astype('category')

As we can see data may greatly vary from one individual to another.

P.S. It's always pleasant to work with clean and prepared data. Synthetic origin is too obvious.

## Exploratory data analysis

Let's start with checking descriptive statistics: