
## Project overview
Healthcare is vital for individuals and society, as it promotes well-being, prevents diseases,promoting health equity and many more.

This  project is crucial for creating healthier societies and fostering a culture of proactive health management.

## Business understanding
#### Problem statement
Develop a recommendetion system for exercise intensity that provides personalized recommendations on appropriate workout intensities based on individual characteristics, including age, gender, BMI, exercise duration, heart rate, calories burned, weather conditions, and desired weight goals. The goal is to guide individuals in selecting exercise intensities that optimize their fitness outcomes, taking into account their specific attributes and preferences.
#### Business Understanding
In today's thriving fitness and wellness industry, the development of a recommender system for exercise intensity presents valuable business opportunities. Fitness centers, gyms, and personal trainers can leverage this system to offer tailored workout programs that align with individual goals, preferences, and fitness levels, ultimately attracting and retaining members. Wellness apps and platforms can integrate the recommender system to deliver personalized exercise recommendations, enhancing the user experience and setting them apart from competitors. Healthcare providers can utilize the system to promote physical activity as a means of disease prevention and management, while corporate wellness programs can leverage it to support employee well-being and productivity. By incorporating an exercise intensity recommender system, businesses can optimize workout effectiveness, increase customer satisfaction, and differentiate their offerings in a competitive market.

#### Objectives
Overall Objective: Develop a Recommender System for Personalized Exercise Intensity

1. To personalize exercise intensity recommendations. Build a recommendation system based on individual characteristics such as age, gender, body mass index (BMI), exercise duration, heart rate, calories burned, weather conditions, and desired weight goals.
2. Develop a model that can predict the optimal exercise intensity for a given individual. 
3. Identify the factors that contribute to optimal exercise intensity.
4. To develop a recommender system that can dynamically adjust exercise intensity recommendations based on changing weather conditions. The system should consider the impact of different weather conditions on workout performance and suggest appropriate exercise intensities accordingly

## Data Understanding

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import phik
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV, GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OrdinalEncoder, Normalizer
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

In [None]:
df1 = pd.read_csv('exercise_datasett.csv')
df1

In [None]:
df1.drop('ID',axis=1,inplace=True)
df1

In [None]:
#import pandas as pd
#from pandas_profiling import ProfileReport

# Load the dataset
#data = pd.read_csv('exercise_datasett.csv')

# Generate the profile report
#profile = ProfileReport(data)

# Save the report to an HTML file
#profile.to_file('report.html')


In [None]:
# A function to print the shape of our datasets
def print_dataset_shape(*datasets):
    """
    Prints the shape of one or more datasets (number of rows and columns).
    Assumes datasets are in a Pandas DataFrame format.
    """
    for idx, dataset in enumerate(datasets):
        print(f"Dataset {idx + 1} - Number of rows: {dataset.shape[0]}")
        print(f"Dataset {idx + 1} - Number of columns: {dataset.shape[1]}")
# print the shape of our dataset
print_dataset_shape(df1)

In [None]:
# Function to display the head of our datasets
def display_data_head(df1):
    dfs = [df1.head()]
    df_names = ["data"]
    for df, name in zip(dfs, df_names): 
        print(f"\n{name}:\n")
        display(df)
# Display the head of our datasets
display_data_head(df1)

In [None]:
#checking info of data
df1.info()

In [None]:
# A function to check for duplicates in our datasets
def check_duplicates(df):
    """
    This function checks for and returns any duplicates in a given dataframe.
    """
    duplicates = df[df.duplicated()]
    if duplicates.shape[0] == 0:
        print("No duplicates found in the dataset")
    else:
        print("Duplicates found in the dataset:")
        return duplicates
# Calling for the function to check for duplicates
check_duplicates(df1)

In [None]:
df1.columns

### Missing Values

In [None]:
# A function to check for missing values in our dataset
def check_missing_values(data):
    # Count missing values in each column
    missing_values = data.isnull().sum()

    # Convert missing values count to percentage of total rows
    missing_percent = (missing_values / len(data)) * 100

    # Combine the missing values count and percent into a DataFrame
    missing_df = pd.concat([missing_values, missing_percent], axis=1)
    missing_df.columns = ['Missing Values', '% of Total']
# Return only columns with missing values
    missing_df = missing_df[missing_df['Missing Values'] > 0]

    return missing_df

# Check missing values in each dataset
display(check_missing_values(df1))

Calories Burn, Dream Weight, Actual Weight, Duration,Heart Rate,Age, BMI, contains missing values and we will fill them.

### Data Preparation

In [None]:
df1['Calories Burn'].ffill(inplace=True)
df1['Dream Weight'].bfill(inplace=True)
df1['Actual Weight'].ffill(inplace=True)
df1['Duration'].bfill(inplace=True)
df1['Heart Rate'].ffill(inplace=True)
df1['Age'].bfill(inplace=True)
df1['BMI'].ffill(inplace=True)

Gender and Weather condition will use foward fill respectively

In [None]:
df1['Gender'].ffill(inplace=True)
df1['Weather Conditions'].ffill(inplace=True)
df1['Exercise Intensity'].ffill(inplace=True)


In [None]:
# Check missing values in each dataset
display(check_missing_values(df1))

### Outliers

In [None]:
cols = ['Calories Burn', 'Dream Weight', 'Actual Weight', 'Age', 'Duration', 'Heart Rate', 'BMI', 'Exercise Intensity']

In [None]:
def remove_outliers(df, columns, threshold=3):
    
    df_cleaned = df.copy()  # Create a copy of the DataFrame
    
    for column in columns:
        z_scores = (df[column] - df[column].mean()) / df[column].std()  # Calculate Z-scores
        outliers = df_cleaned.loc[abs(z_scores) > threshold]  # Find outliers based on threshold
        df_cleaned = df_cleaned.drop(outliers.index)  # Remove outliers from the DataFrame
    
    return df_cleaned


# Specify the columns to remove outliers from
columns_to_remove_outliers = cols

# Remove outliers from the DataFrame
df_cleaned = remove_outliers(df1, columns_to_remove_outliers)

# Plot box plots for the cleaned DataFrame
plt.figure(figsize=(10, 8))
for i, column in enumerate(columns_to_remove_outliers):
    plt.subplot(len(columns_to_remove_outliers)//2 + len(columns_to_remove_outliers)%2, 2, i+1)
    sns.boxplot(data=df_cleaned, x=column)
    plt.title(f"{column} Distribution")
plt.tight_layout()
plt.show()

### Feature Engineering

In [None]:
# check the unique variables for each category.
for col in df1.columns:
    print('\n' + col + '\n')
    print(df1[col].value_counts())

In [None]:
df2 = df1[['Age','Duration','BMI','Calories Burn']].copy()
df2

In [None]:
# Function to map BMI values to weight categories
def BMI(BMI):
    if BMI < 18.5:
        return 'Underweight'
    elif 18.5 <= BMI < 24.9:
        return 'Normal weight'
    elif 24.9 <= BMI < 29.9:
        return 'Overweight'
    elif 29.9 <= BMI < 34.9:
        return 'Obesity class I'
    elif 34.9 <= BMI < 39.9:
        return 'Obesity class II'
    else:
        return 'Obesity class III'

# Apply the categorize_bmi function to the 'BMI' column to create a new column representing weight categories
df2['Weight Category'] = df2['BMI'].apply(BMI)


In [None]:
# Function to map Duration values to duration categories
def Duration(Duration):
    if 18 <= Duration <=26:
        return '19-26 minutes'
    elif 27 <= Duration <=34:
        return '27-34 minutes'
    elif 35 <= Duration <=42:
        return '35-42 minutes'
    elif 43 <= Duration <=50:
        return '43-50 minutes'
    elif 51 <= Duration <=58:
        return '51-58 minutes'
    elif Duration >= 58:
        return '58 minutes and above'

# Apply the Duration_Category function to the 'Duration' column to create a new column representing duration categories
df2['minute duration'] = df2['Duration'].apply(Duration)


In [None]:
# Function to map Duration values to duration categories
def Age(Age):
    if 18 <= Age<=25:
        return '18-25'
    elif 26 <= Age <=33:
        return '26-33'
    elif 34 <= Age <=41:
        return '34-41'
    elif 42 <= Age <=49:
        return '43-50'
    elif 50 <= Age <=57:
        return '50-57'
    elif Age >= 58:
        return 'Age 58 and above'

# Apply the Duration_Category function to the 'Duration' column to create a new column representing duration categories
df2['Age group'] = df2['Age'].apply(Age)


In [None]:
# Function to map Duration values to duration categories
def Calories_Burn(Calories_Burn):
    if 100<= Calories_Burn<=200.999999:
        return '100-200'
    elif 201 <= Calories_Burn<=300.999999:
        return '201-300'
    elif 301 <= Calories_Burn<=400.999999:
        return '301-400'
    elif 401 <= Calories_Burn<=500:
        return '401-500'        
    

# Apply the Duration_Category function to the 'Duration' column to create a new column representing duration categories
df2['Calories group'] = df2['Calories Burn'].apply(Calories_Burn)


In [None]:
# Check missing values in each dataset
display(check_missing_values(df2))

In [None]:
df2

In [None]:
df2.describe()

### Exploratory Data Analysis

#### Age Distribution Analysis

In [None]:
plt.figure(figsize=(12,8))
# create a countplot
sns.countplot(y='Age group',order=df2['Age group'].value_counts().index[0:10], data=df2)
plt.title('Age distribution'); 

Younger individuals exercise more due to higher energy levels, better physical capabilities, and prioritizing health and appearance. Societal trends and peer influence, flexible schedules, and the recognition of long-term health benefits contribute to their active participation in physical activities.

As we age, there are gradual changes that occur in our bodies, particularly in relation to our bones and muscles. Our bones begin to experience a gradual loss of calcium and other minerals, leading to decreased bone density. This makes our skeleton less resilient and more susceptible to fractures and injuries.
This contributes to a gradual decline in muscle mass and overall strength. As a result, the body becomes less robust and may experience difficulties in performing physical tasks that were once easier.

#### Duration distribution in minutes.

In [None]:
plt.figure(figsize=(12, 8))
# create a countplot
sns.countplot(x='minute duration', order=df2['minute duration'].value_counts().iloc[:10].index, data=df2)
plt.title('Duration Distribution')
plt.xticks(rotation=90)  # Rotate the x-axis labels by 90 degrees
plt.show();

The most frequent duration was 19-50 minutes, the highest duration being 19-26 minutes with a mean of 35-42 minutes.


Distribution f weather conditions

In [None]:
df1['Weather Conditions'].value_counts()

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Weather Conditions', data=df1)
plt.xlabel('Weather Conditions')
plt.ylabel('Count')
plt.title('Distribution of Weather Conditions')
plt.show()

The weather condition counts are almost equal.

#### Actual Weight and Dream Weight Analysis: Exploring Weight Goals and Aspirations

In [None]:
plt.plot(df1.groupby(by='Age').mean().index,df1.groupby(by='Age').mean().iloc[:,1],label='Actual Weight')
plt.plot(df1.groupby(by='Age').mean().index,df1.groupby(by='Age').mean().iloc[:,2],color='r',label='Dream Weight')
plt.xlabel('Age')
plt.ylabel('Mean Values')
plt.legend()
plt.show()

Trend in Actual Weight: The plot shows an increasing trend in actual weight as age group increases. On average, individuals tend to have higher actual weight as they get older.

Trend in Dream Weight: The plot indicates that dream weight values vary across age groups. There isn't a clear trend of increasing or decreasing dream weight with age.

Comparing Actual and Dream Weight: The plot reveals that, on average, individuals tend to have higher actual weight than their desired or dream weight across age groups.

In [None]:
plt.scatter(df1.groupby(by='Exercise').mean().iloc[:,0], df1.groupby(by='Exercise').mean().iloc[:,2],  color='green',marker='o')
plt.xlabel("Mean Calories Burn by Exercise")
plt.ylabel("Mean Actual Weight by Exercise")
plt.title('Calories Burn and Actual Weight Relation')
plt.show()

There is a noticeable relationship between calories burned and actual weight based on exercise type. As the mean calories burned increase, there is a tendency for the mean actual weight to decrease. This suggests that exercises with higher calorie burn tend to be associated with lower mean actual weight.
The scatter plot shows a general negative trend between calories burned and actual weight, indicating that individuals who burn more calories during exercise tend to have lower mean actual weight.

Exploring the Relationship between Actual Weight and Age, with gender variation.

In [None]:
# Create a custom color palette for genders
gender_palette = {"Female": "red", "Male": "blue"}

# Create the scatter plot
ssp = sns.scatterplot(data=df1.head(50), x="Age", y="Actual Weight", hue="Gender", palette=gender_palette, style=None)

# Remove the exercise markings
ssp.get_legend().remove()

# Add a custom legend for genders
legend_handles = [plt.Line2D([], [], marker='o', color='w', markerfacecolor=color, markersize=8) for color in gender_palette.values()]
plt.legend(legend_handles, gender_palette.keys(), title='Gender', loc='upper left')

# Set the title
plt.title("Actual Weight to Age")

# Show the plot
plt.show()

In [None]:
# Create the scatter plot
ssp = sns.scatterplot(data=df1.head(50), x="Age", y="Actual Weight", hue="Exercise")

# Remove the legend title
ssp.get_legend().set_title('')

# Set the title
plt.title("Actual Weight to Age")

# Move the legend outside the plot
ssp.legend(loc='center left', bbox_to_anchor=(1, 0.5))

# Show the plot
plt.show()

We can observe a general trend where as age increases, there is a tendency for actual weight to increase as well. This is evident by the overall upward trend of the data points.
There is no clear association between exercise type and actual weight based on the scatter plot.

How does weather condition affect Exercise Intensity?

In [None]:
# Create a bar plot of exercise intensity by weather conditions
plt.figure(figsize=(20, 8))
sns.countplot(x='Weather Conditions', hue='Exercise Intensity', data=df1)
plt.xlabel('Weather Conditions')
plt.ylabel('Count')
plt.title('Exercise Intensity by Weather Conditions')
plt.legend(title='Exercise Intensity')
plt.show()

Sunny Weather Conditions:

Moderate Intensity: The most common exercise intensity level during sunny weather is moderate.
Low Intensity: There are fewer exercise sessions at a low intensity during sunny weather compared to moderate intensity.

Rainy Weather Conditions:
Low Intensity: The count of low-intensity exercises is higher during rainy weather.
High Intensity: There are fewer high-intensity exercises during rainy weather.

Cloudy Weather Conditions:

Moderate Intensity: The majority of exercise sessions during cloudy weather are at a moderate intensity level.
Low and High Intensity: The count of low and high-intensity exercises during cloudy weather may vary.
Overall Observations:

Different Weather, Different Intensities: Each weather condition shows a different distribution of exercise intensity levels.
Weather Influence: Weather conditions may influence exercise intensity preferences and choices.

* Comparison of Mean Dream Weight and Actual Weight Across Age Groups

In [None]:
plt.figure(figsize=(12, 8))  # Set the figure size to 12 inches (width) by 8 inches (height)
plt.bar(df1.groupby(by='Age').mean().index, df1.groupby(by='Age').mean().iloc[:,1], label='Dream Weight')
plt.bar(df1.groupby(by='Age').mean().index, df1.groupby(by='Age').mean().iloc[:,2], color='darkred', label='Actual Weight', bottom=df1.groupby(by='Age').mean().iloc[:,1])
plt.xlabel('Age')
plt.ylabel('Mean Values')
plt.legend()
plt.show()

Across all age groups, the graph illustrates that the mean actual weight exceeds the mean dream weight. This suggests that individuals in the dataset, regardless of their age, generally have higher actual weights compared to their desired or ideal weights.

Is there a relationship or association between age group and exercise intensity?

In [None]:
# Create a contingency table for Age Group and Exercise Intensity
contingency_table = pd.crosstab(df2['Age group'], df1['Exercise Intensity'])

# Print the contingency table
contingency_table

In [None]:
# Create a heatmap of the contingency table with colors
sns.heatmap(contingency_table, cmap='YlGnBu')

# Add labels and title
plt.xlabel('Exercise Intensity')
plt.ylabel('Age group')
plt.title('Contingency Table: Age group vs Exercise Intensity')

# Show the plot
plt.show()

* There is a higher concentration of individuals in the age group "18-25" with exercise intensity levels 1.0, 2.0, and 3.0.
* The age group "26-33" shows a relatively even distribution across various exercise intensity levels.
* The age groups "34-41" and "43-50" have a higher count of individuals with exercise intensity levels 1.0 and 2.0.
* The age groups "50-57" and "58 and above" tend to have lower counts across all exercise intensity levels, indicating a  potentially lower participation in higher intensity exercises

What is the distribution of heart rate base on different Age group?

In [None]:
# plt.figure(figsize=(10, 6))  # Increase the figure size
# plt.bar(df2['Age group'], df1['Heart Rate'], width=0.6)  # Adjust the width of the bars
# plt.xlabel('Age group')
# plt.ylabel('Heart Rate')
# plt.title('Age - Heart Rate Relation')
# plt.show()

In [None]:
# plt.bar(df2['minute duration'],df1['Calories Burn'],width=0.3)
# plt.xlabel('minute duration')
# plt.ylabel("calories burn")
# plt.title('minute duration vs calories burned')
# plt.show()

In [None]:
# plt.bar(df2['Weight Category'],df1['Age'],width=0.3)
# plt.xlabel('Weight Category')
# plt.ylabel("Age")
# plt.title('weight category vs age')
# plt.show()

In [None]:
sns.pairplot(data = df1, height = 2); 

In [None]:
#heatmap of the correlation matrix 
plt.subplots(figsize=(12,12))
sns.heatmap(df1.corr(),annot=True)

The darker the shade of blue, the stronger the positive correlation between the two variables, while the darker the shade of red, the stronger the negative correlation between the two variables. A value of 1 indicates a perfect positive correlation, a value of -1 indicates a perfect negative correlation, and a value of 0 indicates no correlation between the two variables.

* Calories Burn and BMI: The correlation coefficient is -0.096, indicating a negative correlation, suggesting that higher BMI values are associated with lower calorie burn.

* Calories Burn and Actual Weight: The correlation coefficient is -0.11, indicating a negative correlation. This suggests that as actual weight increases, the calories burned tend to decrease.

* Calories Burn and Duration: The correlation coefficient is 0.0096, indicating a weak positive correlation. This implies that there is a slight tendency for longer durations of exercise to result in higher calories burned.

* Calories Burn and Heart Rate: The correlation coefficient is -0.088, indicating a negative correlation. This suggests that as heart rate increases, the calories burned tend to decrease.

* Calories Burn and Exercise Intensity: The correlation coefficient is -0.0022, indicating a very weak negative correlation. This suggests that exercise intensity has a minimal effect on calories burned.

* Actual Weight and BMI: The correlation coefficient is 0.15, indicating a positive correlation. This suggests that higher actual weight is associated with higher BMI values.

* Actual Weight and Dream Weight: The correlation coefficient is 0.24, indicating a positive correlation. This suggests that there is a relationship between actual weight and desired/dream weight.

* BMI and Duration: The correlation coefficient is -0.0048, indicating a very weak negative correlation. This suggests that there is a minimal relationship between BMI and exercise duration.

* BMI and Heart Rate: The correlation coefficient is 0.13, indicating a positive correlation. This suggests that higher BMI values are associated with higher heart rates.

* Duration and Heart Rate: The correlation coefficient is -0.0085, indicating a very weak negative correlation. This suggests that there is a minimal relationship between exercise duration and heart rate.