<a href="https://colab.research.google.com/github/Viren-Kathpal/Personalized-Fitness-Plan-Creator/blob/main/Personalized_Fitness_Plan_Creator_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Personalized Fitness Plan Creator using Random Forest Regressor and Decision
  Tree Regressor

Step 1: Generate Synthetic Dataset

In [None]:
import pandas as pd
import random

# Define possible values for each column
ages = range(18, 60)   # Define a range of ages from 18 to 59
weights = [round(random.uniform(50, 100), 1) for _ in range(1000)]   # Generate a list of random weights between 50 and 100 for 1000 individuals
heights = [round(random.uniform(150, 200), 1) for _ in range(1000)]  # Generate a list of random heights between 150 and 200 for 1000 individuals
goals = ["lose weight", "build muscle", "increase flexibility"]   # Define possible fitness goals
days_per_week = range(1, 8)   # Define a range of days per week for workout from 1 to 7
durations = ["short", "medium", "long"]   # Define possible workout durations
exercises = {   # Define exercises for each fitness goal
    "lose weight": ["running", "cycling", "jump rope", "swimming"],
    "build muscle": ["push-ups", "squats", "lunges", "planks", "dumbbell curls"],
    "increase flexibility": ["yoga", "stretching"]
}
duration_ranges = {   # Define duration ranges for each workout duration
    "short": (15, 30),
    "medium": (30, 45),
    "long": (45, 60)
}

# Generate synthetic data
data = []
for _ in range(1000):
    age = random.choice(ages)    # Randomly select an age from the defined range
    weight = random.choice(weights)  # Randomly select a weight from the generated weights list
    height = random.choice(heights)  # Randomly select a height from the generated heights list
    goal = random.choice(goals)  # Randomly select a fitness goal from the defined goals
    days = random.choice(days_per_week)   # Randomly select days per week for workout from the defined range
    duration_pref = random.choice(durations)   # Randomly select a workout duration preference from the defined durations
    exercise = random.choice(exercises[goal])   # Randomly select an exercise based on the selected fitness goal
    duration_minutes = random.randint(*duration_ranges[duration_pref])   # Randomly select a duration within the range for the chosen duration preference

    data.append([age, weight, height, goal, days, duration_pref, exercise, duration_minutes])  # Append the generated data to the list

# Create DataFrame
df = pd.DataFrame(data, columns=["age", "weight", "height", "goal", "days_per_week", "duration", "exercise", "duration_minutes"])  # Create a DataFrame from the generated data with appropriate column names


# Save to CSV
df.to_csv("fitness_data.csv", index=False)

print("fitness_data.csv file has been created.")


fitness_data.csv file has been created.


Step 2: Train RandomForestRegressor and DecisionTreeRegressor Models

In [None]:
from sklearn.model_selection import train_test_split  # Import train_test_split function to split the data into training and testing sets
from sklearn.ensemble import RandomForestRegressor  # Import RandomForestRegressor model
from sklearn.tree import DecisionTreeRegressor  # Import DecisionTreeRegressor model
from sklearn.preprocessing import OneHotEncoder  # Import OneHotEncoder for preprocessing categorical features
from sklearn.compose import ColumnTransformer  # Import ColumnTransformer for preprocessing
from sklearn.pipeline import Pipeline  # Import Pipeline for creating a pipeline of preprocessing steps and model training

# Separate features and target
X = df.drop(columns=["exercise", "duration_minutes"])  # Extract features (excluding exercise and duration_minutes columns) from the DataFrame
y = df["duration_minutes"]  # Extract target variable (duration_minutes) from the DataFrame

# Preprocess categorical features
categorical_features = ["goal", "duration"]  # Define categorical feature columns
numeric_features = ["age", "weight", "height", "days_per_week"]  # Define numeric feature columns

preprocessor = ColumnTransformer(
    transformers=[
        ("num", "passthrough", numeric_features),  # Pass through numeric features without preprocessing
        ("cat", OneHotEncoder(), categorical_features)  # Apply OneHotEncoder to categorical features
    ]
)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # Split the data into training and testing sets with a test size of 20% and a random state of 42

# Train RandomForestRegressor
rf_pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),  # Preprocess data using the preprocessor
    ("regressor", RandomForestRegressor())  # Train RandomForestRegressor
])
rf_pipeline.fit(X_train, y_train)  # Fit the RandomForestRegressor pipeline on the training data

# Train DecisionTreeRegressor
dt_pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),  # Preprocess data using the preprocessor
    ("regressor", DecisionTreeRegressor())  # Train DecisionTreeRegressor
])
dt_pipeline.fit(X_train, y_train)  # Fit the DecisionTreeRegressor pipeline on the training data

# Evaluate models
rf_score = rf_pipeline.score(X_test, y_test)  # Calculate the score of RandomForestRegressor on the testing data
dt_score = dt_pipeline.score(X_test, y_test)  # Calculate the score of DecisionTreeRegressor on the testing data

print("RandomForestRegressor Score:", rf_score)  # Print the score of RandomForestRegressor
print("DecisionTreeRegressor Score:", dt_score)  # Print the score of DecisionTreeRegressor


RandomForestRegressor Score: 0.8598795119738669
DecisionTreeRegressor Score: 0.7307124145450926


Step 3: Use the Trained Models for Prediction

In [None]:

from sklearn.model_selection import train_test_split  # Import train_test_split function to split the data into training and testing sets
from sklearn.ensemble import RandomForestRegressor  # Import RandomForestRegressor model
from sklearn.tree import DecisionTreeRegressor  # Import DecisionTreeRegressor model
from sklearn.preprocessing import OneHotEncoder  # Import OneHotEncoder for preprocessing categorical features
from sklearn.compose import ColumnTransformer  # Import ColumnTransformer for preprocessing
from sklearn.pipeline import Pipeline  # Import Pipeline for creating a pipeline of preprocessing steps and model training

# Separate features and target
X = df.drop(columns=["exercise", "duration_minutes"])  # Extract features (excluding exercise and duration_minutes columns) from the DataFrame
y = df["duration_minutes"]  # Extract target variable (duration_minutes) from the DataFrame

# Preprocess categorical features
categorical_features = ["goal", "duration"]  # Define categorical feature columns
numeric_features = ["age", "weight", "height", "days_per_week"]  # Define numeric feature columns

preprocessor = ColumnTransformer(
    transformers=[
        ("num", "passthrough", numeric_features),  # Pass through numeric features without preprocessing
        ("cat", OneHotEncoder(), categorical_features)  # Apply OneHotEncoder to categorical features
    ]
)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # Split the data into training and testing sets with a test size of 20% and a random state of 42

# Train RandomForestRegressor
rf_pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),  # Preprocess data using the preprocessor
    ("regressor", RandomForestRegressor())  # Train RandomForestRegressor
])
rf_pipeline.fit(X_train, y_train)  # Fit the RandomForestRegressor pipeline on the training data

# Train DecisionTreeRegressor
dt_pipeline = Pipeline(steps=[
    ("preprocessor", preprocessor),  # Preprocess data using the preprocessor
    ("regressor", DecisionTreeRegressor())  # Train DecisionTreeRegressor
])
dt_pipeline.fit(X_train, y_train)  # Fit the DecisionTreeRegressor pipeline on the training data

# Evaluate models
rf_score = rf_pipeline.score(X_test, y_test)  # Calculate the score of RandomForestRegressor on the testing data
dt_score = dt_pipeline.score(X_test, y_test)  # Calculate the score of DecisionTreeRegressor on the testing data

print("RandomForestRegressor Score:", rf_score)  # Print the score of RandomForestRegressor
print("DecisionTreeRegressor Score:", dt_score)  # Print the score of DecisionTreeRegressor

RandomForestRegressor Score: 0.8563336394854835
DecisionTreeRegressor Score: 0.7343885025540329


Step 4: Collect User Input

In [None]:
# Collect user input
user_age = int(input("Enter your age: "))  # Prompt the user to enter their age and convert it to an integer
user_weight = float(input("Enter your weight (in kg): "))  # Prompt the user to enter their weight in kilograms and convert it to a float
user_height = float(input("Enter your height (in cm): "))  # Prompt the user to enter their height in centimeters and convert it to a float
user_goal = input("Enter your fitness goal (lose weight/build muscle/increase flexibility): ")  # Prompt the user to enter their fitness goal
user_days_per_week = int(input("Enter number of days per week for exercise: "))  # Prompt the user to enter the number of days per week for exercise and convert it to an integer
user_duration = input("Enter preferred workout duration (short/medium/long): ")  # Prompt the user to enter their preferred workout duration

# Create input data for prediction
input_data = {
    "age": user_age,  # Assign the user's age to the "age" key in the input data dictionary
    "weight": user_weight,  # Assign the user's weight to the "weight" key in the input data dictionary
    "height": user_height,  # Assign the user's height to the "height" key in the input data dictionary
    "goal": user_goal,  # Assign the user's fitness goal to the "goal" key in the input data dictionary
    "days_per_week": user_days_per_week,  # Assign the number of days per week for exercise to the "days_per_week" key in the input data dictionary
    "duration": user_duration  # Assign the user's preferred workout duration to the "duration" key in the input data dictionary
}


Enter your age: 34
Enter your weight (in kg): 87
Enter your height (in cm): 178
Enter your fitness goal (lose weight/build muscle/increase flexibility): lose weight
Enter number of days per week for exercise: 6
Enter preferred workout duration (short/medium/long): medium


Step 5: Use Trained Model for Prediction

In [None]:
# Make prediction using RandomForestRegressor
rf_prediction = rf_pipeline.predict(pd.DataFrame([input_data]))  # Use the trained RandomForestRegressor pipeline to predict the duration using the input data
rf_duration = round(rf_prediction[0])  # Round the predicted duration to the nearest integer

# Make prediction using DecisionTreeRegressor
dt_prediction = dt_pipeline.predict(pd.DataFrame([input_data]))  # Use the trained DecisionTreeRegressor pipeline to predict the duration using the input data
dt_duration = round(dt_prediction[0])  # Round the predicted duration to the nearest integer


Step 6: Generate Fitness Plan

In [None]:
def generate_fitness_plan(duration, days_per_week):
    # Define exercises and their respective durations
    exercises = {
        "running": 7,
        "cycling": 8,
        "jump rope": 10,
        "swimming": 12,
        "push-ups": 5,
        "squats": 6,
        "lunges": 6,
        "planks": 5,
        "dumbbell curls": 8,
        "yoga": 10,
        "stretching": 8
    }

    # Calculate total duration for the week
    total_duration = duration * days_per_week

    # Calculate number of exercises to do per day
    num_exercises_per_day = total_duration // len(exercises)

    # Generate fitness plan for each day
    fitness_plan = {f"Day {i+1}": {exercise: reps * num_exercises_per_day for exercise, reps in exercises.items()} for i in range(days_per_week)}

    return fitness_plan

# Generate fitness plans
rf_fitness_plan = generate_fitness_plan(rf_duration, user_days_per_week)    # Generate fitness plan using the predicted duration from RandomForestRegressor
dt_fitness_plan = generate_fitness_plan(dt_duration, user_days_per_week)     # Generate fitness plan using the predicted duration from DecisionTreeRegressor


Step 7: Display Fitness Plan

In [None]:
# Display fitness plans
def display_fitness_plan(fitness_plan):
    print("Your Personalized Fitness Plan:")
    for day, exercises in fitness_plan.items():
        print(f"\n{day}:")
        for exercise, reps in exercises.items():
            print(f"- {exercise}: {reps} reps")

# Display fitness plans generated using RandomForestRegressor
print("Fitness Plan generated using RandomForestRegressor:")
display_fitness_plan(rf_fitness_plan)

# Display fitness plans generated using DecisionTreeRegressor
print("\nFitness Plan generated using DecisionTreeRegressor:")
display_fitness_plan(dt_fitness_plan)


Fitness Plan generated using RandomForestRegressor:
Your Personalized Fitness Plan:

Day 1:
- running: 133 reps
- cycling: 152 reps
- jump rope: 190 reps
- swimming: 228 reps
- push-ups: 95 reps
- squats: 114 reps
- lunges: 114 reps
- planks: 95 reps
- dumbbell curls: 152 reps
- yoga: 190 reps
- stretching: 152 reps

Day 2:
- running: 133 reps
- cycling: 152 reps
- jump rope: 190 reps
- swimming: 228 reps
- push-ups: 95 reps
- squats: 114 reps
- lunges: 114 reps
- planks: 95 reps
- dumbbell curls: 152 reps
- yoga: 190 reps
- stretching: 152 reps

Day 3:
- running: 133 reps
- cycling: 152 reps
- jump rope: 190 reps
- swimming: 228 reps
- push-ups: 95 reps
- squats: 114 reps
- lunges: 114 reps
- planks: 95 reps
- dumbbell curls: 152 reps
- yoga: 190 reps
- stretching: 152 reps

Day 4:
- running: 133 reps
- cycling: 152 reps
- jump rope: 190 reps
- swimming: 228 reps
- push-ups: 95 reps
- squats: 114 reps
- lunges: 114 reps
- planks: 95 reps
- dumbbell curls: 152 reps
- yoga: 190 reps
- s

Choosing the RandomForestRegressor and DecisionTreeRegressor algorithms for my personalized fitness plan generator involves considering various factors, including the nature of the problem, the available data, and the desired outcomes.

1.Synthetic Dataset Creation:
   - I have decided to create a synthetic dataset to simulate diverse user profiles and preferences. This dataset includes features such as age, weight, height, fitness goal, days per week for exercise, preferred workout duration, and corresponding exercise options and durations.
   - For this task, RandomForestRegressor and DecisionTreeRegressor are suitable algorithms because they can handle both numerical and categorical features, making them appropriate for predicting workout durations based on various user inputs.

2.Preprocessing and Model Training:
   - After creating the synthetic dataset, I preprocess the data, encoding categorical features and leaving numerical features unchanged.
   - RandomForestRegressor and DecisionTreeRegressor are capable of handling this preprocessing step seamlessly. They can work with both raw and preprocessed data without requiring extensive feature engineering.
   - I split the dataset into training and testing sets to evaluate the models' performance. These algorithms are well-suited for this task as they provide built-in methods for splitting data and training models.

3.Model Evaluation:
   - I have evaluated the performance of the trained models using appropriate metrics. RandomForestRegressor and DecisionTreeRegressor offer methods to calculate various evaluation metrics, allowing me to assess their performance effectively.
   - Based on the evaluation results, I can compare the performance of the two algorithms and select the one that performs better for your specific use case.

4.Generating Fitness Plan:
   - Once I have a trained model, I use it to predict workout durations based on user input. RandomForestRegressor and DecisionTreeRegressor excel at making predictions for new, unseen data.
   - After obtaining predicted durations, I generate personalized fitness plans for users. These plans include exercises and their respective repetitions, tailored to the user's input preferences.
   - The choice of these algorithms for prediction aligns with the need for flexibility and adaptability in generating personalized fitness plans. RandomForestRegressor and DecisionTreeRegressor can handle complex decision-making processes, making them suitable for this task.

5.Displaying Fitness Plan:
   - Finally, I display the generated fitness plans to users, presenting the recommended exercises and repetitions for each day of the week.
   - The display process involves iterating through the generated fitness plan dictionary and presenting the information in a structured format.
   - This step doesn't directly involve the machine learning algorithms but relies on the predictions and recommendations generated earlier.

In summary, RandomForestRegressor and DecisionTreeRegressor were chosen for my personalized fitness plan generator due to their ability to handle the prediction task effectively, adapt to diverse user inputs, and provide interpretable results. These algorithms complement each other, allowing me to compare their performance and select the most suitable one for the application.

1. RandomForestRegressor Score (0.8563):
   - The score of 0.8563 for the RandomForestRegressor indicates that approximately 85.63% of the variance in the duration of workouts is explained by the features included in the model.
   - This high score suggests that the RandomForestRegressor model is performing well in predicting workout durations based on the input features.
   - RandomForestRegressor is an ensemble learning method that combines multiple decision trees, leading to improved generalization performance compared to a single decision tree.

2. DecisionTreeRegressor Score (0.7344):
   - The score of 0.7344 for the DecisionTreeRegressor indicates that approximately 73.44% of the variance in the duration of workouts is explained by the features included in the model.
   - This score is lower than that of the RandomForestRegressor, suggesting that the DecisionTreeRegressor may be overfitting the training data to some extent.
   - DecisionTreeRegressor constructs a single decision tree based on the training data, which may lead to overfitting if the tree becomes too complex and captures noise in the data.

Reasons for the Differences in Scores:
- RandomForestRegressor typically performs better than DecisionTreeRegressor because it reduces overfitting by averaging the predictions of multiple trees, resulting in a more robust model.
- DecisionTreeRegressor may have lower performance due to its tendency to overfit the training data, especially if the tree is allowed to grow deep without regularization.

In summary, the RandomForestRegressor model achieved a higher score, indicating better performance in explaining the variance in workout durations compared to the DecisionTreeRegressor model. This difference is likely due to the ensemble nature of RandomForestRegressor, which reduces overfitting and improves generalization.

1.Decision Tree:
Concept: A decision tree is a hierarchical structure consisting of nodes that represent decisions or features, branches that represent possible outcomes, and leaf nodes that represent the final decision or prediction. It recursively splits the dataset into subsets based on the most significant attribute at each step.
Working: The decision tree algorithm partitions the feature space into regions by recursively splitting the data based on feature values. It selects the best feature to split the data at each node using metrics like information gain or Gini impurity. This process continues until a stopping criterion is met, such as reaching a maximum depth or purity threshold.
Advantages:
Easy to understand and interpret.
Can handle both numerical and categorical data.
Doesn't require extensive data preprocessing.
Disadvantages:
Prone to overfitting, especially with deep trees.
Sensitive to small variations in the data.
Not suitable for capturing complex relationships in the data.


2.Random Forest:
Concept: Random Forest is an ensemble learning technique that constructs multiple decision trees during training and outputs the mode or mean prediction of the individual trees as the final prediction. It introduces randomness in the tree-building process to reduce overfitting and improve generalization.
Working: Random Forest builds multiple decision trees using bootstrapped samples of the training data and a subset of features at each split. It aggregates the predictions of individual trees to produce a more stable and accurate prediction. The randomness introduced during training helps decorrelate the trees and reduce variance.
Advantages:
Reduces overfitting by averaging the predictions of multiple trees.
Handles high-dimensional data and large datasets effectively.
Provides feature importance scores for interpretation.
Disadvantages:
Less interpretable than a single decision tree.
Slower to train and make predictions compared to a single decision tree.
May not perform well on imbalanced datasets or datasets with noisy features.


In summary, Decision Trees are intuitive and easy to interpret but prone to overfitting, while Random Forests address this issue by building an ensemble of trees and introducing randomness. RandomForestRegressor typically achieves better performance and generalization compared to DecisionTreeRegressor, making it a popular choice for various regression tasks.

Reason For Choosing these two algos

RandomForestRegressor and DecisionTreeRegressor are commonly used for regression tasks in machine learning due to several reasons:

1.Flexibility: Both algorithms can handle a wide range of data types, including numerical and categorical features, making them versatile choices for regression problems with diverse datasets.

2.Non-linearity: They are capable of capturing non-linear relationships between features and target variables, which is important in many real-world regression tasks where linear models may not suffice.

3.Interpretability: Decision trees, in particular, offer a high level of interpretability, allowing users to understand how decisions are made and which features are most influential in predicting the target variable.

4.Ensemble Learning: RandomForestRegressor, being an ensemble method, combines multiple decision trees to improve predictive performance and reduce overfitting. This makes it robust and effective, especially when dealing with noisy or complex datasets.

5.Scalability: While decision trees can be computationally efficient, Random Forests are known for their scalability, allowing them to handle large datasets with ease.

6.Feature Importance: Both algorithms provide insights into feature importance, enabling users to identify the most relevant features contributing to the prediction.

7.Performance: RandomForestRegressor often outperforms other regression algorithms, including linear regression, especially when dealing with high-dimensional data or datasets with complex relationships.

Several other algorithms can be used

1.Gradient Boosting Regressor (GBR):
   - GBR builds an ensemble of weak learners (typically decision trees) sequentially, with each tree correcting the errors of its predecessor. It often achieves higher predictive accuracy compared to Random Forests but may be more prone to overfitting.

2.Support Vector Regressor (SVR):
   - SVR aims to find the hyperplane that best fits the data points while minimizing the error between predicted and actual values. It works well for datasets with a clear margin of separation between classes and can handle high-dimensional data effectively.

3.Linear Regression:
   - Linear regression models the relationship between the independent variables and the target variable by fitting a linear equation to the observed data. It is simple, interpretable, and computationally efficient but may not capture non-linear relationships well.

4.K-Nearest Neighbors Regressor (KNN):
   - KNN predicts the value of a data point by averaging the values of its k nearest neighbors. It is non-parametric and flexible but sensitive to the choice of distance metric and the number of neighbors.

5.Neural Network Regressor:
   - Neural networks, particularly deep learning models, can learn complex patterns and relationships in data through multiple layers of interconnected nodes (neurons). They are highly flexible and can capture intricate non-linear relationships but require large amounts of data and computational resources.

6.Lasso and Ridge Regression:
   - Lasso and Ridge regression are regularization techniques that penalize the magnitude of the coefficients to prevent overfitting. They are useful when dealing with high-dimensional data or multicollinearity.

7.ElasticNet Regression:
   - ElasticNet combines the penalties of Lasso and Ridge regression, offering a balance between variable selection (L1 penalty) and coefficient shrinkage (L2 penalty). It is effective in situations where both feature selection and regularization are desired.

8.XGBoost and LightGBM:
   - XGBoost and LightGBM are gradient boosting frameworks known for their efficiency and high performance. They offer enhancements over traditional gradient boosting methods and are widely used in regression tasks.

