# Regression

## DATASET preparation

### Step 01 : importing libraries

In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

### Step 02 : Loading data

In [4]:
path ="Modified_ObesityDataset_.csv"
df = pd.read_csv(path)

In [3]:
# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive


### Step 03 : split the data into X ( features ) and y (Target)

In [7]:
X = df.drop('Weight', axis=1)
y = df['Weight']

### Step 04 : split the data into training and testing sets

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model 01 : Linear Regression

### Step 01 : import libraries

In [9]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

### Step 02 : fit the classification model

In [10]:
linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)

### Step 03 : evaluate the model

In [14]:
# Make predictions
y_pred_linear = linear_reg.predict(X_test)


# Mean Squared Error
mse_linear = mean_squared_error(y_test, y_pred_linear)
print("Mean Squared Error (Linear Regression):", mse_linear)

# Root Mean Squared Error
rmse_linear = mean_squared_error(y_test, y_pred_linear, squared=False)
print("Root Mean Squared Error (Linear Regression):", rmse_linear)

# Mean Absolute Error
mae_linear = mean_absolute_error(y_test, y_pred_linear)
print("Mean Absolute Error (Linear Regression):", mae_linear)

# R-squared
r2_linear = r2_score(y_test, y_pred_linear)
print("R-squared (Linear Regression):", r2_linear)


Mean Squared Error (Linear Regression): 25.32030052337284
Root Mean Squared Error (Linear Regression): 5.031928111904307
Mean Absolute Error (Linear Regression): 3.92497289170019
R-squared (Linear Regression): 0.9481928818723508


## Model 02 : Random Forest Regression

### Step 01 : import libraries

In [15]:
from sklearn.ensemble import RandomForestRegressor

### Step 2: Create and Fit the Random Forest Classifier

In [16]:
# Fit the Random Forest Regression model
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train, y_train)


### Step 03 : evaluate the model

In [18]:
# Make predictions
y_pred_rf = rf_regressor.predict(X_test)



# Mean Squared Error
mse_rf = mean_squared_error(y_test, y_pred_rf)
print("Mean Squared Error (Random Forest Regression):", mse_rf)

# Root Mean Squared Error
rmse_rf = mean_squared_error(y_test, y_pred_rf, squared=False)
print("Root Mean Squared Error (Random Forest Regression):", rmse_rf)

# Mean Absolute Error
mae_rf = mean_absolute_error(y_test, y_pred_rf)
print("Mean Absolute Error (Random Forest Regression):", mae_rf)

# R-squared
r2_rf = r2_score(y_test, y_pred_rf)
print("R-squared (Random Forest Regression):", r2_rf)


Mean Squared Error (Random Forest Regression): 10.865406480904149
Root Mean Squared Error (Random Forest Regression): 3.296271603024264
Mean Absolute Error (Random Forest Regression): 2.0315549711499523
R-squared (Random Forest Regression): 0.977768613111779


## Comparison


1. **Mean Squared Error (MSE)**:
   - Linear Regression: 25.3203
   - Random Forest Regression: 10.8654
   - Random Forest Regression has a lower MSE, indicating better performance in terms of minimizing the squared differences between predicted and actual values.

2. **Root Mean Squared Error (RMSE)**:
   - Linear Regression: 5.0319
   - Random Forest Regression: 3.2963
   - Random Forest Regression has a lower RMSE, suggesting better performance in terms of the average magnitude of errors.

3. **Mean Absolute Error (MAE)**:
   - Linear Regression: 3.9250
   - Random Forest Regression: 2.0316
   - Random Forest Regression has a lower MAE, indicating better accuracy in predicting the absolute errors.

4. **R-squared (R2)**:
   - Linear Regression: 0.9482
   - Random Forest Regression: 0.9778
   - Random Forest Regression has a higher R-squared value, suggesting that it explains more variance in the target variable and provides a better fit to the data.

Based on these comparisons:
- Random Forest Regression outperforms Linear Regression across all evaluation metrics, demonstrating superior predictive performance in this particular regression task.
- Random Forest Regression shows lower errors (MSE, RMSE, MAE) and higher R-squared compared to Linear Regression, indicating better accuracy and goodness of fit.