# Practical 6: Regression and Its Types\n\n## Objective:\n- Implement simple linear regression using a dataset.\n- Explore and interpret the regression model coefficients and goodness-of-fit measures.\n- Extend the analysis to multiple linear regression and assess the impact of additional predictors.

### 1. Loading and Preparing the Data\nWe'll start by loading the `cars.csv` dataset. For our regression models, we will predict 'Highway mpg'. We select a few relevant features and handle missing values by dropping them. We also rename the columns for easier access.

In [None]:
import pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.metrics import mean_squared_error, r2_score\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\nprint(\"--- Loading and Preparing Data ---\")\ndf = pd.read_csv('cars.csv')\ncols_for_regression = [\n    'Fuel Information.Highway mpg',\n    'Engine Information.Engine Statistics.Horsepower',\n    'Engine Information.Engine Statistics.Cylinders',\n    'Identification.Year'\n]\ndf_reg = df[cols_for_regression].dropna().copy()\ndf_reg.rename(columns={\n    'Fuel Information.Highway mpg': 'Highway_mpg',\n    'Engine Information.Engine Statistics.Horsepower': 'Horsepower',\n    'Engine Information.Engine Statistics.Cylinders': 'Cylinders',\n    'Identification.Year': 'Year'\n}, inplace=True)\n\nprint(\"Data for regression:\")\nprint(df_reg.head())\nprint(\"\\n\")

### 2. Simple Linear Regression\nWe first build a simple linear regression model to predict 'Highway_mpg' using only one independent variable: 'Horsepower'. We'll split the data into training and testing sets, train the model, and evaluate its performance.

In [None]:
print(\"--- Simple Linear Regression ---\")\nX_simple = df_reg[['Horsepower']]\ny_simple = df_reg['Highway_mpg']\n\nX_train_s, X_test_s, y_train_s, y_test_s = train_test_split(X_simple, y_simple, test_size=0.2, random_state=42)\n\nlr_simple = LinearRegression()\nlr_simple.fit(X_train_s, y_train_s)\n\ny_pred_s = lr_simple.predict(X_test_s)\n\nmse_s = mean_squared_error(y_test_s, y_pred_s)\nr2_s = r2_score(y_test_s, y_pred_s)\n\nprint(\"Simple Linear Regression Model Evaluation:\")\nprint(f\"Mean Squared Error (MSE): {mse_s:.4f}\")\nprint(f\"R-squared (R2): {r2_s:.4f}\")\n\nprint(f\"\\nCoefficient (slope): {lr_simple.coef_[0]:.4f}\")\nprint(f\"Intercept: {lr_simple.intercept_:.4f}\")\nprint(\"Interpretation: For each one-unit increase in Horsepower, the Highway mpg is expected to decrease by {:.4f}.\".format(abs(lr_simple.coef_[0])))\nprint(\"\\n\")\n\n# Visualize the regression line\nplt.figure(figsize=(10, 6))\nsns.scatterplot(x=X_test_s['Horsepower'], y=y_test_s, color='blue', label='Actual values')\nsns.lineplot(x=X_test_s['Horsepower'], y=y_pred_s, color='red', label='Regression line')\nplt.title('Simple Linear Regression: Horsepower vs. Highway mpg')\nplt.xlabel('Horsepower')\nplt.ylabel('Highway mpg')\nplt.legend()\nplt.show()

### 3. Multiple Linear Regression\nNext, we extend our analysis to multiple linear regression by including more predictors: 'Cylinders' and 'Year'. We will train a new model and compare its performance to the simple linear regression model.

In [None]:
print(\"--- Multiple Linear Regression ---\")\nX_multi = df_reg[['Horsepower', 'Cylinders', 'Year']]\ny_multi = df_reg['Highway_mpg']\n\nX_train_m, X_test_m, y_train_m, y_test_m = train_test_split(X_multi, y_multi, test_size=0.2, random_state=42)\n\nlr_multi = LinearRegression()\nlr_multi.fit(X_train_m, y_train_m)\n\ny_pred_m = lr_multi.predict(X_test_m)\n\nmse_m = mean_squared_error(y_test_m, y_pred_m)\nr2_m = r2_score(y_test_m, y_pred_m)\n\nprint(\"Multiple Linear Regression Model Evaluation:\")\nprint(f\"Mean Squared Error (MSE): {mse_m:.4f}\")\nprint(f\"R-squared (R2): {r2_m:.4f}\")\n\nprint(\"\\nCoefficients:\")\nfor feature, coef in zip(X_multi.columns, lr_multi.coef_):\n    print(f\"- {feature}: {coef:.4f}\")\nprint(f\"Intercept: {lr_multi.intercept_:.4f}\")\n\nprint(\"\\nComparison:\")\nprint(f\"R2 score improved from {r2_s:.4f} (simple) to {r2_m:.4f} (multiple).\")\nprint(\"This suggests that the additional predictors ('Cylinders' and 'Year') have improved the model's ability to explain the variance in 'Highway_mpg'.\")\nprint(\"\\n\")\n\nprint(\"--- Practical 6 execution finished ---\")