<a href="https://colab.research.google.com/github/cedamusk/AI-N-ML/blob/main/polynomial_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import files
uploaded=files.upload()

## Libraries and Modules
1. `numpy`: Provides support for numerical operations on arrays and matrices. Used for creating and manipulating numerical data in arrays.

2. `pandas`: A data manipulation and analysis library. Used for loading, cleaning and organizing the dataset into a structures format like a DataFrame.

3. `matplotlib.pyplot`: A plotting library for creating visualizations. Used for generating scatter plots, regression curves, and other data visualizations.

4. `sklearn.preprocessing.PolynomialFeatures`: Part of `scikit-learn`. Generates polynomial features from the original input data. For example, if the input is X, it generates X, X^2, X^3... based on the specified polynomial degree.

5. `sklearn.linear_model.LinearRegression`: Provides the implementation of linear regression. Used for fitting a linear regression model to the transformed polynomial features.

6. `sklearn.model_selection.train_test_split`: Splits the dataset into training and testing subsets. Ensures the model is trained on one part of the data and validated on another, reducing overfitting.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split


## Read the dataset
1. `df=pd.read_csv('synthetic_polynomial_data.csv')`: Loads the dataset from the file into a pandas DataFrame named `df`.

2. `print("First few rows")`: Displays a title indicating the intention to show the first few rows of the dataset.

3. `print(df.head())`: Prints the first 5 rows of the DataFrame (`df`) to give a quick look at the data structure and its initial entities.

4. `print("\nDataset info:")`: Prints a title indicating the intention to display descriptive statistics about the dataset.

5. `print(df.describe())`: Displays summary statistice of the numerical column in the DataFrame such as:


*   **Count**: Number of non-missing entries.
*   **Mean**: Average value
*   **Std**: Standard Deviation
*   **Min/Max**: Minimum and Maximum values.
*   **25%, 50%, 75%**: Percentile values.



In [None]:
df=pd.read_csv('synthetic_polynomial_data.csv')
print("First few rows")
print(df.head())
print("\nDataset info:")
print(df.describe())

## Data Preparation
This code prepares the data for use in the polynomial regression model by extracting features (`X`) and target (`y`) values.

1. `X=df['X'].values`: Extracts the `X` column from the DataFrame (`df`) as a NumPy array using the `.values` attribute. This represents the input (independent variable) for the regression model.
2. `.reshape(-1, 1)`: Reshapes the 1D array of `X` into a 2D array with one column and as many rows as needed. This is necessary because `scikit-learn` expects th input to have a 2D shape for features (e.g., `(n_samples, n_features)`)

3. `y=df['Y'].values`: Extracts the `Y` column as a NumPy array. Represents the output (dependent variable) for the regression model.

##Purpose
`X`: A 2D array of independent variable, suitable for use in `scikit-learn` models.
`y`: A 1D array of the dependent variable, ready for regression fitting and evaluation.

In [None]:
X=df['X'].values.reshape(-1,1)
y=df['Y'].values

## Initialize settings
This code initializes variables and settings to evaluate polynomial regression models with different degrees.

1. `degrees=[2,3,4]`: Defines a ist of polynomial degress to explore: quadratic(2), cubic(3), and quartic(4).

2. `best_r2=-np.inf`: Initializes `best_r2` with a very low value (-infinity) to track the best R^2 score across all evaluated models.
3. `best_degree=2`: Initializes `best_degree` with 2 (quadratic). This variable will store the degree of the polynomial model with the highest R^2 score.

4. `best_model=None`: Initializes `best_model` as `None`. This will store the polynomial feature transformation corresponding to the best performing model.

##Purpose
The code setups the foundation for iteratively training polynomial regression models for different degrees, keeping track of the best performing model based on the R^2 score, and identifying the optimal polynomial degree associated model/feature transformation.

In [None]:
degrees=[2,3,4]
best_r2=-np.inf
best_degree=2
best_model=None
best_poly_features=None



## Plot
This code initializes a new figure for creating a plot with specific dimensions using Matplotlib.
1. `plt.figure()`: Creates a new figure for plotting. Ensures any plots created after this line appear within this figure.
2. `figsize=(15, 10)`: Sets the size of the figure in inches: 15 inches wide and 10 inches tall. Larger dimensions provide more space, making plots easier to read and interpret.

In [None]:
plt.figure(figsize=(15, 10))

## Polynomial regression for different degrees
This code iteratively fits polynomial regression models for differet degrees, evaluates them, and visualizes the results.

1.`for i, degree in enumerate(degrees, 1):` : Loops through each polynomial degree in the `degrees` list (`[2,3,4]`). `i` is the subpot index (starts from 1).

2. `poly_features=PolynomialFeatures(degree=degree, include_bias=False)`: Creates polynomial features for the current degree.  `include_bias=False` ensures no constant term (bias) is added explicitly.

3. `X_poly=poly_features.fit_transform(X)`: Transform the input data `X` into polynimial features for the current degree.

4. `model=LinearRegression()`: Initializes a linear regression model.

5. `model.fit(X_poly, y)`: Trains the linear regression model using the polynomial features (`X_poly`) and target values (`y`).

6. `y_pred=model.predict(X_poly)`: Predicts `y` values using the trained model.

7. `r2=r2_score(y, y_pred)` and `mse=mean_squared_error(y, y_pred)`: Calculate the R^2 score and Mean Squared Error (MSE) to evaluate the model's performance.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
!pip install seaborn
import seaborn as sns

#Set the style
with sns.axes_style('darkgrid'):
  plt.rcParams['font.family']='sans-serif'
  plt.rcParams['font.sans-serif']=['Arial']

#Create a figure with adjusted size and spacing
fig=plt.figure(figsize=(12, 4*len(degrees)))
fig.subplots_adjust(hspace=0.4)

#Color palette for different degrees
colors=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEEAD', '#D4A5A5']

for i, degree in enumerate(degrees, 1):
  poly_features=PolynomialFeatures(degree=degree, include_bias=False)
  X_poly=poly_features.fit_transform(X)
  model=LinearRegression()
  model.fit(X_poly, y)
  y_pred=model.predict(X_poly)
  r2=r2_score(y, y_pred)
  mse=mean_squared_error(y, y_pred)

  #Update the model if necessary
  if r2>best_r2:
    best_r2=r2
    best_degree=degree
    best_model=model
    best_poly_features=poly_features

  #Create subplot with enhanced styling
  ax=plt.subplot(len(degrees), 1, i)

  #Set background color
  ax.set_facecolor('#f9f9fa')

  #plot scatter points with enhanced appearance
  plt.scatter(X, y, color='#2C3E50', alpha=0.6, label='Actual data', edgecolor='white', s=80)

  #Create smooth curve for polynomial fit
  X_range=np.linspace(X.min(), X.max(), 300).reshape(-1,1)
  X_range_poly=poly_features.transform(X_range)
  y_range_pred=model.predict(X_range_poly)

  #Plot the polynimials curve with custom color
  plt.plot(X_range, y_range_pred, color=colors[i% len(colors)],
           label=f'Polynomial degree{degree}', linewidth=2.5)

  #Enhanced title and labels
  plt.title(f'Polynomial Regression(Degree {degree})\n$R^2={r2:.4f}$, MSE={mse:.4f}',
            pad=20, color='#2C3E50', fontsize=12, fontweight='bold')

  plt.xlabel("X", color='#2C3E50', fontsize=10, fontweight='bold')
  plt.ylabel("Y", color='#2C3E50', fontsize=10, fontweight='bold')

  legend=plt.legend(frameon=True, fancybox=True, shadow=True)
  legend.get_frame().set_facecolor('white')

  plt.grid(True, alpha=0.3, linestyle='--')

  ax.spines['top'].set_visible(False)
  ax.spines['right'].set_visible(False)
  ax.spines['left'].set_color('#2C3E50')
  ax.spines['bottom'].set_color('#2C3E50')

plt.tight_layout()









In [None]:
plt.tight_layout()
plt.show()

In [None]:
print(f'\nbest polynomial degree: {best_degree}')
print(f"best r2 score: {best_r2:.4f}")

## Best Model
The code generats a polynomial equation string from the coefficeints of the `best_model`, which was determined in the earlier loop as the best-fit polynomial regression model.

In [None]:
coefficients=best_model.coef_
equation="Y ="
for i, coef in enumerate(coefficients):
  if i==0:
    equation += f"{coef:.4f}X"
  else:
    equation += f" + {coef:.4f}X^{i+1}"

In [None]:
print("\nBest fitting polynomial equation:")
print(equation)

In [None]:
future_years=np.array([[11], [12], [13]])
future_X_poly=best_poly_features.transform(future_years)
future_predictions=best_model.predict(future_X_poly)


In [None]:
print("\nPredictions or future X values:")
for year, pred in zip(future_years.flatten(), future_predictions):
  print(f"X={year}: Y={pred:.4f}")
