Skip to content

Polynomial Regression is a statistical method used to model a relationship between a dependent variable y and an independent variable x as an n^th degree polynomial. Unlike linear regression, which fits a straight line, polynomial regression fits a curve to the data points.

Notifications You must be signed in to change notification settings

Golisathwik/Polynomial-Regression-model-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Polynomial-Regression-model-implementation

Polynomial Regression is a statistical method used to model a relationship between a dependent variable y and an independent variable x as an n^th degree polynomial. Unlike linear regression, which fits a straight line, polynomial regression fits a curve to the data points.


๐Ÿ“ˆ Polynomial Regression (Non-Linear Data)

While Linear Regression fits a straight line, Polynomial Regression is used when the data shows a curve or non-linear pattern. It models the relationship between the independent variable ($x$) and dependent variable ($y$) as an $n$-th degree polynomial.

๐Ÿ”น Mathematical Model

For a polynomial of degree $n$, the equation becomes:

$$y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \dots + \beta_n x^n$$

Where:

  • $\beta_0$: The Intercept.
  • $\beta_1, \beta_2, \dots$: The Coefficients for each power of $x$.
  • $n$: The Degree (1 = Line, 2 = Parabola/Curve, 3 = S-shape).
Image

๐Ÿง  Understanding Degrees: The "Bends" Analogy

In Polynomial Regression, the Degree ($n$) determines the flexibility of the model. A simple way to visualize this is to think of the degree as the number of "bends" the line is allowed to make.

1. The Differences in "Bends"

Degree Shape Bends Description
1 (Linear) Straight Line 0 A rigid straight line. It can only go up or down at a constant rate.
2 (Quadratic) Parabola (U-Shape) 1 It can change direction once (e.g., go up then down, like a ball thrown in the air).
3 (Cubic) S-Shape 2 It can change direction twice (e.g., go up, then down, then back up).
$n$ (High) Wavy / Complex $n-1$ A very wiggly line that can twist and turn as many times as needed.

2. When do we use which?

The goal of Machine Learning is not to connect every single dot. The goal is to find the general pattern so we can predict new data accurately.

โœ… Use Degree 2 (Quadratic)

Best when the data has a simple curve or a single peak/valley.

  • Example: Fuel Efficiency vs. Speed. (Efficiency goes up as you speed up, peaks at 60mph, then drops as you go faster).

โœ… Use Degree 3 or 4 (Cubic/Quartic)

Best when the pattern is more complex with multiple fluctuations.

  • Example: Electricity Usage over a Day. (Low at night, high in the morning, dips in the afternoon, high again in the evening).

โŒ Use Degree $n$ (High Complexity)

Almost Never.

โš ๏ธ The Danger of High Degrees (Overfitting) Imagine a model with Degree 20. It has 19 bends available. It is so flexible that it will wiggle frantically to pass through every single data point perfectly.

While it gets 100% accuracy on the training data, it fails miserably on new data because it learned the "noise" instead of the actual pattern. This is called Overfitting.


๐Ÿงฎ Solved Example: Degree 2 (Quadratic)

Objective: Find the best-fit curve for the following data points which follow a non-linear trend.

Data Points: $(1, 1), (2, 4), (3, 9), (4, 15)$

Since the data curves upwards, we use a Quadratic Equation (Degree 2): $$y = a_0 + a_1 x + a_2 x^2$$

Step 1: The Calculation Table

To solve for the coefficients ($a_0, a_1, a_2$), we calculate the sums ($\Sigma$) of the powers of $x$ and $y$.

$x$ $y$ $xy$ $x^2$ $x^2y$ $x^3$ $x^4$
1 1 1 1 1 1 1
2 4 8 4 16 8 16
3 9 27 9 81 27 81
4 15 60 16 240 64 256
Sum ($\Sigma$) 29 96 30 338 100 354

Step 2: The Normal Equations (Matrix Form)

We arrange these sums into the matrix equation $X \cdot A = B$ to solve for the unknown coefficients ($A$).

Normal Equation Matrix Formula

Substituting the values from our table ($n=4$):

Normal Equation Matrix Substituted Values

Step 3: Final Model Solution

By solving the matrix equation (calculating $X^{-1} \cdot B$), we get the optimal values:

$$ \begin{bmatrix} a_0 \ a_1 \ a_2 \end{bmatrix} = \begin{bmatrix} -0.75 \ 0.95 \ 0.75 \end{bmatrix} $$

The Final Best-Fit Equation: $$y = -0.75 + 0.95x + 0.75x^2$$


How to Run the Model

Follow these steps to set up the environment and run the Polynomial Regression model on your local machine.

1. Prerequisites

Ensure you have Python installed. You will need the following libraries:

pip install pandas matplotlib scikit-learn

2. Project Structure

Keep both files in the same directory:

  • polynomial_regression.py (The main logic)
  • predict.csv (The training dataset)

3. Execution

Open your terminal or command prompt, navigate to the folder, and run:

python polynomial_regression.py

4. Interactive Prediction

Once the script runs, it will ask for input:

  1. Enter Hours: The program will prompt you to enter the number of study hours (e.g., 5).
  2. Output: It will display the predicted marks in the console.
  3. Visualization: A graph will pop up showing the relationship between Study Hours and Marks.

๐Ÿ› ๏ธ Development Logic: Step-by-Step

Here is the breakdown of how the model was developed to predict Student Marks based on Study Hours.

Step 1: Data Loading & Preprocessing

We use Pandas to load the dataset.

  • Action: Read predict.csv into a dataframe.
  • Dataset columns: Hours (Independent Variable ) and Marks (Dependent Variable ).
  • Reshaping: The input Hours is reshaped into a 2D array ([[...]]) because Scikit-Learn expects a matrix format.

Step 2: Polynomial Transformation

The relationship between study hours and marks isn't a straight line (marks plateau as hours increase). To fit this curve, we use PolynomialFeatures.

  • Logic: We convert the single feature Hours into a polynomial set: Hours, Hours^2.
  • Code Concept:
poly = PolynomialFeatures(degree=2)
x_poly = poly.fit_transform(x)

  • Before: [5 hours]
  • After: [5, 25] (This allows the model to learn the curved pattern).

Step 3: Training the Model

We fit a standard Linear Regression model on the transformed data.

  • Math: The model learns how both the raw hours and the squared hours affect the marks ().

Step 4: Making a Prediction

When the user enters a value (e.g., 8 hours):

  1. Transform: The code converts 8 into [8, 64].
  2. Predict: The model calculates the likely marks using these values.

Step 5: Visualization

Finally, we use Matplotlib to visualize the result.

  • Scatter Plot: Shows the actual student data (Hours vs Marks).
  • Curve Line (Red): Shows the polynomial regression curve, demonstrating how the model fits the data better than a straight line.

About

Polynomial Regression is a statistical method used to model a relationship between a dependent variable y and an independent variable x as an n^th degree polynomial. Unlike linear regression, which fits a straight line, polynomial regression fits a curve to the data points.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages