In [42]:
import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import Image



import warnings
warnings.filterwarnings('ignore')

**1. What is Simple Linear Regression?**

- Simple Linear Regression is a method to predict a numerical value (output) using one independent variable (input) by drawing a straight line that best fits the data.

- It assumes there‚Äôs a linear relationship between the two variables.

- We can think of like "if we know X, so can we predict Y using a straight line?"

- Example - Predicting Salary from Years of Experience

We want to predict salary based on years of experience.

The Linear Regression Line - 

Salary = 80,000 * (Years of Experience) + 1,70,000

So if someone has 6 years of experience:

Salary = 80,000 * 6 + 1,70,000 = ‚Çπ6,50,000

- One input variable (Years of Experience)

- One output variable (Salary)

- Predicts using a straight line equation: y = mx + c

**2. What are the key assumptions of Simple Linear Regression?**

- Scenario:
    - We're predicting Salary based on Years of Experience.

-  Linear Relationship
    - There should be a straight-line relationship between the input and output.
    - As experience increases, salary increases at a steady rate.
But if salary suddenly jumps or flattens after 3 years, the relationship isn‚Äôt linear.

- Independence of Errors
    - The errors (residuals) should not be related to each other.
    - If we collect salaries of employees from the same company team, their salaries might be connected this breaks the rule.
    - Each person‚Äôs data should be independent.

- Homoscedasticity (Equal Variance of Errors)
    - The spread of errors should be roughly the same across all levels of experience.
    - If prediction errors are small for juniors but huge for seniors, the model is inconsistent.


- Normality of Residuals
    - The errors (residuals) should be normally distributed.
    - After predicting salary, the leftover errors should follow a bell curve.
    - If most predictions are off in one direction (all too high or too low), that‚Äôs bad.

- No (or minimal) Outliers
    - No extreme values that can distort the line
    - If one person has 1 year of experience and earns ‚Çπ20 lakh, it will pull the line away from other points. 

**3. What does the coefficient m represent in the equation Y=mX+c ?**

- Equation :
    - Y = mX + c
- Y = predicted output
- X = input variable
- m = slope or coefficient
- c = intercept (starting value when X = 0)

- m tells you how much Y will change when X increases by 1 unit. It‚Äôs the rate of change.

- Example - Predicting Salary from Years of Experience
    - Let‚Äôs say:
    > Salary = 80,000 √ó Experience + 1,70,000
    - Here,
        - m = 80,000
        - c = 1,70,000

- Every time experience increases by 1 year, the salary increases by ‚Çπ80,000 So:
- 1 year ‚Üí ‚Çπ2,50,000
- 2 years ‚Üí ‚Çπ3,30,000
- 3 years ‚Üí ‚Çπ4,10,000
...and so on

- This 80,000 is the slope, showing how steep the line is ‚Äî or how fast salary grows with experience.

**4. What does the intercept c represent in the equation Y=mX+c?**

- Equation
> Y = mX + c

- Where:
    - Y = predicted output
    - X = input variable
    - m = slope (how much Y changes with X)
    - c = intercept

- c is the value of Y when X = 0.

- c shows where the regression line crosses the Y-axis.

- Let‚Äôs take the same salary prediction example:
> Salary = 80,000 √ó Experience + 1,70,000

- Here:
    - m = 80,000
    - c = 1,70,000

- When a person has 0 years of experience, their predicted starting salary is ‚Çπ1,70,000.
This is the base salary ‚Äî before any experience is added.

**5. How do we calculate the slope m in Simple Linear Regression?**

- Equation - Y = mX + c

- We want to calculate the slope m, which tells us how much Y changes for every 1-unit increase in X.

- Formula to Calculate the Slope m: 

![image.png](attachment:4eca3d2b-e993-4d53-8f30-1becf1639fa4.png)

![image.png](attachment:c74091fb-73a8-4c7d-baf1-75515e307cf9.png)

![slope.jpg](slope.jpg)

Slope (m) = 80,000 - Which means For every 1 extra year of experience, the salary increases by ‚Çπ80,000.

**6.  What is the purpose of the least squares method in Simple Linear Regression?**

Scenario: Predicting a Student's Exam Score Based on Study Hours

- Suppose a teacher wants to predict students' exam scores based on how many hours they study. She collects data from several students:

In [22]:
data = {'Study Hours (X)': [1, 2, 3, 4, 5],
       'Exam Score (Y)': [50, 55, 65, 70, 75]}

df = pd.DataFrame(data)
df

Unnamed: 0,Study Hours (X),Exam Score (Y)
0,1,50
1,2,55
2,3,65
3,4,70
4,5,75


The purpose of the least squares method is to find the best-fitting straight line through the data points so we can predict exam scores based on study hours.

We want to find a line in the form - Y=mX+b

That minimizes the total squared difference between actual scores and predicted scores

![Regression Image](reg.jpg)

This line predicts exam scores based on study hours.
E.g., if a student studies for 4 hours:

ùëå = 6.5(4) + 43.5 = 26+43.5 = 69.5 ‚âà 70

The least squares method helps find this line ‚Äî Y=6.5X+43.5

by minimizing the total squared error between the actual scores and the predicted ones.

So we can predict future exam scores accurately based on study hours.

**7.  How is the coefficient of determination (R¬≤) interpreted in Simple Linear Regression?**

- The coefficient of determination, denoted as R¬≤, tells us how well the regression line fits the data.

- R¬≤ measures how much of the variation in the dependent variable (Y) is explained by the independent variable (X).

- Its value ranges from 0 to 1.

In [80]:
Interpretation = {'R¬≤ Value': [0, 0.50, 1], 
                 'Interpretation': ['The model explains none of the variation in Y',
                                   'The model explains 50% of the variation in Y',
                                   'The model explains 100% of the variation in Y (perfect fit)']}
df = pd.DataFrame(Interpretation)
df

Unnamed: 0,R¬≤ Value,Interpretation
0,0.0,The model explains none of the variation in Y
1,0.5,The model explains 50% of the variation in Y
2,1.0,The model explains 100% of the variation in Y ...


Example (House Price):
- If we're predicting house price based on square footage, and R**2 = 0.85
- This means 85% of the variation in house prices is explained by square footage.
- The remaining 15% is due to other factors (e.g., location, age, condition).
- R¬≤ tells us how well your model explains the data ‚Äî higher R¬≤ means better fit.  

**8.  What is Multiple Linear Regression?**

Multiple Linear Regression (MLR) is a statistical technique used to predict the value of a dependent variable (Y) based on two or more independent variables (X‚ÇÅ, X‚ÇÇ, X‚ÇÉ, ...).

- Example - We'are  trying to predict a house price (Price) based on
    - Size of the house in square feet
    - Number of bedrooms
    - Age of the house
- Then the equation might look like: 

![image.png](attachment:af1f99e4-40ac-4241-a401-37f0d6ce0e79.png)

Here, We're using multiple factors to estimate the house price ‚Äî not just one like in simple linear regression.

- Multiple Linear Regression models the relationship between one dependent variable and two or more independent variables.
- Equation Format -

![image.png](attachment:95b898f5-32e4-4d07-b12e-d4cf20e11d03.png)

- Purpose - It‚Äôs used to predict outcomes more accurately when multiple factors influence the result.
- Real Use Cases:
    - Predicting salary based on education, experience, and age
    - Estimating sales using ad budget, season, and number of stores
    - Predicting health risk using BMI, age, and blood pressure
- Key Benefit:
    - It helps to understand the combined effect of several variables on the target ‚Äî giving better prediction and insight than simple linear regression.

**9. What is the main difference between Simple and Multiple Linear Regression?**

- Simple Linear Regression: One input - One output
- Multiple Linear Regression: Many inputs - One output
- Simple Linear Regression has Only one Number of Independent Variables (X).
- Multiple Linear Regression has Two or more Independent Variables (X).
- Simple Linear Regression example -
    - Predicting score based on study hours
- Multiple Linear Regression example -
    - Predicting score based on study hours, sleep hours, and class attendance 

**10.  What are the key assumptions of Multiple Linear Regression?**

Multiple Linear Regression (MLR) relies on several important assumptions:

- Linearity
    - The relationship between the dependent variable (Y) and each independent variable (X) should be linear.
    - Graphically, the data should form a straight-line trend (when other variables are held constant).
- Independence of Errors
    - The residuals (errors) should be independent of each other.
    - Especially important for time series data (no autocorrelation).
- Homoscedasticity (Constant Variance of Errors)
    - The spread (variance) of residuals should be constant across all levels of the independent variables.
    - If not, it may indicate heteroscedasticity ‚Äî which weakens prediction quality.
- Normality of Residuals
    - The errors (residuals) should be normally distributed, especially for hypothesis testing (like p-values, confidence intervals).
    - Check using a histogram or Q-Q plot of residuals.
- No Multicollinearity
    - Independent variables should not be highly correlated with each other.
    - High multicollinearity makes it hard to know which variable is truly influencing Y.
    - Detect using correlation matrix or Variance Inflation Factor (VIF). 

**11.  What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**

- Imagine we're predicting monthly spending (Y) of people based on their income (X).
    - You collect data from:
    - Low-income people
    - Middle-income people
    - High-income people 
- As income increases, the variation in spending also increases:

- Low-income people have similar spending ‚Äî very predictable
- High-income people vary a lot ‚Äî some save, some splurge
- So, error/residuals are small for low income, large for high income
-  This is heteroscedasticity: When the spread of residuals (errors) increases or decreases with the independent variable.

How does heteroscedasticity affect the model?
- Incorrect standard errors
    - This can lead to wrong p-values and confidence intervals
- Unreliable hypothesis tests
    - We might think a variable is significant when it‚Äôs not (or vice versa)
- Less efficient estimates
    - Coefficients are still unbiased, but not the "best" (i.e., they don‚Äôt have the smallest variance)
- Model diagnostics become misleading
    - Harder to trust R¬≤, t-tests, etc.

**12. How can you improve a Multiple Linear Regression model with high multicollinearity?**

- Suppose we're building a multiple linear regression model to predict house prices using these features:
    - X‚ÇÅ: House size in square feet
    - X‚ÇÇ: Number of bedrooms
    - X‚ÇÉ: Number of bathrooms
- What might happen?
    - Larger houses often have more bedrooms and more bathrooms. So:
    - X‚ÇÅ (Size), X‚ÇÇ (Bedrooms), and X‚ÇÉ (Bathrooms) are strongly correlated with each other
    - This creates multicollinearity ‚Äî where independent variables are highly related to each other.
- Why is this a problem?
    - It makes it hard to tell which variable is actually affecting the house price
    - The model becomes unstable ‚Äî small changes in data can change coefficients a lot
    - p-values and confidence intervals become unreliable
- How to improve a model with high multicollinearity?
    -  Remove one of the correlated variables
        - If two variables are very similar (e.g., bedrooms and house size), remove one
        - Example: Keep house size, drop bedrooms
    -  Combine variables
        - Create a new feature from the correlated ones
        - Example: Combine bedrooms + bathrooms into one feature like total rooms
    -  Use dimensionality reduction (PCA)
        - Use Principal Component Analysis (PCA) to transform correlated variables into uncorrelated components
        - More common in advanced models
    -  Use Regularization (Ridge or Lasso Regression)
        - These techniques reduce the impact of multicollinearity by adding a penalty to large coefficients
        - Ridge: Keeps all variables but shrinks them
        - Lasso: Can eliminate some variables automatically
    -  Check Variance Inflation Factor (VIF)
        - Use VIF to detect which variables are causing multicollinearity
        - If VIF > 5 or 10, that variable is likely a problem

**13. What are some common techniques for transforming categorical variables for use in regression models?**

- Suppose we're building a regression model to predict used car prices.
    - One of your columns is:
        - Fuel Type: Petrol, Diesel, CNG
        - This is a categorical variable ‚Äî it's text, not numeric ‚Äî but regression models require numerical input.
        - So we need to transform it.

Common Techniques to Transform Categorical Variables:

- One-Hot Encoding
    - Creates a new column for each category with 1 or 0 (present or not)
    - For Fuel Type
    - Used when categories don‚Äôt have any natural order
    - Use this in most regression cases
- Label Encoding
    - Assigns a unique number to each category.
    - Not ideal for regression unless the variable has ordinal meaning
(because the model might assume "Diesel > Petrol" just because 1 > 0)
- Ordinal Encoding
    - Used when the categories have a natural order
    - Example: Education: High School < Bachelor's < Master's < PhD
    - Works well if higher categories truly represent increasing levels of impact
- Target Encoding (advanced)
    - Replace each category with the mean of the target variable for that category
    - If average car price for:
        - Petrol cars = ‚Çπ4,00,000
        - Diesel cars = ‚Çπ6,00,000
        - CNG cars = ‚Çπ3,50,000
        - So we should be careful it Can cause data leakage if not used properly (use cross-validation)

**14. What is the role of interaction terms in Multiple Linear Regression?**

- Suppose we're building a model to predict sales based on
    - TV advertising budget (X‚ÇÅ)
    - Online advertising budget (X‚ÇÇ)

- We might think both independently affect sales ‚Äî more ads = more sales.
- But what if sales increase more than expected when both TV and online ads are used together?
- That‚Äôs where interaction comes in ‚Äî the effect of one variable depends on another.

- An interaction term in multiple linear regression captures the idea that The combined effect of two variables is different from the sum of their individual effects.

Role of Interaction Terms in Regression:
- Capture relationships that aren‚Äôt additive
    - Sometimes, one feature amplifies or weakens another's effect.

- Improve model accuracy
    - You catch patterns that would be missed with only individual terms.

- Make interpretation richer
    - Helps in understanding how two variables work together to impact the outcome.
 
- Useful for decision making
    - e.g., Don‚Äôt just spend on online ads ‚Äî they only boost sales when paired with TV ads.

- Common in real-world fields
    - Marketing: TV √ó Online
    - Medicine: Drug dosage √ó Age
    - Education: Study hours √ó Sleep

**15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**

- Suppose We're building a model to predict monthly salary (Y) based on:
    - X‚ÇÅ: Years of experience
    - X‚ÇÇ: Education level (e.g., High School, Bachelor's, Master's ‚Äî encoded as 1, 2, 3)
- In Simple Linear Regression:
    - We only use 1 predictor, say Years of experience.
    - Salary = b0 + b1 x (Experience)
    - Intercept(b0):
        -  Meaning: Salary when experience = 0
        -  Example: If b0 = ‚Çπ15,000, it means someone with 0 years of experience is expected to earn ‚Çπ15,000/month.
        -  In simple linear regression, the intercept is easy to interpret because there's only one variable.
- In Multiple Linear Regression:
    - We use multiple predictors, e.g.:
        - Salary = b0 + b1 X (Experience) + b2 X (Education Level)
    -  Intercept(b0):
        - Meaning: Salary when Experience = 0 AND Education Level = 0
        - But in real life, education level = 0 might not make sense (it wasn‚Äôt even in the category list)
        - In multiple regression, the intercept is harder to interpret ‚Äî it represents the outcome when all predictors are zero, which might be unrealistic or meaningless.
- Difference:
    - In simple linear regression, the intercept shows the outcome at X = 0
    - In multiple linear regression, it shows the outcome when all predictors are 0 ‚Äî which might not always make sense in real life.   

**16.  What is the significance of the slope in regression analysis, and how does it affect predictions?**

- Suppose we're building a simple linear regression model to predict a student's exam score based on hours studied:
- Score = b0 + b1 x (Study¬†Hours)
- Let‚Äôs say the model gives - Score = 40 + 6 X (Study Hours)
- The slope (b‚ÇÅ) is 6

Significance of the Slope:
- The slope tells us For every 1-hour increase in study time, the predicted exam score increases by 6 points.
- 2 Hours - 40 + 6 x 2 = 52
- 4 Hourse - 40 + 6 x 4 = 64

How it affects predictions:
- The steeper the slope, the greater the impact of X on Y.
- If the slope is 0 ‚Üí X has no effect on Y.
- If the slope is large ‚Üí A small change in X leads to a big change in Y.

The slope in regression analysis represents the rate of change of the dependent variable (Y) for each unit increase in the independent variable (X), directly affecting how predictions are made.

**17. How does the intercept in a regression model provide context for the relationship between variables?**

- Suppose we're building a simple linear regression model to predict a person‚Äôs monthly salary (Y) based on their years of experience (X).

- our model gives: Salary=20,000+5,000√ó(Experience)
- Interpretation of the Intercept:
    - The intercept here is ‚Çπ20,000.
    - That means: When experience = 0, the predicted salary is ‚Çπ20,000.
    - It represents the starting point ‚Äî what someone with zero experience is expected to earn.
- The intercept in a regression model provides the baseline value of the outcome when input variables are zero, helping us understand where the model starts and giving context to how the independent variables affect the dependent one.

**18.  What are the limitations of using R¬≤ as a sole measure of model performance?**

Suppose we're building a model to predict house prices based on:
- Size of the house
- Number of rooms
- Distance from city center
- Color of the front door

After training the model, we find the R**2 = 0.95 - which means 95% of the variation in house prices  is explained by your model. But here's the catch:

- That "front door color" is a random feature ‚Äî it has no actual impact on price.
- Because R¬≤ increases with more variables, adding it artificially boosts R¬≤.
- So your model looks good (high R¬≤), but it may perform poorly on new data or make bad predictions.

Key Limitations of Using R¬≤ Alone:

- High R¬≤ doesn‚Äôt mean accurate predictions ‚Äì the model can fit the training data well but still make poor predictions on new data.
- R¬≤ increases with more variables, even irrelevant ones ‚Äì leading to overfitting.
- R¬≤ doesn‚Äôt show model bias ‚Äì our model might consistently overestimate or underestimate, but R¬≤ won‚Äôt show that.
- Not suitable for non-linear relationships ‚Äì R¬≤ assumes linear fit and may mislead for complex patterns.
- No information about error size ‚Äì R¬≤ doesn‚Äôt tell we how far off our predictions are (use RMSE or MAE for that).

**19. How would you interpret a large standard error for a regression coefficient?**

Imagine we're analyzing how TV ad spending affects product sales using regression.

We build a regression model like:

In [None]:
Sales = Œ≤‚ÇÄ + Œ≤‚ÇÅ √ó TV_Ad_Spending + error

Suppose the result shows:
- Œ≤1 (coefficient for TV ads) = 5
- Standard Error of Œ≤1 = 4.8
- This means the estimate of the effect of TV ads on sales has a high uncertainty. The true effect could be much lower or higher than 5 ‚Äî maybe even zero
- So, even though we found a coefficient, the large standard error says: ‚Äú I‚Äôm not confident about this number.‚Äù

Key Interpretations of a Large Standard Error:
- Unreliable Coefficient Estimate ‚Äì A large standard error means the model isn‚Äôt sure about the true value of the coefficient.
- Low Precision ‚Äì our predictor (e.g., TV ads) might not have a consistent effect on the outcome, leading to variability in estimates.
- Possibly Insignificant Predictor ‚Äì The variable may not be strongly related to the output; its effect might be due to chance.
- Could Indicate Multicollinearity ‚Äì If two or more predictors are highly correlated, standard errors can inflate.
- Wider Confidence Interval ‚Äì A large standard error results in a wide confidence interval, meaning the true value is uncertain and possibly includes zero.

**20.  How can heteroscedasticity be identified in residual plots, and why is it important to address it?**

Suppose we're analyzing how study hours affect exam scores using a regression model.
- After plotting the residuals vs predicted scores, we notice
- For students who studied less, the prediction errors (residuals) are small and close together.
- For students who studied a lot, the errors are widely spread ‚Äî some very positive, some very negative.
- This pattern creates a fan shape in the residual plot ‚Äî small spread - increasing spread.
- That‚Äôs heteroscedasticity: the variance of residuals is not constant.

Heteroscedasticity:
- How to Identify It ‚Äì In a residual vs predicted plot, it appears as a funnel/fan shape (spread of residuals increases or decreases).
- Why It Matters ‚Äì It violates a key regression assumption: constant variance of errors (homoscedasticity).
- Effect on Inference ‚Äì It makes standard errors of coefficients unreliable, which can lead to incorrect p-values and confidence intervals.
- Can Mask True Relationships ‚Äì It may hide or distort the true effect of predictors on the outcome.
- How to Address It ‚Äì Use techniques like transforming variables (e.g., log scale), or apply robust standard errors or models like weighted least squares.

**21.  What does it mean if a Multiple Linear Regression model has a high R¬≤ but low adjusted R¬≤?**

Suppose we're building a regression model to predict car price using these variables:
- Age of the car
- Mileage
- Engine size
- Number of cup holders
- Color of floor mats
- Shape of gear knob

After fitting the model:
- R¬≤ = 0.92 (seems excellent!)
- But Adjusted R¬≤ = 0.45

- R¬≤ increases whenever we add more variables ‚Äî even useless ones like "shape of gear knob."
- Adjusted R¬≤ penalizes the model for adding unnecessary variables that don‚Äôt improve prediction.

So, our model looks great based on R¬≤, but Adjusted R¬≤ is warning: ‚Äúwe‚Äôve added irrelevant junk just to boost R¬≤.‚Äù


High R¬≤ but Low Adjusted R¬≤ :
- R¬≤ can be misleading ‚Äì It always increases with more predictors, even if they add no real value.
- Adjusted R¬≤ accounts for model complexity ‚Äì It adjusts R¬≤ down if added variables don‚Äôt truly improve the model.
- Large gap suggests overfitting ‚Äì You may be adding variables that fit noise rather than signal.
- Low Adjusted R¬≤ = low real explanatory power ‚Äì It means the model isn't actually as good as R¬≤ claims.
- Fix by removing irrelevant predictors ‚Äì Use feature selection to keep only meaningful variables.

**22. Why is it important to scale variables in Multiple Linear Regression?**

Suppose we're building a regression model to predict salary using:
- Years of experience (ranges from 1 to 20)
- Age (ranges from 22 to 60)
- Monthly sales revenue (ranges from ‚Çπ10,000 to ‚Çπ10,00,000)
- These variables have very different scales.
- If we don‚Äôt scale them:
    - The variable with the largest values (monthly revenue) will dominate the model.
    - The coefficients become harder to interpret or compare.
    - It can also confuse optimization algorithms, especially when regularization is used.


Importance of Scaling:
- Avoids One Variable Dominating ‚Äì Large-scale variables can overpower smaller-scale ones in coefficient calculation.
- Improves Interpretability ‚Äì Scaling puts all features on a similar scale, making coefficients more comparable.
- Necessary for Regularization (e.g., Ridge/Lasso) ‚Äì Without scaling, penalization terms become biased toward large-scale variables.
- Stabilizes Gradient Descent ‚Äì In large datasets, scaling helps optimization converge faster and more reliably.
- Prevents Numerical Errors ‚Äì Very large or very small values can cause computational issues or inaccurate results.

**23. What is polynomial regression?**

Imagine we‚Äôre analyzing how a car‚Äôs speed affects its fuel efficiency (km/l).
- At low speeds, fuel efficiency increases.
- At medium speeds, it's highest.
- At high speeds, it drops again due to air resistance.
- This forms a curve, not a straight line.
- A Linear Regression would try to fit a straight line ‚Äî and miss the actual pattern.
Instead, Polynomial Regression fits a curved line like:

In [128]:
y = Œ≤0 + Œ≤1x + Œ≤2x**2 + ... + Œ≤nx**n

That‚Äôs how we capture the true, non-linear relationship.

- Models Non-Linear Relationships ‚Äì It fits a curve (instead of a straight line) to handle complex patterns in data.
- Extends Linear Regression ‚Äì It adds higher-degree terms (like x¬≤, x¬≥) to the model but still uses linear coefficients.
- More Flexible Fit ‚Äì Can capture U-shapes, curves, and waves that linear models can‚Äôt.
- Risk of Overfitting ‚Äì Higher-degree polynomials may fit training data too well but perform poorly on new data.
- Used in Real-Life Scenarios ‚Äì Ideal for patterns like growth curves, pricing trends, or physical behaviors (e.g., speed vs fuel efficiency).

**24. How does polynomial regression differ from linear regression?**

Polynomial vs Linear Regression:
- Model Shape
    - Linear Regression fits a straight line (1st-degree).
    - Polynomial Regression fits a curve (2nd-degree or higher).
- Type of Relationship Captured
    - Linear: Captures only linear (straight-line) relationships.
    - Polynomial: Captures non-linear relationships between variables.
- Features Used
    - Linear: Uses original features (e.g., ùë•) only.
    - Polynomial: Uses transformed features
- Model Complexity
    - Linear: Simpler, easier to interpret.
    - Polynomial: More complex, may overfit if degree is too high.
- Equation Form -  

![image.png](attachment:2ce787ef-529a-4343-b126-bc9b8b6f35b5.png)

**25. When is polynomial regression used?**

Polynomial Regression is Used When:
- The Relationship is Non-Linear
    - When a straight line doesn't fit the data well, but the curve still follows a smooth, predictable pattern.
- There‚Äôs a Clear Curve in the Data
    - For example, U-shape or inverted U-shape trends (like fuel efficiency vs speed).
- Residuals Show a Pattern
    - After linear regression, if residuals aren't randomly scattered (e.g., they form a curve), it's a sign to try polynomial regression.
- Need for Better Fit without Complex Models
    - When you want a more flexible model than linear regression but simpler than advanced ML models (like trees or neural networks).
- Modeling Physical or Natural Phenomena
    - Useful in real-world cases like projectile motion, growth rates, or economic trends that naturally follow curved paths.

**26.  What is the general equation for polynomial regression?**

The general equation for Polynomial Regression of degree n is:

![image.png](attachment:9d5a52db-239a-4653-a6f3-e9bc8cb22442.png)

![image.png](attachment:f9f79e77-e113-4d56-98f6-f79ab1913363.png)

**27. Can polynomial regression be applied to multiple variables?**

Ans - Yes

- Polynomial regression can be extended to multiple variables, allowing the model to learn non-linear relationships between two or more predictors and the target variable.

- It includes higher-degree terms (e.g., squares, cubes) and interaction terms (e.g., product of two variables) for capturing complex dependencies.

- The model remains linear in terms of coefficients, even though it fits a non-linear surface in the feature space.

- The number of features increases rapidly with the polynomial degree and number of variables, making the model more complex.

- Scaling and regularization are often necessary to ensure stable training and to reduce the risk of overfitting.

**28. What are the limitations of polynomial regression?**

- Overfitting - High-degree polynomials can model noise in the data, reducing generalization to new/unseen data.

- Increased Model Complexity - As the polynomial degree increases, the number of terms grows rapidly, making the model harder to interpret and computationally heavy.

- Extrapolation is Unreliable- Predictions outside the range of the training data can become extremely inaccurate due to sharp curve behavior.

- Sensitive to Outliers - Outliers can significantly distort the shape of the polynomial curve, affecting the overall model performance.

- Multicollinearity Risk Polynomial features (like - x, x**2, x**3) can be highly correlated, which may lead to unstable coefficient estimates.

**29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

- Cross-Validation (e.g., K-Fold Cross-Validation) - Helps assess how well the model generalizes to unseen data and prevents overfitting.

- Mean Squared Error (MSE) or Root Mean Squared Error (RMSE) - Measures the average squared difference between predicted and actual values; lower values indicate better fit.

- Adjusted R**2 - Adjusts R ** 2 for the number of predictors; helps penalize unnecessary complexity from high-degree polynomials.

- Validation Curve Analysis - Plotting model performance (e.g., MSE) across different polynomial degrees to identify the optimal complexity.

- Regularization Techniques (e.g., Ridge, Lasso) - Used in combination with polynomial features to control overfitting while still modeling non-linearity.

**30. Why is visualization important in polynomial regression?**

- Reveals Non-Linear Relationships - Helps visually confirm whether a curved (non-linear) pattern exists between the features and the target variable.

- Detects Overfitting or Underfitting - Visual plots can show if the model is too wiggly (overfitting) or too flat (underfitting), which numerical metrics alone may not reveal clearly.

- Improves Model Interpretation - Visualization aids in understanding how the model behaves across the input range and how each term contributes to the overall fit.

- Assists in Degree Selection - By plotting polynomial curves of different degrees, one can choose the best balance between bias and variance.

- Validates Residual Patterns - Visualizing residuals helps detect issues like non-constant variance or patterns that the model has failed to capture.

**31. How is polynomial regression implemented in Python?**

- Import Required Libraries - Use libraries like numpy, pandas, matplotlib for data handling and visualization, and scikit-learn for modeling.

- Prepare the Data - Split your dataset into input features (X) and target variable (y), then optionally divide into training and testing sets.

- Generate Polynomial Features - Use PolynomialFeatures from sklearn.preprocessing to create polynomial and interaction terms based on the desired degree.

- Fit the Model - Use LinearRegression from sklearn.linear_model to fit the model on the transformed polynomial features.

- Evaluate the Model - Use metrics like R¬≤, MSE, or cross_val_score to assess performance and avoid overfitting.