In [1]:
# Q1: Simple Linear Regression vs Multiple Linear Regression

**Simple Linear Regression:**
Simple linear regression involves one independent variable and one dependent variable. It aims to find the best-fit line that predicts the dependent variable based on the independent variable.

*Example:*
Predicting a person's weight based on their height.

```python
# Example of Simple Linear Regression
heights = np.array([150, 160, 170, 180, 190]).reshape(-1, 1)  # Heights in cm
weights = np.array([50, 60, 65, 70, 80])  # Weights in kg

# Create a linear regression model
simple_lr = LinearRegression()
simple_lr.fit(heights, weights)

# Predict weights based on heights
predicted_weights = simple_lr.predict(heights)

# Plot the results
plt.scatter(heights, weights, color='blue', label='Actual weights')
plt.plot(heights, predicted_weights, color='red', label='Predicted weights')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()


SyntaxError: unterminated string literal (detected at line 7) (3482848215.py, line 7)

In [2]:
# Example of Multiple Linear Regression
data = pd.DataFrame({
    'Height': [150, 160, 170, 180, 190],
    'Age': [22, 25, 30, 35, 40],
    'Weight': [50, 60, 65, 70, 80]
})

# Independent variables (Height and Age)
X = data[['Height', 'Age']]
# Dependent variable (Weight)
y = data['Weight']

# Create a multiple linear regression model
multiple_lr = LinearRegression()
multiple_lr.fit(X, y)

# Predict weights based on height and age
predicted_weights = multiple_lr.predict(X)

# Display the results
data['Predicted_Weight'] = predicted_weights
print(data)


NameError: name 'pd' is not defined

In [None]:
# Example of Multiple Linear Regression
data = pd.DataFrame({
    'Height': [150, 160, 170, 180, 190],
    'Age': [22, 25, 30, 35, 40],
    'Weight': [50, 60, 65, 70, 80]
})

# Independent variables (Height and Age)
X = data[['Height', 'Age']]
# Dependent variable (Weight)
y = data['Weight']

# Create a multiple linear regression model
multiple_lr = LinearRegression()
multiple_lr.fit(X, y)

# Predict weights based on height and age
predicted_weights = multiple_lr.predict(X)

# Display the results
data['Predicted_Weight'] = predicted_weights
print(data)


In [None]:
# Example of Multiple Linear Regression
data = pd.DataFrame({
    'Height': [150, 160, 170, 180, 190],
    'Age': [22, 25, 30, 35, 40],
    'Weight': [50, 60, 65, 70, 80]
})

# Independent variables (Height and Age)
X = data[['Height', 'Age']]
# Dependent variable (Weight)
y = data['Weight']

# Create a multiple linear regression model
multiple_lr = LinearRegression()
multiple_lr.fit(X, y)

# Predict weights based on height and age
predicted_weights = multiple_lr.predict(X)

# Display the results
data['Predicted_Weight'] = predicted_weights
print(data)


In [None]:

### Q4: Explain the concept of gradient descent. How is it used in machine learning?

```markdown
# Q4: Gradient Descent

**Gradient Descent** is an optimization algorithm used to minimize the cost function in machine learning models, particularly for linear regression.

*Concept:*
1. **Initialization**: Start with random values for the parameters (e.g., coefficients in linear regression).
2. **Compute Gradient**: Calculate the gradient of the cost function with respect to each parameter.
3. **Update Parameters**: Adjust the parameters in the opposite direction of the gradient by a certain step size (learning rate).
4. **Iterate**: Repeat steps 2 and 3 until the cost function converges to a minimum value.

*Usage in Machine Learning:*
Gradient descent is used to find the optimal parameters that minimize the cost function, thereby improving the model's accuracy.

```python
# Example of Gradient Descent for a simple linear regression problem
def gradient_descent(X, y, lr=0.01, iterations=1000):
    m = len(y)
    theta = np.zeros(2)
    X_b = np.c_[np.ones((m, 1)), X]  # Add x0 = 1 to each instance
    
    for iteration in range(iterations):
        gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
        theta = theta - lr * gradients
    
    return theta

# Generate some data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Perform gradient descent
theta = gradient_descent(X, y)
print(f"Intercept: {theta[0]}")
print(f"Slope: {theta[1]}")


In [None]:

### Q5: Describe the multiple linear regression model. How does it differ from simple linear regression?

```markdown
# Q5: Multiple Linear Regression Model

**Multiple Linear Regression** is an extension of simple linear regression. It involves two or more independent variables to predict a single dependent variable.

*Model:*
\[ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \ldots + \beta_nx_n + \epsilon \]

*Difference from Simple Linear Regression:*
- **Simple Linear Regression:** Involves one independent variable.
- **Multiple Linear Regression:** Involves multiple independent variables.

*Example:*
Predicting a person's weight based on their height and age.

```python
# Example of Multiple Linear Regression
data = pd.DataFrame({
    'Height': [150, 160, 170, 180, 190],
    'Age': [22, 25, 30, 35, 40],
    'Weight': [50, 60, 65, 70, 80]
})

# Independent variables (Height and Age)
X = data[['Height', 'Age']]
# Dependent variable (Weight)
y = data['Weight']

# Create a multiple linear regression model
multiple_lr = LinearRegression()
multiple_lr.fit(X, y)

# Intercept and Coefficients
intercept = multiple_lr.intercept_
coefficients = multiple_lr.coef_

print(f"Intercept: {intercept}")
print(f"Coefficients: {coefficients}")


In [None]:

### Q6: Explain the concept of multicollinearity in multiple linear regression. How can you detect and address this issue?

```markdown
# Q6: Multicollinearity in Multiple Linear Regression

**Multicollinearity** occurs when two or more independent variables in a multiple regression model are highly correlated. This can cause problems because it makes it difficult to determine the individual effect of each independent variable on the dependent variable.

*Detection:*
1. **Correlation Matrix:** Check the correlation coefficients between pairs of independent variables.
2. **Variance Inflation Factor (VIF):** Calculate VIF for each independent variable. A VIF value greater than 5 or 10 indicates high multicollinearity.

*Addressing Multicollinearity:*
1. **Remove Highly Correlated Variables:** Drop one of the correlated variables.
2. **Combine Variables:** Combine correlated variables into a single predictor.
3. **Principal Component Analysis (PCA):** Use PCA to transform correlated variables into a set of uncorrelated components.

```python
# Example: Detecting multicollinearity using VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor

# Create a sample dataset
np.random.seed(0)
X = np.random.rand(100, 3)
X[:, 1] = X[:, 0] + np.random.normal(scale=0.1, size=100)  # Create high correlation
y = 3*X[:, 0] + 5*X[:, 1] + np.random.randn(100)

# Calculate VIF for each independent variable
vif_data = pd.DataFrame()
vif_data["feature"] = ['X1', 'X2', 'X3']
vif_data["VIF"] = [variance_inflation_factor(X, i) for i in range(X.shape[1])]
print(vif_data)


In [None]:

### Q7: Describe the polynomial regression model. How is it different from linear regression?

```markdown
# Q7: Polynomial Regression Model

**Polynomial Regression** is a type of regression analysis where the relationship between the independent variable \( x \) and the dependent variable \( y \) is modeled as an \( n \)-th degree polynomial.

*Model:*
\[ y = \beta_0 + \beta_1x + \beta_2x^2 + \ldots + \beta_nx^n + \epsilon \]

*Difference from Linear Regression:*
- **Linear Regression:** Models a linear relationship between \( x \) and \( y \).
- **Polynomial Regression:** Models a nonlinear relationship between \( x \) and \( y \) using polynomial terms.

*Example:*
Predicting house prices based on the size of the house, where the relationship is not linear.

```python
# Example of Polynomial Regression
# Generate some data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + X**2 + np.random.randn(100, 1)

# Transform to polynomial features
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

# Create a linear regression model
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)

# Predict values
X_new = np.linspace(0, 2, 100).reshape(100, 1)
X_new_poly = poly_features.transform(X_new)
y_new = poly_reg.predict(X_new_poly)

# Plot the results
plt.scatter(X, y, color='blue', label='Actual data')
plt.plot(X_new, y_new, color='red', label='Polynomial fit')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Polynomial Regression')
plt.legend()
plt.show()


In [None]:

### Q8: Advantages and disadvantages of polynomial regression compared to linear regression. In what situations would you prefer to use polynomial regression?

```markdown
# Q8: Advantages and Disadvantages of Polynomial Regression

**Advantages:**
1. **Flexibility:** Can model nonlinear relationships.
2. **Better Fit:** Can provide a better fit for certain datasets where the relationship is not linear.

**Disadvantages:**
1. **Overfitting:** High-degree polynomials can overfit the data.
2. **Complexity:** More complex than linear regression, especially for high-degree polynomials.
3. **Extrapolation:** Poor at extrapolating outside the range of the data.

**When to Use Polynomial Regression:**
1. When the relationship between the independent and dependent variables is nonlinear.
2. When a simple linear model does not provide a good fit for the data.
3. When prior knowledge or exploratory data analysis suggests a polynomial relationship.

*Example:*
Use polynomial regression when modeling the growth of a plant, where the growth rate accelerates or decelerates over time.

```python
# Example: Overfitting with high-degree polynomial
# Generate some data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + X**2 + np.random.randn(100, 1)

# Transform to high-degree polynomial features
poly_features_high = PolynomialFeatures(degree=10, include_bias=False)
X_poly_high = poly_features_high.fit_transform(X)

# Create a linear regression model
poly_reg_high = LinearRegression()
poly_reg_high.fit(X_poly_high, y)

# Predict values
X_new = np.linspace(0, 2, 100).reshape(100, 1)
X_new_poly_high = poly_features_high.transform(X_new)
y_new_high = poly_reg_high.predict(X_new_poly_high)

# Plot the results
plt.scatter(X, y, color='blue', label='Actual data')
plt.plot(X_new, y_new_high, color='red', label='High-degree Polynomial fit')
plt.xlabel('X')
plt.ylabel('y')
plt.title('High-degree Polynomial Regression (Overfitting)')
plt.legend()
plt.show()
