### Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

years_experience = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)  # Independent variable
salary = np.array([30000, 35000, 40000, 45000, 50000, 55000, 60000, 65000, 70000, 75000])  # Dependent variable

model = LinearRegression()

model.fit(years_experience, salary)

slope = model.coef_[0]
intercept = model.intercept_

print("Slope (Coefficient):", slope)
print("Intercept:", intercept)

y_pred = model.predict(years_experience)

r_squared = r2_score(salary, y_pred)
print("R-squared:", r_squared)


Slope (Coefficient): 4999.999999999999
Intercept: 25000.000000000004
R-squared: 1.0


### Q2. Define adjusted R-squared and explain how it differs from the regular R-squared. 

In [7]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

years_experience = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)  # Independent variable
salary = np.array([30000, 35000, 40000, 45000, 50000, 55000, 60000, 65000, 70000, 75000])  # Dependent variable

model = LinearRegression()

model.fit(years_experience, salary)

slope = model.coef_[0]
intercept = model.intercept_

print("Slope (Coefficient):", slope)
print("Intercept:", intercept)

y_pred = model.predict(years_experience)

r_squared = r2_score(salary, y_pred)
print("R-squared:", r_squared)

n = len(years_experience) 
k = 1

adjusted_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - k - 1)
print("Adjusted R-squared:", adjusted_r_squared)


Slope (Coefficient): 4999.999999999999
Intercept: 25000.000000000004
R-squared: 1.0
Adjusted R-squared: 1.0


### Q3. When is it more appropriate to use adjusted R-squared?

### Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?

### Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.

### Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?

### Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.

In [2]:
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

np.random.seed(42)
X = np.random.randn(100, 10)  # Features
y = np.random.randint(0, 2, 100)  # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


ridge = Ridge(alpha=1.0)  # alpha is the regularization parameter

ridge.fit(X_train_scaled, y_train)

y_train_pred = ridge.predict(X_train_scaled)
y_train_pred_rounded = np.round(y_train_pred)

training_accuracy = accuracy_score(y_train, y_train_pred_rounded)
print("Training Accuracy:", training_accuracy)

y_test_pred = ridge.predict(X_test_scaled)
y_test_pred_rounded = np.round(y_test_pred)

test_accuracy = accuracy_score(y_test, y_test_pred_rounded)
print("Test Accuracy:", test_accuracy)


Training Accuracy: 0.6
Test Accuracy: 0.5


### Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.

### Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?

### Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?