import matplotlib.pyplot as plt
import numpy as np

### 1. Using a graph to illustrate slope and intercept, define basic linear regression.
x = np.linspace(0, 10, 100)
m = 2
b = 1
y = m * x + b

plt.figure(figsize=(10, 8))

plt.subplot(3, 2, 1)
plt.plot(x, y, label='y = 2x + 1')
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Basic Linear Regression: Slope and Intercept')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.legend()

### 2. In a graph, explain the terms rise, run, and slope.
x1, y1 = 1, 2
x2, y2 = 4, 8

plt.subplot(3, 2, 2)
plt.plot([x1, x2], [y1, y2], marker='o', label='Line')
plt.plot([x1, x1], [y1, y2], linestyle='--', color='gray', label='Rise')
plt.plot([x1, x2], [y1, y1], linestyle='--', color='gray', label='Run')
plt.text(x1, (y1+y2)/2, 'Rise', ha='right')
plt.text((x1+x2)/2, y1, 'Run', va='top')
plt.xlabel('Run (Horizontal Change)')
plt.ylabel('Rise (Vertical Change)')
plt.title('Explanation of Rise, Run, and Slope')
plt.legend()

### 3. Use a graph to demonstrate slope, linear positive slope, and linear negative slope, as well as the different conditions that contribute to the slope.
positive_slope = 2 * x + 1
negative_slope = -2 * x + 1

plt.subplot(3, 2, 3)
plt.plot(x, positive_slope, label='Positive Slope (y = 2x + 1)')
plt.plot(x, negative_slope, label='Negative Slope (y = -2x + 1)')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Positive and Negative Slopes')
plt.legend()

### 4. Use a graph to demonstrate curve linear negative slope and curve linear positive slope.
positive_curve_slope = x**2
negative_curve_slope = -x**2

plt.subplot(3, 2, 4)
plt.plot(x, positive_curve_slope, label='Curve Linear Positive Slope (y = x^2)')
plt.plot(x, negative_curve_slope, label='Curve Linear Negative Slope (y = -x^2)')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Curve Linear Positive and Negative Slopes')
plt.legend()

### 5. Use a graph to show the maximum and low points of curves.
curve = -x**2 + 4

plt.subplot(3, 2, 5)
plt.plot(x, curve, label='Curve (y = -x^2 + 4)')
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Maximum and Low Points of a Curve')
plt.scatter(0, 4, color='red', label='Maximum Point')
plt.scatter(0, 0, color='blue', label='Low Point')
plt.legend()

### 6. Use the formulas for a and b to explain ordinary least squares.
Ordinary Least Squares (OLS) Explanation
Formulas:
- Slope (`b`): b = Σ[(x_i - x̄)(y_i - ȳ)] / Σ[(x_i - x̄)^2]
- Intercept (`a`): a = ȳ - b * x̄
- These formulas minimize the sum of squared differences between observed and predicted values.

### 7. Provide a step-by-step explanation of the OLS algorithm.
Steps of OLS Algorithm:
1. Identify independent (x) and dependent (y) variables.
2. Calculate the means of x and y.
3. Compute the slope (`b`) using the formula for OLS.
4. Compute the intercept (`a`) using the formula for OLS.
5. Form the regression equation y = a + bx.
6. Use the equation to predict values of y for given values of x.
7. Assess the goodness of fit for the model by calculating R-squared and other metrics.

### 8. What is the regression's standard error? To represent the same, make a graph.
error = np.random.normal(0, 1, size=x.shape)

plt.subplot(3, 2, 6)
plt.errorbar(x, y + error, yerr=error, fmt='o', label='Observed Data')
plt.plot(x, y, label='Regression Line')
plt.xlabel('Independent Variable (x)')
plt.ylabel('Dependent Variable (y)')
plt.title('Regression with Standard Error')
plt.legend()

plt.tight_layout()
plt.show()

### 9. Provide an example of multiple linear regression.
Example of Multiple Linear Regression:
Predict house prices based on variables like size, location, and number of rooms.
The model: Price = b0 + b1*Size + b2*Location + b3*Rooms + ...

### 10. Describe the regression analysis assumptions and the BLUE principle.
Regression Analysis Assumptions and BLUE Principle:
1. Linearity: The relationship between the independent and dependent variables is linear.
2. Independence: Observations are independent of each other.
3. Homoscedasticity: Constant variance of errors.
4. Normality: Errors are normally distributed.
BLUE Principle: Best Linear Unbiased Estimator

### 11. Describe two major issues with regression analysis.
Issues with Regression Analysis:
1. Multicollinearity: High correlation among independent variables, which can lead to unreliable estimates.
2. Overfitting: Model fits the training data too closely and performs poorly on unseen data.

### 12. How can the linear regression model's accuracy be improved?
Improving Linear Regression Accuracy:
1. Feature Selection: Choose the most relevant variables.
2. Regularization: Apply techniques like Ridge or Lasso regression to prevent overfitting.
3. Data Preprocessing: Standardize or normalize the data.

### 13. Using an example, describe the polynomial regression model in detail.
Polynomial Regression Example:
- Polynomial regression fits a nonlinear relationship between the independent variable x and the dependent variable y.
- Example: Predicting the growth rate of a population where growth accelerates over time.

### 14. Provide a detailed explanation of logistic regression.
Logistic Regression Explanation:
- Used for binary classification problems.
- The output is a probability that is mapped to two classes using a threshold.
- The logistic function (sigmoid) is used to map predictions to probabilities.

### 15. What are the logistic regression assumptions?
Logistic Regression Assumptions:
1. Linearity of independent variables and log odds.
2. Independence of errors.
3. Absence of multicollinearity.

### 16. Go through the details of maximum likelihood estimation.
Maximum Likelihood Estimation (MLE) Explanation:
- MLE is a method to estimate the parameters of a statistical model.
- It maximizes the likelihood function so that under the assumed model, the observed data is most probable.
