# Lesson: Linear Regression in Statistical Analysis

## Objectives
By the end of this lesson, participants will:
- Understand the concept of linear regression and its applications.
- Learn how to perform linear regression in R.
- Interpret the results of linear regression analysis.
- Visualize linear regression models with plots.

---

## 1. Introduction to Linear Regression
Linear regression is a statistical method for modeling the relationship between a dependent variable (response) and one or more independent variables (predictors). The goal is to fit a line (or hyperplane in multiple dimensions) that best predicts the dependent variable.

### Formula
For simple linear regression:
\[
Y = \beta_0 + \beta_1X + \epsilon
\]
Where:
- \(Y\): Dependent variable.
- \(X\): Independent variable.
- \(\beta_0\): Intercept.
- \(\beta_1\): Slope of the line.
- \(\epsilon\): Random error.

---

## 2. Assumptions of Linear Regression
1. **Linearity**: The relationship between the independent and dependent variable is linear.
2. **Independence**: Observations are independent.
3. **Homoscedasticity**: Constant variance of errors.
4. **Normality**: Residuals are normally distributed.

---

## 3. Performing Linear Regression in R
### Example: Simple Linear Regression
#### Dataset
We will use the built-in `mtcars` dataset.

```r
# Load Dataset
data(mtcars)

# Perform Linear Regression
model <- lm(mpg ~ wt, data = mtcars)

# Summary of the Model
summary(model)
```

### Interpretation of Results
- **Coefficients**: The slope (`wt`) indicates how much the dependent variable (`mpg`) changes for a one-unit increase in the predictor (`wt`).
- **R-squared**: Proportion of variance explained by the model.
- **p-value**: Indicates if the relationship is statistically significant.

---

### Example: Multiple Linear Regression
#### Dataset
We will add another variable, `hp` (horsepower), to the model.

```r
# Multiple Linear Regression
model_mult <- lm(mpg ~ wt + hp, data = mtcars)

# Summary of the Model
summary(model_mult)
```

### Interpretation
- Each coefficient shows the effect of a predictor while holding other variables constant.

---

## 4. Visualizing Linear Regression
### Simple Linear Regression Plot
```r
# Scatter Plot with Regression Line
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight", xlab = "Weight", ylab = "Miles per Gallon")
abline(model, col = "blue")
```

### Residual Plot
Residual plots help check assumptions like homoscedasticity.

```r
# Residual Plot
plot(model, which = 1)
```

---

## 5. Diagnostic Measures
### Checking Assumptions
1. **Residual Analysis**:
   ```r
   # Plot Residuals
   plot(model$residuals)
   ```

2. **Normality of Residuals**:
   ```r
   # QQ Plot
   qqnorm(model$residuals)
   qqline(model$residuals, col = "red")
   ```

3. **Homoscedasticity**:
   ```r
   # Scale-Location Plot
   plot(model, which = 3)
   ```

---

## 6. Exercise
### Task
1. Load a dataset of your choice.
2. Fit a simple linear regression model and a multiple linear regression model.
3. Plot the fitted regression line for the simple model.
4. Check assumptions using diagnostic plots.
5. Write a short report summarizing:
   - The coefficients and their interpretation.
   - Model performance (R-squared and p-values).

### Bonus
Use the `predict()` function to make predictions for new data points:
```r
# Predict New Values
new_data <- data.frame(wt = c(2.5, 3.5), hp = c(100, 150))
predictions <- predict(model_mult, newdata = new_data)
print(predictions)
```

---

## Summary
In this lesson, we:
- Explored the theory and assumptions of linear regression.
- Performed simple and multiple linear regression in R.
- Visualized and interpreted regression models.
- Learned how to diagnose potential issues with regression assumptions.

