# Lesson: Testing for Normality

## Objectives
By the end of this lesson, participants will:
- Understand the concept of normality in data.
- Learn methods for assessing normality both visually and statistically.
- Implement normality tests in R.
- Interpret the results of these tests.

---

## 1. What is Normality?
Normality refers to the condition where a dataset follows a normal distribution, also known as a Gaussian distribution. The normal distribution is symmetric and bell-shaped, with most data points clustering around the mean.

### Importance of Normality
- Many statistical tests (e.g., t-tests, ANOVA) assume that the data is normally distributed.
- Ensuring normality helps in applying these tests reliably.

---

## 2. Visual Methods for Assessing Normality

### a. Histogram
A histogram gives a quick visual of the data's distribution.
```r
# Example in R
set.seed(123)
data <- rnorm(100, mean = 0, sd = 1)
hist(data, breaks = 10, main = "Histogram of Data", col = "lightblue", border = "black")
```

### b. Q-Q Plot (Quantile-Quantile Plot)
A Q-Q plot compares the quantiles of the dataset against the quantiles of a theoretical normal distribution.
```r
# Q-Q Plot
qqnorm(data, main = "Q-Q Plot")
qqline(data, col = "red")
```

### c. Density Plot
A smoothed curve of the data can help assess normality.
```r
# Density Plot
plot(density(data), main = "Density Plot of Data")
```

---

## 3. Statistical Tests for Normality

### a. Shapiro-Wilk Test
The Shapiro-Wilk test checks if a dataset follows a normal distribution.
- Null Hypothesis (H0): Data is normally distributed.
- Alternative Hypothesis (H1): Data is not normally distributed.

```r
# Shapiro-Wilk Test
shapiro.test(data)
```

### b. Kolmogorov-Smirnov Test
Compares the sample distribution with a normal distribution.
```r
# Kolmogorov-Smirnov Test
ks.test(data, "pnorm", mean(data), sd(data))
```

### c. Anderson-Darling Test
A more robust test for normality.
```r
# Anderson-Darling Test
library(nortest)
ad.test(data)
```

### d. Lilliefors (Kolmogorov-Smirnov) Test
A variation of the Kolmogorov-Smirnov test, designed for normality.
```r
# Lilliefors Test
library(nortest)
lillie.test(data)
```

---

## 4. Dealing with Non-Normality
If data is not normally distributed:
1. **Transform the Data**:
   - Logarithmic, square root, or Box-Cox transformations.
   ```r
   # Log Transformation
   log_data <- log(data)
   hist(log_data, main = "Histogram of Log-Transformed Data")
   ```

2. **Non-Parametric Tests**:
   Use tests that don’t assume normality (e.g., Mann-Whitney U test, Kruskal-Wallis test).

---

## 5. Practical Example

### Step 1: Load Data
```r
# Simulated Example Dataset
set.seed(42)
data <- c(rnorm(50, mean = 10, sd = 2), runif(10, min = 5, max = 15))
```

### Step 2: Visual Inspection
```r
# Visual Methods
hist(data, breaks = 10, main = "Histogram", col = "lightblue")
qqnorm(data)
qqline(data, col = "red")
```

### Step 3: Statistical Testing
```r
# Statistical Tests
shapiro.test(data)
ad.test(data)
```

---

## 6. Exercise
1. Load a dataset (e.g., a CSV file) into R.
2. Perform visual inspections for normality using histograms and Q-Q plots.
3. Apply at least two statistical tests (e.g., Shapiro-Wilk, Anderson-Darling) to assess normality.
4. If the data is not normal, try a transformation and reassess.
5. Write a report summarizing your findings.

---

## Summary
In this lesson, we covered:
- The importance of normality in statistical analyses.
- Visual methods (histograms, Q-Q plots) and statistical tests (Shapiro-Wilk, Kolmogorov-Smirnov, etc.) for assessing normality.
- Techniques to address non-normality.

