# Introduction to R: Statistical Analysis in Medicine

## **Section 1: Introduction to R and the Environment**

### **1.1 Why Use R?**
R is a powerful open-source tool for statistical analysis, data visualization, and reproducible research. It is widely used in medical and biological sciences for:
- Analyzing clinical trial data.
- Visualizing trends and patterns in patient data.
- Performing advanced statistical modeling.

### **1.2 Setting Up R and RStudio**
#### Installing R and RStudio:
- Download and install R from [CRAN](https://cran.r-project.org/).
- Download and install RStudio from [RStudio's website](https://www.rstudio.com/).

#### The RStudio Interface:
- **Source Pane:** Write and save scripts.
- **Console:** Execute R commands interactively.
- **Environment Pane:** View and manage objects in memory.
- **Plots/Help Pane:** Display plots and access documentation.

### **1.3 Writing Your First Script**
#### Example Code:
```r
# Print a message
print("Hello, R!")

# Basic arithmetic
2 + 2
```

---

## **Section 2: Basics of Data Analysis**

### **2.1 Importing Data**
#### Loading a CSV File:
```r
# Read a CSV file
my_data <- read.csv("path/to/your/file.csv")

# View the first few rows
head(my_data)
```

### **2.2 Data Management**
#### Filtering and Selecting Columns:
```r
# Filter rows where Age > 50
subset_data <- my_data[my_data$Age > 50, ]

# Select specific columns
selected_data <- my_data[, c("Age", "Gender")]
```

#### Handling Missing Data:
```r
# Check for missing values
sum(is.na(my_data))

# Replace missing values with the mean
my_data$Age[is.na(my_data$Age)] <- mean(my_data$Age, na.rm = TRUE)
```

---

## **Section 3: Descriptive Analysis**

### **3.1 Descriptive Statistics**
#### Calculating Summary Statistics:
```r
# Mean and Standard Deviation
mean(my_data$Age, na.rm = TRUE)
sd(my_data$Age, na.rm = TRUE)

# Summary of the dataset
summary(my_data)
```

### **3.2 Visualizing Data**
#### Histograms:
```r
# Plot a histogram of Age
hist(my_data$Age, main = "Age Distribution", xlab = "Age")
```

#### Boxplots:
```r
# Boxplot of Age by Gender
grouped_boxplot <- boxplot(Age ~ Gender, data = my_data,
                          main = "Age by Gender",
                          xlab = "Gender",
                          ylab = "Age")
```

### **3.3 Normal vs. Non-Normal Distribution**

#### Understanding Normal Distribution:
- A normal distribution is symmetric and bell-shaped.
- Many biological and clinical variables (e.g., height, blood pressure) follow a normal distribution.

#### Checking Normality:
1. **Visual Methods:**
   - **Histogram:**
     ```r
     hist(my_data$Age, main = "Histogram of Age", xlab = "Age")
     ```
   - **Q-Q Plot:**
     ```r
     qqnorm(my_data$Age)
     qqline(my_data$Age, col = "red")
     ```

2. **Statistical Tests:**
   - **Shapiro-Wilk Test:**
     ```r
     shapiro.test(my_data$Age)
     ```
   - **Kolmogorov-Smirnov Test:**
     ```r
     ks.test(my_data$Age, "pnorm", mean = mean(my_data$Age), sd = sd(my_data$Age))
     ```

#### Handling Non-Normal Data:
- **Transformations:** Apply log, square root, or inverse transformations.
  ```r
  transformed_data <- log(my_data$Age)
  ```
- **Use Nonparametric Tests:** If data remains non-normal, use tests like Wilcoxon or Kruskal-Wallis.

---

## **Section 4: Inferential Statistics**

### **4.1 t-Tests**
#### Independent t-Test:
```r
# Perform a t-test for Age by Gender
t.test(Age ~ Gender, data = my_data)
```

### **4.2 Nonparametric Tests**
#### Wilcoxon Test:
```r
# Perform a Wilcoxon test for Age by Gender
wilcox.test(Age ~ Gender, data = my_data)
```

---

## **Section 5: Regression Analysis**

### **5.1 Linear Regression**
#### Fitting a Linear Model:
```r
# Fit a linear regression model
model <- lm(Age ~ Height, data = my_data)

# Summary of the model
summary(model)
```

#### Plotting the Regression Line:
```r
# Scatter plot with regression line
plot(my_data$Height, my_data$Age,
     main = "Age vs. Height",
     xlab = "Height",
     ylab = "Age")
abline(model, col = "red")
```

---

## **Section 6: Data Visualization**

### **6.1 Basic Plots**
#### Bar Plot:
```r
# Bar plot of Gender
barplot(table(my_data$Gender),
        main = "Gender Distribution",
        xlab = "Gender",
        ylab = "Frequency")
```

### **6.2 Advanced Visualization with ggplot2**
#### Creating a Scatter Plot:
```r
# Install ggplot2
install.packages("ggplot2")
library(ggplot2)

# Scatter plot with ggplot2
ggplot(my_data, aes(x = Height, y = Age)) +
  geom_point() +
  geom_smooth(method = "lm", col = "red") +
  labs(title = "Height vs. Age",
       x = "Height",
       y = "Age")
```

---

## **Section 7: Final Project**

### **Project Description:**
1. Load a medical dataset (e.g., patient data).
2. Perform the following analyses:
   - Descriptive statistics.
   - Inferential testing (e.g., t-tests, Chi-Square).
   - Regression modeling.
3. Visualize key findings with plots.

#### Example Steps:
```r
# Step 1: Load data
final_data <- read.csv("path/to/your/final_dataset.csv")

# Step 2: Perform descriptive analysis
summary(final_data)

# Step 3: Visualize data
hist(final_data$Age, main = "Age Distribution", xlab = "Age")

# Step 4: Perform regression analysis
final_model <- lm(Age ~ Height, data = final_data)
summary(final_model)

# Step 5: Create a report (use R Markdown for formatting)
```
