# Biostatistics in R: Working with CSV/Excel Files and Data Frames

## **Session Objectives**
By the end of this session, participants will:

1. Learn how to load data from CSV and Excel files into R.
2. Understand how to manipulate rows and columns in a data frame.
3. Calculate common statistical measures like mean, variance, standard deviation, quantiles, and median.
4. Create a frequency distribution chart.

---

## **1. Loading Data into R**

### Installing Necessary Packages
Before we start, ensure you have the required packages installed:
```R
install.packages("readr")   # For reading CSV files
install.packages("readxl")  # For reading Excel files
```

### Loading a CSV File
Example:
```R
library(readr)
# Load the CSV file into a data frame
data <- read_csv("path/to/your/file.csv")

# View the first few rows of the data
glimpse(data)  # Compact view
head(data)     # First 6 rows
```

### Loading an Excel File
Example:
```R
library(readxl)
# Load the Excel file into a data frame
data <- read_excel("path/to/your/file.xlsx")

# View the first few rows
head(data)
```

---

## **2. Working with Rows and Columns**

### Selecting Columns
- Use `$` or `[` to access columns.
```R
# Access a column by name
data$ColumnName

# Access multiple columns
data[c("Column1", "Column2")]
```

### Selecting Rows
- Use slicing or conditions to filter rows.
```R
# Select the first 10 rows
data[1:10, ]

# Filter rows where ColumnA > 50
filtered_data <- data[data$ColumnA > 50, ]
```

### Adding Columns
```R
# Create a new column based on existing columns
data$NewColumn <- data$ColumnA + data$ColumnB
```

### Removing Columns
```R
# Remove a column
data$ColumnToRemove <- NULL
```

### Summary Example
```R
# Check structure and summary of the data
str(data)
summary(data)
```

---

## **3. Statistical Measures**

### Common Statistics

1. **Mean:**
   ```R
   mean(data$ColumnA, na.rm = TRUE)  # Ignore missing values
   ```

2. **Variance:**
   ```R
   var(data$ColumnA, na.rm = TRUE)
   ```

3. **Standard Deviation:**
   ```R
   sd(data$ColumnA, na.rm = TRUE)
   ```

4. **Quantiles:**
   ```R
   quantile(data$ColumnA, probs = c(0.25, 0.5, 0.75), na.rm = TRUE)
   ```

5. **Median:**
   ```R
   median(data$ColumnA, na.rm = TRUE)
   ```

### Example Data:
Assume `data$Age` contains the ages of patients.
```R
mean(data$Age, na.rm = TRUE)   # Calculate the mean age
sd(data$Age, na.rm = TRUE)     # Calculate the standard deviation of ages
```

---

## **4. Frequency Distribution and Visualization**

### Frequency Table
```R
# Create a frequency table
table(data$ColumnA)
```

### Bar Plot
```R
# Plot a bar chart
barplot(table(data$ColumnA), main = "Frequency Distribution", col = "blue")
```

### Histogram
```R
# Plot a histogram
hist(data$ColumnA, main = "Histogram of ColumnA", xlab = "Values", col = "lightblue")
```

---

## **Assignment for Next Week**

1. **Load Data:** Load a CSV or Excel file into R.
2. **Manipulate Data:**
   - Add a new column that combines or transforms existing data.
   - Filter rows based on a condition.
3. **Calculate Statistics:** Compute the mean, variance, standard deviation, median, and quantiles for at least one column.
4. **Create Visualizations:**
   - Create a frequency table.
   - Generate a bar plot and a histogram.
5. Submit your R script along with the dataset.

