# Introduction to R and Basic Statistics

## Session Objectives
The aim of this session is to introduce participants to the R programming language and RStudio environment. By the end of the session, participants will:

1. Install and set up R and RStudio.
2. Navigate the RStudio interface.
3. Learn basic R commands and environment setup.
4. Understand data types in R.
5. Perform basic mathematical and logical operations.
6. Work with vectors, matrices, and data frames.
7. Handle missing data and type conversions in R.

---

## **1. Installing R and RStudio**

1. **Download R:**
   - Visit [CRAN R Project](https://cran.r-project.org/) and download the latest version of R for your operating system.
2. **Download RStudio:**
   - Visit [RStudio](https://posit.co/download/rstudio-desktop/) and download the free desktop version.
3. **Installation:**
   - Follow the instructions to install both R and RStudio.

---

## **2. Exploring the RStudio Environment**

### Key Panels in RStudio:
- **Console:** Where you execute R commands.
- **Script Editor:** To write and save scripts.
- **Environment/History:** Displays objects and command history.
- **Plots/Help/Files:** Visualizations, documentation, and file navigation.

### Example:
Open RStudio and type the following in the Console:
```R
print("Welcome to R!")
```

---

## **3. Basic Commands in R**

### Getting Help:
- `help()` or `?`: To get help about a function.
  ```R
  help(mean)
  ?mean
  ```
- `??`: Search for keywords.
  ```R
  ??"linear model"
  ```

### Working Directory:
- `getwd()`: Get the current working directory.
- `setwd("path")`: Set a new working directory.
  ```R
  getwd()
  setwd("C:/Users/YourName/Documents")
  ```

---

## **4. Data Types in R**

### Types of Variables:
1. **Character:** Strings of text.
   ```R
   var_char <- "Hello, R!"
   class(var_char)  # Output: "character"
   ```
2. **Numeric:** Decimal numbers.
   ```R
   var_num <- 10.5
   class(var_num)  # Output: "numeric"
   ```
3. **Integer:** Whole numbers.
   ```R
   var_int <- as.integer(10)
   class(var_int)  # Output: "integer"
   ```
4. **Complex:** Complex numbers.
   ```R
   var_comp <- 3 + 2i
   class(var_comp)  # Output: "complex"
   ```
5. **Logical:** TRUE or FALSE.
   ```R
   var_log <- TRUE
   class(var_log)  # Output: "logical"
   ```

---

## **5. Mathematical and Logical Operations**

### Arithmetic Operators:
```R
# Addition, subtraction, multiplication, division
5 + 3
5 - 3
5 * 3
5 / 3
```

### Logical Operators:
```R
# Greater than, less than, equal to
5 > 3
5 == 3
5 != 3
```

---

## **6. Working with Data Structures**

### Vectors:
A collection of elements of the same type.
```R
vector <- c(1, 2, 3, 4)
class(vector)  # Output: "numeric"
```

### Matrices:
Two-dimensional arrays with rows and columns.
```R
matrix <- matrix(1:9, nrow = 3, byrow = TRUE)
print(matrix)
```

### Data Frames:
Tables where each column can contain different types of data.
```R
data <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
print(data)
```

---

## **7. Type Conversion**

### Examples:
```R
num <- 42
char <- as.character(num)  # Convert numeric to character
num_back <- as.numeric(char)  # Convert character to numeric
```

---

## **8. Handling Missing Data**

### Identifying Missing Data:
```R
data <- c(1, 2, NA, 4)
is.na(data)  # Identify missing values
```

### Removing Missing Data:
```R
clean_data <- na.omit(data)
print(clean_data)
```

---

## **Assignment for Next Week**

1. Create a script file in RStudio and:
   - Write R code to create a vector, matrix, and data frame.
   - Perform basic arithmetic and logical operations.
   - Identify and handle missing data in a sample dataset.
2. Create a data frame with at least 10 columns and 10 rows, including some `NA` values. Calculate the mean, maximum, and minimum for each row.



# Lesson: Working with Matrices in R

## Objectives
By the end of this lesson, participants will:
- Understand what matrices are and their applications in data analysis.
- Learn how to create matrices in R.
- Explore operations and functions specific to matrices.
- Practice subsetting, modifying, and performing calculations with matrices.

---

## 1. What is a Matrix?
A **matrix** is a two-dimensional array in which elements are arranged in rows and columns. All elements in a matrix must have the same data type (e.g., numeric, character).

**Applications**:
- Representing datasets.
- Mathematical computations (e.g., linear algebra).
- Image processing.

---

## 2. Creating Matrices
### 2.1 Using `matrix()`
Syntax:
```R
matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL)
```
- **data**: Elements to fill the matrix.
- **nrow**: Number of rows.
- **ncol**: Number of columns.
- **byrow**: Logical. Fill by rows if `TRUE`, otherwise by columns.
- **dimnames**: Optional names for rows and columns.

### Example:
```R
# Create a 3x3 matrix with numbers 1 to 9
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)

# Create a matrix filling by rows
mat_byrow <- matrix(1:9, nrow = 3, byrow = TRUE)
print(mat_byrow)
```

### 2.2 Using `cbind()` and `rbind()`
- `cbind()`: Combines vectors as columns.
- `rbind()`: Combines vectors as rows.

### Example:
```R
# Combine vectors into a matrix
col1 <- c(1, 2, 3)
col2 <- c(4, 5, 6)
mat_cbind <- cbind(col1, col2)
print(mat_cbind)

row1 <- c(1, 2, 3)
row2 <- c(4, 5, 6)
mat_rbind <- rbind(row1, row2)
print(mat_rbind)
```

---

## 3. Accessing Matrix Elements
### 3.1 Indexing
Syntax:
```R
matrix[row, column]
```
- Leave row or column blank to select all.

### Example:
```R
# Access element in 2nd row, 3rd column
mat[2, 3]

# Access entire 1st row
mat[1, ]

# Access entire 2nd column
mat[, 2]
```

### 3.2 Modifying Elements
```R
# Modify specific element
mat[1, 1] <- 99
print(mat)

# Replace an entire row
mat[3, ] <- c(10, 11, 12)
print(mat)
```

---

## 4. Matrix Operations
### 4.1 Arithmetic
- Addition, subtraction, multiplication, and division are element-wise.

### Example:
```R
mat2 <- matrix(10:18, nrow = 3)

# Element-wise addition
result_add <- mat + mat2
print(result_add)

# Element-wise multiplication
result_mult <- mat * mat2
print(result_mult)
```

### 4.2 Matrix Multiplication
Use `%*%` for matrix multiplication.
```R
# Matrix multiplication
result_mat_mult <- mat %*% mat2
print(result_mat_mult)
```

---

## 5. Built-in Functions for Matrices
### 5.1 Mathematical Functions
- `rowSums()`: Sum of each row.
- `colSums()`: Sum of each column.
- `rowMeans()`: Mean of each row.
- `colMeans()`: Mean of each column.

### Example:
```R
# Row and column sums
row_sums <- rowSums(mat)
col_sums <- colSums(mat)

# Row and column means
row_means <- rowMeans(mat)
col_means <- colMeans(mat)
```

### 5.2 Transpose and Determinant
- `t()`: Transpose a matrix.
- `det()`: Calculate determinant (only for square matrices).

### Example:
```R
# Transpose
transposed_mat <- t(mat)
print(transposed_mat)

# Determinant
det_value <- det(mat2)
print(det_value)
```

---

## 6. Handling Missing Values
### Example:
```R
# Create a matrix with NA
mat_with_na <- matrix(c(1, NA, 3, 4, 5, NA), nrow = 2)

# Check for missing values
is_na <- is.na(mat_with_na)
print(is_na)

# Replace NA with a value (e.g., 0)
mat_with_na[is.na(mat_with_na)] <- 0
print(mat_with_na)
```

---

## 7. Visualizing Matrices
Use `image()` to create a heatmap-like visualization.

### Example:
```R
# Visualize a matrix
image(mat, main = "Matrix Visualization", col = heat.colors(10))
```

---

## 8. Exercise
1. Create a 4x4 matrix with random numbers between 1 and 100.
2. Calculate the sum, mean, and transpose of the matrix.
3. Replace all values greater than 50 with 50.
4. Visualize the resulting matrix using `image()`.

---

This comprehensive guide ensures learners understand matrix basics, operations, and their applications in R.



# Lesson: Data Frames in R

## Objective
This lesson introduces the concept of data frames in R, their structure, and various operations that can be performed on them. By the end of this lesson, you will be able to create, manipulate, and analyze data frames efficiently.

---

## What is a Data Frame?
A data frame in R is a two-dimensional, tabular data structure similar to a spreadsheet or a database table. It consists of rows and columns, where each column can have a different data type (numeric, character, factor, etc.).

### Key Features:
- Columns represent variables.
- Rows represent observations.
- Columns can have different data types.

---

## Creating a Data Frame
You can create a data frame using the `data.frame()` function.

### Example:
```R
# Creating a data frame
my_data <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  Score = c(85.5, 92.3, 88.7)
)

# Display the data frame
print(my_data)
```
Output:
```
     Name Age Score
1   Alice  25 85.5
2     Bob  30 92.3
3 Charlie  35 88.7
```

---

## Accessing Data in a Data Frame
### 1. Accessing Columns
- Use the `$` operator or `[[ ]]`.

#### Example:
```R
# Access the 'Name' column
names_column <- my_data$Name

# Access the 'Score' column using [[ ]]
score_column <- my_data[["Score"]]
```

### 2. Accessing Rows
- Use indexing with `[row, ]`.

#### Example:
```R
# Access the first row
first_row <- my_data[1, ]
```

### 3. Accessing Specific Elements
- Use `[row, column]`.

#### Example:
```R
# Access the element in the 2nd row and 3rd column
element <- my_data[2, 3]
```

### 4. Accessing Subsets
#### Example:
```R
# Select rows where Age > 28
subset_data <- my_data[my_data$Age > 28, ]
```

---

## Adding or Modifying Data
### 1. Adding Columns
```R
# Add a new column 'Passed'
my_data$Passed <- my_data$Score > 90
```

### 2. Adding Rows
```R
# Add a new row
new_row <- data.frame(Name = "David", Age = 28, Score = 89.0, Passed = FALSE)
my_data <- rbind(my_data, new_row)
```

### 3. Modifying Values
```R
# Update Bob's Score
my_data[2, "Score"] <- 95.0
```

---

## Summary Statistics
You can calculate summary statistics for numeric columns.

#### Example:
```R
# Calculate the mean score
mean_score <- mean(my_data$Score)

# Calculate the standard deviation
sd_score <- sd(my_data$Score)

# Summary of the entire data frame
summary(my_data)
```

---

## Handling Missing Data
### 1. Introducing Missing Data
```R
# Set the Score of Charlie to NA
my_data[3, "Score"] <- NA
```

### 2. Identifying Missing Data
```R
# Check for missing values
is.na(my_data)
```

### 3. Removing Missing Data
```R
# Remove rows with NA values
my_data <- na.omit(my_data)
```

### 4. Replacing Missing Data
```R
# Replace NA with the mean of the column
my_data$Score[is.na(my_data$Score)] <- mean(my_data$Score, na.rm = TRUE)
```

---

## Visualizing Data Frames
You can visualize the data using built-in plotting functions.

#### Example:
```R
# Plot Age vs Score
plot(my_data$Age, my_data$Score, main = "Age vs Score", xlab = "Age", ylab = "Score", col = "blue", pch = 16)
```

---

## Practice Exercises
1. Create a data frame with at least 5 columns (e.g., Name, Age, Gender, Height, Weight) and 10 rows of data.
2. Compute the mean, median, and standard deviation for a numeric column.
3. Add a new column that categorizes data into groups based on a numeric column.
4. Replace missing values in a column with the column's median.
5. Subset the data frame based on specific criteria and plot a relationship between two columns.

---

## Conclusion
Data frames are one of the most powerful and flexible data structures in R. Understanding how to manipulate and analyze data frames is essential for statistical analysis and data science workflows.

