# R Basics

This notebook covers fundamental concepts and data structures in R programming, essential for data analysis and statistical computing.

## Setting Up Your Working Directory

The working directory is the default location where R will look for files to load and save files. It's good practice to set it explicitly.

In [31]:

# Get current working directory
getwd()
# Expected Output: [1] "/path/to/your/working/directory"

# Set working directory (replace with your desired path)
# setwd("C:/Users/YourUser/Documents/R_Projects")
# setwd("/home/youruser/R_Projects")
# You can also use the 'Session' -> 'Set Working Directory' menu in RStudio.


## Vectors

Vectors are the most basic R data objects. They are one-dimensional arrays that can hold numeric, character, or logical data, but all elements in a vector must be of the same data type. The `c()` function is used to combine elements into a vector.

### Numeric Vector

A vector containing only numbers (integers or decimals).

In [32]:

age <- c(19, 17, 21, 20, 37)
age
# Expected Output: [1] 19 17 21 20 37


### Character Vector

A vector containing text strings. Elements are enclosed in single or double quotes.

In [33]:

name <- c("Luffy", "Chopper", "Zoro", "Nami", "Jinbe")
name
# Expected Output: [1] "Luffy"   "Chopper" "Zoro"    "Nami"    "Jinbe"

gender <- c('M', "M", "M", "F", "M") # Both single and double quotes work
gender
# Expected Output: [1] "M" "M" "M" "F" "M"


### Logical Vector

A vector containing boolean values (`TRUE` or `FALSE`).

In [34]:

gender_is_M <- c(TRUE, TRUE, TRUE, FALSE, TRUE)
gender_is_M
# Expected Output: [1]  TRUE  TRUE  TRUE FALSE  TRUE


### Classes and Types (`class()`, `typeof()`)

In R, `class()` returns the object's class (how it's treated by generic functions), while `typeof()` returns the internal storage type of the object. For atomic vectors, they often return the same value, but for more complex objects, they can differ.

In [35]:

class(age) 
# Expected Output: [1] "numeric"
typeof(age)
# Expected Output: [1] "double"

class(name)
# Expected Output: [1] "character"
typeof(name)
# Expected Output: [1] "character"

class(gender_is_M)
# Expected Output: [1] "logical"
typeof(gender_is_M)
# Expected Output: [1] "logical"

# Define a helper function to print class and type for multiple variables
print_class_type <- function(var, var_name) {
  cat("Class of", var_name, "=", class(var), ",", "Type of", var_name, "=", typeof(var), "\n")
}

# Use the function for each variable
print_class_type(age, "age")
print_class_type(name, "name")
print_class_type(gender, "gender")
print_class_type(gender_is_M, "gender_is_M")
# Expected Output:
# Class of age = numeric , Type of age = double 
# Class of name = character , Type of name = character 
# Class of gender = character , Type of gender = character 
# Class of gender_is_M = logical , Type of gender_is_M = logical 

# Note: 'print' is a built-in function in R. Redefining it can cause issues.
# It's better to use a different name for custom functions.
# For demonstration, if you were to redefine print:
# print_custom <- function(x){cat(x)}
# print_custom("Hello World")
# Expected Output: Hello World


Class of age = numeric , Type of age = double 
Class of name = character , Type of name = character 
Class of gender = character , Type of gender = character 
Class of gender_is_M = logical , Type of gender_is_M = logical 


### Sequence Generation (`seq()`)

The `seq()` function generates regular sequences of numbers. It's highly flexible.

In [36]:

# Sequence from start to end (by default, step is 1)
sequential <- seq(1, 8)
sequential
# Expected Output: [1] 1 2 3 4 5 6 7 8

# Sequence with a specified step
sequential1 <- seq(1, 8, by = 2) # 'by' argument specifies the step size
sequential1
# Expected Output: [1] 1 3 5 7

# Using 'length.out' to specify the desired length of the sequence
seq_length <- seq(1, 10, length.out = 5)
seq_length
# Expected Output: [1]  1.00  3.25  5.50  7.75 10.00


### Repeating Elements (`rep()`)

The `rep()` function is used to replicate elements of vectors and lists.

In [37]:

# Repeat a single number 10 times
re <- rep(8, 10)
re
# Expected Output: [1] 8 8 8 8 8 8 8 8 8 8

# Repeat a string 5 times
re_str <- rep("yes", 5)
re_str
# Expected Output: [1] "yes" "yes" "yes" "yes" "yes"

# Repeat elements of a vector
rep_vec <- rep(c(1, 2), times = 3)
rep_vec
# Expected Output: [1] 1 2 1 2 1 2

# Repeat each element individually
rep_each <- rep(c("A", "B"), each = 2)
rep_each
# Expected Output: [1] "A" "A" "B" "B"


### Random Number Generation (`runif()`, `round()`)

R provides functions to generate random numbers from various distributions. `runif()` generates random numbers from a uniform distribution.

In [38]:

# Generate 5 random numbers between 0 and 1 (default for runif)
rand <- runif(5)
rand
# Expected Output: [1] 0.1234 0.5678 0.9123 0.3456 0.7890 (values will vary)

# Generate 5 random numbers between 20 and 30
rand1 <- runif(5, min = 20, max = 30)
rand1
# Expected Output: [1] 23.45 28.12 20.76 29.01 25.55 (values will vary)

# Rounding random numbers to the nearest integer
round1 <- round(runif(5, min = 20, max = 30))
round1
# Expected Output: [1] 23 28 21 29 26 (values will vary based on rand1)


## Data Frames

Data frames are the most important data structure in R for storing tabular data. They are essentially lists of vectors of equal length, where each vector represents a column and each row represents an observation. Data frames can contain different types of data (numeric, character, logical) in different columns.

In [39]:

# Create vectors first
name <- c("Luffy", "Chopper", "Zoro", "Nami", "Jinbe")
age <- c(19, 17, 21, 20, 37)
gender <- c('M', "M", "M", "F", "M")
gender_is_M <- c(TRUE, TRUE, TRUE, FALSE, TRUE)

# Create a data frame using data.frame()
# cbind() combines vectors by column. It's often used to prepare data for data.frame.
df <- data.frame(cbind(name, age, gender, gender_is_M))
df
# Expected Output:
#      name age gender gender_is_M
# 1   Luffy  19      M        TRUE
# 2 Chopper  17      M        TRUE
# 3    Zoro  21      M        TRUE
# 4    Nami  20      F       FALSE
# 5   Jinbe  37      M        TRUE

# Note: When using cbind() with mixed types, R might coerce all columns to character.
# A more robust way to create a data frame is directly:
df_robust <- data.frame(
  Name = c("Luffy", "Chopper", "Zoro", "Nami", "Jinbe"),
  Age = c(19, 17, 21, 20, 37),
  Gender = c('M', "M", "M", "F", "M"),
  Is_Male = c(TRUE, TRUE, TRUE, FALSE, TRUE)
)
print(df_robust)
str(df_robust) # Check structure and data types
# Expected Output (str output will show correct types like int, Factor, logi):
#   Name Age Gender Is_Male
# 1 Luffy  19      M    TRUE
# 2 Chopper  17      M    TRUE
# 3    Zoro  21      M    TRUE
# 4    Nami  20      F   FALSE
# 5   Jinbe  37      M    TRUE
# 'data.frame':	5 obs. of  4 variables:
#  $ Name   : Factor w/ 5 levels "Chopper","Jinbe",..: 3 1 5 4 2
#  $ Age    : num  19 17 21 20 37
#  $ Gender : Factor w/ 2 levels "F","M": 2 2 2 1 2
#  $ Is_Male: logi  TRUE TRUE TRUE FALSE TRUE

# fix(df) # Opens a data editor for interactive editing (not runnable in notebook output)


name,age,gender,gender_is_M
<chr>,<chr>,<chr>,<chr>
Luffy,19,M,True
Chopper,17,M,True
Zoro,21,M,True
Nami,20,F,False
Jinbe,37,M,True


     Name Age Gender Is_Male
1   Luffy  19      M    TRUE
2 Chopper  17      M    TRUE
3    Zoro  21      M    TRUE
4    Nami  20      F   FALSE
5   Jinbe  37      M    TRUE
'data.frame':	5 obs. of  4 variables:
 $ Name   : chr  "Luffy" "Chopper" "Zoro" "Nami" ...
 $ Age    : num  19 17 21 20 37
 $ Gender : chr  "M" "M" "M" "F" ...
 $ Is_Male: logi  TRUE TRUE TRUE FALSE TRUE


In [40]:
linesep <- function() {
  cat(strrep("-", 50), "\n")
}


### Accessing Elements in Data Frames

You can access specific columns or rows using various methods, including dollar sign (`$`), square brackets (`[]`), and `subset()`.

In [41]:

# Re-create df_robust for consistent examples
df_robust <- data.frame(
  Name = c("Luffy", "Chopper", "Zoro", "Nami", "Jinbe"),
  Age = c(19, 17, 21, 20, 37),
  Gender = c('M', "M", "M", "F", "M"),
  Is_Male = c(TRUE, TRUE, TRUE, FALSE, TRUE)
)

# Accessing a specific column using $ (most common and readable)
cat('Accessing column by name using $:\n')
print(df_robust$Name)
# Expected Output: [1] Luffy   Chopper Zoro    Nami    Jinbe

cat('Accessing column by name using $:\n')
print(df_robust$Gender)
# Expected Output: [1] M M M F M

linesep()

# Accessing columns by index or name using []
cat('Selective columns by index (columns 1 to 3):\n')
print(df_robust[1:3]) # Selects columns from index 1 to 3
# Expected Output:
#      Name Age Gender
# 1   Luffy  19      M
# 2 Chopper  17      M
# 3    Zoro  21      M
# 4    Nami  20      F
# 5   Jinbe  37      M

linesep()

cat('Selective columns by name (Name and Gender):\n')
print(df_robust[c("Name", "Gender")]) # Selects specific columns by their names
# Expected Output:
#      Name Gender
# 1   Luffy      M
# 2 Chopper      M
# 3    Zoro      M
# 4    Nami      F
# 5   Jinbe      M

linesep()

# Accessing specific rows and columns
cat('Accessing row 2, column 1:\n')
print(df_robust[2, 1]) # Row 2, Column 1
# Expected Output: [1] Chopper

cat('Accessing all columns for row 3:\n')
print(df_robust[3, ]) # All columns for row 3
# Expected Output:
#   Name Age Gender Is_Male
# 3 Zoro  21      M    TRUE

cat('Accessing specific rows and columns by name:\n')
print(df_robust[c(1, 5), c("Name", "Age")]) # Rows 1 and 5, columns 'Name' and 'Age'
# Expected Output:
#      Name Age
# 1   Luffy  19
# 5   Jinbe  37


Accessing column by name using $:
[1] "Luffy"   "Chopper" "Zoro"    "Nami"    "Jinbe"  
Accessing column by name using $:
[1] "M" "M" "M" "F" "M"
-------------------------------------------------- 
Selective columns by index (columns 1 to 3):
     Name Age Gender
1   Luffy  19      M
2 Chopper  17      M
3    Zoro  21      M
4    Nami  20      F
5   Jinbe  37      M
-------------------------------------------------- 
Selective columns by name (Name and Gender):
     Name Gender
1   Luffy      M
2 Chopper      M
3    Zoro      M
4    Nami      F
5   Jinbe      M
-------------------------------------------------- 
Accessing row 2, column 1:
[1] "Chopper"
Accessing all columns for row 3:
  Name Age Gender Is_Male
3 Zoro  21      M    TRUE
Accessing specific rows and columns by name:
   Name Age
1 Luffy  19
5 Jinbe  37


## Matrix and Its Operations

Matrices are two-dimensional, homogeneous data structures in R, meaning all elements must be of the same data type. They are fundamental for linear algebra operations and statistical modeling.

In [42]:

# Creating a matrix
# matrix(data, nrow, ncol, byrow, dimnames)
# data: elements to fill the matrix
# nrow: number of rows
# ncol: number of columns
# byrow: if TRUE, fills by row; if FALSE (default), fills by column
# dimnames: a list of two vectors for row and column names
a <- matrix(1:8, nrow = 2, ncol = 4, byrow = TRUE, 
            dimnames = list(c('RowI', "RowII"), c('ColW', 'ColX', 'ColY', 'ColZ')))
a
# Expected Output:
#      ColW ColX ColY ColZ
# RowI    1    2    3    4
# RowII   5    6    7    8

linesep()

# Another matrix example (filled by column by default)
values = c(1, 2, 3, 4, 5, 6, 7, 8)
matrix1 = matrix(values, nrow = 4, ncol = 2)
matrix1
# Expected Output:
#      [,1] [,2]
# [1,]    1    5
# [2,]    2    6
# [3,]    3    7
# [4,]    4    8


Unnamed: 0,ColW,ColX,ColY,ColZ
RowI,1,2,3,4
RowII,5,6,7,8


-------------------------------------------------- 


0,1
1,5
2,6
3,7
4,8


### Accessing Matrix Elements

Elements in a matrix can be accessed using `[row_index, col_index]`.

In [43]:

# Access the first row
a[1, ] 
# Expected Output: ColW ColX ColY ColZ \n 1    2    3    4

# Access the first column
a[, 1] 
# Expected Output: RowI RowII \n 1     5

# Access a specific element (e.g., element at RowII, ColY)
a["RowII", "ColY"]
# Expected Output: [1] 7

# Access a sub-matrix
a[1, c(2, 4)] # First row, columns 2 and 4
# Expected Output: ColX ColZ \n 2    4


### Matrix Operations

R provides functions for common matrix operations like determinant, transpose, addition, subtraction, and multiplication.

In [44]:

# Create a square matrix for operations
x = matrix(2:17, nrow = 4, ncol = 4, byrow = FALSE)
print("Matrix X:")
print(x)
# Expected Output:
#      [,1] [,2] [,3] [,4]
# [1,]    2    6   10   14
# [2,]    3    7   11   15
# [3,]    4    8   12   16
# [4,]    5    9   13   17

linesep()

# Determinant of a square matrix
cat('Determinant of X = ', det(x), '\n')
# Expected Output: Determinant of X =  0 (for this specific matrix)

linesep()

# Transpose of a matrix
y <- t(x) # Transpose: rows become columns and vice-versa
print("Transpose of X (Y):")
print(y)
# Expected Output:
#      [,1] [,2] [,3] [,4]
# [1,]    2    3    4    5
# [2,]    6    7    8    9
# [3,]   10   11   12   13
# [4,]   14   15   16   17

linesep()

# Matrix Subtraction (element-wise)
print("X - Y (element-wise subtraction):")
print(x - y)
# Expected Output (example for first element: 2-2=0, 6-3=3, etc.):
#      [,1] [,2] [,3] [,4]
# [1,]    0    3    6    9
# [2,]   -3    0    3    6
# [3,]   -6   -3    0    3
# [4,]   -9   -6   -3    0

linesep()

# Matrix Addition (element-wise)
print("X + Y (element-wise addition):")
print(x + y)
# Expected Output (example for first element: 2+2=4, 6+3=9, etc.):
#      [,1] [,2] [,3] [,4]
# [1,]    4    9   14   19
# [2,]    9   14   19   24
# [3,]   14   19   24   29
# [4,]   19   24   29   34

linesep()

# Matrix Multiplication (true matrix product, not element-wise)
# For A %*% B, the number of columns in A must equal the number of rows in B.
# Let's use matrix 'a' (2x4) and matrix 'x' (4x4)
print("Matrix 'a':")
print(a)
print("Matrix 'x':")
print(x)
z = a %*% x # Matrix multiplication
print("a %*% x (matrix multiplication):")
print(z)
# Expected Output (result will be a 2x4 matrix):
#      [,1] [,2] [,3] [,4]
# [1,]   40   92  144  196
# [2,]  100  236  372  508

linesep()

# Eigenvalues and Eigenvectors
# eigen() computes the eigenvalues and eigenvectors of a symmetric matrix.
# For non-symmetric matrices, it computes eigenvalues and left/right eigenvectors.
print("Eigen decomposition of X:")
eigen_result <- eigen(x)
print("Eigenvalues:")
print(eigen_result$values)
print("Eigenvectors:")
print(eigen_result$vectors)
# Expected Output: Complex numbers for eigenvalues and eigenvectors for this non-symmetric matrix.


[1] "Matrix X:"
     [,1] [,2] [,3] [,4]
[1,]    2    6   10   14
[2,]    3    7   11   15
[3,]    4    8   12   16
[4,]    5    9   13   17
-------------------------------------------------- 
Determinant of X =  0 
-------------------------------------------------- 
[1] "Transpose of X (Y):"
     [,1] [,2] [,3] [,4]
[1,]    2    3    4    5
[2,]    6    7    8    9
[3,]   10   11   12   13
[4,]   14   15   16   17
-------------------------------------------------- 
[1] "X - Y (element-wise subtraction):"
     [,1] [,2] [,3] [,4]
[1,]    0    3    6    9
[2,]   -3    0    3    6
[3,]   -6   -3    0    3
[4,]   -9   -6   -3    0
-------------------------------------------------- 
[1] "X + Y (element-wise addition):"
     [,1] [,2] [,3] [,4]
[1,]    4    9   14   19
[2,]    9   14   19   24
[3,]   14   19   24   29
[4,]   19   24   29   34
-------------------------------------------------- 
[1] "Matrix 'a':"
      ColW ColX ColY ColZ
RowI     1    2    3    4
RowII    5    6    7    8
[1