---
**Author**: *Mugabi Trevor*

**Published**: *2024-09-07*

**Keywords**: *OOP, Advanced Functions, Scoping Rules, Error Handling*

**Description**: *Advanced R*

---

# Advanced Functions
**In R**, advanced functions refer to features and techniques that allow you to write more efficient, flexible, and powerful code.\
These functions go beyond basic operations and include concepts such as functional programming, custom function writing, and handling more complex data structures


# Object-Oriented Programming (S3 and S4 Classes)

Object-Oriented Programming (OOP) in **R** primarily revolves around **S3 and S4 classes**, each representing different paradigms of object-oriented design. **R** also has **R5** classes, but the most commonly used OOP systems in **R are S3 and S4**. Let's explore these two approaches with examples.

**1. S3 Classes**:
- **S3** is the simplest and most commonly used OOP system in **R**. It is informal and flexible, making it easy to use but less strict compared to other OOP systems like **S4**. **S3** doesn't require formal definitions of classes or methods.

- **Key Concepts**:
    - **No formal class definition**: You don’t need to explicitly define the class structure.
    - **Generic functions**: Functions like print(), summary(), etc., can be extended to handle different object types based on their class.
    - **Method dispatch**: The method is chosen based on the class of the object.

***Example of S3***

In [1]:
# Creating an S3 object
person <- list(name = "John Doe", age = 30)

# Assigning a class to the object
class(person) <- "Person"

# Defining a print method for the "Person" class
print.Person <- function(x) {
  cat("Person's Name:", x$name, "\n")
  cat("Person's Age:", x$age, "\n")
}

# Now, when you print the object, it will use the custom print method
print(person)


Person's Name: John Doe 
Person's Age: 30 


**2. S4 Classes**

- **S4** is a more ***formal and robust OOP system*** in R. Unlike S3, S4 requires explicit definitions of classes and methods, making it stricter and more suitable for large, complex programs.

- **Key Concepts**:
    - **Formal class definitions**: You must define the structure of the class and its attributes explicitly.
    - **Formal method definitions**: Methods must be explicitly defined for each class.
    - **Slot-based system**: S4 objects store data in slots, which are like fields in other OOP languages.
    - **Type checking**: S4 provides better type-checking and error reporting compared to S3.

Example of S4 Class Definition

In [2]:
# Define an S4 class "Person"
setClass(
  "Person",
  slots = list(
    name = "character",
    age = "numeric"
  )
)

# Create an instance of the S4 class
john <- new("Person", name = "John Doe", age = 30)

# Define a method to display the Person object
setMethod(
  "show", "Person",
  function(object) {
    cat("Name:", object@name, "\n")
    cat("Age:", object@age, "\n")
  }
)

# Show the details of the object
john


ERROR while rich displaying an object: Error in x$name: $ operator not defined for this S4 class

Traceback:
1. tryCatch(withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
2. tryCatchList(expr, classes, parentenv, handlers)
3. tryCatchOne(expr, names, parentenv, handlers[[1L]])
4. doTryCatch(return(expr), name, parentenv, handler)
5. withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
6. repr::mime2repr[[mime]](obj)
7. repr_text.default(obj)
8. paste(capture.output(p

In [3]:
# Define a new S4 class "Employee" that inherits from "Person"
setClass(
  "Employee",
  slots = list(
    job_title = "character"
  ),
  contains = "Person"  # Inherit from the "Person" class
)

# Create an instance of the Employee class
jane <- new("Employee", name = "Jane Doe", age = 28, job_title = "Data Analyst")

# Define a method to display the Employee object
setMethod(
  "show", "Employee",
  function(object) {
    cat("Employee Name:", object@name, "\n")
    cat("Employee Age:", object@age, "\n")
    cat("Job Title:", object@job_title, "\n")
  }
)

# Show the details of the employee object
jane


ERROR while rich displaying an object: Error in x$name: $ operator not defined for this S4 class

Traceback:
1. tryCatch(withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
2. tryCatchList(expr, classes, parentenv, handlers)
3. tryCatchOne(expr, names, parentenv, handlers[[1L]])
4. doTryCatch(return(expr), name, parentenv, handler)
5. withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
6. repr::mime2repr[[mime]](obj)
7. repr_text.default(obj)
8. paste(capture.output(p

**Method Dispatch in S4**:

In S4, you can create generic functions and define methods for specific classes. For example:

In [4]:
# Defining a generic function
setGeneric("getAge", function(object) standardGeneric("getAge"))

# Defining a method for the "Person" class
setMethod("getAge", "Person", function(object) {
  return(object@age)
})

# Calling the method for an S4 object
getAge(john)  # Outputs 30


# Inheritance and Methods

In [5]:
# Defining another S3 class called "Employee" that inherits from "Person"
employee <- list(name = "Jane Doe", age = 28, job_title = "Data Analyst")
class(employee) <- c("Employee", "Person")  # Inherits both classes

# Defining a print method for the Employee class
print.Employee <- function(x) {
  cat("Employee Name:", x$name, "\n")
  cat("Employee Age:", x$age, "\n")
  cat("Job Title:", x$job_title, "\n")
}

# Now, printing the employee object will call the "print.Employee" method
print(employee)

Employee Name: Jane Doe 
Employee Age: 28 
Job Title: Data Analyst 


In the example above, the employee object inherits from both the Employee and Person classes. The correct print method is selected based on the first class in the class vector.

**When to Use S3 vs S4**:
- S3 is typically preferred for smaller, simpler programs where flexibility and speed are important.
- S4 is used in larger, more complex systems where formal class definitions, type checking, and robustness are critical, such as in bioinformatics or statistical software packages.
- In conclusion, both S3 and S4 have their strengths and serve different purposes in R. S3 offers simplicity and speed, while S4 provides formal structure and greater safety, especially in more complex applications

# Scoping Rules
In **R**, scoping rules determine how the language finds and uses the value of variables. Understanding scoping is crucial because it controls the visibility and accessibility of variables within different parts of the code. **R** primarily uses **lexical scoping** (sometimes called ***static scoping***) with elements of dynamic scoping.

But before we get into scoping first understand the difference between **Global and Local Variables**.\
**Global variables** are defined outside any function and can be accessed from anywhere in the script.\
**Local variables** are defined inside a function and can only be accessed from within that function.

**1. Lexical Scoping**
- Lexical scoping means that the value of a variable is determined by the environment in which the function was created, not necessarily where the function is called. The environment refers to the collection of objects (variables, functions, etc.) that are defined and accessible within a specific context.

***Example of Lexical Scoping***:


In [6]:
x <- 10  # Global variable

my_function <- function() {
  x <- 5  # Local variable inside the function
  return(x)
}

my_function()  # Output: 5
x  # Output: 10 (the global variable remains unchanged)


In the above example, when ***my_function()*** is called, it uses the value of ***x*** defined inside the ***function (5)***, not the global ***x (10)***. This demonstrates local scoping.

In [7]:
y <- 20  # Global variable

outer_function <- function() {
  y <- 15  # Local to outer_function
  
  inner_function <- function() {
    return(y)  # Looks for y in outer_function
  }
  
  return(inner_function())
}

outer_function()  # Output: 15
y  # Output: 20


Here in the above example, ***inner_function()*** returns the value of ***y*** from ***outer_function()***, not the ***global y***, because of ***lexical scoping***.

**2. Dynamic Scoping**

In dynamic scoping, a function uses the value of a variable based on the environment from which it is called, not where it was defined. R mainly uses lexical scoping, but elements of dynamic scoping are used when handling environments like the global environment or certain error-handling situations.

You can use the ***parent.frame()*** function to access dynamic scoping explicitly.

**3. Search Path and Environments**

**R** has a hierarchy of environments that it uses to search for variables. When you reference a variable, R looks for it in the following order:
- The local environment (within the current function).
- The enclosing environment (the environment where the function was defined).
- The global environment (the user's workspace).
- The base package and other attached packages.
- The empty environment (indicating no more environments to search).

***Example of the Search Path***

In [8]:
z <- 100  # Global variable

search_variable <- function() {
  return(z)
}

search_variable()  # Output: 100 (R looks in the global environment for z)


# String Manipulation

**String manipulation in R** refers to working with text data (strings) using various functions and techniques. Since R is often used for data analysis, it's essential to clean, format, and extract relevant information from strings. R provides several built-in functions and libraries for string manipulation, primarily through the base package and the stringr package from the tidyverse.

Here are common string manipulation tasks in **R**:



1. Combining Strings
- ***paste() and paste0()*** are used to concatenate strings.\
The difference between the two is that ***paste()*** adds a space by default, while ***paste0()*** does not.

In [9]:
str1 <- "Hello"
str2 <- "Programmer"
paste(str1, str2)    # Output: "Hello Programmer"
paste0(str1, str2)   # Output: "HelloProgrammer"


You can also define a custom separator using the ***sep*** argument in ***paste()***.

In [10]:
paste(str1, str2, sep = ", ")  # Output: "Hello, Programmer"

**2. String Length**
- ***nchar()*** returns the number of characters in a string.

In [11]:
nchar("Data Science")  # Output: 12

**3. Substrings**
- ***substr()*** extracts parts of a string.
- ***substring()*** is similar but allows more flexible indexing.

In [12]:
my_string <- "Data Science"
substr(my_string, 1, 4)   # Output: "Data"
substring(my_string, 6)   # Output: "Science" (starts from the 6th character)


**4. Changing Case**
- ***toupper()*** converts a string to uppercase.
- ***tolower()*** converts a string to lowercase.

In [13]:
toupper("Data")   # Output: "DATA"
tolower("Science")  # Output: "science"

**5. Trimming Whitespace**
- ***trimws()*** removes leading and trailing spaces from a string.


In [14]:
trimws("   Data Science   ")  # Output: "Data Science"

**6. Pattern Matching and Substitution**
- **Searching for Patterns**
    - ***grep()*** finds the positions of strings that match a pattern.
    - ***grepl()*** returns a logical vector indicating whether the pattern is found.

- **Replacing Patterns**
    - **gsub()** replaces all occurrences of a pattern.
    - **sub()** replaces the first occurrence of a pattern.



In [15]:
my_strings <- c("apple", "banana", "pear", "pineapple")
grep("apple", my_strings)  # Output: 1 4 (positions where "apple" is found)
grepl("apple", my_strings)  # Output: TRUE FALSE FALSE TRUE


In [16]:
text <- "I like apples and apples are tasty"
gsub("apples", "oranges", text)  # Output: "I like oranges and oranges are tasty".

sub("apples", "oranges", text)   # Output: "I like oranges and apples are tasty"


**7. Advanced String Manipulation with stringr**
- The stringr package provides a consistent and easy-to-use set of functions for string manipulation. It is part of the tidyverse and works well with data frames and other tidy data structures.
- To use ***stringr***, you need to install and load it. More on that when we are learning libraries.

# Error Handling (tryCatch)
**Error handling in R** is crucial when writing robust code that gracefully handles unexpected situations, such as missing data or incorrect input. The ***tryCatch() function*** in **R** provides a structured way to handle errors, warnings, and even "finally" blocks, allowing you to write safer and more reliable programs.\
The ***tryCatch()*** function works similarly to try-except blocks in other languages. It evaluates an expression, and depending on the result (whether it is an error, warning, or normal execution), it triggers a corresponding handler.

The structure of ***tryCatch()*** looks like this:
```{r}
tryCatch(
  {
    # Code that may cause an error
  },
  error = function(e) {
    # Handle error
  },
  warning = function(w) {
    # Handle warning
  },
  finally = {
    # Code that will always run, regardless of success or failure
  }
)

```

**Key Components of tryCatch()**
- **Expression Block**: The first block of code contains the main logic you want to run. If an error or warning occurs here, it is passed to the respective handler.
- **Error Handler**: The ***error = function(e)*** block catches and handles errors. The error object e contains information about the error.
- **Warning Handler**: The ***warning = function(w)*** block is for catching and handling warnings.
- **Finally Block**: The finally block contains code that will execute regardless of whether an error or warning occurred. It is similar to the finally clause in other languages.


In [17]:
# Example 
result <- tryCatch(
  {
    # Code that may throw an error
    x <- 10 / 0   # Division by zero will cause an error
  },
  error = function(e) {
    # Handle the error
    print("An error occurred!")
    print(e)
    return(NA)   # Return a default value
  }
)
print(result)


[1] Inf


In [18]:
# You can also handle warnings similarly. 
# Here's an example that triggers a warning 
# for using log() on a negative number.
result <- tryCatch(
  {
    # Code that may produce a warning
    log(-1)
  },
  warning = function(w) {
    # Handle the warning
    print("A warning occurred!")
    print(w)
    return(NA)  # Return a default value
  }
)

print(result)


[1] NA


**NB**: Using ***try()*** for Simple Error Handling:
- While ***tryCatch()*** is powerful, for simple error handling where you just want to skip code execution on errors, you can use the simpler ***try() function***.

In [19]:
result <- try(log("a"))  # Error: "a" is not a numeric value
if (inherits(result, "try-error")) {
  print("An error occurred!")
}


Error in log("a") : non-numeric argument to mathematical function
[1] "An error occurred!"


# Iterators and Apply Functions


## Iterators
**In R**, **iterators** allow you to loop over a collection of elements, such as vectors, lists, or other data structures. While **R** is vectorized (which means many operations can be done without loops), iterators like **for loops** are commonly used for more complex tasks. **R** also provides higher-level functions like the ***apply()*** family, which are more efficient than explicit loops in most cases.

For Loop Example:

In [20]:
numbers <- c(1, 2, 3, 4, 5)

for (num in numbers) {
  print(num^2)
}


[1] 1
[1] 4
[1] 9
[1] 16
[1] 25


## Apply Family of Functions in R
The apply family of functions ***apply(), lapply(), sapply(), tapply(), mapply()***, and ***vapply()*** provides an alternative to loops for applying a function to elements of a data structure. Each function is specialized for specific types of data structures or output requirements.

1. **Apply()** : Used for arrays and matrices
    - ***apply(X, MARGIN, FUN)***
    - X: An array or matrix
    - MARGIN: 1 for rows, 2 for columns
    - FUN: The function to apply


In [21]:
# Example: Apply sum to rows and columns of a matrix
mat <- matrix(1:9, nrow = 3)
print(mat)

# Sum over rows (MARGIN = 1)
apply(mat, 1, sum)

# Sum over columns (MARGIN = 2)
apply(mat, 2, sum)


     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9


2. **lapply()**: Used for lists and returns a list
    - ***lapply(X, FUN)***
    - X: A list or vector
    - FUN: The function to apply to each element

In [22]:
# Example: Use lapply to get the square of each number in a list
num_list <- list(1, 2, 3, 4)
lapply(num_list, function(x) x^2)


3. **sapply()**: Simplified version of ***lapply()*** that attempts to return a vector or matrix
    - ***sapply(X, FUN)***
    - X: A list or vector
    - FUN: The function to apply

In [23]:
# Example: Same as lapply, but returns a vector
sapply(num_list, function(x) x^2)


4. **tapply()**: Used to apply a function to subsets of a vector, typically based on some grouping factor.
    - ***tapply(X, INDEX, FUN)***
    - X: A vector
    - INDEX: A factor or list of factors
    - FUN: The function to apply

In [24]:
# Example: Grouped mean of a vector based on a factor
ages <- c(22, 25, 24, 30, 28)
gender <- factor(c("M", "F", "F", "M", "M"))

tapply(ages, gender, mean)  # Get mean age by gender


5. **mapply()**: Multivariate version of ***sapply()**. It can apply a function to multiple lists or vectors.
    - ***mapply(FUN, ..., MoreArgs = NULL)***
    - FUN: The function to apply
    - ...: Vectors or lists to pass as arguments to FUN

In [25]:
# Example: Use mapply to sum corresponding elements from two vectors
mapply(sum, 1:4, 5:8)


6. **vapply()**: Similar to ***sapply()***, but you explicitly specify the return type.
    - ***vapply(X, FUN, FUN.VALUE)***
    - X: A list or vector
    - FUN: The function to apply
    - FUN.VALUE: The type of return value (e.g., numeric(1), character(1))

In [26]:
# Example: Ensure a numeric output for each element
vapply(num_list, function(x) x^2, numeric(1))


**Comparing lapply(), sapply(), and vapply()**
- **lapply()** always returns a list.
- **sapply()** attempts to simplify the output (e.g., returns a vector or matrix if possible).
- **vapply()** is stricter and requires you to define the output type, making it safer in some

# User Input
**In R**, you can take user input using the ***readline() function***. This function allows you to prompt the user for input and stores it as a string. If you need the input to be in another type (e.g., numeric), you can convert it accordingly using functions like ***as.numeric(), as.integer()***, or other type conversion functions.

**Basic Usage of readline()**
- Here's how to take basic input from a user in **R**:

In [27]:
name <- readline(prompt = "Enter your name: ")
cat("Hello,", name, "!\n")


Hello, James !


**NB**: ***Readline()*** reads one line if you need to read multiple lines of text use **Readlines(func)** thats if its from a **file**

# Exercise
--- 
***Good Luck***
---