# **IDS Lab Week 1**

## ***Introduction to R:***


R is a powerful programming language and software environment used extensively for **statistical computing, data analysis, and graphical representation**.

***Features of R:***
- **Statistical Analysis:** R provides a wide range of statistical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering.
- **Graphical Capabilities:** R is renowned for its advanced graphical apabilities, enabling users to produce high-quality plots and visualizations.
- **Open Source:** R is open-source software, meaning it is free to use, and its source code is available for anyone to inspect, modify, and enhance.
- **Extensible:** Users can extend R's capabilities through packages. The Comprehensive R Archive Network (CRAN) hosts thousands of packages developed by the R community.
- **Data Handling:** R provides robust tools for data manipulation, making it easier to clean, preprocess, and transform data.

### ***R Comments:***

Comments starts with a #

### ***R Variables:***

Variables are used to store data values. In R, variables can hold different types of data, such as numbers, strings, or logical values.


### ***Assignment Operator:***

The assignment operator in R is <-. You can also use =, but <- is more common in practice

In [1]:
x <- 10 # Assigning the value 10 to variable x
y = 5 # Assigning the value 5 to variable y
name <- "John" # Assigning a string value
is_active <- TRUE # Assigning a logical value

### ***Naming Conventions***
- Variable names can contain letters, numbers, periods (.), and underscores (_).
- Variable names cannot start with a number.
- R is case-sensitive, so Var and var are different variables.

### ***R Data Types***

Basic data types in R can be divided into the following types:
- numeric - (10.5, 55, 787)
- integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
- complex - (9 + 3i, where "i" is the imaginary part)
- character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
- logical (a.k.a. boolean) - (TRUE or FALSE)

We can use the class() function to check the data type of a variable

In [2]:
class(88)

In [3]:
class(88.8)

In [5]:
class(44L)

In [6]:
class(7+5i)

In [8]:
class('p')

In [9]:
class('fun')

In [10]:
class(TRUE)

### ***Type Conversion***

In R, there are many `as.` functions that allow you to coerce or convert one object type to another.

Here are some of the most commonly used `as.` functions in R, which are frequently employed in data manipulation and analysis:

1. **`as.numeric()`**
   - Converts an object (e.g., character, factor) to a numeric type.

In [12]:
as.numeric("123")

2. **`as.character()`**
   - Converts an object (e.g., numeric, factor) to a character type.

In [13]:
as.character(123)

3. **`as.factor()`**
   - Converts a vector (usually character or numeric) to a factor, often used for categorical data.

In [14]:
as.factor(c("low", "medium", "high"))

4. **`as.integer()`**
   - Converts an object (e.g., numeric) to an integer type.

In [15]:
as.integer(12.34)

5. **`as.logical()`**
   - Converts an object (e.g., numeric, character) to a logical type (`TRUE` or `FALSE`).

In [16]:
as.logical(1)

6. **`as.Date()`**
   - Converts a character string or numeric value to a Date object.

In [17]:
as.Date("2024-08-18")

7. **`as.data.frame()`**
   - Converts an object (e.g., matrix, list) to a data frame, a common structure for storing tabular data.

In [18]:
as.data.frame(matrix(1:4, nrow = 2))

V1,V2
<int>,<int>
1,3
2,4


8. **`as.matrix()`**
   - Converts an object (e.g., data frame, vector) to a matrix.

In [19]:
as.matrix(data.frame(a = 1:3, b = 4:6))

a,b
1,4
2,5
3,6


9. **`as.list()`**
   - Converts an object (e.g., vector, matrix) to a list, a flexible data structure in R.

In [20]:
as.list(c(1, 2, 3))

10. **`as.vector()`**
    - Converts an object (e.g., matrix, array) to a vector.

In [21]:
as.vector(matrix(1:4, nrow = 2))

These functions are fundamental in R programming and are used frequently when working with different data types and structures, enabling seamless conversion between formats.

## ***Data structures***

### ***1. Vector***

#### **Definition**

Vectors are the simplest and most fundamental data structure in R. They are one-dimensional arrays that store elements of the same type, such as numeric, character, or logical. In R, everything is essentially a vector, even a single number (which is considered a numeric vector of length one).

#### **Creating Vectors**
Vectors can be created in several ways:

1. **Using the `c()` function**:

  The most common way to create a vector is by combining elements using the `c()` function.

In [48]:
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)
num_vector

In [49]:
# Character vector
char_vector <- c("apple", "banana", "cherry")
char_vector

In [50]:
# Logical vector
log_vector <- c(TRUE, FALSE, TRUE)
log_vector

2. **Using the `seq()` function**:

  The `seq()` function generates a sequence of numbers.

In [51]:
# Sequence from 1 to 10 with a step of 2
seq_vector <- seq(1, 10, by = 2)
seq_vector

3. **Using the `rep()` function**:

   The `rep()` function repeats the elements of a vector.

In [52]:
# Repeating the number 1, 5 times
rep_vector <- rep(1, times = 5)
rep_vector

4. **Using the `:` operator**:

   The colon operator creates a sequence of numbers.

In [53]:
# Sequence from 1 to 5
colon_vector <- 1:5
colon_vector

#### **Accessing Elements in a Vector**
You can access elements in a vector using square brackets `[]`.

In [62]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Single Element**:

   Access a single element by its index.

In [76]:
# Access the third element of num_vector
num_vector[3]

2. **Multiple Elements**:

   Access multiple elements by passing a vector of indices.

In [77]:
# Access the first and third elements of num_vector
num_vector[c(1, 3)]

3. **Excluding Elements**:

   You can exclude elements by using a negative index.

In [78]:
# Exclude the second element of num_vector
num_vector[-2]

4. **Using Logical Indexing**:

   You can access elements that meet a specific condition.

In [81]:
# Access elements greater than 3 in num_vector
num_vector[num_vector > 3]

#### **Updating Elements in a Vector**
Updating elements in a vector is straightforward using indexing.

In [80]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Update a Single Element**:

In [60]:
# Update the second element of num_vector to 10
num_vector[2] <- 10
num_vector

2. **Update Multiple Elements**:

In [66]:
# Update the first and third elements of num_vector to 20 and 30
num_vector[c(1, 3)] <- c(20, 30)
num_vector

#### **Deleting Elements in a Vector**
In R, you cannot technically delete elements from a vector, but you can create a new vector that excludes certain elements.

In [67]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Delete by Excluding Elements**:

In [68]:
# Remove the second element
num_vector <- num_vector[-2]
num_vector

2. **Setting Elements to `NA`**:

   While you can't remove elements, you can set them to `NA` (though this changes the structure).

In [70]:
# Set the second element to NULL (not recommended for regular use)
num_vector[2] <- NA
num_vector

#### **Length of a Vector**
The length of a vector refers to the number of elements it contains.

In [71]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Getting the Length**:

   Use the `length()` function to determine how many elements are in a vector.

In [75]:
# Length of num_vector
length(num_vector)

2. **Changing the Length**:

   You can change the length of a vector by setting its length explicitly (which truncates or extends it).

In [73]:
# Truncate the vector to 3 elements
length(num_vector) <- 3
num_vector

In [74]:
# Extend the vector to 6 elements (new elements will be NA)
length(num_vector) <- 6
num_vector

#### **Operations on Vectors**
Vectors in R support a wide range of operations, including arithmetic, logical, and relational operations.

In [84]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Arithmetic Operations**:

   Arithmetic operations on vectors are element-wise.

In [87]:
# Addition
num_vector + 2

In [89]:
# Subtraction
num_vector - 1

In [90]:
# Multiplication
num_vector * 2

In [91]:
# Division
num_vector / 2

2. **Logical Operations**:

   Logical operations are also element-wise.

In [92]:
# Check if elements are greater than 3
num_vector > 3

3. **Relational Operations**:

   You can compare vectors element by element.

In [93]:
# Element-wise comparison
num_vector == c(1, 10, 3, 4, 5)

4. **Vectorized Functions**:

   Many functions in R are vectorized, meaning they operate on each element of a vector.

In [94]:
# Square root of each element
sqrt(num_vector)

In [95]:
# Sum of all elements
sum(num_vector)

In [96]:
# Mean of all elements
mean(num_vector)

#### **Combining Vectors**
You can combine vectors using the `c()` function.

In [None]:
num_vector <- c(1, 2, 3, 4, 5)
num_vector

1. **Concatenation**:

In [98]:
# Combine two vectors
c(num_vector, c(6, 7, 8))

2. **Appending**:

   You can append elements to a vector using the `c()` function.

In [99]:
# Append elements to num_vector
c(num_vector, 6, 7)

#### **Vector Recycling**
When performing operations on vectors of different lengths, R recycles the shorter vector to match the length of the longer one.

1. **Example of Recycling**:
   ```r
   # Vector recycling
   recycled_result <- num_vector + c(1, 2)
   # The second vector is recycled to (1, 2, 1, 2, 1)
   ```

### **NA and NULL in Vectors**
1. **NA (Not Available)**:
   Represents missing values. Operations involving `NA` will return `NA` unless handled explicitly.
   ```r
   # Example with NA
   na_vector <- c(1, 2, NA, 4, 5)
   sum_na <- sum(na_vector, na.rm = TRUE)  # Removes NA before summing
   ```

2. **NULL**:
   Represents the absence of a value or an empty object. `NULL` is used to remove elements in lists but is less commonly used in vectors.
   ```r
   # NULL example
   null_vector <- c(1, 2, NULL, 4, 5)
   ```

### **Common Vector Functions**
1. **`length()`**: Returns the number of elements in a vector.
2. **`sum()`**: Calculates the sum of all elements in a numeric vector.
3. **`mean()`**: Calculates the mean of the elements.
4. **`sort()`**: Sorts the elements in ascending or descending order.
   ```r
   sorted_vector <- sort(num_vector, decreasing = TRUE)
   ```
5. **`unique()`**: Returns a vector with duplicate elements removed.
   ```r
   unique_vector <- unique(c(1, 2, 2, 3, 4, 4, 5))
   ```
6. **`any()`**: Tests if any element of a logical vector is `TRUE`.
7. **`all()`**: Tests if all elements of a logical vector are `TRUE`.
   ```r
   all_positive <- all(num_vector > 0)
   ```

### **Coercion**
When combining different types in a vector, R automatically coerces elements to the most flexible type.

1. **Automatic Coercion**:
   ```r
   mixed_vector <- c(1, "two", 3)  # Coerces to character
   ```
2. **Explicit Coercion**:
   You can explicitly change the type of elements in a vector.
   ```r
   # Convert numeric to character
   as_char <- as.character(num_vector)
   
   # Convert character to numeric
   as_num <- as.numeric(char_vector)
   ```

### **Summary**
Vectors are the cornerstone of R programming. Understanding how to create, manipulate, and operate on vectors is crucial for data analysis and manipulation in R. By mastering vectors, you gain control over the most fundamental data structure, enabling you to work effectively with more complex structures like matrices, data frames, and lists.