# DATA STRUCTURE  in R

## Introduction 
___

In any programming language, if you are doing programming, you need to use different variables to store different data. Moreover, variables are reserved in a memory location to store values. Also, this implies that, once you create a variable you reserve some area in memory. Further, data structures are the only way of arranging data so it can be used efficiently on a computer.

If we see in contrast to different programming languages like C and Java, R doesn’t have variables declared as some data type. Further, the variables are appointed with R-objects and the knowledge form of the R-object becomes the datatype of the variable. There are many types of R-objects. The popularly used ones are:

- Vector
- Matrix
- Array
- Lists
- Data Frames

## R Vector
___

In R, a vector is a fundamental data structure that represents an ordered collection of elements of the same data type. Vectors can contain numeric, character, logical, or complex values. They are essential for storing and manipulating data efficiently in R.

Here are some key characteristics of vectors in R:

1. **Homogeneous Data:** All elements within a vector must be of the same data type. For example, a numeric vector can only contain numeric values, a character vector can only contain character values, and so on.

2. **One-Dimensional:** Vectors in R are one-dimensional arrays, meaning they contain a single row or a single column of data.

3. **Fixed Length:** Once created, the length of a vector in R is fixed and cannot be changed. You can, however, create new vectors by concatenating or subsetting existing vectors.

4. **Indexed Access:** Individual elements within a vector can be accessed using numeric indices. Indices in R are 1-based, meaning the first element of a vector has an index of 1, the second element has an index of 2, and so on.

5. **Operations:** Vectors support various operations, including arithmetic operations (e.g., addition, subtraction, multiplication), logical operations, subsetting, filtering, and more.

There are two main types of vectors in R: atomic vectors and lists. Here's a comparison between them:

1. **Atomic Vectors:**
   - Atomic vectors are vectors that contain elements of a single data type, such as numeric, character, logical, or complex.
   - All elements within an atomic vector must be of the same data type.
   - Atomic vectors are more memory-efficient than lists because they store elements contiguously in memory.
   - Examples of atomic vectors include numeric vectors, character vectors, logical vectors, and complex vectors.

2. **Lists:**
   - Lists are vectors that can contain elements of different data types, including other lists.
   - Unlike atomic vectors, lists can hold heterogeneous data, meaning they can contain elements of different data types within the same list.
   - Lists are more flexible than atomic vectors but may consume more memory and be less efficient for certain operations due to their heterogeneous nature.
   - Lists are commonly used for storing complex or structured data, such as data frames, nested data structures, or objects with different attributes.

The main difference between atomic vectors and recursive vector(list) is that atomic vectors are homogeneous, whereas the recursive vector(list) can be heterogeneous. Vectors have three common properties:

1. Type, `typeof()`, what it is.
2. Length, `length()`, how many elements it contains.
3. Attributes, `attributes()`, additional arbitrary metadata.


__Atomic vectors__:  
There are four common sub-types of R Atomic Vectors:

1. Numeric Data Type
2. Integer Data Type
3. Character Data Type
4. Logical Data Type


  
In summary, vectors are essential data structures in R for storing and manipulating data efficiently. Atomic vectors contain elements of a single data type, while lists can hold heterogeneous data and provide more flexibility in data organization. The choice between using atomic vectors and lists depends on the specific requirements and characteristics of the data being handled.

## R Matrix 
___

In R, a matrix is a two-dimensional data structure that contains elements arranged in rows and columns. Each element in a matrix must be of the same data type, such as numeric, character, or logical. Matrices are useful for storing and manipulating data in a structured way, particularly when dealing with numerical data or when performing matrix algebra operations.


__Here are some key characteristics and features of R matrices:__

- Rectangular Structure: Matrices in R have a rectangular structure, meaning they consist of rows and columns, with each row having the same number of elements and each column having the same number of elements.
  
- Homogeneous Data: All elements within a matrix must be of the same data type, such as numeric, character, or logical. Mixing different data types within a single matrix is not allowed.

- Indexed Access: Individual elements within a matrix can be accessed using row and column indices, allowing for selective retrieval and modification of values.

- Dimension Attributes: Matrices in R have dimension attributes that specify the number of rows and columns in the matrix. These dimensions can be retrieved or modified using functions such as dim().

- Matrix Operations: R provides extensive support for matrix operations, including arithmetic operations (addition, subtraction, multiplication), matrix multiplication, transposition, inversion, determinant calculation, eigenvalue computation, and various other linear algebraic operations.

- Integration with Statistical Analysis: Matrices play a fundamental role in statistical analysis and modeling in R, serving as the primary data structure for storing datasets and model inputs, especially in fields like machine learning and data analysis.


__Applications of Matrices__
- Matrices are used for carrying out geological surveys. We can represent information in the form of matrices that can be used for plotting graphs, performing statistical operations, etc.
- To represent the real-world data is like traits of people’s population. They are the best representation method for plotting common survey things.
- In robotics and automation, matrices are the best elements for the robot movements.
- Matrices are used in calculating the gross domestic products in economics. Therefore, it helps in calculating goods product efficiency.
- In computer-based applications, matrices play a vital role in the projection of a three-dimensional image into a two-dimensional screen creating realistic seeming motions.
- In physical related applications, matrices can be applied in the study of an electrical circuit.

## R Array
___

In R, an array is a multidimensional data structure that extends the concept of a vector to two or more dimensions. It allows you to store data in a grid-like format with multiple rows, columns, and additional dimensions. Arrays are useful for representing data that have multiple attributes or dimensions, such as matrices, three-dimensional data, or higher-dimensional data.

Here are some key features of arrays in R:

1. Multidimensional Structure: Arrays can have multiple dimensions, meaning they can have rows, columns, and additional dimensions beyond the two-dimensional structure of matrices.
   
1. Homogeneous Data: Similar to vectors, arrays must contain elements of the same data type. This means that all elements within an array must be of the same data type, such as numeric, character, logical, or complex.

1. Indexed Access: Elements within an array can be accessed using a combination of indices corresponding to the dimensions of the array. For example, a two-dimensional array can be accessed using row and column indices, while a three-dimensional array requires indices for row, column, and depth.

1. Fixed Dimensions: Once created, the dimensions of an array in R are fixed and cannot be changed. You can, however, create new arrays by reshaping or combining existing arrays.

1. Operations: Arrays support various operations, including arithmetic operations (e.g., addition, subtraction, multiplication), subsetting, slicing, reshaping, and more. R provides functions and operators for performing these operations efficiently on arrays.

1. Integration with Matrices: In R, matrices are a special case of two-dimensional arrays. This means that you can perform array operations on matrices, and matrices can be treated as arrays with two dimensions.

1. Efficient Storage: Arrays in R are stored in a contiguous block of memory, making them efficient for storing and accessing large volumes of data. However, large arrays can consume a significant amount of memory, so it's important to consider memory usage when working with arrays.

- An array is created using the `array()` function. We can use vectors as input. To create an array, we can use these values in the dim parameter. Here's an example of creating a simple array in R:

In [3]:
# Create a three-dimensional array with numeric values
array_data <- array(data = 1:24,  # Numeric data elements
                    dim = c(2, 3, 4))  # Dimensions: 2 rows, 3 columns, 4 layers

# Print the array
print(array_data)


, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24



This will create a 2x3x4 array with the numbers 1 to 24 filled row-wise, column-wise, and layer-wise.

In summary, arrays in R are powerful data structures for representing multidimensional data and performing various operations on them efficiently. They provide a flexible and efficient way to store and manipulate data with multiple dimensions, making them essential for many data analysis and modeling tasks.

## List in R
___

In R, a list is a versatile and flexible data structure that can store elements of different data types, such as numeric, character, logical, or even other lists. Unlike vectors and arrays, which are restricted to storing elements of the same data type, lists allow for heterogeneous data, making them suitable for representing complex and structured data. A list is created using the `list()` function.

Here are some key features of lists in R:

1. **Heterogeneous Data:** Lists can contain elements of different data types within the same list. This means that you can store numeric values, character strings, logical values, or even other lists as elements of a single list.

2. **Arbitrary Length:** Lists can have an arbitrary number of elements, and each element can have a different length. This makes lists highly flexible and suitable for storing data structures of varying complexity.

3. **Indexed Access:** Elements within a list can be accessed using numeric indices or names assigned to the elements. This allows for selective retrieval and modification of list elements.

4. **Recursive Structure:** Lists can contain other lists as elements, leading to a recursive data structure. This allows for the creation of nested or hierarchical data structures, such as lists of lists or lists of data frames.

5. **Named Elements:** List elements can have names assigned to them, making it easier to reference and access specific elements within a list.

6. **Versatility:** Lists are highly versatile and can be used to represent a wide range of data structures, including collections of variables, results of statistical analyses, configurations of models, hierarchical data, and more.

7. **Operations:** Lists support various operations, including adding, removing, and modifying elements, sub-setting, merging, and combining lists, as well as applying functions to list elements.


In [3]:
# Create a list with elements of different data types
my_list <- list(name = "John",
                age = 30,
                is_student = TRUE,
                favorite_numbers = c(1, 3, 5),
                address = list(city = "New York", zip_code = 10001))
print(my_list)

$name
[1] "John"

$age
[1] 30

$is_student
[1] TRUE

$favorite_numbers
[1] 1 3 5

$address
$address$city
[1] "New York"

$address$zip_code
[1] 10001




This will create a list with five elements: a character string, a numeric value, a logical value, a numeric vector, and another nested list. Each element is labeled with a name, making it easier to reference and access specific elements within the list.

In summary, lists in R are flexible and powerful data structures for storing and manipulating heterogeneous data. They provide a versatile way to represent complex and structured data, making them essential for many data analysis, modeling, and programming tasks in R.

## Data Frame in R

In R, a data frame is a two-dimensional data structure that resembles a table or a spreadsheet, consisting of rows and columns. It is a fundamental data structure used for storing and manipulating data in a tabular format. Each column of a data frame can contain values of different data types, such as numeric, character, logical, or factor.

Here are some key features of data frames in R:

1. **Tabular Structure:** Data frames have a tabular structure, with rows representing observations or cases, and columns representing variables or attributes. Each row corresponds to a single observation, and each column corresponds to a single variable.

2. **Heterogeneous Data:** Data frames can contain elements of different data types within the same data frame. This means that each column of a data frame can contain values of different data types, such as numeric, character, logical, or factor.

3. **Indexed Access:** Elements within a data frame can be accessed using row and column indices, or by referring to variable names. This allows for selective retrieval and modification of data frame elements.

4. **Column Names:** Data frame columns can have names assigned to them, making it easier to reference and access specific variables within a data frame.

5. **Row Names:** Data frame rows can also have names assigned to them, although this is optional. Row names can be used to uniquely identify rows within a data frame.

6. **Integration with Matrices:** Data frames are built on top of the matrix data structure in R, providing additional functionality and flexibility for handling tabular data.

7. **Operations:** Data frames support various operations, including adding, removing, and modifying columns and rows, subsetting, merging, joining, filtering, sorting, and summarizing data.

Data frames are widely used in R for data manipulation, exploration, analysis, and modeling tasks. They are particularly useful for working with structured datasets, such as those imported from spreadsheets, databases, or other data sources. Data frames are also the primary data structure used in conjunction with statistical modeling functions and packages in R.

Here's an example of creating a simple data frame in R:

```R
# Create a data frame with three variables: ID, Age, and Gender
my_data <- data.frame(ID = c(1, 2, 3, 4),
                      Age = c(25, 30, 35, 40),
                      Gender = c("Male", "Female", "Male", "Female"))

# Print the data frame
print(my_data)
```

This will create a data frame with four rows and three columns, representing ID, Age, and Gender variables for four individuals. Each column contains values of different data types, and the rows represent individual observations.

A data frame is an array. Unlike an array, the data we store in the columns of the data frame can be of various types. That is, one column might be a numeric variable, another might be a factor, and a third might be a character variable. All columns have to be of the same length.

Characteristics of a Data Frame:

- The column names should be non-empty.
- The row names should be unique.
- The data stored in a data frame can be of numeric, factor or character type.
- Each column should contain the same number of data items.
- Datasets imported in R are stored as data frames by default.

In [1]:
# Create a data frame with three variables: ID, Age, and Gender
my_data <- data.frame(ID = c(1, 2, 3, 4),
                      Age = c(25, 30, 35, 40),
                      Gender = c("Male", "Female", "Male", "Female"))

# Print the data frame
my_data

ID,Age,Gender
<dbl>,<dbl>,<chr>
1,25,Male
2,30,Female
3,35,Male
4,40,Female


## Difference between Array vs Data Frame 
___

In R, both arrays and data frames are data structures used for organizing and storing data. However, they have distinct characteristics and are suitable for different types of data and operations.

Here's a comparison between arrays and data frames in R:

1. **Structure:**
   - **Array:** An array is a multidimensional data structure that can store elements of the same data type. It can have two or more dimensions, organized in rows, columns, and additional dimensions.
   - **Data Frame:** A data frame is a two-dimensional data structure similar to a spreadsheet or a database table. It consists of rows and columns, where each column can contain elements of different data types. Unlike arrays, data frames are primarily used for tabular data.

2. **Homogeneity vs. Heterogeneity:**
   - **Array:** Arrays are homogeneous data structures, meaning that all elements within an array must be of the same data type.
   - **Data Frame:** Data frames are heterogeneous data structures, allowing columns to contain elements of different data types. This makes data frames suitable for representing tabular data with mixed data types, such as numeric, character, and factor variables.

3. **Indexing:**
   - **Array:** Elements within an array can be accessed using numeric indices corresponding to the dimensions of the array. For example, a two-dimensional array can be indexed using row and column indices.
   - **Data Frame:** Data frames support both numeric and character-based indexing. Columns can be accessed using column names or numeric indices, while rows can be accessed using numeric row indices.

4. **Integration with Statistical Analysis:**
   - **Array:** Arrays are commonly used for storing and manipulating multidimensional data in statistical analysis, such as experimental data, image data, or time series data.
   - **Data Frame:** Data frames are widely used in statistical analysis and data manipulation tasks, especially for tabular data analysis, data cleaning, and data visualization. They are commonly used for working with datasets imported from external sources or generated from statistical analyses.

5. **Operations:**
   - **Array:** Arrays support various array operations, including arithmetic operations, reshaping, subsetting, and array-specific functions for multidimensional data manipulation.
   - **Data Frame:** Data frames support operations commonly used in data analysis, such as subsetting, filtering, merging, joining, aggregation, and statistical computations. R provides a wide range of functions and packages tailored for data frame manipulation and analysis.

In summary, arrays are suitable for storing and manipulating multidimensional homogeneous data, while data frames are better suited for working with tabular data with mixed data types. Each data structure has its own strengths and is chosen based on the specific requirements of the data and the analysis tasks at hand.