# Data Structures in R
R provides several fundamental data structures that are essential for organizing and manipulating data. Understanding these structures is crucial for effective data analysis and programming in R.

## Main Types of Data Structures in R
### Vectors
The most basic data structure in R - a sequence of elements of the same type.

In [1]:
# Creating vectors
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)

# Accessing elements
numeric_vector[3]  # Returns 3
character_vector[1:2]  # Returns "apple" "banana"

# Vector operations
v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
v1 + v2  # Element-wise addition: 5 7 9

### Matrices
Two-dimensional, homogeneous data structures (all elements same type).

In [2]:
# Creating a matrix
mat <- matrix(1:9, nrow = 3, ncol = 3)
#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8
# [3,]    3    6    9

# Accessing elements
mat[2, 3]  # Returns 8 (row 2, column 3)
mat[1, ]   # Returns first row: 1 4 7

# Matrix operations
t(mat)  # Transpose
mat %*% mat  # Matrix multiplication

0,1,2
1,2,3
4,5,6
7,8,9


0,1,2
30,66,102
36,81,126
42,96,150


### Arrays
Similar to matrices but can have more than two dimensions.

In [7]:
# Creating a 3D array
arr <- array(1:24, dim = c(2, 3, 4))  # 2 rows, 3 columns, 4 layers
print(arr)
# Accessing elements
arr[1, 2, 3]  # Returns element at row 1, column 2, layer 3

, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24



### Lists
Ordered collections of objects that can contain elements of different types.

In [9]:
# Creating a list
my_list <- list(
  name = "John",
  age = 30,
  scores = c(85, 92, 88),
  is_student = TRUE
)

# Accessing elements
my_list$name  # Returns "John"
my_list[[3]]  # Returns the scores vector
my_list[["age"]]  # Returns 30

# Nested lists
nested_list <- list(
  person1 = list(name = "Alice", age = 25),
  person2 = list(name = "Bob", age = 30)
)
print(nested_list)

$person1
$person1$name
[1] "Alice"

$person1$age
[1] 25


$person2
$person2$name
[1] "Bob"

$person2$age
[1] 30




### Data Frames
Two-dimensional structures similar to matrices but columns can contain different types of data (like a spreadsheet or database table).

In [10]:
# Creating a data frame
df <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  score = c(88.5, 92.0, 85.5),
  passed = c(TRUE, TRUE, FALSE)
)

# Accessing elements
df$name  # Returns the name column
df[2, ]  # Returns the second row
df[, "age"]  # Returns the age column

# Common operations
head(df)  # View first few rows
summary(df)  # Summary statistics
nrow(df)  # Number of rows
ncol(df)  # Number of columns

Unnamed: 0_level_0,name,age,score,passed
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
2,Bob,30,92,True


Unnamed: 0_level_0,name,age,score,passed
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,25,88.5,True
2,Bob,30,92.0,True
3,Charlie,35,85.5,False


     name                age           score         passed       
 Length:3           Min.   :25.0   Min.   :85.50   Mode :logical  
 Class :character   1st Qu.:27.5   1st Qu.:87.00   FALSE:1        
 Mode  :character   Median :30.0   Median :88.50   TRUE :2        
                    Mean   :30.0   Mean   :88.67                  
                    3rd Qu.:32.5   3rd Qu.:90.25                  
                    Max.   :35.0   Max.   :92.00                  

### Factors
Used to represent categorical data with fixed possible values (levels).

In [12]:
# Creating a factor
gender <- factor(c("male", "female", "male", "female", "female"))
levels(gender)  # Shows "female" "male"

# Ordered factors
temp_levels <- factor(c("medium", "low", "high"), 
                      levels = c("low", "medium", "high"), 
                      ordered = TRUE)
print(temp_levels)

[1] medium low    high  
Levels: low < medium < high


## Summary

* Use vectors for simple sequences of the same type

* Use matrices/arrays for homogeneous multi-dimensional data

* Use lists for heterogeneous collections or complex objects

* Use data frames for tabular data (most common in data analysis)

* Use factors for categorical variables with fixed levels

Understanding these data structures and when to use each is fundamental to effective R programming and data analysis.

## Practice Exercises

1. Create a numeric vector of 10 random numbers between 1-100

In [14]:
vec<-sample(1:100, 10)
print(vec)

 [1] 87 58 88 54 75 34 31 78 43 14


2. Create a 4×4 matrix of even numbers (2-32)

In [23]:
mat <- matrix(seq(2, 32, 2), nrow=4, ncol=4)
print(mat)

     [,1] [,2] [,3] [,4]
[1,]    2   10   18   26
[2,]    4   12   20   28
[3,]    6   14   22   30
[4,]    8   16   24   32


3. Create a 3D array (2×3×2) of month abbreviations

In [24]:
arr <- array(month.abb, dim=c(2,3,2))
print(arr)

, , 1

     [,1]  [,2]  [,3] 
[1,] "Jan" "Mar" "May"
[2,] "Feb" "Apr" "Jun"

, , 2

     [,1]  [,2]  [,3] 
[1,] "Jul" "Sep" "Nov"
[2,] "Aug" "Oct" "Dec"



4. Create a list containing:

* Your name (character)

* A vector of 3 lucky numbers

* A logical for whether R is fun

Then extract the numbers using both $ and [[ ]] notation.

In [26]:
my_list<-list(Name="Mehwish",
              LuckyNum=c(3,4,5),
              is_fun=TRUE)
my_list$Name
my_list[["is_fun"]]              

5. Create a data frame named books with:

* title: 3 book titles (e.g., "R for Beginners", "Data Science 101", "Advanced R")

* author: Author names

* pages: Page counts (numeric)

Add a column long_book (TRUE if pages > 200).

In [39]:

# Step 1: Create the data frame
books <- data.frame(
  title = c("R for Beginners", "Data Science 101", "Advanced R"),
  author = c("Jane Doe", "John Smith", "Alice Brown"),
  pages = c(120, 250, 180)
)

# Step 2: Add a column
books$long_book <- books$pages > 200
print(books)


             title      author pages long_book
1  R for Beginners    Jane Doe   120     FALSE
2 Data Science 101  John Smith   250      TRUE
3       Advanced R Alice Brown   180     FALSE


6. Create a factor weather from:
c("Sunny", "Rainy", "Cloudy", "Sunny", "Rainy").

* Add a new level "Snowy" (even if not present in data).

* Convert it to an ordered factor:
"Rainy" < "Cloudy" < "Sunny" < "Snowy".

In [41]:
# Step 1: Create factor
weather <- factor(c("Sunny", "Rainy", "Cloudy", "Sunny", "Rainy"))

# Step 2: Add level
levels(weather) <- c(levels(weather), "Snowy")

# Step 3: Make ordered
weather <- factor(weather, 
                 levels = c("Rainy", "Cloudy", "Sunny", "Snowy"), 
                 ordered = TRUE)
print(weather)

[1] Sunny  Rainy  Cloudy Sunny  Rainy 
Levels: Rainy < Cloudy < Sunny < Snowy
