# 03 – Data Frames

Core R concepts: working with tabular data using data frames.

*Part of the [Foundations: Python, R & SQL](../README.md) series.*

## 1. Creating a Data Frame

Use `data.frame()` to create tabular data.

In [1]:
# Create a data frame
df <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 22),
  salary = c(50000, 60000, 45000)
)
df

name,age,salary
Alice,25,50000
Bob,30,60000
Charlie,22,45000


## 2. Inspecting a Data Frame

In [2]:
head(df)

name,age,salary
Alice,25,50000
Bob,30,60000
Charlie,22,45000


In [3]:
str(df)

'data.frame':	3 obs. of  3 variables:
 $ name  : Factor w/ 3 levels "Alice","Bob",..: 1 2 3
 $ age   : num  25 30 22
 $ salary: num  50000 60000 45000


In [4]:
summary(df)

      name        age            salary     
 Alice  :1   Min.   :22.00   Min.   :45000  
 Bob    :1   1st Qu.:23.50   1st Qu.:47500  
 Charlie:1   Median :25.00   Median :50000  
             Mean   :25.67   Mean   :51667  
             3rd Qu.:27.50   3rd Qu.:55000  
             Max.   :30.00   Max.   :60000  

In [5]:
nrow(df)

In [6]:
ncol(df)

In [7]:
colnames(df)

## 3. Accessing Data

In [8]:
# Column by name
df$name              

In [9]:
# Column by name (alternative)
df[["age"]]          

In [10]:
# First row
df[1, ]              

name,age,salary
Alice,25,50000


In [11]:
# Column by name
df[, "salary"]       

In [12]:
# Subset by row and column
df[1:2, c("name", "salary")]  

name,salary
Alice,50000
Bob,60000


## 4. Filtering Rows

In [13]:
# Filter with logical conditions
df[df$age > 24, ]
df[df$salary >= 50000 & df$age < 30, ]

name,age,salary
Alice,25,50000
Bob,30,60000


name,age,salary
Alice,25,50000


## 5. Adding and Removing Columns

In [14]:
# Add new column
df$senior = df$age > 28
df

name,age,salary,senior
Alice,25,50000,False
Bob,30,60000,True
Charlie,22,45000,False


In [15]:
# Remove a column
df$senior = NULL
df

name,age,salary
Alice,25,50000
Bob,30,60000
Charlie,22,45000


## 6. Sorting Data

In [16]:
# Order by salary (ascending)
df[order(df$salary), ]

Unnamed: 0,name,age,salary
3,Charlie,22,45000
1,Alice,25,50000
2,Bob,30,60000


In [17]:
# Order by age (descending)
df[order(-df$age), ]

Unnamed: 0,name,age,salary
2,Bob,30,60000
1,Alice,25,50000
3,Charlie,22,45000


## Summary

| Task                | Function / Syntax              |
|---------------------|--------------------------------|
| Create data frame   | `data.frame()`                 |
| Inspect structure   | `str()`, `summary()`, `head()` |
| Access columns      | `df$col`, `df[, col]`          |
| Filter rows         | `df[condition, ]`              |
| Sort rows           | `order()`                      |
| Modify columns      | Add: `df$new <- ...`<br>Remove: `df$col <- NULL` |
