# 05 – dplyr

Core R concepts: data manipulation with `dplyr`.

*Part of the [Foundations: Python, R & SQL](../README.md) repository.*

In [10]:
!pip install -q rpy2
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [11]:
%%R
library(dplyr)

## 1. Load dplyr and prepare data

In [12]:
%%R
# Sample data
df <- data.frame(
  name = c("Alice", "Bob", "Clara", "David", "Eva"),
  age = c(25, 30, 28, 22, 35),
  salary = c(40000, 50000, 45000, 38000, 60000),
  department = c("HR", "IT", "IT", "Finance", "HR")
)

df

   name age salary department
1 Alice  25  40000         HR
2   Bob  30  50000         IT
3 Clara  28  45000         IT
4 David  22  38000    Finance
5   Eva  35  60000         HR


## 2. Filter rows

In [13]:
%%R
# People older than 28
filter(df, age > 28)

  name age salary department
1  Bob  30  50000         IT
2  Eva  35  60000         HR


## 3. Select columns

In [14]:
%%R
# Select name and salary
select(df, name, salary)

   name salary
1 Alice  40000
2   Bob  50000
3 Clara  45000
4 David  38000
5   Eva  60000


## 4. Arrange rows

In [15]:
%%R
# Sort by salary descending
arrange(df, desc(salary))

   name age salary department
1   Eva  35  60000         HR
2   Bob  30  50000         IT
3 Clara  28  45000         IT
4 Alice  25  40000         HR
5 David  22  38000    Finance


## 5. Mutate new columns

In [16]:
%%R
# Add a 10% salary increase
mutate(df, salary_raised = salary * 1.1)

   name age salary department salary_raised
1 Alice  25  40000         HR         44000
2   Bob  30  50000         IT         55000
3 Clara  28  45000         IT         49500
4 David  22  38000    Finance         41800
5   Eva  35  60000         HR         66000


## 6. Summarize and group

In [17]:
%%R
# Average salary per department
df %>%
  group_by(department) %>%
  summarize(avg_salary = mean(salary))

# A tibble: 3 × 2
  department avg_salary
  <chr>           <dbl>
1 Finance         38000
2 HR              50000
3 IT              47500


## 7. Pipe chaining

In [18]:
%%R
# Filter, select, and sort in one step
df %>%
  filter(age > 25) %>%
  select(name, salary) %>%
  arrange(salary)

   name salary
1 Clara  45000
2   Bob  50000
3   Eva  60000


## Summary

| Function   | Description                 |
|------------|-----------------------------|
| `filter()` | Filter rows by condition    |
| `select()` | Select specific columns     |
| `arrange()`| Sort rows                   |
| `mutate()` | Add or transform columns    |
| `summarize()` | Aggregate values         |
| `group_by()` | Group rows for summary    |
| Pipe `%>%` | Chain operations            |