# Functions of dplyr Package

dplyr is part of the tidyverse in R, and it provides easy-to-use functions for data manipulation.

### select() – Choose Columns

* Purpose: Select specific columns from a data frame.

* Use Case: If you want only certain variables.

In [2]:
#install.packages("dplyr")
library(dplyr)

# Example data
df <- data.frame(Name = c("Ali", "Sara", "John"),
                 Age = c(25, 30, 28),
                 Salary = c(50000, 60000, 55000))

# Select only Name and Age
select(df, Name, Age)

"package 'dplyr' was built under R version 4.3.3"

Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




Name,Age
<chr>,<dbl>
Ali,25
Sara,30
John,28


### filter() – Filter Rows

In [2]:
# Filter rows where Age > 26
filter(df, Age > 26)


Name,Age,Salary
<chr>,<dbl>,<dbl>
Sara,30,60000
John,28,55000


### arrange() – Sort Rows

In [3]:
# Sort by Age ascending
arrange(df, Age)

# Sort by Salary descending
arrange(df, desc(Salary))


Name,Age,Salary
<chr>,<dbl>,<dbl>
Ali,25,50000
John,28,55000
Sara,30,60000


Name,Age,Salary
<chr>,<dbl>,<dbl>
Sara,30,60000
John,28,55000
Ali,25,50000


### mutate() – Create or Transform Columns

In [4]:
# Add a new column Bonus = 10% of Salary
mutate(df, Bonus = Salary * 0.10)


Name,Age,Salary,Bonus
<chr>,<dbl>,<dbl>,<dbl>
Ali,25,50000,5000
Sara,30,60000,6000
John,28,55000,5500


### summarize() – Summarize Data

In [3]:
# Average Salary
summarize(df, avg_salary = mean(Salary))

# Sample data
students <- data.frame(
  Name = c("Ali", "Sara", "John", "Emma", "Omar", "Lina"),
  Class = c("A", "A", "B", "B", "A", "B"),
  Marks = c(85, 90, 78, 88, 92, 81)
)

# Group by Class and calculate average marks
summarize(
  group_by(students, Class),
  avg_marks = mean(Marks),
  max_marks = max(Marks)
)



avg_salary
<dbl>
55000


Class,avg_marks,max_marks
<chr>,<dbl>,<dbl>
A,89.0,92
B,82.33333,88


## Summary

| Function    | General Syntax |
|-------------|----------------|
| **select()**   | `select(data, col1, col2, …)` |
| **filter()**   | `filter(data, condition)` |
| **arrange()**  | `arrange(data, col)` <br> `arrange(data, desc(col))` |
| **mutate()**   | `mutate(data, newcol = expression)` |
| **summarize()** | `summarize(data, newcol = function(column))` |


## Practice Questions with `mtcars`  

1. **select()** → Select only the columns `mpg`, `hp`, and `gear` from `mtcars`.  

2. **filter()** → Find all cars where `mpg` is greater than 25.  

3. **arrange()** → Sort the dataset by `hp` (horsepower) in descending order.  

4. **mutate()** → Create a new column called `efficiency` that is the ratio of `mpg` to `wt`.  

5. **summarize()** → Find the average horsepower (`hp`) of all cars in the dataset.  

