# *Tibbles in R*

In R, a tibble is a modern and improved version of a data frame, designed to make working with data easier and more intuitive. Tibbles have a similar structure to data frames, but with some differences in their behavior and syntax.

In [2]:
library(tidyverse)

"package 'tidyverse' was built under R version 4.2.3"
── [1mAttaching core tidyverse packages[22m ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.0     [32m✔[39m [34mpurrr    [39m 1.0.1
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mggplot2  [39m 3.4.1     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [10]:
# create a tibble with two columns
my_tibble <- tibble(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), )

# print the tibble
print(my_tibble)

[90m# A tibble: 3 × 3[39m
  name      age gender
  [3m[90m<chr>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Alice      25 F     
[90m2[39m Bob        30 M     
[90m3[39m Charlie    35 M     


#### Tibbles have several advantages over data frames, including:

* Printing only the first 10 rows and all columns that fit on a screen, which makes it easier to work with large datasets
* Column names are not automatically converted to factors, which can sometimes cause issues with data types
* Non-standard evaluation, which allows you to use variable names without having to use the dollar sign operator ($)
* Built-in support for tidyverse packages, which makes it easier to work with data in a tidy way.

#### Accessing columns in a tibble:

In [11]:
# select the "name" column from the tibble
my_tibble$name


#### Filtering rows in a tibble:

In [12]:
# filter the tibble to only include rows where age is greater than 30
my_tibble_filtered <- filter(my_tibble, age > 30)

# print the filtered tibble
print(my_tibble_filtered)


[90m# A tibble: 1 × 3[39m
  name      age gender
  [3m[90m<chr>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Charlie    35 M     


#### Adding a new column to a tibble:

In [13]:
# add a new column to the tibble that indicates if the person is over 30
my_tibble <- mutate(my_tibble, over_30 = ifelse(age > 30, "yes", "no"))

# print the tibble with the new column
print(my_tibble)


[90m# A tibble: 3 × 4[39m
  name      age gender over_30
  [3m[90m<chr>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m  
[90m1[39m Alice      25 F      no     
[90m2[39m Bob        30 M      no     
[90m3[39m Charlie    35 M      yes    


#### Summarizing data in a tibble:

In [14]:
# summarize the tibble to get the average age by gender
age_by_gender <- summarise(group_by(my_tibble, gender), average_age = mean(age))

# print the summarized data
print(age_by_gender)


[90m# A tibble: 2 × 2[39m
  gender average_age
  [3m[90m<chr>[39m[23m        [3m[90m<dbl>[39m[23m
[90m1[39m F             25  
[90m2[39m M             32.5


#### Sorting a tibble:

In [15]:
# sort the tibble by age in descending order
my_tibble_sorted <- arrange(my_tibble, desc(age))

# print the sorted tibble
print(my_tibble_sorted)


[90m# A tibble: 3 × 4[39m
  name      age gender over_30
  [3m[90m<chr>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m  
[90m1[39m Charlie    35 M      yes    
[90m2[39m Bob        30 M      no     
[90m3[39m Alice      25 F      no     


#### Combining tibbles:

In [16]:
# create two tibbles with the same columns
tibble1 <- tibble(x = c(1, 2, 3), y = c(4, 5, 6))
tibble2 <- tibble(x = c(4, 5, 6), y = c(7, 8, 9))

# combine the two tibbles into one
combined_tibble <- bind_rows(tibble1, tibble2)

# print the combined tibble
print(combined_tibble)


[90m# A tibble: 6 × 2[39m
      x     y
  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
[90m1[39m     1     4
[90m2[39m     2     5
[90m3[39m     3     6
[90m4[39m     4     7
[90m5[39m     5     8
[90m6[39m     6     9


#### Adding rows to a tibble:

In [17]:
# create a tibble with two columns
my_tibble <- tibble(a = c(1, 2), b = c(3, 4))

# add a new row to the tibble
my_tibble <- add_row(my_tibble, a = 3, b = 5)

# print the modified tibble
print(my_tibble)


[90m# A tibble: 3 × 2[39m
      a     b
  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
[90m1[39m     1     3
[90m2[39m     2     4
[90m3[39m     3     5


#### Renaming columns:

In [18]:
# create a tibble with two columns
my_tibble <- tibble(a = c(1, 2, 3), b = c(4, 5, 6))

# rename the "a" column to "A"
my_tibble_renamed <- rename(my_tibble, A = a)

# print the renamed tibble
print(my_tibble_renamed)


[90m# A tibble: 3 × 2[39m
      A     b
  [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
[90m1[39m     1     4
[90m2[39m     2     5
[90m3[39m     3     6
