# __07 Tibbles with tibble__

In [1]:
# libraries
library(tidyverse)

# config
repr_html.tbl_df <- function(obj, ..., rows = 6) repr:::repr_html.data.frame(obj, ..., rows = rows)
options(dplyr.summarise.inform = FALSE)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.1
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
?tibble

0,1
tibble {tibble},R Documentation

0,1
...,"<dynamic-dots> A set of name-value pairs. These arguments are processed with rlang::quos() and support unquote via !! and unquote-splice via !!!. Use := to create columns that start with a dot. Arguments are evaluated sequentially. You can refer to previously created elements directly or using the .data pronoun. An existing .data pronoun, provided e.g. inside dplyr::mutate(), is not available."
.rows,"The number of rows, useful to create a 0-column tibble or just as an additional check."
.name_repair,"Treatment of problematic column names:  ""minimal"": No name repair or checks, beyond basic existence,  ""unique"": Make sure names are unique and not empty,  ""check_unique"": (default value), no name repair, but check they are unique,  ""universal"": Make the names unique and syntactic  a function: apply custom name repair (e.g., .name_repair = make.names for names in the style of base R).  A purrr-style anonymous function, see rlang::as_function() This argument is passed on as repair to vctrs::vec_as_names(). See there for more details on these terms and the strategies used to enforce them."


In [3]:
as_tibble(iris)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
<dbl>,<dbl>,<dbl>,<dbl>,<fct>
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
⋮,⋮,⋮,⋮,⋮
6.5,3.0,5.2,2.0,virginica
6.2,3.4,5.4,2.3,virginica
5.9,3.0,5.1,1.8,virginica


You can create a new tibble from individual vectors with `tibble()` .
`tibble()` will automatically recycle inputs of length 1, and allows
you to refer to variables that you just created, as shown here:

In [4]:
tibble(
    x = 1:5,
    y = 1,
    z = x^2 + y
)

x,y,z
<int>,<dbl>,<dbl>
1,1,2
2,1,5
3,1,10
4,1,17
5,1,26


If you’re already familiar with `data.frame()` , note that `tibble()`
does much less: it never changes the type of the inputs (e.g., it never
converts strings to factors!), it never changes the names of variables,
and it never creates row names.

It’s possible for a tibble to have column names that are not valid R
variable names, aka nonsyntactic names. For example, they might
not start with a letter, or they might contain unusual characters like
a space. To refer to these variables, you need to surround them with
backticks, ` :

In [5]:
tb <- tibble(
    `:)` = 'smile',
    ` ` = 'space',
    `2000` = 'number'
)

(tb)

:),Unnamed: 1_level_0,2000
<chr>,<chr>,<chr>
smile,space,number


You’ll also need the backticks when working with these variables in
other packages, like ggplot2, dplyr, and tidyr.

Another way to create a tibble is with `tribble()` , short for trans‐
posed tibble. `tribble()` is customized for data entry in code: column
headings are defined by formulas (i.e., they start with ~ ), and entries
are separated by commas. This makes it possible to lay out small
amounts of data in easy-to-read form:

In [7]:
tribble(
    ~x, ~y, ~z,
    #--|---|---
    'a', 2, 3.6,
    'b', 1, 6.5
)

x,y,z
<chr>,<dbl>,<dbl>
a,2,3.6
b,1,6.5


## __Tibbles Versus data.frame__

### Printing

Tibbles have a refined print method that shows only the first 10
rows, and all the columns that fit on screen. This makes it much eas‐
ier to work with large data. In addition to its name, each column
reports its type, a nice feature borrowed from `str()` :

In [8]:
tibble(a = lubridate::now() + runif(1e3) * 86400,
       b = lubridate::today() + runif(1e3) * 30,
       c = 1:1e3,
       d = runif(1e3),
       e = sample(letters, 1e3, replace = TRUE))

a,b,c,d,e
<dttm>,<date>,<int>,<dbl>,<chr>
2020-09-02 12:01:59,2020-09-14,1,0.6465581,e
2020-09-01 13:24:35,2020-09-16,2,0.5990876,u
2020-09-02 02:56:50,2020-09-17,3,0.5714991,c
⋮,⋮,⋮,⋮,⋮
2020-09-02 09:36:01,2020-09-16,998,0.7877355,i
2020-09-01 17:15:00,2020-09-01,999,0.3174963,n
2020-09-01 21:04:24,2020-09-12,1000,0.1839201,f


You can also control the default print behavior by setting options:
* options(tibble.print_max = n, tibble.print_min = m) :
if more than m rows, print only n rows. Use
options(dplyr.print_min = Inf) to always show all rows.
* Use options(tibble.width = Inf) to always print all col‐
umns, regardless of the width of the screen.

### __Subsetting__

So far all the tools you’ve learned have worked with complete data
frames. If you want to pull out a single variable, you need some new
tools, `$` and `[[` . `[[` can extract by name or position; `$` only extracts
by name but is a little less typing:

In [9]:
df <- tibble(
    x = runif(5),
    y = runif(5)
)

In [11]:
# extract by name
df$x

In [12]:
# extract by name
df[["x"]]

In [13]:
# extarct by position
df[[1]]

To use these in a pipe, you’ll need to use the special placeholder . :

In [14]:
df %>%
    .$x

In [15]:
df %>%
    .[["x"]]

Some older functions don’t work with tibbles. If you encounter one
of these functions, use `as.data.frame()` to turn a tibble back to a
`data.frame` :

In [16]:
class(as.data.frame(tb))

The main reason that some older functions don’t work with tibbles
is the `[` function. We don’t use `[` much in this book because
`dplyr::filter()` and `dplyr::select()` allow you to solve the same
problems with clearer code.
              
With base R data frames, `[` sometimes
returns a data frame, and sometimes returns a vector. With tibbles,
`[` always returns another tibble.

In [18]:
# How can you tell if an object is a tibble? (Hint: try printing
# mtcars , which is a regular data frame.)
(class(mtcars))
(class(diamonds))

In [19]:
# What does tibble::enframe() do? When might you use it?
?enframe

0,1
enframe {tibble},R Documentation

0,1
x,An atomic vector (for enframe()) or a data frame with one or two columns (for deframe()).
"name, value","Names of the columns that store the names and values. If name is NULL, a one-column tibble is returned; value cannot be NULL."
