# Manipulating a Dataframe

## Download Rmd Version

If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.

Exercises incomplete:
[Download Manipulating_a_dataframe.Rmd](rmarkdown/Manipulating_a_dataframe.Rmd)
Exercises complete:
[Download Manipulating_a_dataframe_complete.Rmd](rmarkdown/Manipulating_a_dataframe_complete.Rmd)

## Learning Objectives
- Learn how to subset data by selecting specific columns and filtering rows based on conditions using the `dplyr` package
- Understand how to create new variables and rename existing ones within a dataframe
- Understand sorting data based on one or more variables
- Learn how to summarise and aggregate data, including calculating summary statistics for the entire dataset and for grouped data
- Develop techniques for identifying and filtering out missing values in a dataset


## Introduction

The package `dplyr` from the Tidyverse contains functions for manipulating
dataframes. We will use the `penguins` dataset contained in the `palmerpenguins`
package. Note that this data is already in tidy format.

In [1]:
library(palmerpenguins)  # loads `penguins` data
head(penguins)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
Adelie,Torgersen,,,,,,2007
Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007


## Sub-setting data

For taking subsets of data, we can use the functions `select` and
`filter` from `dplyr`:

- `dplyr::select` is used to keep only specified **columns** of the dataframe,
  i.e. to 'select' certain variables from the data.
- `dplyr::filter` is used to keep **rows** that meet some specified condition,
  i.e. to 'filter' the observations in the data.


### Selecting columns

To use `select`, name the columns to keep after supplying the dataframe
(or tibble), like so:

```R
dplyr::select(data, column1, column2, ...)
```

**Note**: We _don't_ use strings to specify the columns! Instead, we write them
as if they were variables e.g. `dplyr::select(data, column1)` instead of
`dplyr::select(data, "column1")`.

Let's first check the column names in our tibble:

In [2]:
colnames(penguins)

Now let's select the 'species', 'year', 'sex', and 'body mass' columns:

In [3]:
# Select species, year, sex, and body mass
penguins_selected <- dplyr::select(penguins, species, year, sex, body_mass_g)

head(penguins_selected)

species,year,sex,body_mass_g
<fct>,<int>,<fct>,<int>
Adelie,2007,male,3750.0
Adelie,2007,female,3800.0
Adelie,2007,female,3250.0
Adelie,2007,,
Adelie,2007,female,3450.0
Adelie,2007,male,3650.0


The `select` function can also be used to _remove_ columns by using the `-`
operator. For example, we may want to remove only two columns, 'island' and
'year':

In [4]:
# Remove the island and year columns by using `-`.
penguins_deselected <- dplyr::select(penguins, -island, -year)

# Take a look at the columns
colnames(penguins_deselected)

### Filtering rows

The `filter` function from `dplyr` is used to keep rows from a dataframe/tibble
that meet a _predicate condition_ in terms of the column values, i.e. a
statement that is either `TRUE` or `FALSE`.


#### Comparitive operators

One way to create predicate conditions is to use the
[W3Schools: R Operators](https://www.w3schools.com/r/r_operators.asp), which
should be familiar:

```
x == y   : equal
x != y   : not equal
x > y    : greater than
x < y    : less than
x >= y   : greater than or equal
x <= y   : less than or equal
```

For example, suppose we want to keep all penguin data only from the year
2008. In base R, we could accomplish this by indexing on a boolean vector
created using the `==` operator:

In [5]:
# Create boolean vector indicating where year = 2008
is_year_2008 <- penguins$year == 2008
head(is_year_2008, n = 75)

In [6]:
# Index on this vector
head(penguins[is_year_2008,])

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Biscoe,39.6,17.7,186,3500,female,2008
Adelie,Biscoe,40.1,18.9,188,4300,male,2008
Adelie,Biscoe,35.0,17.9,190,3450,female,2008
Adelie,Biscoe,42.0,19.5,200,4050,male,2008
Adelie,Biscoe,34.5,18.1,187,2900,female,2008
Adelie,Biscoe,41.4,18.6,191,3700,male,2008


The `filter` function works by specifying the predicate condition for filtering
in terms of the column names, like so:

```R
dplyr::filter(data, condition)  # keep rows satisfying 'condition'
```

So to filter the penguins data to keep only the 2008 observations:

In [7]:
penguins_2008 <- dplyr::filter(penguins, year == 2008)
head(penguins_2008)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Biscoe,39.6,17.7,186,3500,female,2008
Adelie,Biscoe,40.1,18.9,188,4300,male,2008
Adelie,Biscoe,35.0,17.9,190,3450,female,2008
Adelie,Biscoe,42.0,19.5,200,4050,male,2008
Adelie,Biscoe,34.5,18.1,187,2900,female,2008
Adelie,Biscoe,41.4,18.6,191,3700,male,2008


#### Logical operators

We can build more complicated predicate conditions by using the
[W3Schools: R Operators](https://www.w3schools.com/r/r_operators.asp) on columns:

```
A & B   : A AND B   e.g. col1 == 2 & col2 == 10
A | B   : A OR B    e.g. col1 > 2 | col2 != 10
!A      : NOT A     e.g. !(col1 < 2)
```

In addition, `filter` allows us to specify multiple AND operations by listing
out multiple predicate conditions:

```
dplyr::filter(data, condition1, condition2, ...)  # keep rows satisfying
                                                  # 'condition1' AND 'condition2' AND ...
```

Some examples on the penguin data:

In [8]:
# Observations of male penguins from the year 2008:
dplyr::filter(penguins, sex == "male" & year == 2008)  # using &
dplyr::filter(penguins, sex == "male", year == 2008)  # listing multiple conditions in filter

# Observations of penguins with bill length > 40mm or bill depth < 20mm 
dplyr::filter(penguins, bill_length_mm > 40 | bill_depth_mm < 20)

# Observations except those of male penguins on the island of Biscoe
dplyr::filter(penguins, !(sex == "male" & island == "Biscoe"))

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Biscoe,40.1,18.9,188,4300,male,2008
Adelie,Biscoe,42.0,19.5,200,4050,male,2008
Adelie,Biscoe,41.4,18.6,191,3700,male,2008
Adelie,Biscoe,40.6,18.8,193,3800,male,2008
Adelie,Biscoe,37.6,19.1,194,3750,male,2008
Adelie,Biscoe,41.3,21.1,195,4400,male,2008
Adelie,Biscoe,41.1,18.2,192,4050,male,2008
Adelie,Biscoe,41.6,18.0,192,3950,male,2008
Adelie,Biscoe,41.1,19.1,188,4100,male,2008
Adelie,Torgersen,41.8,19.4,198,4450,male,2008


species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Biscoe,40.1,18.9,188,4300,male,2008
Adelie,Biscoe,42.0,19.5,200,4050,male,2008
Adelie,Biscoe,41.4,18.6,191,3700,male,2008
Adelie,Biscoe,40.6,18.8,193,3800,male,2008
Adelie,Biscoe,37.6,19.1,194,3750,male,2008
Adelie,Biscoe,41.3,21.1,195,4400,male,2008
Adelie,Biscoe,41.1,18.2,192,4050,male,2008
Adelie,Biscoe,41.6,18.0,192,3950,male,2008
Adelie,Biscoe,41.1,19.1,188,4100,male,2008
Adelie,Torgersen,41.8,19.4,198,4450,male,2008


species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
Adelie,Torgersen,38.9,17.8,181,3625,female,2007
Adelie,Torgersen,39.2,19.6,195,4675,male,2007
Adelie,Torgersen,34.1,18.1,193,3475,,2007
Adelie,Torgersen,42.0,20.2,190,4250,,2007
Adelie,Torgersen,37.8,17.1,186,3300,,2007
Adelie,Torgersen,37.8,17.3,180,3700,,2007


species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,,,,,,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
Adelie,Torgersen,39.3,20.6,190,3650,male,2007
Adelie,Torgersen,38.9,17.8,181,3625,female,2007
Adelie,Torgersen,39.2,19.6,195,4675,male,2007
Adelie,Torgersen,34.1,18.1,193,3475,,2007
Adelie,Torgersen,42.0,20.2,190,4250,,2007


#### Filtering missing values

There are other functions we can use to evaluate columns and get a true/false
output. An important one is `is.na()`. This function evaluates a column and
reports back a `TRUE` value when there is an `NA` in that column's row. For
example, we can use the `head()` function to look at the top 6 values of the
`sex` column, and see that there is an NA in the fourth row.

In [9]:
head(penguins$sex)

The `is.na` function evaluates the whole column and gives us `TRUE`s whenever it
sees an NA. Not surprisingly, we see a `TRUE` in the fourth observation.  

In [10]:
head(is.na(penguins$sex))

Since `is.na()` gives us a `TRUE`/`FALSE` vector, we can use it with `filter`.
The following gives us all rows where the sex column has missing data (i.e. is
set to `NA`):

In [11]:
dplyr::filter(penguins, is.na(sex))

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Torgersen,,,,,,2007
Adelie,Torgersen,34.1,18.1,193.0,3475.0,,2007
Adelie,Torgersen,42.0,20.2,190.0,4250.0,,2007
Adelie,Torgersen,37.8,17.1,186.0,3300.0,,2007
Adelie,Torgersen,37.8,17.3,180.0,3700.0,,2007
Adelie,Dream,37.5,18.9,179.0,2975.0,,2007
Gentoo,Biscoe,44.5,14.3,216.0,4100.0,,2007
Gentoo,Biscoe,46.2,14.4,214.0,4650.0,,2008
Gentoo,Biscoe,47.3,13.8,216.0,4725.0,,2009
Gentoo,Biscoe,44.5,15.7,217.0,4875.0,,2009


To do the reverse, i.e. keep only the observations where the sex value is not
missing, combine `is.na` with the NOT operator:

In [12]:
dplyr::filter(penguins, !is.na(sex))

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
Adelie,Torgersen,39.3,20.6,190,3650,male,2007
Adelie,Torgersen,38.9,17.8,181,3625,female,2007
Adelie,Torgersen,39.2,19.6,195,4675,male,2007
Adelie,Torgersen,41.1,17.6,182,3200,female,2007
Adelie,Torgersen,38.6,21.2,191,3800,male,2007
Adelie,Torgersen,34.6,21.1,198,4400,male,2007


### Exercise: subsetting data

Write code to extract data for Gentoo penguins with weight between 4.8kg and
5.2kg (inclusive). Only keep the species, sex and body mass columns in the
output. Give the numbers of resulting observations for male penguins, female
penguins and penguins with unknown sex.





```{admonition} Solution
:class: dropdown
``` R
penguins_subset <- dplyr::filter(penguins,
                                 species == "Gentoo",
                                 4800 <= body_mass_g & body_mass_g <= 5200)
penguins_subset <- dplyr::select(penguins_subset, species, sex, body_mass_g)
penguins_subset

cat("Number of male penguins:", nrow(dplyr::filter(penguins_subset, sex == "male")), "\n")
cat("Number of female penguins:", nrow(dplyr::filter(penguins_subset, sex == "female")), "\n")
cat("Number of penguins of unknown sex:", nrow(dplyr::filter(penguins_subset, is.na(sex))), "\n")
```




## Creating new variables and renaming variables

We often want to make a new column with some updated or transformed value. We
can use the `mutate` function in `dplyr` for this:

```
# value1, value2,... are expressions, possibly involving column names

dplyr::mutate(data, new_column1 = value1, new_column2 = value2, ...)
```

For example, to add a column of IDs for identifying each row in the data:


In [13]:
# Add column of ID numbers
penguins_with_ids <- dplyr::mutate(penguins, id = 1:nrow(penguins))

# Optional: Put the ID column at the beginning of the dataframe
penguins_with_ids <- dplyr::relocate(penguins_with_ids, id, .before = 1)

head(penguins_with_ids)

id,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
<int>,<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>
1,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
2,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
3,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
4,Adelie,Torgersen,,,,,,2007
5,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
6,Adelie,Torgersen,39.3,20.6,190.0,3650.0,male,2007


Another example: if we wanted to calculate a new value, the ratio of bill length
to bill depth, we could do the following

In [14]:
dplyr::mutate(penguins, bill_ratio = bill_length_mm / bill_depth_mm)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_ratio
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>,<dbl>
Adelie,Torgersen,39.1,18.7,181,3750,male,2007,2.090909
Adelie,Torgersen,39.5,17.4,186,3800,female,2007,2.270115
Adelie,Torgersen,40.3,18.0,195,3250,female,2007,2.238889
Adelie,Torgersen,,,,,,2007,
Adelie,Torgersen,36.7,19.3,193,3450,female,2007,1.901554
Adelie,Torgersen,39.3,20.6,190,3650,male,2007,1.907767
Adelie,Torgersen,38.9,17.8,181,3625,female,2007,2.185393
Adelie,Torgersen,39.2,19.6,195,4675,male,2007,2.000000
Adelie,Torgersen,34.1,18.1,193,3475,,2007,1.883978
Adelie,Torgersen,42.0,20.2,190,4250,,2007,2.079208


**Note**: the output of mutate is not just a new column on its own,
but the whole dataframe with the new column appended. To update `penguins` to
have the new column, we need to overwrite it:

In [15]:
penguins <- dplyr::mutate(penguins, bill_ratio = bill_length_mm / bill_depth_mm)
colnames(penguins)

When we just want to change the name of a column, we can use the `rename`
function in `dplyr`:

```R
dplyr::rename(data, New_column_name1 = Old_column_name1, New_column_name2 = Old_column_name2, ...)
```

For example, we could change the name of some variables to remove the units
(probably not recommended in general, but it serves as an example):

In [16]:
penguins_renamed <- dplyr::rename(penguins,
                                  bill_length = bill_length_mm,
                                  bill_depth = bill_depth_mm,
                                  flipper_length = flipper_length_mm,
                                  body_mass = body_mass_g
                                  )
colnames(penguins_renamed)

### Exercise: creating variables

Update the `penguins` dataframe so that it contains an ID column
_with IDs represented as strings_.





```{admonition} Solution
:class: dropdown
``` R
# Solution 1
penguins <- dplyr::mutate(penguins, id = as.character(1:nrow(penguins)))

# Solution 2
penguins <- dplyr::mutate(penguins, id = 1:nrow(penguins))  # int IDs
penguins <- dplyr::mutate(penguins, id = as.character(id))  # update column to strings
```






## Sorting observations

We can sort the rows of data by the value of a particular column using the
`arrange` function in `dplyr`. By default, the sorting is performed in increasing
order, although we can use the `desc()` function on the column to sort in
descending order:

```R
dplyr::arrange(data, col)  # sort rows by ascending order of values from column
                           # `col`

dplyr::arrange(data, dplyr::desc(col))  # sort rows by descending order of
                                        # values from column `col`
```

For example, to sort the `penguins` data by descending value of flipper length:

In [17]:
penguins_sorted_flipper <- dplyr::arrange(penguins, dplyr::desc(flipper_length_mm))
penguins_sorted_flipper

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year,bill_ratio
<fct>,<fct>,<dbl>,<dbl>,<int>,<int>,<fct>,<int>,<dbl>
Gentoo,Biscoe,54.3,15.7,231,5650,male,2008,3.458599
Gentoo,Biscoe,50.0,16.3,230,5700,male,2007,3.067485
Gentoo,Biscoe,59.6,17.0,230,6050,male,2007,3.505882
Gentoo,Biscoe,49.8,16.8,230,5700,male,2008,2.964286
Gentoo,Biscoe,48.6,16.0,230,5800,male,2008,3.037500
Gentoo,Biscoe,52.1,17.0,230,5550,male,2009,3.064706
Gentoo,Biscoe,51.5,16.3,230,5500,male,2009,3.159509
Gentoo,Biscoe,55.1,16.0,230,5850,male,2009,3.443750
Gentoo,Biscoe,49.5,16.2,229,5800,male,2008,3.055556
Gentoo,Biscoe,49.8,15.9,229,5950,male,2009,3.132075


### Exercise: sorting

Sort the `penguins` data by the id column we created in the previous exercise.
Explain the resulting order on the id column.

```{admonition} Solution
:class: dropdown
``` R
penguins_sorted_id <- dplyr::arrange(penguins, id)
head(dplyr::select(penguins_sorted_id, id, species, year, bill_length_mm))
```

## Summarising / aggregating data

Often we want to aggregate data at certain levels to better understand
differences across groups. For instance, does flipper length differ by species?
Does body mass change between years?


### Summarising all the data

First, we can use `summarise` on its own, without any grouping, to get a single
summary from all the rows in the dataframe. Here are some examples using the
flipper length variable:

In [18]:
# Count number of observations
dplyr::summarise(penguins, num_rows = dplyr::n())

# Count number of different years in the data
dplyr::summarise(penguins, num_years = dplyr::n_distinct(year))

# Get the min and max year
dplyr::summarise(penguins, min_year = min(year), max_year = max(year))

# calculate the sum of flipper lengths for the whole data frame
dplyr::summarise(penguins, sum_flipper_length_mm = sum(flipper_length_mm))

num_rows
<int>
344


num_years
<int>
3


min_year,max_year
<int>,<int>
2007,2009


sum_flipper_length_mm
<int>
""


If the result is `NA`, it's likely because the column contained missing values.
We include the `na.rm = TRUE` optional argument to tell `summarise` to remove
`NA`s before calculating:

In [19]:
# With missing values removed
dplyr::summarise(penguins, sum_flipper_length_mm = sum(flipper_length_mm, na.rm = TRUE))

sum_flipper_length_mm
<int>
68713


### Summarising grouped data

More generally, we can compute summaries for subgroups of the data by combining
the `group_by` and `summarise` (or `summarize`) functions in `dplyr`. The steps
to do this are:

1. First use `group_by` to declare the column (or columns) that you want to
   group the data by.
2. Then use `summarise` to actually do the summarising on each of the subgroups
   from step 1. The resulting dataframe will have one summary per subgroup.

For example, to get the average flipper length of each sex:

In [20]:
# First group the data by sex
penguins_grouped_by_sex <- dplyr::group_by(penguins, sex)

# Then find the mean flipper length for each sex
dplyr::summarise(penguins_grouped_by_sex, mean_flipper_length_mm = mean(flipper_length_mm, na.rm = TRUE))

sex,mean_flipper_length_mm
<fct>,<dbl>
female,197.3636
male,204.506
,199.0


These functions are powerful. We can group by multiple columns at once to create
pairwise groups. As we saw above, we can also create several summary variables
at the same time:

```
dplyr::group_by(data, col1, col2, ...)  # group by combinations of values in
                                        # col1, col2, ...

# Compute multiple summaries on the same data
dplyr::summarise(grouped_data, summary1 = <...>, summary2 = <...>, ...)
```


### Exercise: grouping and summarising

Create a summary dataframe that gives the mean, range (i.e. `max - min`)
and standard deviation of the body mass for each species / sex combination, as
well as the number of observations in each group.

```{admonition} Solution
:class: dropdown
``` R
penguins_grouped <- dplyr::group_by(penguins, species, sex)
dplyr::summarise(penguins_grouped,
                 mean_body_mass_g = mean(body_mass_g, na.rm = TRUE),
                 sd_body_mass_g = sd(body_mass_g, na.rm = TRUE),
                 range_body_mass_g = max(body_mass_g, na.rm = TRUE) - min(body_mass_g, na.rm = TRUE),
                 Group_count = dplyr::n())
```





### An unimportant detail: grouped dataframes

`dplyr::group_by` returns a 'grouped dataframe' (`dplyr::grouped_df`). These
behave just like tibbles/dataframes except they have some extra information
about the grouping attached to them.

When you group by multiple columns, the result of `summarise` will typically
also be a grouped dataframe. Usually, the groups are given by all columns except
the last one in the grouped dataframe supplied to `summarise`. By default, you
get a console message telling you on which columns the output is grouped by (see
the result of the exercise above).

The upshot of this is that it allows us to do successive summaries without
having to keep regrouping the data. For example, the code below will produce a
'max of averages':

In [21]:
# Group by sex and species
penguins_grouped <- dplyr::group_by(penguins, sex, species)

# Average mass for each species / sex combination
# Result is grouped by sex (see console output)
penguins_grouped_mass <- dplyr::summarise(penguins_grouped,
                                          mean_body_mass_g = mean(body_mass_g, na.rm = TRUE))
print(penguins_grouped_mass)

# Max of the average mass over species, for each sex
penguins_max_species_avg_mass <- dplyr::summarise(
  penguins_grouped_mass,
  max_species_avg_body_mass_g = max(mean_body_mass_g, na.rm = TRUE)
  )
print(penguins_max_species_avg_mass)

[1m[22m`summarise()` has grouped output by 'sex'. You can override using the `.groups`
argument.


[90m# A tibble: 8 × 3[39m
[90m# Groups:   sex [3][39m
  sex    species   mean_body_mass_g
  [3m[90m<fct>[39m[23m  [3m[90m<fct>[39m[23m                [3m[90m<dbl>[39m[23m
[90m1[39m female Adelie               [4m3[24m369.
[90m2[39m female Chinstrap            [4m3[24m527.
[90m3[39m female Gentoo               [4m4[24m680.
[90m4[39m male   Adelie               [4m4[24m043.
[90m5[39m male   Chinstrap            [4m3[24m939.
[90m6[39m male   Gentoo               [4m5[24m485.
[90m7[39m [31mNA[39m     Adelie               [4m3[24m540 
[90m8[39m [31mNA[39m     Gentoo               [4m4[24m588.
[90m# A tibble: 3 × 2[39m
  sex    max_species_avg_body_mass_g
  [3m[90m<fct>[39m[23m                        [3m[90m<dbl>[39m[23m
[90m1[39m female                       [4m4[24m680.
[90m2[39m male                         [4m5[24m485.
[90m3[39m [31mNA[39m                           [4m4[24m588.


In practice, you don't have to worry about this detail. If you want to get back
to a plain old tibble/dataframe, use the `dplyr::ungroup` function on a grouped
dataframe.


## Acknowledgement

The material in this notebook is adapted from Eliza Wood's [Tidyverse: Data
wrangling & visualization](https://liza-wood.github.io/tidyverse_intro/) course,
which is licensed under [Creative Commons BY-NC-SA
4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). This in itself is
based on material from [UC Davis's R-DAVIS
course](https://gge-ucd.github.io/R-DAVIS/index.html), which draws heavily on
[Data Carpentry: R for Data Analysis in Ecology](https://datacarpentry.org/R-ecology-lesson/) R lessons.

## Summary Quiz

In [22]:
# Call the function to display quiz interactively:
source("../../R_functions/quiz_renderer.R")
show_quiz_from_json("questions/summary_manipulating_a_dataframe.json")