# Section 01: Data Wrangling
### `01-Loading the gapminder and dplyr packages`

Use the `library()` function to load the `dplyr` package, just like we've loaded the `gapminder` package for you.
Type `gapminder`, on its own line, to look at the `gapminder` dataset.


In [1]:
# Load the gapminder package
library(gapminder)

# Load the dplyr package
library(dplyr)

# Look at the gapminder dataset
gapminder::gapminder


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453
Afghanistan,Asia,1957,30.332,9240934,820.8530
Afghanistan,Asia,1962,31.997,10267083,853.1007
Afghanistan,Asia,1967,34.020,11537966,836.1971
Afghanistan,Asia,1972,36.088,13079460,739.9811
Afghanistan,Asia,1977,38.438,14880372,786.1134
Afghanistan,Asia,1982,39.854,12881816,978.0114
Afghanistan,Asia,1987,40.822,13867957,852.3959
Afghanistan,Asia,1992,41.674,16317921,649.3414
Afghanistan,Asia,1997,41.763,22227415,635.3414


### `02-Filtering for one year`

- Add a `filter()` line after the pipe (`%>%`) to extract only the observations from the year 1957. Remember that you use `==` to compare two values.

In [2]:
library(gapminder)
library(dplyr)

# Filter the gapminder dataset for the year 1957
gapminder %>%
    filter(year == 1957)
  

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1957,30.33200,9240934,820.8530
Albania,Europe,1957,59.28000,1476505,1942.2842
Algeria,Africa,1957,45.68500,10270856,3013.9760
Angola,Africa,1957,31.99900,4561361,3827.9405
Argentina,Americas,1957,64.39900,19610538,6856.8562
Australia,Oceania,1957,70.33000,9712569,10949.6496
Austria,Europe,1957,67.48000,6965860,8842.5980
Bahrain,Asia,1957,53.83200,138655,11635.7995
Bangladesh,Asia,1957,39.34800,51365468,661.6375
Belgium,Europe,1957,69.24000,8989111,9714.9606


### `03-Filtering for one country and one year`
- Filter the `gapminder` data to retrieve only the observation from China in the year 2002.



In [3]:
library(gapminder)
library(dplyr)

# Filter for China in 2002
gapminder::gapminder %>%
    filter(country == "China", year == 2002)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
China,Asia,2002,72.028,1280400000,3119.281


### `04-Arranging observations by life expectancy`
- Sort the `gapminder` dataset in ascending order of life expectancy (`lifeExp`).
- Sort the `gapminder` dataset in descending order of life expectancy.

In [4]:
library(gapminder)
library(dplyr)

# Sort in ascending order of lifeExp
gapminder %>%
    arrange(lifeExp)

  
# Sort in descending order of lifeExp
gapminder %>%
    arrange(desc(lifeExp))

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Rwanda,Africa,1992,23.599,7290203,737.0686
Afghanistan,Asia,1952,28.801,8425333,779.4453
Gambia,Africa,1952,30.000,284320,485.2307
Angola,Africa,1952,30.015,4232095,3520.6103
Sierra Leone,Africa,1952,30.331,2143249,879.7877
Afghanistan,Asia,1957,30.332,9240934,820.8530
Cambodia,Asia,1977,31.220,6978607,524.9722
Mozambique,Africa,1952,31.286,6446316,468.5260
Sierra Leone,Africa,1957,31.570,2295678,1004.4844
Burkina Faso,Africa,1952,31.975,4469979,543.2552


country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Japan,Asia,2007,82.603,127467972,31656.07
"Hong Kong, China",Asia,2007,82.208,6980412,39724.98
Japan,Asia,2002,82.000,127065841,28604.59
Iceland,Europe,2007,81.757,301931,36180.79
Switzerland,Europe,2007,81.701,7554661,37506.42
"Hong Kong, China",Asia,2002,81.495,6762476,30209.02
Australia,Oceania,2007,81.235,20434176,34435.37
Spain,Europe,2007,80.941,40448191,28821.06
Sweden,Europe,2007,80.884,9031088,33859.75
Israel,Asia,2007,80.745,6426679,25523.28


### `05- Filtering and arranging`
- Use `filter()` to extract observations from just the year 1957, then use `arrange()` to sort in descending order of population (`pop`).

In [5]:
library(gapminder)
library(dplyr)

# Filter for the year 1957, then arrange in descending order of population

gapminder %>%
    filter(year == 1957) %>%
    arrange(desc(pop))


country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
China,Asia,1957,50.54896,637408000,575.9870
India,Asia,1957,40.24900,409000000,590.0620
United States,Americas,1957,69.49000,171984000,14847.1271
Japan,Asia,1957,65.50000,91563009,4317.6944
Indonesia,Asia,1957,39.91800,90124000,858.9003
Germany,Europe,1957,69.10000,71019069,10187.8267
Brazil,Americas,1957,53.28500,65551171,2487.3660
United Kingdom,Europe,1957,70.42000,51430000,11283.1779
Bangladesh,Asia,1957,39.34800,51365468,661.6375
Italy,Europe,1957,67.81000,49182000,6248.6562


### `06-Using mutate to change or create a column`
- Use `mutate()` to change the existing `lifeExp` column, by multiplying it by 12: `12 * lifeExp`.
- Use `mutate()` to add a new column, called `lifeExpMonths`, calculated as `12 * lifeExp`.


In [11]:
library(gapminder)
library(dplyr)

# Use mutate to change lifeExp to be in months
gapminder %>%
    mutate(lifeExp = 12 * lifeExp)

# Use mutate to create a new column called lifeExpMonths
gapminder %>%
    mutate(lifeExpMonths = 12 * lifeExp)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,345.612,8425333,779.4453
Afghanistan,Asia,1957,363.984,9240934,820.8530
Afghanistan,Asia,1962,383.964,10267083,853.1007
Afghanistan,Asia,1967,408.240,11537966,836.1971
Afghanistan,Asia,1972,433.056,13079460,739.9811
Afghanistan,Asia,1977,461.256,14880372,786.1134
Afghanistan,Asia,1982,478.248,12881816,978.0114
Afghanistan,Asia,1987,489.864,13867957,852.3959
Afghanistan,Asia,1992,500.088,16317921,649.3414
Afghanistan,Asia,1997,501.156,22227415,635.3414


country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonths
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453,345.612
Afghanistan,Asia,1957,30.332,9240934,820.8530,363.984
Afghanistan,Asia,1962,31.997,10267083,853.1007,383.964
Afghanistan,Asia,1967,34.020,11537966,836.1971,408.240
Afghanistan,Asia,1972,36.088,13079460,739.9811,433.056
Afghanistan,Asia,1977,38.438,14880372,786.1134,461.256
Afghanistan,Asia,1982,39.854,12881816,978.0114,478.248
Afghanistan,Asia,1987,40.822,13867957,852.3959,489.864
Afghanistan,Asia,1992,41.674,16317921,649.3414,500.088
Afghanistan,Asia,1997,41.763,22227415,635.3414,501.156


### `07-Combining filter, mutate, and arrange`
In one sequence of pipes on the `gapminder` dataset:
- `filter()` for observations from the year 2007,
- `mutate()` to create a column `lifeExpMonths`, calculated as `12 * lifeExp`, and
- `arrange()` in descending order of that new column

In [12]:
library(gapminder)
library(dplyr)

# Filter, mutate, and arrange the gapminder dataset
gapminder %>%
    filter(year == 2007) %>%
    mutate(lifeExpMonths = 12 * lifeExp) %>%
    arrange(desc(lifeExpMonths))

country,continent,year,lifeExp,pop,gdpPercap,lifeExpMonths
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>,<dbl>
Japan,Asia,2007,82.603,127467972,31656.068,991.236
"Hong Kong, China",Asia,2007,82.208,6980412,39724.979,986.496
Iceland,Europe,2007,81.757,301931,36180.789,981.084
Switzerland,Europe,2007,81.701,7554661,37506.419,980.412
Australia,Oceania,2007,81.235,20434176,34435.367,974.820
Spain,Europe,2007,80.941,40448191,28821.064,971.292
Sweden,Europe,2007,80.884,9031088,33859.748,970.608
Israel,Asia,2007,80.745,6426679,25523.277,968.940
France,Europe,2007,80.657,61083916,30470.017,967.884
Canada,Americas,2007,80.653,33390141,36319.235,967.836


### `The End`