Skip to content

Commit

Permalink
differences for PR #149
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jan 11, 2024
1 parent e684e92 commit 5f4bd24
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 5 deletions.
115 changes: 111 additions & 4 deletions 04-data-structures-part2.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat

::::::::::::::::::::::::::::::::::::::::: instructor

Pay attention to and explain the errors and warnings generated from the
Pay attention to and explain the errors and warnings generated from the
examples in this episode.

:::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::


```r
Expand Down Expand Up @@ -77,7 +77,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind

- You can read directly from excel spreadsheets without
converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.


::::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -99,7 +99,8 @@ str(gapminder)
$ gdpPercap: num 779 821 853 836 740 ...
```

We can also examine individual columns of the data frame with our `class` function:
We can also examine individual columns of the data frame with the `class` or
'typeof' functions:


```r
Expand All @@ -110,6 +111,14 @@ class(gapminder$year)
[1] "integer"
```

```r
typeof(gapminder$year)
```

```{.output}
[1] "integer"
```

```r
class(gapminder$country)
```
Expand Down Expand Up @@ -424,6 +433,104 @@ tail(gapminder_norway)

To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.


## Removing columns and rows in data frames

To remove columns from a data frame, we can use the 'subset' function.
This function allows us to remove columns using their names:


```r
life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
head(life_expectancy)
```

```{.output}
country year lifeExp below_average
1 Afghanistan 1952 28.801 TRUE
2 Afghanistan 1957 30.332 TRUE
3 Afghanistan 1962 31.997 TRUE
4 Afghanistan 1967 34.020 TRUE
5 Afghanistan 1972 36.088 TRUE
6 Afghanistan 1977 38.438 TRUE
```

We can also use a logical vector to achieve the same result. Make sure the
vector's length match the number of columns in the data frame (to avoid vector
recycling):


```r
life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
head(life_expectancy)
```

```{.output}
country year lifeExp below_average
1 Afghanistan 1952 28.801 TRUE
2 Afghanistan 1957 30.332 TRUE
3 Afghanistan 1962 31.997 TRUE
4 Afghanistan 1967 34.020 TRUE
5 Afghanistan 1972 36.088 TRUE
6 Afghanistan 1977 38.438 TRUE
```

Alternatively, we can use column's positions:


```r
life_expectancy <- gapminder[-c(3, 4, 6)]
head(life_expectancy)
```

```{.output}
country year lifeExp below_average
1 Afghanistan 1952 28.801 TRUE
2 Afghanistan 1957 30.332 TRUE
3 Afghanistan 1962 31.997 TRUE
4 Afghanistan 1967 34.020 TRUE
5 Afghanistan 1972 36.088 TRUE
6 Afghanistan 1977 38.438 TRUE
```

Note that the easy way to remove rows from a data frame is selecting the rows
we want to keep instead.
Anyway, to remove rows from a data frame, we can use their positions:


```r
# Filter data for Afghanistan during the 20th century:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
gapminder$year > 2000, ]

# Now remove data for 2002, that is, the first row:
afghanistan_20c[-1, ]
```

```{.output}
country year pop continent lifeExp gdpPercap below_average
12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
```


An interesting case is removing rows containing NAs:


```r
# Turn some values into NAs:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA

# Remove NAs
na.omit(afghanistan_20c)
```

```{.output}
country year pop continent lifeExp gdpPercap below_average
12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
```


## Factors

Here is another thing to look out for: in a `factor`, each different value
Expand Down
Empty file modified fig/06-rmd-generate-figures.sh
100755 → 100644
Empty file.
Empty file modified fig/12-plyr-generate-figures.sh
100755 → 100644
Empty file.
2 changes: 1 addition & 1 deletion md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"episodes/01-rstudio-intro.Rmd" "f4e11815e378019213cd8bc32bd5d292" "site/built/01-rstudio-intro.md" "2023-11-21"
"episodes/02-project-intro.Rmd" "00024461ca6e3ea1ec659cf9434377d4" "site/built/02-project-intro.md" "2023-11-21"
"episodes/03-data-structures-part1.Rmd" "a83070b1d04789704c8173e6813aba66" "site/built/03-data-structures-part1.md" "2023-11-21"
"episodes/04-data-structures-part2.Rmd" "22100d1539c25cba0459d909f346f516" "site/built/04-data-structures-part2.md" "2023-11-21"
"episodes/04-data-structures-part2.Rmd" "df5db7ccfc08dc2a55831652fc07de31" "site/built/04-data-structures-part2.md" "2024-01-11"
"episodes/05-data-subsetting.Rmd" "b673744f991a865b9996504197cc013e" "site/built/05-data-subsetting.md" "2023-11-21"
"episodes/06-dplyr.Rmd" "5d6106566981f73f1e3dc6a5c011fa28" "site/built/06-dplyr.md" "2023-11-21"
"episodes/07-plot-ggplot2.Rmd" "7cbd4da57c055ecbc3ee80bd2694497a" "site/built/07-plot-ggplot2.md" "2023-11-21"
Expand Down

0 comments on commit 5f4bd24

Please sign in to comment.