differences for PR #149

datacarpentry · Jan 11, 2024 · 5f4bd24 · 5f4bd24
1 parent e684e92
commit 5f4bd24
Show file tree

Hide file tree

Showing 4 changed files with 112 additions and 5 deletions.
diff --git a/04-data-structures-part2.md b/04-data-structures-part2.md
@@ -36,10 +36,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
 
 :::::::::::::::::::::::::::::::::::::::::  instructor
 
-Pay attention to and explain the errors and warnings generated from the 
+Pay attention to and explain the errors and warnings generated from the
 examples in this episode.
 
-:::::::::::::::::::::::::::::::::::::::::  
+:::::::::::::::::::::::::::::::::::::::::
 
 
 ```r
@@ -77,7 +77,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
 
 - You can read directly from excel spreadsheets without
   converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.
-  
+
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -99,7 +99,8 @@ str(gapminder)
  $ gdpPercap: num  779 821 853 836 740 ...
 ```
 
-We can also examine individual columns of the data frame with our `class` function:
+We can also examine individual columns of the data frame with the `class` or
+'typeof' functions:
 
 
 ```r
@@ -110,6 +111,14 @@ class(gapminder$year)
 [1] "integer"
 ```
 
+```r
+typeof(gapminder$year)
+```
+
+```{.output}
+[1] "integer"
+```
+
 ```r
 class(gapminder$country)
 ```
@@ -424,6 +433,104 @@ tail(gapminder_norway)
 
 To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
 
+
+## Removing columns and rows in data frames
+
+To remove columns from a data frame, we can use the 'subset' function.
+This function allows us to remove columns using their names:
+
+
+```r
+life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
+head(life_expectancy)
+```
+
+```{.output}
+      country year lifeExp below_average
+1 Afghanistan 1952  28.801          TRUE
+2 Afghanistan 1957  30.332          TRUE
+3 Afghanistan 1962  31.997          TRUE
+4 Afghanistan 1967  34.020          TRUE
+5 Afghanistan 1972  36.088          TRUE
+6 Afghanistan 1977  38.438          TRUE
+```
+
+We can also use a logical vector to achieve the same result. Make sure the
+vector's length match the number of columns in the data frame (to avoid vector
+recycling):
+
+
+```r
+life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
+head(life_expectancy)
+```
+
+```{.output}
+      country year lifeExp below_average
+1 Afghanistan 1952  28.801          TRUE
+2 Afghanistan 1957  30.332          TRUE
+3 Afghanistan 1962  31.997          TRUE
+4 Afghanistan 1967  34.020          TRUE
+5 Afghanistan 1972  36.088          TRUE
+6 Afghanistan 1977  38.438          TRUE
+```
+
+Alternatively, we can use column's positions:
+
+
+```r
+life_expectancy <- gapminder[-c(3, 4, 6)]
+head(life_expectancy)
+```
+
+```{.output}
+      country year lifeExp below_average
+1 Afghanistan 1952  28.801          TRUE
+2 Afghanistan 1957  30.332          TRUE
+3 Afghanistan 1962  31.997          TRUE
+4 Afghanistan 1967  34.020          TRUE
+5 Afghanistan 1972  36.088          TRUE
+6 Afghanistan 1977  38.438          TRUE
+```
+
+Note that the easy way to remove rows from a data frame is selecting the rows
+we want to keep instead.
+Anyway, to remove rows from a data frame, we can use their positions:
+
+
+```r
+# Filter data for Afghanistan during the 20th century:
+afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
+                             gapminder$year > 2000, ]
+
+# Now remove data for 2002, that is, the first row:
+afghanistan_20c[-1, ]
+```
+
+```{.output}
+       country year      pop continent lifeExp gdpPercap below_average
+12 Afghanistan 2007 31889923      Asia  43.828  974.5803          TRUE
+```
+
+
+An interesting case is removing rows containing NAs:
+
+
+```r
+# Turn some values into NAs:
+afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
+afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
+
+# Remove NAs
+na.omit(afghanistan_20c)
+```
+
+```{.output}
+       country year      pop continent lifeExp gdpPercap below_average
+12 Afghanistan 2007 31889923      Asia  43.828  974.5803          TRUE
+```
+
+
 ## Factors
 
 Here is another thing to look out for: in a `factor`, each different value

diff --git a/fig/06-rmd-generate-figures.sh b/fig/06-rmd-generate-figures.sh
diff --git a/fig/12-plyr-generate-figures.sh b/fig/12-plyr-generate-figures.sh
diff --git a/md5sum.txt b/md5sum.txt
@@ -6,7 +6,7 @@
 "episodes/01-rstudio-intro.Rmd" "f4e11815e378019213cd8bc32bd5d292" "site/built/01-rstudio-intro.md" "2023-11-21"
 "episodes/02-project-intro.Rmd" "00024461ca6e3ea1ec659cf9434377d4" "site/built/02-project-intro.md" "2023-11-21"
 "episodes/03-data-structures-part1.Rmd" "a83070b1d04789704c8173e6813aba66" "site/built/03-data-structures-part1.md" "2023-11-21"
-"episodes/04-data-structures-part2.Rmd" "22100d1539c25cba0459d909f346f516" "site/built/04-data-structures-part2.md" "2023-11-21"
+"episodes/04-data-structures-part2.Rmd" "df5db7ccfc08dc2a55831652fc07de31" "site/built/04-data-structures-part2.md" "2024-01-11"
 "episodes/05-data-subsetting.Rmd" "b673744f991a865b9996504197cc013e" "site/built/05-data-subsetting.md" "2023-11-21"
 "episodes/06-dplyr.Rmd" "5d6106566981f73f1e3dc6a5c011fa28" "site/built/06-dplyr.md" "2023-11-21"
 "episodes/07-plot-ggplot2.Rmd" "7cbd4da57c055ecbc3ee80bd2694497a" "site/built/07-plot-ggplot2.md" "2023-11-21"