Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new section to lesson 4 #149

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 59 additions & 4 deletions episodes/04-data-structures-part2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat

::::::::::::::::::::::::::::::::::::::::: instructor

Pay attention to and explain the errors and warnings generated from the
Pay attention to and explain the errors and warnings generated from the
examples in this episode.

:::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::

```{r, echo=TRUE}
gapminder <- read.csv("data/gapminder_data.csv")
Expand Down Expand Up @@ -75,7 +75,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind

- You can read directly from excel spreadsheets without
converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.


::::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -86,10 +86,12 @@ always do is check out what the data looks like with `str`:
str(gapminder)
```

We can also examine individual columns of the data frame with our `class` function:
We can also examine individual columns of the data frame with the `class` or
'typeof' functions:

```{r}
class(gapminder$year)
typeof(gapminder$year)
class(gapminder$country)
str(gapminder$country)
```
Expand Down Expand Up @@ -281,6 +283,59 @@ tail(gapminder_norway)

To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.


## Removing columns and rows in data frames

To remove columns from a data frame, we can use the 'subset' function.
This function allows us to remove columns using their names:

```{r}
life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
head(life_expectancy)
```

We can also use a logical vector to achieve the same result. Make sure the
vector's length match the number of columns in the data frame (to avoid vector
recycling):

```{r}
life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
head(life_expectancy)
```

Alternatively, we can use column's positions:

```{r}
life_expectancy <- gapminder[-c(3, 4, 6)]
head(life_expectancy)
```

Note that the easy way to remove rows from a data frame is selecting the rows
we want to keep instead.
Anyway, to remove rows from a data frame, we can use their positions:

```{r}
# Filter data for Afghanistan during the 20th century:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
gapminder$year > 2000, ]

# Now remove data for 2002, that is, the first row:
afghanistan_20c[-1, ]
```


An interesting case is removing rows containing NAs:

```{r}
# Turn some values into NAs:
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA

# Remove NAs
na.omit(afghanistan_20c)
```


## Factors

Here is another thing to look out for: in a `factor`, each different value
Expand Down
Loading