Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typos and other minor corrections #151

Merged
merged 1 commit into from
Nov 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 7 additions & 10 deletions _episodes/01-r-plotting.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ You should now have a line of text in your code file that started with `gapminde

What if we want to run this command from our code file?

In order to run code that you've typed in the editor, you have a few options. We can click <kbd>Run</kbd> again from the right side of the **Editor** tab but the quickest way to run the code is by pressing <kbd>Ctrl</kbd>+<kbd>Enter</kbd> on your keyboard.
In order to run code that you've typed in the editor, you have a few options. We can click <kbd>Run</kbd> again from the right side of the **Editor** tab but the quickest way to run the code is by pressing <kbd>Ctrl</kbd>+<kbd>Enter</kbd> on your keyboard (<kbd>Cmd</kbd>+<kbd>Enter</kbd> on Mac).

This will run the line of code that currently contains your cursor and will move your cursor to the next line. Note that when Rstudio runs your code, it basically just copies your code from the **Editor** window to the **Console** window, just like what happened when we selected <kbd>Run Selected Line(s)</kbd>.

Expand Down Expand Up @@ -709,7 +709,7 @@ There are also lots of other fun options:
> {: .solution}
{: .challenge}

Since we have the data for the population of each country, we might be curious what effect population might have on life expectancy and GDP per capita. Do you think larger countires will have a longer or shorter life expectancy? Let's find out by mapping the population of each country to the size of our points.
Since we have the data for the population of each country, we might be curious what effect population might have on life expectancy and GDP per capita. Do you think larger countries will have a longer or shorter life expectancy? Let's find out by mapping the population of each country to the size of our points.


~~~
Expand Down Expand Up @@ -803,7 +803,7 @@ Many datasets are much more complex than the example we used for the first plot.
## Importing datasets
_[Back to top](#contents)_

In the first plot, we looked at a smaller slice of a large dataset. To gain a better understanding of the kinds of patterns we might observe in our own data, we will now use the full dataset, which is stored in a called "gapminder_data.csv".
In the first plot, we looked at a smaller slice of a large dataset. To gain a better understanding of the kinds of patterns we might observe in our own data, we will now use the full dataset, which is stored in a file called "gapminder_data.csv".

To start, we will read in the data without using the interactive RStudio file navigation.

Expand Down Expand Up @@ -923,7 +923,7 @@ Sometimes plots like this are called "spaghetti plots" because all the lines loo
> > {: .language-r}
> >
> > <img src="../fig/rmd-01-gapminderMoreLines-1.png" title="plot of chunk gapminderMoreLines" alt="plot of chunk gapminderMoreLines" width="612" style="display: block; margin: auto;" />
> > (China and India are the two Asian countries that have experience massive population growth from 1952-2007.)
> > (China and India are the two Asian countries that have experienced massive population growth from 1952-2007.)
> {: .solution}
{: .challenge}

Expand Down Expand Up @@ -952,8 +952,8 @@ We've previously used the discrete values of the `continent` column to color in

This type of visualization makes it easy to compare the range and spread of values across groups. The "middle" 50% of the data is located inside the box and outliers that are far away from the central mass of the data are drawn as points.

> ## Bonus Exercise:
> Take a look a the ggplot cheat sheet. Find all the geoms listed under "Discrete X, Continuous Y". Try replacing `geom_boxplot` with one of these other functions.
> ## Bonus Exercise: Other discrete geoms
> Take a look a the ggplot [cheat sheet](https://ggplot2.tidyverse.org/). Find all the geoms listed under "Discrete X, Continuous Y". Try replacing `geom_boxplot` with one of these other functions.
>
> > ## Example solution
> >
Expand Down Expand Up @@ -1197,7 +1197,7 @@ ggplot(gapminder_1997) +
Try different values like 5 or 50 to see how the plot changes.

> ## Bonus Exercise: One variable plots
> Rather than a histogram, choose one of the other geometries listed under "One Variable" plots on the ggplot cheat sheet. Note that we used `lifeExp` here which has continuous values. If you want to try the discrete options, try mapping `continent` to x instead.
> Rather than a histogram, choose one of the other geometries listed under "One Variable" plots on the ggplot [cheat sheet](https://ggplot2.tidyverse.org/). Note that we used `lifeExp` here which has continuous values. If you want to try the discrete options, try mapping `continent` to x instead.
>
> > ## Example solution
> >
Expand Down Expand Up @@ -1421,9 +1421,6 @@ library(gifski)
> **Part 1:**
> Let's start by creating a static plot using `ggplot()`, as we've been doing so far. This time, lets put `log(gdpPercap)` on the x-axis, to help spread out our data points, and life expectancy on our y-axis. Also map the point size to the population of the country, and the color of the points to the continent.
>
> **Part 1:**
> Let's start by creating a static plot using `ggplot()`, as we've been doing so far. This time let's put `log(gdpPercap)` on the x-axis, to help spread out our data points, and life expectancy on our y-axis. Also map the point size to the population of the country, and the color of the points to the continent.
>
> > ## Solution
> >
> >
Expand Down
4 changes: 2 additions & 2 deletions _episodes/02-unix-shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ When you’re following along in the lesson, don’t type the prompt when typing
To make the prompt the same for all of us, run this command:

```
PS1=’$ ‘
PS1='$ '
```
{: .language-bash}

Expand Down Expand Up @@ -159,7 +159,7 @@ This error message tells us the command we tried to run, `ks`, is not a command
## Man and Help
_[Back to top](#contents)_

Now that we know how to list files with `ls`, we can learn how to look up the manual pages for unix shell commands. If you want to learn more about a command we can use `man` to look up its manual page. which will open with `ls`. We can navigate the man page to view the description of a command and its options. For example, if you want to know more about the navigation options of `ls` you can type `man ls` on the command line.
Now that we know how to list files with `ls`, we can learn how to look up the manual pages for unix shell commands. If you want to learn more about a command, we can use `man` to look up its manual page. We can navigate the man page to view the description of a command and its options. For example, if you want to know more about the navigation options of `ls` you can type `man ls` on the command line.

```
man ls
Expand Down
6 changes: 3 additions & 3 deletions _episodes/03-intro-git-github.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ $ git config --global core.editor "nano -w"
{: .language-bash}

If you have a different preferred text editor, it is possible to reconfigure the text editor for Git to other editors whenever you want to change it.
Vim is the default editor. If did not change your editor and stuck in this editor, the following instructions will help you exit.
Vim is the default editor. If you did not change your editor and are stuck in Vim, the following instructions will help you exit.

> ## Exiting Vim
>
Expand Down Expand Up @@ -1648,7 +1648,7 @@ then enter your partner's username.

To accept access to the Owner's repo, the Collaborator
needs to go to [https://github.com/notifications](https://github.com/notifications).
Once there she can accept access to the Owner's repo.
Once there they can accept access to the Owner's repo.

Next, the Collaborator needs to download a copy of the Owner's repository to her
machine. This is called "cloning a repo". To clone the Owner's repo into
Expand All @@ -1661,7 +1661,7 @@ $ git clone https://github.com/USERNAME/un-report.git ~/Desktop/USERNAME-un-repo

Replace `USERNAME` with the Owner's username.

The Collaborator can now make a change in her clone of the Owner's repository,
The Collaborator can now make a change in their clone of the Owner's repository,
exactly the same way as we've been doing before:

```
Expand Down
22 changes: 11 additions & 11 deletions _episodes/04-r-data-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ _[Back to top](#contents)_

First, navigate to the un-reports directory however you'd like and open `un-report.Rproj`.
This should open the un-report R project in RStudio.
You can checkt his by seeing if the Files in the bottom right of RStudio are the ones in your `un-report` directory.
You can check this by seeing if the Files in the bottom right of RStudio are the ones in your `un-report` directory.

Yesterday we spent a lot of time making plots in R using the ggplot2 package. Visualizing data using plots is a very powerful skill in R, but what if we would like to work with only a subset of our data? Or clean up messy data, calculate summary statistics, create a new variable, or join two datasets together? There are several different methods for doing this in R, and we will touch on a few today using functions the `dplyr` package.

Expand Down Expand Up @@ -130,7 +130,7 @@ cols(
~~~
{: .output}

Notice that the output of the `read_csv()` function is pretty informative. It tells us the name of all of our column headers as how it interpreted the data type. This birds-eye-view can help you take a quick look that everything is how we expect it to be.
Notice that the output of the `read_csv()` function is pretty informative. It tells us the name of all of our column headers as well as how it interpreted the data type. This birds-eye-view can help you take a quick look that everything is how we expect it to be.

Now we have the tools necessary to work through this lesson.

Expand Down Expand Up @@ -200,7 +200,7 @@ Note that you don't have to quotes around this new name as long as it starts wit
>
> > ## Solution:
> >
> > If you do want to use spaces or other characters, You should wrap the name in quotes.
> > If you do want to use spaces or other characters, you should wrap the name in quotes.
> >
> >
> > ~~~
Expand Down Expand Up @@ -316,7 +316,7 @@ gapminder_data %>%
> {: .solution}
{: .challenge}

Notice how the pipe operator (`%>%`) allows us to combine these two simple steps into a more complicated data extraction?. We took the data, filtered out the rows, then took the mean value. The argument we pass to `filter()` needs to be some expression that will return TRUE or FALSE. We can use comparisons like `>` (greater than) and `<` (less than) for example. Here we tested for equality using a double equals sign `==`. You use `==` (double equals) when testing if two values are equal, and you use `=` (single equals) when naming arguments that you are passing to functions). Try changing it to use `filter(year = 2007)` and see what happens.
Notice how the pipe operator (`%>%`) allows us to combine these two simple steps into a more complicated data extraction?. We took the data, filtered out the rows, then took the mean value. The argument we pass to `filter()` needs to be some expression that will return TRUE or FALSE. We can use comparisons like `>` (greater than) and `<` (less than) for example. Here we tested for equality using a double equals sign `==`. You use `==` (double equals) when testing if two values are equal, and you use `=` (single equals) when naming arguments that you are passing to functions. Try changing it to use `filter(year = 2007)` and see what happens.

## Grouping rows using `group_by()` {#grouping-rows-using-group_by}
_[Back to top](#contents)_
Expand Down Expand Up @@ -679,7 +679,7 @@ gapminder_data %>%
~~~
{: .output}

Notice here that we tell `pivot_wider()` which columns to pull the names we wish our new columns to be named from the year variable, and the values to populate those columns from the lifeExp variable. (Again, neither of which have to be in quotes in the code when there are no special characters or spaces - certainly an incentive not to use special characters or spaces!) We see that the resulting tables have new columns by year, and the values populate it with our remaining variables dictating the rows.
Notice here that we tell `pivot_wider()` which columns to pull the names we wish our new columns to be named from the year variable, and the values to populate those columns from the lifeExp variable. (Again, neither of which have to be in quotes in the code when there are no special characters or spaces - certainly an incentive not to use special characters or spaces!) We see that the resulting table has new columns by year, and the values populate it with our remaining variables dictating the rows.

# Cleaning up data
_[Back to top](#contents)_
Expand Down Expand Up @@ -793,7 +793,7 @@ cols(

Now we get a similar Warning message as before, but the outputted table looks better.

> **Warnings and Errors:**It's important to differentiate between Warnings and Errors in R. A warning tells us, "you might want to know about this issue, but R still did what you asked". An error tells us, "there's something wrong with your code or your data and R didn't do what you asked". You need to fix any errors that arise. Warnings, are probably best to resolve or at least understand why they are coming up.
> **Warnings and Errors: **It's important to differentiate between Warnings and Errors in R. A warning tells us, "you might want to know about this issue, but R still did what you asked". An error tells us, "there's something wrong with your code or your data and R didn't do what you asked". You need to fix any errors that arise. Warnings, are probably best to resolve or at least understand why they are coming up.
{.callout}

We can resolve this warning by one of two methods. First, we can tell `read_csv()` what the column names should be with the `col_names()` argument where we give it the column names we want within the c() function separated by commas. If we do this, then we need to set skip to 2 to also skip the column headings.
Expand Down Expand Up @@ -948,7 +948,7 @@ cols(
~~~
{: .output}

Both of these strategies are useful for helping us to clean up our data. Which you ultimately use for this project is a matter of personal preference. We'll go with the first option where we used col_names so that we don't have to worry about the Warning message.
Both of these strategies are useful for helping us to clean up our data. What you ultimately use for this project is a matter of personal preference. We'll go with the first option where we used col_names so that we don't have to worry about the Warning message.


~~~
Expand Down Expand Up @@ -1278,7 +1278,7 @@ cols(
# Joining data frames
_[Back to top](#contents)_

Now we're ready to join our CO2 emissions data to the gapminder data. Previously we saw that we could read in and filter the gapminder data like this to get the data from the Americas for 2007 (this will overwrite our previous gapminder_data:
Now we're ready to join our CO2 emissions data to the gapminder data. Previously we saw that we could read in and filter the gapminder data like this to get the data from the Americas for 2007 (this will overwrite our previous gapminder_data):


~~~
Expand Down Expand Up @@ -1306,7 +1306,7 @@ cols(

Look at the data in co2_emissions and gapminder_data. If you had to merge these two data frames together, which column would you use to merge them together? If you said "country" - good job!

We'll call country our "key". Now, when we join them together, can you think of any problems we might run into when we merge things? We might not have CO2 emissions data for all of the countries in the gapminder dataset and vice versa. Also, a country might be represented in both data frames but not by the same name in both places. As an example, write down the name of the country that the University of Michigan is in - we'll come back to you answer shortly!
We'll call country our "key". Now, when we join them together, can you think of any problems we might run into when we merge things? We might not have CO2 emissions data for all of the countries in the gapminder dataset and vice versa. Also, a country might be represented in both data frames but not by the same name in both places. As an example, write down the name of the country that the University of Michigan is in - we'll come back to your answer shortly!

The dplyr package has a number of tools for joining data frames together depending on what we want to do with the rows of the data of countries that are not represented in both data frames. Let's look at some cartoon examples and then come back to our own data.

Expand Down Expand Up @@ -1466,7 +1466,7 @@ anti_join(gapminder_data, co2_emissions, by="country")
~~~
{: .output}

Now we see that our recode enabled the join for all countries in the gapminder, and we are left with Puerto Rico. In the next exercise, Let's recode Puerto Rico as United States in the gapminder data and then use `group_by()` and `summarize()` to aggregate the data; we'll use the population data to weight the life expectancy and GDP values.
Now we see that our recode enabled the join for all countries in the gapminder, and we are left with Puerto Rico. In the next exercise, let's recode Puerto Rico as United States in the gapminder data and then use `group_by()` and `summarize()` to aggregate the data; we'll use the population data to weight the life expectancy and GDP values.

> ## Exercise: Data cleaning to facilitate a join
>
Expand Down Expand Up @@ -1675,7 +1675,7 @@ gapminder_co2 %>%
~~~
{: .output}

The `if_else()` statement reads like, "if country equals "Canada" OR `|` "United states" OR "Mexico", The new variable region should be "north", else "south"". It's worth exploring logical operators for "or" `|`, "and" `&&`, and "not" `!`, which opens up a great deal of possibilities for writing code to do what you want.
The `if_else()` statement reads like, "if country equals "Canada" OR `|` "United states" OR "Mexico", the new variable region should be "north", else "south"". It's worth exploring logical operators for "or" `|`, "and" `&&`, and "not" `!`, which opens up a great deal of possibilities for writing code to do what you want.

We see that although Canada, the United States, and Mexico account for close to half the population of the Americas, they account for 88% of the CO2 emitted. We just did this math quickly by plugging the numbers from our table into the console to get the percentages. Can we make that a little more reproducible by calculating percentages for population (pop) and total emissions (total) into our data before summarizing?

Expand Down