Skip to content

Commit

Permalink
lab 3 execution changes
Browse files Browse the repository at this point in the history
  • Loading branch information
atheobold committed Apr 24, 2023
1 parent 6fbc8ae commit ab482cc
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions _freeze/labs/lab-3/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
{
"hash": "52d2b25fd601cacd734c8055d60652d3",
"hash": "6820a0df0478f44e85ce256fc43de23d",
"result": {
"markdown": "---\ntitle: \"Lab 3: Incorporating Categorical Variables\"\nauthor: \"Your group's names here!\"\ndate: \"April 21, 2023\"\nformat: \n html:\n embed-resources: true\n standalone: true\neditor: visual\nexecute: \n echo: true\n eval: false\n message: false\n---\n\n\n# Getting started\n\n## Load packages\n\nIn this lab, we will explore and visualize the data using packages housed in the **tidyverse** suite of packages.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Package for ggplot and dplyr tools\nlibrary(tidyverse)\n\n## Package for ecological data\nlibrary(lterdatasampler)\n\n## Package for density ridge plots\nlibrary(ggridges)\n```\n:::\n\n\n## The data\n\nIn this lab we will work with data from the H.J. Andrews Experimental Forest. The following is a description of the data:\n\n> Populations of West Slope cutthroat trout (Onchorhyncus clarki clarki) in two standard reaches of Mack Creek in the H.J. Andrews Experimental Forest have been monitored since 1987. Monitoring of Pacific Giant Salamanders, Dicamptodon tenebrosus began in 1993. The two standard reaches are in a section of clearcut forest (ca. 1963) and an upstream 500 year old coniferous forest. Sub-reaches are sampled with 2-pass electrofishing, and all captured vertebrates are measured and weighed. Additionally, a set of channel measurements are taken with each sampling. This study constitutes one of the longest continuous records of salmonid populations on record.\n\nFirst, we'll view the `and_vertebrates` dataframe where these data are stored.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nView(and_vertebrates)\n```\n:::\n\n\n## Exploring the Dataset\n\nThe **codebook** (description of the variables) can be accessed by pulling up the help file by typing a `?` before the name of the dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?and_vertebrates\n```\n:::\n\n\n**Question 1 -- How large is the `and_vertebrates` dataset? (i.e. How many rows and columns does the dataset have?)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 1 (and 2) goes here!\n```\n:::\n\n\n**Question 2 -- Are there categorical variables in the dataset? If so, what are their names?**\n\n## Accessing the Levels of a Variable\n\nThe `species` variable refers to the species of the animal which was captured. You can use the `distinct()` function to access the distinct values of a categorical variable (e.g., `distinct(nycflights, carrier)`). Notice the first input is the name of the dataset and the second input is the name of the categorical variable!\n\n**Question 3 -- Use the `distinct()` function to discover the levels / values of the `species` variable.**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 3 goes here!\n```\n:::\n\n\n# Data Wrangling\n\nAlright, you should have found that there is more than one species included in these data. For our analysis, we are only interested in Cutthroat trout.\n\n**Question 4 -- Use the `filter()` function to include *only* observations on Cutthroat trout.**\n\n**The only part you need to remove is the ...! Keep the `trout <-`!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n[Electrofishing](https://en.wikipedia.org/wiki/Electrofishing) fishing technique that uses direct current electricity flowing between a submerged cathode and anode, to insert an electric current into the water. This current stuns fish in a (hopefully) non-lethal manner, in order to capture them for marking and measuring. Technically, smaller fish are less affected by the current, so there presumably is a size of fish that is \"uncatchable\".\n\n**Question 5 -- Use the `filter()` function again to include *only* trout whose `length_1_mm` is greater than 4 inches (or 101 mm).**\n\n*The only part you need to remove is the ...! Keep the `trout <-`!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n# Data Visualizations\n\nAlright, now that we've gotten our data ready for analysis, let's start with some visualizations\n\n**Question 6 -- Using `ggplot()` create a visualization of the *distribution* of the lengths of the Cutthroat trout (from the `trout` dataset you `filter`ed above).**\n\n*Keep in mind your plot should only extend to 101mm if you completed #5 correctly*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 6 goes here!\n```\n:::\n\n\n**Question 7 -- Name three possible sources of variation for the length of a Cutthroat trout.**\n\n## Adding Categorical Variables\n\nWhen we are interested in comparing the distribution of a numerical variable across groups of a categorical variable, we \"typically\" see people use stacked histograms or side-by-side boxplots. I believe an unsung hero of these types of comparisons is the **ridge plot**.\n\nAs introduced in *Introduction to Modern Statistics*, a ridge plot essentially has multiple density plots stacked in the same plotting window. A key feature of ridge plots is a categorical variable is **always** on the y-axis, with a numeric variable on the x-axis.\n\nIn R, we use the `geom_density_ridges()` function from the **ggridges** package to create a ridge plot. Yes, this is new, but don't worry! The function has the same layout as things you've seen before.\n\n**Question 8 -- Fill in the code below to create a ridge plot comparing the lengths of Cutthroat trout between the different types of channels (`unittype`). Use the `trout` dataset you `filter`ed above!**\n\n*Be sure to add nice axis labels to your plot, which describe the variables being plotted (and their units)!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 8 goes here!\n\nggplot(data = fish, \n mapping = aes(x = <NUMERICAL VARIABLE>, \n y = <CATEGORICAL VARIABLE>)\n ) +\n geom_density_ridges() \n```\n:::\n\n\n**Question 9 -- Modify your plot from #8 to incorporate the `section` of the forest into your plot using either color or facets.**\n\n*Hint: The `fill` aesthetic will **fill** the ridge plots with color.*\n\n**Question 10 -- Based on your plot, how different are the lengths of the Cutthroat trout between the different channel types and forest sections?**\n\n# Data Summaries\n\nPaired with visualizations, summary statistics can provide a clearer picture for the comparisons we are interested in. To obtain summary statistics for different groups of a categorical variable, we need to use our friend the `group_by()` function.\n\n**Question 11 -- Find the average length of Cutthroat trout from the different channel types (`unittype`). Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 11 goes here!\n```\n:::\n\n\n**Question 12 -- Find the average length of Cutthroat trout from the different channel types (`unittype`) *and* forest `section`. Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 12 goes here!\n```\n:::\n\n\n**Question 13 -- How do the differences in these averages compare with what you saw in your visualization in Question 9?**\n",
"supporting": [],
"markdown": "---\ntitle: \"Lab 3: Incorporating Categorical Variables\"\nauthor: \"Your group's names here!\"\ndate: \"April 21, 2023\"\nformat: \n html:\n embed-resources: true\n standalone: true\neditor: visual\nexecute: \n echo: true\n eval: false\n message: false\n warning: false\n---\n\n\n# Getting started\n\n## Load packages\n\nIn this lab, we will explore and visualize the data using packages housed in the **tidyverse** suite of packages.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Package for ggplot and dplyr tools\nlibrary(tidyverse)\n\n## Package for ecological data\nlibrary(lterdatasampler)\n\n## Package for density ridge plots\nlibrary(ggridges)\n```\n:::\n\n\n## The data\n\nIn this lab we will work with data from the H.J. Andrews Experimental Forest. The following is a description of the data:\n\n> Populations of West Slope cutthroat trout (Onchorhyncus clarki clarki) in two standard reaches of Mack Creek in the H.J. Andrews Experimental Forest have been monitored since 1987. Monitoring of Pacific Giant Salamanders, Dicamptodon tenebrosus began in 1993. The two standard reaches are in a section of clearcut forest (ca. 1963) and an upstream 500 year old coniferous forest. Sub-reaches are sampled with 2-pass electrofishing, and all captured vertebrates are measured and weighed. Additionally, a set of channel measurements are taken with each sampling. This study constitutes one of the longest continuous records of salmonid populations on record.\n\nFirst, we'll view the `and_vertebrates` dataframe where these data are stored.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nView(and_vertebrates)\n```\n:::\n\n\n## Exploring the Dataset\n\nThe **codebook** (description of the variables) can be accessed by pulling up the help file by typing a `?` before the name of the dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?and_vertebrates\n```\n:::\n\n\n**Question 1 -- How large is the `and_vertebrates` dataset? (i.e. How many rows and columns does the dataset have?)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 1 (and 2) goes here!\n```\n:::\n\n\n**Question 2 -- Are there categorical variables in the dataset? If so, what are their names?**\n\n## Accessing the Levels of a Variable\n\nThe `species` variable refers to the species of the animal which was captured. You can use the `distinct()` function to access the distinct values of a categorical variable (e.g., `distinct(nycflights, carrier)`). Notice the first input is the name of the dataset and the second input is the name of the categorical variable!\n\n**Question 3 -- Use the `distinct()` function to discover the levels / values of the `species` variable.**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 3 goes here!\n```\n:::\n\n\n# Data Wrangling\n\nAlright, you should have found that there is more than one species included in these data. For our analysis, we are only interested in Cutthroat trout.\n\n**Question 4 -- Use the `filter()` function to include *only* observations on Cutthroat trout.**\n\n**The only part you need to remove is the ...! Keep the `trout <-`!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n[Electrofishing](https://en.wikipedia.org/wiki/Electrofishing) fishing technique that uses direct current electricity flowing between a submerged cathode and anode, to insert an electric current into the water. This current stuns fish in a (hopefully) non-lethal manner, in order to capture them for marking and measuring. Technically, smaller fish are less affected by the current, so there presumably is a size of fish that is \"uncatchable\".\n\n**Question 5 -- Use the `filter()` function again to include *only* trout whose `length_1_mm` is greater than 4 inches (or 101 mm).**\n\n*The only part you need to remove is the ...! Keep the `trout <-`!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n# Data Visualizations\n\nAlright, now that we've gotten our data ready for analysis, let's start with some visualizations\n\n**Question 6 -- Using `ggplot()` create a visualization of the *distribution* of the lengths of the Cutthroat trout (from the `trout` dataset you `filter`ed above).**\n\n*Keep in mind your plot should only extend to 101mm if you completed #5 correctly*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 6 goes here!\n```\n:::\n\n\n**Question 7 -- Name three possible sources of variation for the length of a Cutthroat trout.**\n\n## Adding Categorical Variables\n\nWhen we are interested in comparing the distribution of a numerical variable across groups of a categorical variable, we \"typically\" see people use stacked histograms or side-by-side boxplots. I believe an unsung hero of these types of comparisons is the **ridge plot**.\n\nAs introduced in *Introduction to Modern Statistics*, a ridge plot essentially has multiple density plots stacked in the same plotting window. A key feature of ridge plots is a categorical variable is **always** on the y-axis, with a numeric variable on the x-axis.\n\nIn R, we use the `geom_density_ridges()` function from the **ggridges** package to create a ridge plot. Yes, this is new, but don't worry! The function has the same layout as things you've seen before.\n\n**Question 8 -- Fill in the code below to create a ridge plot comparing the lengths of Cutthroat trout between the different types of channels (`unittype`). Use the `trout` dataset you `filter`ed above!**\n\n*Be sure to add nice axis labels to your plot, which describe the variables being plotted (and their units)!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 8 goes here!\n\nggplot(data = fish, \n mapping = aes(x = <NUMERICAL VARIABLE>, \n y = <CATEGORICAL VARIABLE>)\n ) +\n geom_density_ridges() \n```\n:::\n\n\n**Question 9 -- Modify your plot from #8 to incorporate the `section` of the forest into your plot using either color or facets.**\n\n*Hint: The `fill` aesthetic will **fill** the ridge plots with color.*\n\n**Question 10 -- Based on your plot, how different are the lengths of the Cutthroat trout between the different channel types and forest sections?**\n\n# Data Summaries\n\nPaired with visualizations, summary statistics can provide a clearer picture for the comparisons we are interested in. To obtain summary statistics for different groups of a categorical variable, we need to use our friend the `group_by()` function.\n\n**Question 11 -- Find the average length of Cutthroat trout from the different channel types (`unittype`). Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 11 goes here!\n```\n:::\n\n\n**Question 12 -- Find the average length of Cutthroat trout from the different channel types (`unittype`) *and* forest `section`. Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 12 goes here!\n```\n:::\n\n\n**Question 13 -- How do the differences in these averages compare with what you saw in your visualization in Question 9?**\n",
"supporting": [
"lab-3_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down

0 comments on commit ab482cc

Please sign in to comment.