Skip to content

Commit

Permalink
modify week 3 content, change week 3 slides for Tuesday asynch class,…
Browse files Browse the repository at this point in the history
… add R resource files
  • Loading branch information
atheobold committed Apr 14, 2023
1 parent 283dad0 commit 5827527
Show file tree
Hide file tree
Showing 40 changed files with 891 additions and 94 deletions.
8 changes: 3 additions & 5 deletions _freeze/labs/lab-3/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
{
"hash": "01df39bd88593e1680156b7475643629",
"hash": "a00cc991274599c5ff4e8fa8a82d205f",
"result": {
"markdown": "---\ntitle: \"Lab 3: Incorporating Categorical Variables\"\nauthor: \"Your group's names here!\"\ndate: \"April 21, 2023\"\nformat: \n html:\n embed-resources: true\n standalone: true\neditor: visual\nexecute: \n echo: true\n eval: false\n message: false\n---\n\n\n# Getting started\n\n## Load packages\n\nIn this lab, we will explore and visualize the data using packages housed in the **tidyverse** suite of packages.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Package for ggplot and dplyr tools\nlibrary(tidyverse)\n\n## Package for ecological data\nlibrary(lterdatasampler)\n\n## Package for density ridge plots\nlibrary(ggridges)\n```\n:::\n\n\n## The data\n\nIn this lab we will work with data eluded to in Tuesday's lecture, data from the H.J. Andrews Experimental Forest. The following is a description of the data:\n\n> Populations of West Slope cutthroat trout (Onchorhyncus clarki clarki) in two standard reaches of Mack Creek in the H.J. Andrews Experimental Forest have been monitored since 1987. Monitoring of Pacific Giant Salamanders, Dicamptodon tenebrosus began in 1993. The two standard reaches are in a section of clearcut forest (ca. 1963) and an upstream 500 year old coniferous forest. Sub-reaches are sampled with 2-pass electrofishing, and all captured vertebrates are measured and weighed. Additionally, a set of channel measurements are taken with each sampling. This study constitutes one of the longest continuous records of salmonid populations on record.\n\nFirst, we'll view the `and_vertebrates` dataframe where these data are stored.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nView(and_vertebrates)\n```\n:::\n\n\n## Exploring the Dataset\n\nThe **codebook** (description of the variables) can be accessed by pulling up the help file by typing a `?` before the name of the dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?and_vertebrates\n```\n:::\n\n\n**Question 1** -- How large is the `and_vertebrates` dataset? (i.e. How many rows and columns does it have?)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Your code to answer question 1 goes here! \n```\n:::\n\n\n**Question 2** -- Are there categorical variables in the dataset? If so, what are their names?\n\n## Accessing the Levels of a Variable\n\nThe `species` variable refers to the species of the animal which was captured. You can use the `distinct()` function to access the distinct values of a categorical variable (e.g., `distinct(nycflights, carrier)`).\n\n**Question 3** -- Use the `distinct()` function to discover the levels / values of the `species` variable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Your code to answer question 3 goes here! \n```\n:::\n\n\n# Data Wrangling\n\nAlright, you should have found that there is more than one species included in these data. For our analysis, we are only interested in Cutthroat trout.\n\n**Question 4** -- Use the `filter()` function to include *only* observations on Cutthroat trout.\n\n**The only part you need to remove is the ...! Keep the `trout <-`!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n[Electrofishing](https://en.wikipedia.org/wiki/Electrofishing) fishing technique that uses direct current electricity flowing between a submerged cathode and anode, to insert an electric current into the water. This current stuns fish in a (hopefully) non-lethal manner, in order to capture them for marking and measuring. Technically, smaller fish are less affected by the current, so there presumably is a size of fish that is \"uncatchable\".\n\n**Question 5** -- Use the `filter()` function again to include *only* trout whose `length_1_mm` is greater than 4 inches (or 101 mm).\n\n**The only part you need to remove is the ...! Keep the `trout <-`!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n# Data Visualizations\n\nAlright, now that we've gotten our data ready for analysis, let's start with some visualizations\n\n**Question 6** -- Using `ggplot()` create a visualization of the *distribution* of the lengths of the Cutthroat trout (from the `trout` dataset you `filter`ed above).\n\n\n::: {.cell}\n\n:::\n\n\n**Question 7** -- Name three possible sources of variation for the length of a Cutthroat trout.\n\n## Adding Categorical Variables\n\nWhen we are interested in comparing the distribution of a numerical variable across groups of a categorical variable, we \"typically\" see people use stacked histograms or side-by-side boxplots. I believe an unsung hero of these types of comparisons is the **ridge plot**.\n\nAs introduced in *Introduction to Modern Statistics*, a ridge plot essentially has multiple density plots stacked in the same plotting window. A key feature of ridge plots is a categorical variable is **always** on the y-axis, with a numeric variable on the x-axis.\n\nIn R, we use the `geom_density_ridges()` function from the **ggridges** package to create a ridge plot. Yes, this is new, but don't worry! The function has the same layout as things you've seen before.\n\n**Question 8** -- Fill in the code below to create a ridge plot comparing the lengths of Cutthroat trout between the different types of channels (`unittype`). Use the `trout` dataset you `filter`ed above!\n\n*Be sure to add nice axis labels to your plot, which describe the variables being plotted (and their units)!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = fish, \n mapping = aes(x = <NUM-VAR>, y = <CAT-VAR>)) +\n geom_density_ridges() \n```\n:::\n\n\n**Question 9** -- Incorporate the `section` of the forest into your plot using either the `fill` aesthetic or facets.\n\n\n::: {.cell}\n\n:::\n\n\n**Question 10** -- Based on your plot, how different are the lengths of the Cutthroat trout between the different channel types and forest sections?\n\n# Data Summaries\n\nPaired with visualizations, summary statistics can provide a clearer picture for the comparisons we are interested in. To obtain summary statistics for different groups of a categorical variable, we need to use our friend the `group_by()` function.\n\n**Question 11** -- Find the average length of Cutthroat trout from the different channel types (`unittype`). Use the `trout` dataset from Question 5!\n\n\n::: {.cell}\n\n:::\n\n\n**Question 12** -- Find the average length of Cutthroat trout from the different channel types (`unittype`) **and** forest `section`. Use the `trout` dataset from Question 5!\n\n\n::: {.cell}\n\n:::\n\n\n**Question 13** -- How do the differences in these averages compare with what you saw in your visualization in Question 9?\n",
"supporting": [
"lab-3_files"
],
"markdown": "---\ntitle: \"Lab 3: Incorporating Categorical Variables\"\nauthor: \"Your group's names here!\"\ndate: \"April 21, 2023\"\nformat: \n html:\n embed-resources: true\n standalone: true\neditor: visual\nexecute: \n echo: true\n eval: false\n message: false\n---\n\n\n# Getting started\n\n## Load packages\n\nIn this lab, we will explore and visualize the data using packages housed in the **tidyverse** suite of packages.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Package for ggplot and dplyr tools\nlibrary(tidyverse)\n\n## Package for ecological data\nlibrary(lterdatasampler)\n\n## Package for density ridge plots\nlibrary(ggridges)\n```\n:::\n\n\n## The data\n\nIn this lab we will work with data from the H.J. Andrews Experimental Forest. The following is a description of the data:\n\n> Populations of West Slope cutthroat trout (Onchorhyncus clarki clarki) in two standard reaches of Mack Creek in the H.J. Andrews Experimental Forest have been monitored since 1987. Monitoring of Pacific Giant Salamanders, Dicamptodon tenebrosus began in 1993. The two standard reaches are in a section of clearcut forest (ca. 1963) and an upstream 500 year old coniferous forest. Sub-reaches are sampled with 2-pass electrofishing, and all captured vertebrates are measured and weighed. Additionally, a set of channel measurements are taken with each sampling. This study constitutes one of the longest continuous records of salmonid populations on record.\n\nFirst, we'll view the `and_vertebrates` dataframe where these data are stored.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nView(and_vertebrates)\n```\n:::\n\n\n## Exploring the Dataset\n\nThe **codebook** (description of the variables) can be accessed by pulling up the help file by typing a `?` before the name of the dataset:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?and_vertebrates\n```\n:::\n\n\n**Question 1 -- How large is the `and_vertebrates` dataset? (i.e. How many rows and columns does it have?)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 1 (and 2) goes here!\n```\n:::\n\n\n**Question 2 -- Are there categorical variables in the dataset? If so, what are their names?**\n\n## Accessing the Levels of a Variable\n\nThe `species` variable refers to the species of the animal which was captured. You can use the `distinct()` function to access the distinct values of a categorical variable (e.g., `distinct(nycflights, carrier)`). Notice the first input is the name of the dataset and the second input is the name of the categorical variable!\n\n**Question 3 -- Use the `distinct()` function to discover the levels / values of the `species` variable.**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 3 goes here!\n```\n:::\n\n\n# Data Wrangling\n\nAlright, you should have found that there is more than one species included in these data. For our analysis, we are only interested in Cutthroat trout.\n\n**Question 4 -- Use the `filter()` function to include _only_ observations on Cutthroat trout.**\n\n\n**The only part you need to remove is the ...! Keep the `trout <-`!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n[Electrofishing](https://en.wikipedia.org/wiki/Electrofishing) fishing technique that uses direct current electricity flowing between a submerged cathode and anode, to insert an electric current into the water. This current stuns fish in a (hopefully) non-lethal manner, in order to capture them for marking and measuring. Technically, smaller fish are less affected by the current, so there presumably is a size of fish that is \"uncatchable\".\n\n**Question 5 -- Use the `filter()` function again to include _only_ trout whose `length_1_mm` is greater than 4 inches (or 101 mm).**\n\n*The only part you need to remove is the ...! Keep the `trout <-`!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntrout <- ...\n```\n:::\n\n\n# Data Visualizations\n\nAlright, now that we've gotten our data ready for analysis, let's start with some visualizations\n\n**Question 6 -- Using `ggplot()` create a visualization of the *distribution* of the lengths of the Cutthroat trout (from the `trout` dataset you `filter`ed above).**\n\n*Keep in mind your plot should only extend to 101mm if you completed #5 correctly*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 6 goes here!\n```\n:::\n\n\n**Question 7 -- Name three possible sources of variation for the length of a Cutthroat trout.**\n\n## Adding Categorical Variables\n\nWhen we are interested in comparing the distribution of a numerical variable across groups of a categorical variable, we \"typically\" see people use stacked histograms or side-by-side boxplots. I believe an unsung hero of these types of comparisons is the **ridge plot**.\n\nAs introduced in *Introduction to Modern Statistics*, a ridge plot essentially has multiple density plots stacked in the same plotting window. A key feature of ridge plots is a categorical variable is **always** on the y-axis, with a numeric variable on the x-axis.\n\nIn R, we use the `geom_density_ridges()` function from the **ggridges** package to create a ridge plot. Yes, this is new, but don't worry! The function has the same layout as things you've seen before.\n\n**Question 8 -- Fill in the code below to create a ridge plot comparing the lengths of Cutthroat trout between the different types of channels (`unittype`). Use the `trout` dataset you `filter`ed above!**\n\n*Be sure to add nice axis labels to your plot, which describe the variables being plotted (and their units)!*\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 8 goes here!\n\nggplot(data = fish, \n mapping = aes(x = <NUMERICAL VARIABLE>, \n y = <CATEGORICAL VARIABLE>)\n ) +\n geom_density_ridges() \n```\n:::\n\n\n**Question 9 -- Modify your plot from #8 to incorporate the `section` of the forest into your plot using either color or facets.**\n\n*Hint: The `fill` aesthetic will __fill__ the ridge plots with color.*\n\n**Question 10 -- Based on your plot, how different are the lengths of the Cutthroat trout between the different channel types and forest sections?**\n\n# Data Summaries\n\nPaired with visualizations, summary statistics can provide a clearer picture for the comparisons we are interested in. To obtain summary statistics for different groups of a categorical variable, we need to use our friend the `group_by()` function.\n\n**Question 11 -- Find the average length of Cutthroat trout from the different channel types (`unittype`). Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 11 goes here!\n```\n:::\n\n\n**Question 12 -- Find the average length of Cutthroat trout from the different channel types (`unittype`) _and_ forest `section`. Use the `trout` dataset from Question 5!**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Your code for question 12 goes here!\n```\n:::\n\n\n**Question 13 -- How do the differences in these averages compare with what you saw in your visualization in Question 9?**\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
14 changes: 14 additions & 0 deletions _freeze/resources/week4_regression/execute-results/docx.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "8b7269d1df9a94d41b068ecb442db97f",
"result": {
"markdown": "---\ntitle: \"Regression Modeling in R\"\nformat: docx\neditor: visual\nexecute: \n eval: false\n---\n\n\n## Data Modeling\n\n- `lm()` -- fits a linear model to a dataset\n\n - You specify the variables as a formula (`y ~ x`), where `y` is your response variable and `x` is your explanatory variable\n - The second argument is the name of the dataset (`data = penguins`)\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n ## Two quantitative explanatory variables\n model1 <- lm(bill_length_mm ~ bill_depth_mm + body_mass_g, data = penguins)\n \n ## One quantitative and one categorical explanatory variable\n model2 <- lm(bill_length_mm ~ bill_depth_mm + sex, data = penguins)\n ```\n :::\n\n\n\\vspace{0.5cm}\n\n- `get_regression_table()` -- produces a tidy table output of a regression model\n - Output includes coefficients, standard errors, p-values, and confidence intervals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_regression_table(model1)\n```\n:::\n\n\n\\vspace{0.5cm}\n\n- `summary()` -- produces a \"raw\" summary of a regression model\n - The \"untidy\" version of a regression summary.\n - Includes same information as `get_regression_table()`, but also includes $R^2$ and adjusted $R^2$.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(model2)\n```\n:::\n\n\n\\newpage\n\n- `tidy()` -- takes untidy output and creates a nice table!\n\n - Similar to `get_regression_table()`, but doesn't output confidence intervals.\n - Lives in the **broom** package\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n tidy(model2)\n ```\n :::\n\n\n\\vspace{0.5cm}\n\n- `get_regression_points()` -- provides information on each observation used in a `lm()` in a tidy table format\n - Produces a table with the variables included in the regression, and the residual associated with each observation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_regression_points(model1)\n```\n:::\n\n\n\\vspace{0.5cm}\n\n- `predict()` -- produces an untidy vector of the predicted y-values for each observation in the dataset\n - Can make predictions for new observations with the `newdata` argument.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npredict(model1)\n\nnew_penguin <- data.frame(bill_depth_mm = 200, body_mass_g = 500)\npredict(model1, newdata = new_penguin)\n```\n:::\n\n\n\\vspace{0.5cm}\n\n- `augment()` -- produces a tidy table of data values from a regression model\n\n - Lives in the **broom** package\n - Produces a table with the variables included in the regression, and 6 additional columns:\n - including `.fitted` (predicted y-value for that observation), `.resid` (residual for that observation)\n - Can make predictions for new observations with the `newdata` argument.\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n augment(model2)\n \n new_penguin <- data.frame(bill_depth_mm = 15, sex = \"female\")\n augment(model2, newdata = new_penguin)\n ```\n :::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": null,
"postProcess": false
}
}
Loading

0 comments on commit 5827527

Please sign in to comment.