# Practice Worksheet: Inferential questions and data visualisation

### Welcome to STAT 201: Statistical Inference for Data Science

Each week, you will complete a lecture assignment like this one. Before we get started, there are some administrative details.

You can only learn technical subjects with hands-on practice. The weekly lecture worksheets and tutorials are crucial in the course; you should complete them with utmost dedication. Collaborating with fellow students during these sessions is highly encouraged, as it can aid in a better understanding of concepts. If you ever get stuck while attempting a question in the lecture or tutorial worksheets, don't hesitate to seek help from your peers, TAs, or instructors. In fact, explaining concepts to others helps solidify your own understanding of them. However, please refrain from sharing answers, as we expect everyone to submit their own work, and just getting the answer jeopardizes one's learning. 

Cases of plagiarism will receive 0 in the entire assignment, and we will report the student to UBC's Faculty of Science for academic misconduct. 

Copy-and-pasting or paraphrasing the answer given by a genAI tool (e.g., chatGPT) is not acceptable as an answer to worksheet questions. It prevents you from learning properly and will be considered academic misconduct.

You can read more about course policies on the course website.

Since DSCI 100 is a prerequisite for this course, we expect students to be comfortable with the tutorial and worksheet style and structure. Students should also be familiar with answering questions in a Jupyter notebook and assessing if their answers are correct before submission. However, if you're having trouble, contact the teaching team for support - we're glad to help. This practice worksheet is made to help you remember how to use Jupyter notebooks.

**Today's worksheet is a practice worksheet. It does not count for points. It is meant to refresh your mind on how these work and to review a few concepts you have learned previously.**

#### Practice Worksheet Learning Goals:
After completing this practice worksheet, you will remember how to:

1. Differentiate an inferential question from other questions that can be answered with data.
2. Write an R script to plot a histogram.


In [None]:
# Run this cell before continuing.
library(tidyverse)
library(palmerpenguins)
source("tests_worksheet_00.R")

## 1. Inferential questions

In DSCI 100, we learned about [six different types of data analysis questions we can ask and answer](https://datasciencebook.ca/intro.html#asking-a-question). Let's start this worksheet by reminding us what an inferential question is.

**Question 1.0**
<br>{points: 1}

Which of the following is an example of an inferential question:

A. Are students who exercise regularly less stressed during exams?     
    
B. What is the number of native bird species found in Vancouver?   
     
C. What is the chance of survival if we treat a patient with this new drug?     
     
D. What is the most common declared major of the students in my data set?  
 
_Assign your answer to an object called `answer1.0`. Your answer should be a single character surrounded by quotes._

In [None]:
# answer1.0 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_1.0()

**Question 1.1**
<br>{points: 1}

True or false?

"What process explains the decline of species diversity in Canada?" is an example of an inferential question.

_Assign your answer to an object called `answer1.1`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer1.1 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_1.1()

## 2. Histograms with `ggplot2`

In DSCI 100, we learned about creating [visualisations with `ggplot2`](https://datasciencebook.ca/viz.html). In STAT 201, we will also use `ggplot2` to visualise our data. One type of visualisation that you have encountered before and that we will use repeatedly in this course is called a *histogram*. A histogram is useful because it displays how a variable is distributed in a data set by separating the data into bins and using vertical bars to quantify how many data points fall in each bin.

As an example, the code below will plot a histogram of the body mass (in grams) of three species of penguins found in a data set from the R  package `palmerpenguins`, which you loaded at the start of the worksheet.

In [None]:
# Run this cell before continuing.
penguin_hist <- penguins %>% 
    ggplot(aes(x = body_mass_g)) + 
    geom_histogram(bins = 20) +
    xlab("Body mass (g)") +
    ylab("Count") +
    ggtitle("Body mass of penguins")
penguin_hist

*We have a warning because some of the rows have missing values (i.e., these rows have `NA` instead of a penguin's body mass).*

**Question 2.0**
<br>{points: 1}

Which part of the code above specifies the column of the data set that corresponds to the x-axis?

A. `geom_histogram(bins = 20)`
    
B. `xlab("Body mass (g)")`
     
C. `penguins`    
     
D. `aes(x = body_mass_g)`  
 
_Assign your answer to an object called `answer2.0`. Your answer should be a single character surrounded by quotes._

In [None]:
# answer2.0 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_2.0()

**Question 2.1**
<br>{points: 1}

True or false?

"The penguins in this data set range from roughly 2500 g to roughly 6500 g."

_Assign your answer to an object called `answer2.1`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer2.1 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_2.1()

**Question 2.2**
<br>{points: 1}

True or false?

"According to the histogram, the mean of this dataset is likely higher than the median."

_Assign your answer to an object called `answer2.2`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer2.2 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_2.2()

### Final thoughts
Congratulations! You have completed your practice STAT 201 worksheet. This worksheet doesn't count for points, but it's good to go through how to save and check your worksheet properly. When finishing a worksheet, keep in mind:
- **Don't forget to save your work with `File` -> `Save Notebook`. You should always do this periodically _(TIP: use the keybind `Ctrl+S` or `Cmd+S` to speed up the process)._**
- **Before you leave the page, check that the autograder returns the results you are expecting by clicking `Kernel` -> `Restart Kernel and Run All`.**
- Never rename a file or folder.