intro-to-R-tidyverse/exercise_02-intro_to_R.Rmd

---
title: "Introduction to R - Exercises"
author: "CCDL for ALSF"
date: "2020"
output:
  html_notebook:
    toc: true
    toc_float: true
---

The goal of these exercises is to help you get comfortable with using R and R notebooks by continuing to play with the gene results dataset we used in the [01-intro_to_base_R](01-intro_to_base_R-live.Rmd) and [02-intro_to_ggplot2](02-intro_to_ggplot2-live.Rmd) notebooks.
It is a pre-processed [astrocytoma microarray dataset](https://www.refine.bio/experiments/GSE44971/gene-expression-data-from-pilocytic-astrocytoma-tumour-samples-and-normal-cerebellum-controls) 
that we performed a set of [differential expression analyses on](scripts/00-setup-intro-to-R.R).

### Set Up

Use this chunk to load the `tidyverse` package.

```{r tidyverse, solution = TRUE}

```

Create a results directory if it doesn't exist.

```{r results_dir, solution = TRUE}

```

## Read in the gene results file

Use `readr::read_tsv()` to read in the file "gene_results_GSE44971.tsv" and
assign it the variable `stats_df`.
Recall that this notation means the `read_tsv()` function from the `readr` package.
If you have already loaded the `tidyverse` package above with `library()`,
you can use the function `read_tsv()` on its own without the preceding `readr::` as the `readr` package is loaded as part of `tidyverse`.

```{r read-data, solution = TRUE}

```

Use this chunk to explore what your data frame, `stats_df` looks like.

```{r explore-df, solution = TRUE}

```

## Read in the metadata

Use `readr::read_tsv()` to read in the file "cleaned_metadata_GSE44971.tsv" and assign it the name `metadata`.

```{r read-metadata, solution = TRUE}

```

Use this chunk to explore what your data frame, `metadata` looks like.

```{r explore-metadata, solution = TRUE}

```

### Selecting from data frames

Use `$` syntax to look at the `avg_expression` column of the `stats_df`
data frame.

```{r dollar, solution = TRUE}

```

Use the `min()` argument to find what the minimum average expression in this dataset is.
Remember you can use `?min` or the help panel to find out more about a function.

```{r minimum-expr, solution = TRUE}
# Find the minimum average expression value

```

Find the `log()`, using base 2, of the average expression values.

```{r log2-expr, solution = TRUE}
# Find the log of base 2 of the average expression

```

## Using logical arguments

Display the `adj_p_value` column of the `stats_df` data frame.

```{r show-p, solution = TRUE }

```

Find out which of these adjusted p-values are below a `0.05` cutoff using a logical statement.

```{r small-p, solution = TRUE}

```

Name the logical vector you created above as `significant_vector`.

```{r save-bool, solution = TRUE}

```

Use `sum()` with the object `significant_vector` to count how many p values in the total set are below this cutoff. 
To solve this, you might think about `TRUE` and `FALSE` values as an alternative way to represent `1` and `0`.

```{r sum-sig, solution = TRUE}

```

## Filter the dataset

Select the column `contrast` from `stats_df`.

```{r select-contrast, solution = TRUE}

```

Construct a logical vector using `contrast` column you selected above that
indicates which rows of `stats_df` are from the `astrocytoma_normal`
contrast test.  

```{r contrast-logical, solution = TRUE}

```

Use `dplyr::filter()` to keep only the data for the `astrocytoma_normal` contrast
in `stats_df`.

```{r filter-contrast, solution = TRUE}

```

Use the `nrow()` function on `astrocytoma_normal_df` to see if your filter worked.
You should have `2268` rows.

```{r contrast-rows, solution = TRUE}

```

Save your filtered data to a TSV file using `readr::write_tsv()`.
Call it `astrocytoma_normal_contrast_results.tsv` and save it to the `results`
directory.

```{r write-df, solution = TRUE}

```


### Create a density plot

Set up a ggplot object for `astrocytoma_normal_df` and set `x` as the average
expression variable.
Use the `+` to add on a layer called `geom_density()`

```{r density-plot, solution = TRUE}

```

Use the plot you started above and add a `ggplot2::theme` layer to play with its aesthetics (e.g. `theme_classic()`)
See the [ggplot2 themes vignette](https://ggplot2.tidyverse.org/reference/ggtheme.html)
to see a list of theme options.

```{r density-theme, solution = TRUE}

```

Feel free to make other customizations to this plot by adding more layers with `+`.
You can start by adding a `ylab()` and `xlab()` and then by getting inspiration
from this [handy cheatsheet for ggplot2](https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf).

```{r customize-plot, solution = TRUE}
# Customize your plot!

```

Save your plot as a `PNG`.

```{r save-plot, solution = TRUE}

```

### Session Info

```{r}
sessionInfo()
```