Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to address bioconductor lesson issues from first beta #167

Merged
merged 13 commits into from
Jan 24, 2023
42 changes: 30 additions & 12 deletions _episodes_rmd/04-bioconductor-vcfr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ questions:
objectives:
- "Describe what the Bioconductor repository is and what it is used for"
- "Describe how Bioconductor differs from CRAN"
- "Search Bioconductor for relevent packages"
- "Search Bioconductor for relevant packages"
- "Install a package from Bioconductor"
keypoints:
- "Bioconductor is an alternative package repository for bioinformatics packages."
- "Installing packages from Bioconductor requires a new method, since it is not compatible with the `install.packages()` function used for CRAN."
- "Check Bioconductor to see if there is a package relevent to your analysis before writing code yourself."
- "Check Bioconductor to see if there is a package relevant to your analysis before writing code yourself."
source: Rmd
---

Expand All @@ -21,17 +21,24 @@ source("../bin/chunk-options.R")
knitr_fig_path("04-")
```

<!-- example of rendered lesson https://datacarpentry.org/genomics-r-intro/02-r-basics/index.html -->
## Packages in R -- what are they and why do we use them?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this explanation! I noticed that we mention packages in the "R Basics continued" episode, but we don't explain it to this extent. I wonder if this would be a better fit in an earlier episode than in the Bioconductor one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think that probably makes sense. This is related to #170 -- if @JasonJWilliamsNY is ok with it I can pull this description out from here, and do a PR of the content on the R Basic continued lesson.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind -- might as well move fast and break things, eh? I pulled out the paragraph in commit c28e053 and will PR it to the earlier lesson separately. @ytakemon let me know if there are any other adjustments you think I should make or if this is ready to merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Merging.


Packages are simply collections of functions and/or data that can be used to extend the capabilities of R beyond the core functionality that comes with it by default. There are useful R packages available that span all types of statistical analysis, data visualization, and more. So far we have been using packages that are included in the base installation of R (this is what comes with R 'out of the box'). However, there are many more packages available for R, and we will learn how to install and use them in this lesson. The main place that R packages are installed from is a website called [CRAN](https://cran.r-project.org/) (the Comprehensive R Archive Network). Many thousands of R packages are available there, and when you use the built-in R function `install.packages()`, it will look for a CRAN repository to install from. So, for example, to install [tidyverse](https://www.tidyverse.org) packages such as `dplyr` and `ggplot2` (which you'll do in the next few lessons), you would use the following command:

```{r, eval = FALSE, purl = FALSE}
# install a package from CRAN
install.packages("dplyr")
```

## Installing packages from somewhere else besides CRAN?

In some cases, you may want to use a specialized package that is not hosted on [CRAN](https://cran.r-project.org/) (the Comprehensive R Archive Network). This may be because the package is so new that it hasn't yet been submitted to CRAN, or it could be that it is on a focal topic that has an alternative repository. One major example of an alternative repository source is [Bioconductor](https://bioconductor.org/), which has a mission of "promot[ing] the statistical analysis and comprehension of current and emerging high-throughput biological assays." This means that many if not all of the packages available on Bioconductor are focused on the analysis of biological data, and that it can be a great place to look for tools to help you analyze your -omics datasets!
However, not all R packages are available on CRAN. For bioinformatics-related packages in particular, there is another repository that has many powerful packages that you can install. It is called [Bioconductor](https://bioconductor.org/) and it is a repository specifically focused on bioinformatics packages. [Bioconductor](https://bioconductor.org/) has a mission of "promot[ing] the statistical analysis and comprehension of current and emerging high-throughput biological assays." This means that many if not all of the packages available on Bioconductor are focused on the analysis of biological data, and that it can be a great place to look for tools to help you analyze your -omics datasets!

## So how do I use it?

Since access to the [Bioconductor](https://bioconductor.org/) repository is not built in to base R 'out of the box', there are a couple steps needed to install packages from this alternative source. We will work through the steps (only 2!) to install a package to help with the VCF analysis we are working on, but you can use the same approach to install any of the many thousands of available packages.

![screenshot of bioconductor homepage](fig/bioconductor_website_screenshot.jpg)
![screenshot of bioconductor homepage](../fig/bioconductor_website_screenshot.jpg)

## First, install the `BiocManager` package

Expand All @@ -49,13 +56,19 @@ To check if this worked (and also so you can make a note of the version for repr
BiocManager::version()
```

# Second, install the vcfR package from Bioconductor using `BiocManager`
## Second, install the vcfR package from Bioconductor using `BiocManager`

> ## Head's Up: Installing vcfR may take a while due to numerous dependencies
>
> Just be aware that installing packages that have many dependencies can take a while.
>
{: .callout}

```{r install-vcfR, eval = FALSE}
# install the vcfR package from bioconductor using BiocManager::install()
BiocManager::install("vcfR")
```
You may need to also allow it to install some dependencies or update installed packages in order to successfully complete the process.
Depending on your particular system, you may need to also allow it to install some dependencies or update installed packages in order to successfully complete the process.

> ## Note: Installing packages from Bioconductor vs from CRAN
> Some packages begin by being available only on Bioconductor, and then later
Expand All @@ -74,7 +87,7 @@ You may need to also allow it to install some dependencies or update installed p
{: .callout}


# Search for Bioconductor packages based on your analysis needs
## Search for Bioconductor packages based on your analysis needs

While we are only focusing in this workshop on VCF analyses, there are hundreds or thousands of different types of data and analyses that bioinformaticians may want to work with. Sometimes you may get a new dataset and not know exactly where to start with analyzing or visualizing it. The Bioconductor package search view can be a great way to browse through the packages that are available.

Expand All @@ -98,13 +111,18 @@ vcf files in R.

> ## Challenge
>
> Add code chunks to
> - Use the `BiocManager::available()` function to see what packages are available matching a search term.
> - Use the [biocViews](https://bioconductor.org/packages/release/BiocViews.html#___Software) interface to search for packages of interest.
>
> - Install the `BiocManager` package
> - Use that package's `install()` function to install `vcfR`
> - Browse the Bioconductor website to find a second package, and install it
> You may or may not want to try installing the package, since not all dependencies always install easily. However, this will at least let you see what is available.
{: .challenge}

> ## Tip: Refreshing the RStudio package view after installing
>
> If you install a package from Bioconductor, you may need to refresh the RStudio package view to see it in your list. You can do this by clicking the "Refresh" button in the Packages pane of RStudio.
>
{: .callout}

## Resources

- [Bioconductor](https://bioconductor.org/)
Expand Down