Skip to content

A set of scripts to build gene set databases for use with the SetRank gene set enrichment analysis package.

Notifications You must be signed in to change notification settings

C3c6e6/GeneSets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What does this package do?

This package is a set of scripts to build gene set annotation packages for different organism for use with the SetRank BioConductor package.

How do I install it?

There is no installation. Just clone this repository to your local machine. Make sure all scripts in the bin folder have their execute permission set. These scripts need the following requirements:

  • R 3.0.1 or higher with the GO.db Bioconductor package installed.
  • Perl 5.12 or higher with the Date::Format module installed.
  • The wget utility
  • GNU Make

How do I use it?

Building the GeneSets packages

The only thing you have to do is tell the package for which organisms you which to build annotation tables. In root of the repository, there is a file called organisms. Open this file with your favorite text editor and add the names of the organism(s) you want to build annotation tables for and remove any names you're not interested in. Make sure there is only one name per line and that the lines do not contain any leading or trailing white space characters. Also, make sure that the names you are using correspond exactly to the official names used by the NCBI taxonomy database, as this is the reference used by the GeneSets package. Using the correct name is especially important when working with bacterial strains and substrains. For instance, if you want to create annotation tables for Escherichia coli K12, substrain MG1655, you should write exactly: Escherichia coli str. K-12 substr. MG1655. When in doubt, you can look up the name of your species on the NCBI website and simply copy-and-paste the name into your editor.

Once you've edited the organisms file, you are ready to build the packages. All you have to do is open your UNIX-shell, go to the directory of the repository:

make

Issuing this command will start the build process, which might take a while to complete, mainly depending on your network connection speed.

Once the process is finished you should see, for each organism listed in the organisms file, a file appear named GeneSets.Genus.species_YY.MM.DD.tar.gz, with Genus.species the name of the organism and YY.MM.DD the date the file, e.g. GeneSets.Homo.sapiens_13.10.18.tar.gz. Each of these files is an R package -- the date string functions as version number -- that is ready to install.

To install several packages at once, you can use the following bash command:

for f in *_YY.MM.DD.tar.gz; do R CMD INSTALL $f; done

Of course, you have to replace YY.MM.DD with the actual date string.

Updating the annotation packages

To keep your collection of annotation table packages up-to-date, all you have to do is to regularly run the make command. The GeneSets package will automatically detect if the gene set annotation data for a given organism has changed compared to the last time you ran make and create an updated version if necessary. Don't forget to install the updated packages afterwards.

About

A set of scripts to build gene set databases for use with the SetRank gene set enrichment analysis package.

Resources

Stars

Watchers

Forks

Packages

No packages published