Skip to content

GEO Cancer Prognostic Datasets Retriever is a bioinformatics tool for cancer prognostic dataset retrieval from the GEO website.

License

Notifications You must be signed in to change notification settings

AbbasAlameer/geoCancerPrognosticDatasetsRetriever

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

geoCancerPrognosticDatasetsRetriever

GEO Cancer Prognostic Datasets Retriever is a bioinformatics tool for cancer prognostic dataset retrieval from the GEO website.

Summary

Gene Expression Omnibus (GEO) Cancer Prognostic Datasets Retriever is a bioinformatics tool for cancer prognostic dataset retrieval from the GEO database. It requires a GeoDatasets input file listing all GSE dataset entries for a specific cancer (for example, bladder cancer), obtained as a download from the GEO database. This bioinformatics tool functions by applying two heuristic filters to examine individual GSE dataset entries listed in a GEO DataSets input file. The Prognostic Text filter flags for prognostic keywords (ex. “prognosis” or “survival”) used by clinical scientists and present in the title/abstract entries of a GSE dataset. If found, this tool retrieves those flagged datasets. Next, the second filter (Prognostic Signature filter) filters these datasets further by applying prognostic signature pattern matching (Perl regular expression signatures) to identify if the GSE dataset is a likely prognostic dataset.

geoCancerPrognosticDatasetsRetriever dependencies

The dependencies (i.e. packages) used by geoCancerPrognosticDatasetsRetriever are:

  • strict

  • warnings

  • Term::ANSIColor

  • Getopt::Std

  • LWP::Simple

  • File::Basename

  • File::HomeDir

  • App::cpanminus

  • Net::SSLeay

Installation

geoCancerPrognosticDatasetsRetriever can be used on any Linux, macOS, or Windows machines. On the Windows operating system you will need to install the Windows Subsystem for Linux (WSL) compatibility layer (quick installation instructions). Once WSL is launched, the user can follow the geoCancerPrognosticDatasetsRetriever installation instructions described below.

To run the program, you need to have the following programs installed on your computer:

  • Perl (version 5.8.0 or later)

  • cURL (version 7.68.0 or later)

By default, Perl is installed on all Linux or macOS operating systems. Likewise, cURL is installed on all macOS versions. cURL may not be installed on Linux and would need to be manually installed through a Linux distribution’s software centre. It will be installed automatically on Linux Ubuntu by geoCancerPrognosticDatasetsRetriever.

Manual install:

perl Makefile.PL
make
make install

On Linux Ubuntu, you might need to run the last command as a superuser (sudo make install) and you will need to manually install (if not already installed in your Perl 5 configuration) the following packages:

libfile-homedir-perl

sudo apt-get install -y libfile-homedir-perl

cpanminus

sudo apt -y install cpanminus

LWP::Simple

perl -MCPAN -e 'install "LWP::Simple"'

libnet-ssleay-perl

sudo apt-get install -y libnet-ssleay-perl

CPAN install:

cpanm App::geoCancerPrognosticDatasetsRetriever

To uninstall:

cpanm --uninstall App::geoCancerPrognosticDatasetsRetriever

On Linux Ubuntu, you might need to run the two previous CPAN commands as a superuser (sudo cpanm App::geoCancerPrognosticDatasetsRetriever and sudo cpanm --uninstall App::geoCancerPrognosticDatasetsRetriever).

Data file

The required input file is a GEO DataSets file obtainable as a download from GEO DataSets, upon querying for any particular cancer (for example, bladder cancer) in geoCancerPrognosticDatasetsRetriever.

Execution instructions

The basic usage for running geoCancerPrognosticDatasetsRetriever is:

geoCancerPrognosticDatasetsRetriever -d "CANCER_TYPE"

An example basic usage command using "bladder cancer" as a query:

geoCancerPrognosticDatasetsRetriever -d "bladder cancer"

With the basic usage command, the mandatory -d (download) flag is used to download and then retrieve bladder cancer prognostic dataset(s) associated with the GPL570 platform code (default selection). When using this command, the input and output files of geoCancerPrognosticDatasetsRetriever will be found in the ~/geoCancerPrognosticDatasetsRetriever_files/data/ and ~/geoCancerPrognosticDatasetsRetriever_files/results/ directories, respectively.

For specialized options, allowing more fine-grained user control, the following options are made available:

-p

A list of GPL platform codes may be specified prior to execution, for expanding prognostic datasets retrieval for a particular cancer (i.e. bladder cancer). For example:

geoCancerPrognosticDatasetsRetriever -d "bladder cancer" -p "GPL570 GPL97 GPL96"

-f

A user-specified absolute path to save results files (overriding the default results directory) may by specified prior to execution. For example:

geoCancerPrognosticDatasetsRetriever -d "bladder cancer" -p "GPL570 GPL97 GPL96" -f "/Bladder_cancer_files/"

With this command, the input files will be found in the same directory as a basic usage run's input files (~/geoCancerPrognosticDatasetsRetriever_files/data/. The output files will be found in the user-specified directory (for example, "/Bladder_cancer_files/"), created in the user's home directory.

-k

This option allows a user to keep large temporary/output files instead of them being removed by default. For example:

geoCancerPrognosticDatasetsRetriever -d "bladder cancer" -p "GPL570 GPL97 GPL96" -f "/Bladder_cancer_files/" -k

Help information can be read by typing the following command:

geoCancerPrognosticDatasetsRetriever -h

This command will print the following instructions:

Usage: geoCancerPrognosticDatasetsRetriever -h

Mandatory arguments:
  CANCER_TYPE           type of the cancer as query search term

Optional arguments:
  -p                    list of GPL platform codes
  -f                    user-specified absolute path to save results files
  -k                    option to keep temporary files
  -h                    show help message and exit

Article

For a complete description of geoCancerPrognosticDatasetsRetriever, this information can be found in the following peer-reviewed published article:

Abbas Alameer and Davide Chicco, "geoCancerPrognosticDatasetsRetriever, a bioinformatics tool to easily identify cancer prognostic datasets on Gene Expression Omnibus (GEO)", Bioinformatics, 2021.

Copyright and License

Copyright 2021 by Abbas Alameer, Kuwait University

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 (GPLv2).

Contact

geoCancerPrognosticDatasetsRetriever was developed by:
Abbas Alameer (Bioinformatics and Molecular Modelling Group, Kuwait University), in collaboration with Davide Chicco (University of Toronto)

For information, please contact Abbas Alameer at abbas.alameer(AT)ku.edu.kw

About

GEO Cancer Prognostic Datasets Retriever is a bioinformatics tool for cancer prognostic dataset retrieval from the GEO website.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages