No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Latest commit 6b9f666 Feb 21, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Examples Update to v 1.2.1 Jan 18, 2017
INTEGRATE-Neo-V-1.2.1 Move "}" one line down Feb 20, 2018
older_versions Update to version 1.2.1 Jan 18, 2017
README.md Update to v 1.2.1 Jan 18, 2017

README.md

INTEGRATE-Neo

INTEGRATE-Neo is a gene fusion neoantigen discovering tool using next-generation sequencing data. It is written in C++ and Python.

  • Python
  • Perl
  • awk
  • GCC

If not, please install these languages or tools. You may also need to install some prerequisite tools:

HLAminer and NetMHC are also included in the vendor directory here.

To compile the C++ part of this pipeline, you may need to install CMAKE

Installation

Download INTEGRATE-Neo at https://github.com/ChrisMaherLab/INTEGRATE-Neo.

Run the installation script:

$ cd INTEGRATE-Neo-V-1.2.0
$ chmod +x install.sh
$ ./install.sh -o /opt/bin/

Note that you can choose wherever you like to install the software. It can be different from "/opt/bin/".

Now you have installed:

  • integrate-neo

together with the modules of integrate-neo that can be used as standalone tools:

  • fusionBedpeAnnotator
  • fusionBedpeSubsetter
  • runHLAminer
  • HLAminerToTsv
  • runAddNetMHC4Result
  • runNetMHC4WithSMCRNABedpe

A setup.ini and a rule.txt file are also at your destination directory now. If you don't like them to be there, copy them to the place you like. But remember to use the --setup-file and --rule-file options to run integrate-neo if you moved them.

setup

Remember to edit the setup.ini file before your first running the pipeline. The one in the installation packages are using example paths like "/SOME/PATH/...".

For the HLAminer reference HLA alleles, i.e. HLA_ABC_CDS.fasta, remember to index it with bwa before the first run.

input

If you type the following (or python ./integrate-neo.py --help):

$ ./integrate-neo.py

you can see the 14 parameters and explanations.

The following are the required options:

    -1/--fastq1       
    -2/--fastq2       
    -f/--fusion-bedpe 
    -r/--reference    
    -g/--gene-model   

The --fastq[1/2] and --reference options are clear enough, the FASTQ and FASTA formats for sequencing reads and human reference genome.

The --fusion-bedpe option requires a BEDPE format for gene fusions. This BEDPE format follows the standardized format provided by The ICGC-TCGA DREAM Somatic Mutation Calling - RNA Challenge (SMC-RNA).

The --gene-model option requires a gene annotation genePhred file.

Download the gtf file from Ensembl:

GRCh37, e.g., v75: ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz

GRCh38, e.g., v86: ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz

and run the following command for v75:

$ gunzip Homo_sapiens.GRCh37.75.gtf.gz
$ ./gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh37.75.gtf Homo_sapiens.GRCh37.75.genePred

for v86:

$ gunzip Homo_sapiens.GRCh38.86.gtf.gz.
$ ./gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.86.gtf Homo_sapiens.GRCh38.86.genePred

FASTA files can also be downloaded at Ensembl:

v75: ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna_sm.primary_assembly.fa.gz

v86: ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

output

The output is in BEDPE format, the first 11 columns follows the SMC-RNA format. columns 12-19 are:

  • Epitope sequence
  • Epitope Affinity (nanoMolar)
  • HLA allele
  • HLA category
  • HLA score
  • HLA e-value
  • HLA confidence

Important

The chromosome names in the reference genome, the gene models, and the fusions should be consistent.

Examples

Examples are provided for you to test the code.

Enjoy!

Release notes:

12-23-2016: INTEGRATE-Neo v 1.2.0

updated BedpeAnnotator to v 0.2.0, which includes a new column for transcript Ids, a new column for lengths of nucleotides in the coding regions at 5p transcripts, a new column for whether the peptides are in-frame, and a new column for whether the fusion transcript follows canonical dinucleotides.

01-17-2017: INTEGRATE-Neo v 1.2.1

updated BedpeAnnotator to v 0.2.1, which includes a bug fixing.