Brian Haas edited this page Oct 16, 2018 · 61 revisions

STAR-Fusion

STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.

Our STAR-Fusion manuscript preprint is now available on bioRxiv http://biorxiv.org/content/early/2017/03/24/120295

Installation

Type 'make' in the base installation directory.

In addition to STAR-Fusion, the following tools and resource data sets must be installed:

Downloading a STAR-Fusion Release (Preferred)

Visit https://github.com/STAR-Fusion/STAR-Fusion/releases

and be sure to download the 'FULL' version. The others are auto-generated by GitHub and are missing required submodules.

Installing from GitHub Clone:

(Bleeding-edge code, not preferred)

%  git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git

The --recursive parameter is needed to integrate the required submodules. Active development currently happens off the 'master' branch.

Please consider using a proper 'FULL' release version as described above.

Installing from Galaxy toolshed

STAR-Fusion is also available for Galaxy toolshed download. An associated genome resource data manager will be required to run this too. Please find instructions here.

Bioconda install

A Bioconda recipe is also available for STAR-Fusion.

Tools Required:

.

   A typical perl module installation may involve:
   perl -MCPAN -e shell
   install DB_File
   install URI::Escape
   install Set::IntervalTree
   install Carp::Assert
   install JSON::XS
   install PerlIO::gzip

Computing / Hardware Requirements and Execution Times

Memory requirements

If you're planning to run STAR to align reads to the human genome, then you'll need ~30G RAM. If you've already run STAR and are just planning on running STAR-Fusion given the existing STAR outputs, then modest resources are required and it should run on any commodity hardware.

When the '--FusionInspector validate' mode is used, memory requirements can increase to 40G or 50G. If '--FusionInspector inspect' mode is used, additional RAM should generally not be required.

Execution times

Execution times are largely determined by how long it takes for STAR to align reads. The fusion-finding component generally takes minutes on large samples. If '--FusionInspector validate' mode is used, then roughly double the total execution time, as STAR is needed to perform an additional full alignment of the reads in FusionInspector mode.

Data Resources Required:

A reference genome and corresponding protein-coding gene annotation set, including blast-matching gene pairs must be provided to STAR-Fusion. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz)

If you're looking to apply STAR-Fusion using a different target, you'll need to generate the required resources as described by our FusionFilter resource builder. FusionFilter comes included in the STAR-Fusion software.

Preparing the genome resource lib

If you downloaded the large (30G) 'plug-n-play' resource lib, then just untar/gz the archive and use it directly.

Otherwise, if you downloaded the small (~2G) unprocessed resource lib, then you'll need to prep it for use with STAR-fusion as follows:

 %  tar xvf CTAT_resource_lib.tar.gz

 %  cd CTAT_resource_lib/

 %  $STAR_FUSION_HOME/FusionFilter/prep_genome_lib.pl \
                         --genome_fa ref_genome.fa \
                         --gtf ref_annot.gtf \
                         --fusion_annot_lib CTAT_HumanFusionLib.dat.gz \
                         --annot_filter_rule AnnotFilterRule.pm \
                         --pfam_db PFAM.domtblout.dat.gz

Once the build process completes successfully, you can then refer to the above like so with STAR-Fusion:

   STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib   ...

Running STAR-Fusion starting with FASTQ files:

Given paired-end of FASTQ files, run STAR-Fusion like so:

 STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             --left_fq reads_1.fq \
             --right_fq reads_2.fq \
             --output_dir star_fusion_outdir

If you have single-end FASTQ files, just use the --left_fq parameter:

 STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             --left_fq reads_1.fq \ 
             --output_dir star_fusion_outdir

If you set the environmental variable 'CTAT_GENOME_LIB' to the '/path/to/your/ctat_genome_lib_build_dir' resulting from the above build process or from the plug-n-play installation, then you won't need to specify --genome_lib_dir as a STAR-Fusion parameter.

Note, unless you have relatively long single-end reads (ex. at least 100 base length), you will be underpowered for detecting fusion transcripts.

Alternatively, Kickstart mode: running STAR yourself, and then running STAR-Fusion using the existing outputs

It's not always the case that you want to have STAR-Fusion run STAR directly, as you may have already run STAR earlier on, or prefer to run STAR separately to use the outputs in other processes such as for expression estimates or variant detection.

Parameters that we recommend for running STAR as part of STAR-Fusion are as follows:

 STAR --genomeDir ${star_index_dir} \                                                                                     
      --readFilesIn ${left_fq_filename} ${right_fq_filename} \                                                                      
      --twopassMode Basic \                                                                                                      
      --outReadsUnmapped None \                                                                                                  
      --chimSegmentMin 12 \                                                                                                    
      --chimJunctionOverhangMin 12 \                                                                                           
      --alignSJDBoverhangMin 10 \                                                                                              
      --alignMatesGapMax 100000 \                                                                                             
      --alignIntronMax 100000 \                                                                                                
      --chimSegmentReadGapMax 3 \                                                                                    
      --alignSJstitchMismatchNmax 5 -1 5 5 \
      --runThreadN ${THREAD_COUNT} \                                                                                                           
      --outSAMstrandField intronMotif \
      --chimOutJunctionFormat 1 # required as of STAR v2.6.1

This will (in part) generate a file called 'Chimeric.out.junction', which is used by STAR-Fusion like so:

 STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             -J Chimeric.out.junction \
             --output_dir star_fusion_outdir

Note, include the --left_fq and --right_fq parameters along with the -J Chimeric.out.junction in order to compute the FFPM (normalized fusion fragments per million total rna-seq fragments) values in your summary report. Otherwise, you'll just get evidence fragment counts without the normalized values.

Output from STAR-Fusion

The output from STAR-Fusion is found as a tab-delimited file named 'star-fusion.fusion_predictions.tsv', along with an abridged version that excludes the identification of the evidence fusion reads and called 'star-fusion.fusion_predictions.abridged.tsv', with the following format:

#FusionName           JunctionReadCount  SpanningFragCount  SpliceType           LeftGene                        LeftBreakpoint    RightGene                        RightBreakpoint   LargeAnchorSupport  FFPM        LeftBreakDinuc  LeftBreakEntropy  RightBreakDinuc  RightBreakEntropy  annots
THRA--AC090627.1      27                 93                 ONLY_REF_SPLICE      THRA^ENSG00000126351.8          chr17:38243106:+  AC090627.1^ENSG00000235300.3     chr17:46371709:+  YES_LDAS            23875.8456  GT              1.8892            AG               1.9656             ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
THRA--AC090627.1      5                  93                 ONLY_REF_SPLICE      THRA^ENSG00000126351.8          chr17:38243106:+  AC090627.1^ENSG00000235300.3     chr17:46384693:+  YES_LDAS            19498.6072  GT              1.8892            AG               1.4295             ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
ACACA--STAC2          12                 52                 ONLY_REF_SPLICE      ACACA^ENSG00000132142.15        chr17:35479453:-  STAC2^ENSG00000141750.6          chr17:37374426:-  YES_LDAS            12733.7844  GT              1.9656            AG               1.9656             ["ChimerSeq","CCLE","Klijn_CellLines","FA_CancerSupp","INTRACHROMOSOMAL[chr17:1.60Mb]"]
RPS6KB1--SNF8         10                 43                 ONLY_REF_SPLICE      RPS6KB1^ENSG00000108443.9       chr17:57970686:+  SNF8^ENSG00000159210.5           chr17:47021337:-  YES_LDAS            10545.1651  GT              1.3753            AG               1.8323             ["Klijn_CellLines","FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr17:10.95Mb]"]
TOB1--SYNRG           8                  30                 ONLY_REF_SPLICE      TOB1^ENSG00000141232.4          chr17:48943419:-  SYNRG^ENSG00000006114.11         chr17:35880751:-  YES_LDAS            7560.6844   GT              1.4566            AG               1.8892             ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:12.97Mb]"]
VAPB--IKZF3           4                  46                 ONLY_REF_SPLICE      VAPB^ENSG00000124164.11         chr20:56964573:+  IKZF3^ENSG00000161405.12         chr17:37934020:-  YES_LDAS            9948.269    GT              1.9656            AG               1.7819             ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
ZMYND8--CEP250        2                  44                 ONLY_REF_SPLICE      ZMYND8^ENSG00000101040.15       chr20:45852970:-  CEP250^ENSG00000126001.11        chr20:34078463:+  NO_LDAS             9152.4075   GT              1.8295            AG               1.8062             ["FA_CancerSupp","CCLE","ChimerSeq","INTRACHROMOSOMAL[chr20:11.74Mb]"]
AHCTF1--NAAA          3                  38                 ONLY_REF_SPLICE      AHCTF1^ENSG00000153207.10       chr1:247094880:-  NAAA^ENSG00000138744.10          chr4:76846964:-   YES_LDAS            8157.5805   GT              1.7232            AG               1.8062             ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr1--chr4]"]
VAPB--IKZF3           1                  46                 ONLY_REF_SPLICE      VAPB^ENSG00000124164.11         chr20:56964573:+  IKZF3^ENSG00000161405.12         chr17:37922746:-  NO_LDAS             9351.3729   GT              1.9656            AG               1.9329             ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
VAPB--IKZF3           1                  46                 ONLY_REF_SPLICE      VAPB^ENSG00000124164.11         chr20:56964573:+  IKZF3^ENSG00000161405.12         chr17:37944627:-  NO_LDAS             9351.3729   GT              1.9656            AG               1.8892             ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
STX16--RAE1           4                  33                 ONLY_REF_SPLICE      STX16^ENSG00000124222.17        chr20:57227143:+  RAE1^ENSG00000101146.8           chr20:55929088:+  YES_LDAS            7361.719    GT              1.9899            AG               1.9656             ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr20:1.27Mb]"]
AHCTF1--NAAA          1                  38                 ONLY_REF_SPLICE      AHCTF1^ENSG00000153207.10       chr1:247094431:-  NAAA^ENSG00000138744.10          chr4:76846964:-   NO_LDAS             7759.6498   GT              1.9086            AG               1.8062             ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr1--chr4]"]
STX16-NPEPL1--RAE1    4                  24                 INCL_NON_REF_SPLICE  STX16-NPEPL1^ENSG00000254995.4  chr20:57227143:+  RAE1^ENSG00000101146.8           chr20:55929088:+  YES_LDAS            5571.0306   GT              1.9899            AG               1.9656             INTRACHROMOSOMAL[chr20:1.27Mb]
RAB22A--MYO9B         6                  11                 ONLY_REF_SPLICE      RAB22A^ENSG00000124209.3        chr20:56886178:+  MYO9B^ENSG00000099331.9          chr19:17256207:+  YES_LDAS            3382.4115   GT              1.6895            AG               1.9656             ["FA_CancerSupp","ChimerSeq","CCLE","INTERCHROMOSOMAL[chr20--chr19]"]
MED1--ACSF2           4                  11                 ONLY_REF_SPLICE      MED1^ENSG00000125686.7          chr17:37595418:-  ACSF2^ENSG00000167107.8          chr17:48548389:+  YES_LDAS            2984.4807   GT              1.9656            AG               1.9656             ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:10.90Mb]"]
MED13--BCAS3          2                  12                 ONLY_REF_SPLICE      MED13^ENSG00000108510.5         chr17:60129898:-  BCAS3^ENSG00000141376.16         chr17:59469338:+  YES_LDAS            2785.5154   GT              1.5546            AG               1.9086             ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:0.55Mb]"]
MED1--STXBP4          1                  15                 ONLY_REF_SPLICE      MED1^ENSG00000125686.7          chr17:37607291:-  STXBP4^ENSG00000166263.9         chr17:53218671:+  NO_LDAS             3183.4461   GT              1.3996            AG               1.7968             ["CCLE","FA_CancerSupp","Klijn_CellLines","INTRACHROMOSOMAL[chr17:15.44Mb]"]
MED13--BCAS3          1                  12                 ONLY_REF_SPLICE      MED13^ENSG00000108510.5         chr17:60129898:-  BCAS3^ENSG00000141376.16         chr17:59465979:+  NO_LDAS             2586.55     GT              1.5546            AG               0.8366             ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:0.55Mb]"]
STARD3--DOK5          2                  7                  ONLY_REF_SPLICE      STARD3^ENSG00000131748.11       chr17:37793484:+  DOK5^ENSG00000101134.7           chr20:53259997:+  NO_LDAS             1790.6885   GT              1.8892            AG               1.9656             ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr17--chr20]"]
DIDO1--TTI1           1                  10                 ONLY_REF_SPLICE      DIDO1^ENSG00000101191.12        chr20:61569148:-  TTI1^ENSG00000101407.8           chr20:36642259:-  NO_LDAS             2188.6192   GT              1.6402            AG               1.9329             ["FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr20:24.85Mb]"]
DIDO1--TTI1           1                  10                 ONLY_REF_SPLICE      DIDO1^ENSG00000101191.12        chr20:61569148:-  TTI1^ENSG00000101407.8           chr20:36634799:-  NO_LDAS             2188.6192   GT              1.6402            AG               1.8892             ["FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr20:24.85Mb]"]
BRD4--RFX1            1                  8                  ONLY_REF_SPLICE      BRD4^ENSG00000141867.13         chr19:15443101:-  RFX1^ENSG00000132005.4           chr19:14109129:-  NO_LDAS             1790.6884   GT              1.9086            AG               1.8892             ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr19:1.23Mb]"]
BRD4--RFX1            1                  8                  ONLY_REF_SPLICE      BRD4^ENSG00000141867.13         chr19:15443101:-  RFX1^ENSG00000132005.4           chr19:14094407:-  NO_LDAS             1790.6884   GT              1.9086            AG               1.8295             ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr19:1.23Mb]"]
TRPC4AP--MRPL45       1                  8                  ONLY_REF_SPLICE      TRPC4AP^ENSG00000100991.7       chr20:33665849:-  MRPL45^ENSG00000174100.5         chr17:36478009:+  NO_LDAS             1790.6884   GT              1.6895            AG               1.9086             ["CCLE","Klijn_CellLines","FA_CancerSupp","INTERCHROMOSOMAL[chr20--chr17]"]

The JunctionReads column indicates the number of RNA-Seq fragments containing a read that aligns as a split read at the site of the putative fusion junction.

The SpanningFrags column indicates the number of RNA-Seq fragments that encompass the fusion junction such that one read of the pair aligns to a different gene than the other paired-end read of that fragment.

Those predictions that have very few JunctionReads and/or SpanningReads are going to be enriched for false positives. Note, depending on the site of the fusion breakpoint and length of the reads, it may not be possible to have SpanningFragments and all evidence may show up in the form of JunctionReads.

The number of fusion-supporting reads depends on both the expression of the fusion transcript and the number of reads sequenced. The deeper the sequenced data set, the greater the number of artifactual fusions that will appear with minimal supporting evidence, and so taking into account the sequencing depth is important to curtail overzealous prediction of fusion transcripts with ever-so-minimal supporting evidence. We provide normalized measures of the fusion-supporting rna-seq fragments as FFPM (fusion fragments per million total reads) measures. A filter of 0.1 sum FFPM (meaning at least 1 fusion-supporting rna-seq fragment per 10M total reads) tends to be effective at excluding fusion artifacts, and is the current default for filtering fusions from the final output. Adjust the 'STAR-Fusion --min_FFPM' parameter, or set it to zero to disable FFPM-based filtering.

The 'LargeAnchorSupport' column indicates whether there are split reads that provide 'long' (set to length of 25 bases) alignments on both sides of the putative breakpoint. Those fusions supported only by split reads (no spanning fragments) and lack LargeAnchorSupport are often highly suspicious and tend to be false positives. Those with LargeAnchorSupport are labeled as 'YES_LDAS' (where LDAS = long double anchor support.... yes, more jargon).

'SpliceType' indicates whether the proposed breakpoint occurs at reference exon junctions as provided by the reference transcript structure annotations (ex. gencode).

The abridged output file contents are shown above. See the unabridged 'star-fusion.fusion_predictions.tsv' output file for the identity of the RNA-Seq fragments identified as junction or spanning fragments, where the individual read names are provided as comma-delimited lists in each corresponding column.

The final column 'annots' provides a simplified annotation for fusion transcript, leveraging FusionAnnotator (bundled with STAR-Fusion). For the human source or plug-n-play genome libs, the fusion annotation info is based on CTAT_HumanFusionLib, which includes many popular resources for annotating fusions known to be relevant to cancer, as well as fusions thought to be red herrings that will be automatically filtered from the final output. Rules for filtering out fusions based on annotations are encoded in a small Perl module found as '${CTAT_GENOME_LIB}/AnnotFilterRule.pm', and for the provided human genome lib involves excluding 'red herring' categories (described here - not a fish picture ;-) ) in addition to fusions involving mitochondrial genes or HLA loci (common artifacts). To exclude any annotation-based filtering, use the 'STAR-Fusion --no_annotation_filter' parameter.

If there are alternatively spliced isoforms for fusion transcripts, the same fusion pair will be listed as multiple entries but with different breakpoints identified.

Further Inspection, Visualization, and Validation?

We have a companion tool called FusionInspector that provides a more in-depth view of the evidence supporting the predicted fusions. FusionInspector can also run Trinity to de novo reconstruct your predicted fusion transcripts based on the identified fusion-supporting RNA-Seq reads.

As of STAR-Fusion v1.1.0, FusionInspector is integrated into STAR-Fusion as a submodule.

FusionInspector can be run in either 'inspect' or 'validate' mode when executed downstream from STAR-Fusion:

  • '--FusionInspector inspect': only the reads identified by STAR-Fusion as evidence supporting the fusion prediction are aligned directly to a target set of fusion-gene contigs for exploration using IGV.

  • '--FusionInspector validate': involves a more rigorous process of reevaluating the entire set of input reads, aligning the reads to a combination of the reference genome and a set of fusion-gene contigs based on the STAR-Fusion predictions. Reads mapping better to the fusion-gene contigs than the reference genome are identified and reported, fusions are re-scored/quantified, and fusion transcript allelic fractions are computed.

If either mode is invoked, STAR-Fusion will run FusionInspector and create a FusionInspector/ output subdirectory containing all relevant output files. See the FusionInspector Wiki for documentation on output files and loading results into IGV for visualization.

Examine Effect of Fusions on Coding Regions

It is sometimes the case that fusion transcripts generate novel fusion proteins with altered functions. You can further explore the impact of the fusion event on coding regions by invoking the '--examine_coding_effect' parameter.

The coding effect results are appended as additional columns in the STAR-Fusion tab-delimited output file. An example set of columns include:

  #FusionName        BCR--ABL1
  ...
  CDS_LEFT_ID        ENST00000305877.8
  CDS_LEFT_RANGE     1-2782
  CDS_RIGHT_ID       ENST00000318560.5
  CDS_RIGHT_RANGE    80-3393
  PROT_FUSION_TYPE   INFRAME
  FUSION_MODEL       chr22|+|[0]23523148-23524426[0]|[1]23595986-23596167[2]|[0]23603137-23603241[2]|[0]23603542-23603727[2]|[0]23610595-23610702[2]|[0]23613719-23613779[0]|[1]23615268-23615320[2]|[0]23615821-23615961[2]|[0]23626164-23626285[1]|[2]23627220-23627388[2]|[0]23629346-23629465[2]|[0]23630284-23630359[0]|[1]23631704-23631808[0]|[1]23632526-23632600[0]<==>chr9|+|[1]133729451-133729624[0]|[1]133730188-133730483[2]|[0]133738150-133738422[2]|[0]133747516-133747600[0]|[1]133748247-133748424[1]|[2]133750255-133750439[0]|[1]133753802-133753954[0]|[1]133755455-133755544[0]|[1]133755887-133756051[0]|[1]133759356-133761070[2]
  FUSION_CDS         atggtggacccggtgggcttcgcggaggcgtggaaggcgcagttcccggactcagagcccccgcgcatggagctgcgctcagtgggcgacatcgagcaggagctggagcgctgcaaggcctccattcggcgcctggagcaggaggtgaaccaggagcgcttccgcatgatctacctgcagacgttgctggccaaggaaaagaagagctatgaccggcagcgatggggcttccggcgcgcggcgcaggcccccgacggcgcctccgagccccgagcgtccgcgtcgcgcccgcagccagcgcccgccgacggagccgacccgccgcccgccgaggagcccgaggcccggcccgacggcgagggttctccgggtaaggccaggcccgggaccgcccgcaggcccggggcagccgcgtcgggggaacgggacgaccggggaccccccgccagcgtggcggcgctcaggtccaacttcgagcggatccgcaagggccatggccagcccggggcggacgccgagaagcccttctacgtgaacgtcgagtttcaccacgagcgcggcctggtgaaggtcaacgacaaagaggtgtcggaccgcatcagctccctgggcagccaggccatgcagatggagcgcaaaaagtcccagcacggcgcgggctcgagcgtgggggatgcatccaggcccccttaccggggacgctcctcggagagcagctgcggcgtcgacggcgactacgaggacgccgagttgaacccccgcttcctgaaggacaacctgatcgacgccaatggcggtagcaggcccccttggccgcccctggagtaccagccctaccagagcatctacgtcgggggcatgatggaaggggagggcaagggcccgctcctgcgcagccagagcacctctgagcaggagaagcgccttacctggccccgcaggtcctactccccccggagttttgaggattgcggaggcggctataccccggactgcagctccaatgagaacctcacctccagcgaggaggacttctcctctggccagtccagccgcgtgtccccaagccccaccacctaccgcatgttccgggacaaaagccgctctccctcgcagaactcgcaacagtccttcgacagcagcagtccccccacgccgcagtgccataagcggcaccggcactgcccggttgtcgtgtccgaggccaccatcgtgggcgtccgcaagaccgggcagatctggcccaacgatggcgagggcgccttccatggagacgcagatggctcgttcggaacaccacctggatacggctgcgctgcagaccgggcagaggagcagcgccggcaccaagatgggctgccctacattgatgactcgccctcctcatcgccccacctcagcagcaagggcaggggcagccgggatgcgctggtctcgggagccctggagtccactaaagcgagtgagctggacttggaaaagggcttggagatgagaaaatgggtcctgtcgggaatcctggctagcgaggagacttacctgagccacctggaggcactgctgctgcccatgaagcctttgaaagccgctgccaccacctctcagccggtgctgacgagtcagcagatcgagaccatcttcttcaaagtgcctgagctctacgagatccacaaggagttctatgatgggctcttcccccgcgtgcagcagtggagccaccagcagcgggtgggcgacctcttccagaagctggccagccagctgggtgtgtaccgggccttcgtggacaactacggagttgccatggaaatggctgagaagtgctgtcaggccaatgctcagtttgcagaaatctccgagaacctgagagccagaagcaacaaagatgccaaggatccaacgaccaagaactctctggaaactctgctctacaagcctgtggaccgtgtgacgaggagcacgctggtcctccatgacttgctgaagcacactcctgccagccaccctgaccaccccttgctgcaggacgccctccgcatctcacagaacttcctgtccagcatcaatgaggagatcacaccccgacggcagtccatgacggtgaagaagggagagcaccggcagctgctgaaggacagcttcatggtggagctggtggagggggcccgcaagctgcgccacgtcttcctgttcaccgacctgcttctctgcaccaagctcaagaagcagagcggaggcaaaacgcagcagtatgactgcaaatggtacattccgctcacggatctcagcttccagatggtggatgaactggaggcagtgcccaacatccccctggtgcccgatgaggagctggacgctttgaagatcaagatctcccagatcaagaatgacatccagagagagaagagggcgaacaagggcagcaaggctacggagaggctgaagaagaagctgtcggagcaggagtcactgctgctgcttatgtctcccagcatggccttcagggtgcacagccgcaacggcaagagttacacgttcctgatctcctctgactatgagcgtgcagagtggagggagaacatccgggagcagcagaagaagtgtttcagaagcttctccctgacatccgtggagctgcagatgctgaccaactcgtgtgtgaaactccagactgtccacagcattccgctgaccatcaataaggaagatgatgagtctccggggctctatgggtttctgaatgtcatcgtccactcagccactggatttaagcagagttcaaAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTGGCCAGAGGTCCATCTCGCTGAGATACGAAGGGAGGGTGTACCATTACAGGATCAACACTGCTTCTGATGGCAAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCACGCTCCATTATCCAGCCCCAAAGCGCAACAAGCCCACTGTCTATGGTGTGTCCCCCAACTACGACAAGTGGGAGATGGAACGCACGGACATCACCATGAAGCACAAGCTGGGCGGGGGCCAGTACGGGGAGGTGTACGAGGGCGTGTGGAAGAAATACAGCCTGACGGTGGCCGTGAAGACCTTGAAGGAGGACACCATGGAGGTGGAAGAGTTCTTGAAAGAAGCTGCAGTCATGAAAGAGATCAAACACCCTAACCTGGTGCAGCTCCTTGGGGTCTGCACCCGGGAGCCCCCGTTCTATATCATCACTGAGTTCATGACCTACGGGAACCTCCTGGACTACCTGAGGGAGTGCAACCGGCAGGAGGTGAACGCCGTGGTGCTGCTGTACATGGCCACTCAGATCTCGTCAGCCATGGAGTACCTGGAGAAGAAAAACTTCATCCACAGAGATCTTGCTGCCCGAAACTGCCTGGTAGGGGAGAACCACTTGGTGAAGGTAGCTGATTTTGGCCTGAGCAGGTTGATGACAGGGGACACCTACACAGCCCATGCTGGAGCCAAGTTCCCCATCAAATGGACTGCACCCGAGAGCCTGGCCTACAACAAGTTCTCCATCAAGTCCGACGTCTGGGCATTTGGAGTATTGCTTTGGGAAATTGCTACCTATGGCATGTCCCCTTACCCGGGAATTGACCTGTCCCAGGTGTATGAGCTGCTAGAGAAGGACTACCGCATGGAGCGCCCAGAAGGCTGCCCAGAGAAGGTCTATGAACTCATGCGAGCATGTTGGCAGTGGAATCCCTCTGACCGGCCCTCCTTTGCTGAAATCCACCAAGCCTTTGAAACAATGTTCCAGGAATCCAGTATCTCAGACGAAGTGGAAAAGGAGCTGGGGAAACAAGGCGTCCGTGGGGCTGTGAGTACCTTGCTGCAGGCCCCAGAGCTGCCCACCAAGACGAGGACCTCCAGGAGAGCTGCAGAGCACAGAGACACCACTGACGTGCCTGAGATGCCTCACTCCAAGGGCCAGGGAGAGAGCGATCCTCTGGACCATGAGCCTGCCGTGTCTCCATTGCTCCCTCGAAAAGAGCGAGGTCCCCCGGAGGGCGGCCTGAATGAAGATGAGCGCCTTCTCCCCAAAGACAAAAAGACCAACTTGTTCAGCGCCTTGATCAAGAAGAAGAAGAAGACAGCCCCAACCCCTCCCAAACGCAGCAGCTCCTTCCGGGAGATGGACGGCCAGCCGGAGCGCAGAGGGGCCGGCGAGGAAGAGGGCCGAGACATCAGCAACGGGGCACTGGCTTTCACCCCCTTGGACACAGCTGACCCAGCCAAGTCCCCAAAGCCCAGCAATGGGGCTGGGGTCCCCAATGGAGCCCTCCGGGAGTCCGGGGGCTCAGGCTTCCGGTCTCCCCACCTGTGGAAGAAGTCCAGCACGCTGACCAGCAGCCGCCTAGCCACCGGCGAGGAGGAGGGCGGTGGCAGCTCCAGCAAGCGCTTCCTGCGCTCTTGCTCCGCCTCCTGCGTTCCCCATGGGGCCAAGGACACGGAGTGGAGGTCAGTCACGCTGCCTCGGGACTTGCAGTCCACGGGAAGACAGTTTGACTCGTCCACATTTGGAGGGCACAAAAGTGAGAAGCCGGCTCTGCCTCGGAAGAGGGCAGGGGAGAACAGGTCTGACCAGGTGACCCGAGGCACAGTAACGCCTCCCCCCAGGCTGGTGAAAAAGAATGAGGAAGCTGCTGATGAGGTCTTCAAAGACATCATGGAGTCCAGCCCGGGCTCCAGCCCGCCCAACCTGACTCCAAAACCCCTCCGGCGGCAGGTCACCGTGGCCCCTGCCTCGGGCCTCCCCCACAAGGAAGAAGCTGGAAAGGGCAGTGCCTTAGGGACCCCTGCTGCAGCTGAGCCAGTGACCCCCACCAGCAAAGCAGGCTCAGGTGCACCAGGGGGCACCAGCAAGGGCCCCGCCGAGGAGTCCAGAGTGAGGAGGCACAAGCACTCCTCTGAGTCGCCAGGGAGGGACAAGGGGAAATTGTCCAGGCTCAAACCTGCCCCGCCGCCCCCACCAGCAGCCTCTGCAGGGAAGGCTGGAGGAAAGCCCTCGCAGAGCCCGAGCCAGGAGGCGGCCGGGGAGGCAGTCCTGGGCGCAAAGACAAAAGCCACGAGTCTGGTTGATGCTGTGAACAGTGACGCTGCCAAGCCCAGCCAGCCGGGAGAGGGCCTCAAAAAGCCCGTGCTCCCGGCCACTCCAAAGCCACAGTCCGCCAAGCCGTCGGGGACCCCCATCAGCCCAGCCCCCGTTCCCTCCACGTTGCCATCAGCATCCTCGGCCCTGGCAGGGGACCAGCCGTCTTCCACCGCCTTCATCCCTCTCATATCAACCCGAGTGTCTCTTCGGAAAACCCGCCAGCCTCCAGAGCGGATCGCCAGCGGCGCCATCACCAAGGGCGTGGTCCTGGACAGCACCGAGGCGCTGTGCCTCGCCATCTCTAGGAACTCCGAGCAGATGGCCAGCCACAGCGCAGTGCTGGAGGCCGGCAAAAACCTCTACACGTTCTGCGTGAGCTATGTGGATTCCATCCAGCAAATGAGGAACAAGTTTGCCTTCCGAGAGGCCATCAACAAACTGGAGAATAATCTCCGGGAGCTTCAGATCTGCCCGGCGACAGCAGGCAGTGGTCCAGCGGCCACTCAGGACTTCAGCAAGCTCCTCAGTTCGGTGAAGGAAATCAGTGACATAGTGCAGAGGTAG
  FUSION_TRANSL      MVDPVGFAEAWKAQFPDSEPPRMELRSVGDIEQELERCKASIRRLEQEVNQERFRMIYLQTLLAKEKKSYDRQRWGFRRAAQAPDGASEPRASASRPQPAPADGADPPPAEEPEARPDGEGSPGKARPGTARRPGAAASGERDDRGPPASVAALRSNFERIRKGHGQPGADAEKPFYVNVEFHHERGLVKVNDKEVSDRISSLGSQAMQMERKKSQHGAGSSVGDASRPPYRGRSSESSCGVDGDYEDAELNPRFLKDNLIDANGGSRPPWPPLEYQPYQSIYVGGMMEGEGKGPLLRSQSTSEQEKRLTWPRRSYSPRSFEDCGGGYTPDCSSNENLTSSEEDFSSGQSSRVSPSPTTYRMFRDKSRSPSQNSQQSFDSSSPPTPQCHKRHRHCPVVVSEATIVGVRKTGQIWPNDGEGAFHGDADGSFGTPPGYGCAADRAEEQRRHQDGLPYIDDSPSSSPHLSSKGRGSRDALVSGALESTKASELDLEKGLEMRKWVLSGILASEETYLSHLEALLLPMKPLKAAATTSQPVLTSQQIETIFFKVPELYEIHKEFYDGLFPRVQQWSHQQRVGDLFQKLASQLGVYRAFVDNYGVAMEMAEKCCQANAQFAEISENLRARSNKDAKDPTTKNSLETLLYKPVDRVTRSTLVLHDLLKHTPASHPDHPLLQDALRISQNFLSSINEEITPRRQSMTVKKGEHRQLLKDSFMVELVEGARKLRHVFLFTDLLLCTKLKKQSGGKTQQYDCKWYIPLTDLSFQMVDELEAVPNIPLVPDEELDALKIKISQIKNDIQREKRANKGSKATERLKKKLSEQESLLLLMSPSMAFRVHSRNGKSYTFLISSDYERAEWRENIREQQKKCFRSFSLTSVELQMLTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSSKALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPTKTRTSRRAAEHRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNLFSALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSPKPSNGAGVPNGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSASCVPHGAKDTEWRSVTLPRDLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTVTPPPRLVKKNEEAADEVFKDIMESSPGSSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGSALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRRHKHSSESPGRDKGKLSRLKPAPPPPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAAKPSQPGEGLKKPVLPATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPPERIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNKFAFREAINKLENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR*
  PFAM_LEFT          Bcr-Abl_Oligo|3-75|3.6e-43^RhoGEF|503-689|1.1e-37^IQ_SEC7_PH|724-769|4.7e-06^PH_5|727-865|6.6e-05^PH|740-864|5.3e-06^Bcr-Abl_Oligo|781-802|0.45^C2|913-1007|2.2e-09^PH|919-973|0.56^RhoGAP|1068-1218|7.9e-52
  PFAM_RIGHT         SH3_1-PARTIAL|~80-113|1.6e-15^SH3_9-PARTIAL|~80-117|1.3e-11^SH3_2-PARTIAL|~80-118|1.9e-10^SH3_3|82-116|9.2e-08^SH3_6|83-112|0.00021^SH2|127-202|1.4e-26^Pkinase_Tyr|242-492|3.3e-102^Pkinase|244-491|2.1e-51^Haspin_kinase|307-388|9.1e-07^F_actin_bind|1027-1130|2.2e-33

Note, fusion annotation and examination of coding effects are additionally performed on FusionInspector outputs if '--FusionInspector inspect|validate' is invoked in your STAR-Fusion run.

De novo reconstruct fusion transcripts using Trinity

The reconstructed transcripts described above by the 'fusion coding effect' module are based on the reference annotations and the reference genome sequence. If you're interested in doing de novo reconstruction of fusion transcripts based on the actual fusion-supporting RNA-Seq reads, capturing any additional variant information or novel sequence features that may be evidence among the reads, you can include de novo reconstruction by invoking the STAR-Fusion '--denovo_reconstruct' parameter. This requires that you include the '--FusionInspector inspect|validate' setting. Based on the STAR-Fusion predicted fusions.

Trinity is used to assemble reads aligned to fusion contigs constructed by FusionInspector. If FusionInspector 'inspect' mode is invoked, then only the fusion-evidence reads are de novo assembled. If FusionInspector 'validate' mode is selected, then all reads aligned to the fusion gene contigs are assembled. Trinity contigs identified as fusion transcripts are captured and reported. The de novo reconstructed fusion transcripts are provided as a FASTA file '${star_fusion_output_directory}/FusionInspector/finspector.gmap_trinity_GG.fusions.fasta', and the transcript accessions are reported in the FusionInspector tab-delimited summary output file.

An example Trinity-reconstructed fusion transcript is:

>TRINITY_GG_12_c0_g1_i1 VAPB--IKZF3:1396-28254
CGGTGTCTGGACCAAGGGGCGCAGGGCTTCGGCGCCAAGATAGCTGATGGCGTTATTGATGGCTTGGTCCATCATGCGGG
TCTGTATGAGCTCACTCTCTTTCTCATACATGTAACTTGAATTATAGTTGACATCAAAGCAGTGGCGCTTCTCACCAATG
AATTTCTGAGGCATTGAGCTTTTTCGTTTTGCCACATTGCTTGCTAATCTGTCCAGTACGAGAGCTCTTTCACTTCCCAT
CTCTGCTTTGATGTGTCTTGCCTCCGCACTTGCTCGGAATTTGAGCTCGTGCTGCGGCTCGAGGCTCAGGACCTGCTCCA
CCTTCGCCATGTTCCTTAGCGGCGGAGCACCTTTGGCGGGGAGACCCCTGAGAGGTCACCGGGGCGGGAAGCGTTAATGC
TGCGCCCGCTTTAAGTTTTACAAAAAGGCGGGGACCGGTCGGGGCACGGGCGGGGGTCCTCTACCG

See our FusionInspector wiki for more details.

Example data and execution:

In the included testing/ directory, you'll find a small sample of fastq reads from a tumor sample. Find fusions using the resource set like so:

cd testing/
 
../STAR-Fusion --left_fq reads_1.fq.gz --right_fq reads_2.fq.gz \
               -O star_fusion_outdir \
               --genome_lib_dir  /path/to/your/CTAT_resource_lib \
               --verbose_level 2  

Want to use Docker?

We provide a Docker image that contains all software pre-installed for running STAR, STAR-Fusion, and FusionInspector, and it's available here: https://hub.docker.com/r/trinityctat/ctatfusion/

If you have docker installed, you can pull the image like so:

docker pull trinityctat/ctatfusion

STAR-Fusion could be run like so via Docker, for example, running within the '${STAR_FUSION_HOME}/Docker' folder, where ${STAR_FUSION_HOME} is your base installation directory for the STAR-Fusion software.

# and now running STAR-Fusion & FusionInspector 'inspect' & Trinity de-novo reconstruction via Docker,
# below we assume you have your reads_1.fq.gz and reads_2.fq.gz in your current working directory
# and also have the ctat_genome_lib_build_dir in your current directory.

docker run -v `pwd`:/data --rm trinityctat/ctatfusion \
    /usr/local/src/STAR-Fusion/STAR-Fusion \
    --left_fq /data/reads_1.fq.gz \
    --right_fq /data/reads_2.fq.gz \
    --genome_lib_dir /data/ctat_genome_lib_build_dir \
    -O /data/StarFusionOut \
    --FusionInspector validate \
    --examine_coding_effect \
    --denovo_reconstruct

Contact Us

Questions, comments, etc?

Visit our STAR-fusion Google group https://groups.google.com/forum/#!forum/star-fusion

Acknowledgements

This effort was largely inspired by earlier work done by Nicolas Stransky in the landmark publication "The landscape of kinase fusions in cancer" by Stransky et al., Nat Commun 2014, in addition to very nice work done by Daniel Nicorici with his FusionCatcher software.

STAR-Fusion is contributed by Brian Haas (Broad Institute), in collaboration with Alex Dobin (Cold Spring Harbor Laboratory). STAR-Fusion is several components being developed as part of the Trinity Cancer Transcriptomics Toolkit.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.