Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index #978

Open
Kusimeena opened this issue Jul 26, 2020 · 14 comments
Labels

Comments

@Kusimeena
Copy link

Kusimeena commented Jul 26, 2020

Hi, I am trying to align RNA-seq data using STAR version 2.7.5a using the following codes:
STAR --genomeDir /Users/Home/Desktop/STAR_RNAseq/NCBI_GRCh39_index --readFilesIn 01.fastq.gz --runThreadN 2 --readFilesCommand gunzip -c --outFileNamePrefix 01A --quantMode TranscriptomeSAM GeneCounts

and it ended up with,
XITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index

I checked the Log.out and it ends like "finished successfully
DONE: Genome generation, EXITING", so I believe there was no error with index generation. Could you help me how to fix this issue? Thanks.

@Kusimeena Kusimeena changed the title Hi, EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index Jul 26, 2020
@alexdobin
Copy link
Owner

Hi @Kusimeena

please regenerate the genome and check that your drive has not run out of space after genome generation, which could lead to the corruption of the output files.
If this does not help, please send me the Log.out files for both the genome generation and mapping.

Cheers
Alex

@mtekman
Copy link

mtekman commented Aug 3, 2020

Hi, I'm seeing this same error message.

Steps to reproduce

Load STAR environment

     conda create -n starnew star==2.7.5b
     conda activate starnew

(Edit: I can also confirm this occurs on version 2.7.5a also)

Generate STAR index

    STAR --runMode genomeGenerate \
         --genomeDir 'tempstargenomedir' \
         --readFilesCommand zcat \
         --genomeFastaFiles Homo_sapiens.GRCh38.dna.chromosome.21.fa \
         --sjdbOverhang 100 \
         --sjdbGTFfile Homo_sapiens.GRCh38.100.gtf \
         --genomeSAindexNbases 4 \
         --runThreadN 5

where I retrieved the hg38 chromosome 21 FASTA and the hg38 gtf file from:

  • ftp://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz
  • ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz

This ran sucessfully:

  Aug 03 14:55:17 ..... started STAR run
  Aug 03 14:55:17 ... starting to generate Genome files
  Aug 03 14:55:18 ..... processing annotations GTF
  Aug 03 14:55:25 ... starting to sort Suffix Array. This may take a long time...
  Aug 03 14:55:25 ... sorting Suffix Array chunks and saving them to disk...
  Aug 03 14:55:46 ... loading chunks from disk, packing SA...
  Aug 03 14:55:47 ... finished generating suffix array
  Aug 03 14:55:47 ... generating Suffix Array index
  Aug 03 14:55:47 ... completed Suffix Array index
  Aug 03 14:55:47 ..... inserting junctions into the genome indices
  Aug 03 14:55:49 ... writing Genome to disk ...
  Aug 03 14:55:49 ... writing Suffix Array to disk ...
  Aug 03 14:55:51 ... writing SAindex to disk
  Aug 03 14:55:51 ..... finished successfully

Run STARsolo on test data 1K PBMC v2 data

The datasets I retrieved from here, and the barcodes file from here and attempted to map only the first lane L001:

    STAR  --runThreadN 4 \
          --genomeLoad NoSharedMemory \
          --genomeDir tempstargenomedir \
          --readFilesCommand zcat \
          --readFilesIn pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R2_001.fastq.gz pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R1_001.fastq.gz \
          --soloType Droplet \
          --soloCBwhitelist 737K-august-2016.txt \
          --soloBarcodeReadLength 1  \
          --soloCBstart 1 \
          --soloCBlen 16 \
          --soloUMIstart 17 \
          --soloUMIlen 10 \
          --soloStrand 'Forward' \
          --soloFeatures 'Gene' \
          --soloUMIdedup '1MM_All'

and the error message I recieve is:

Aug 03 14:56:52 ..... started STAR run
Aug 03 14:56:52 ..... loading genome

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Aug 03 14:56:53 ...... FATAL ERROR, exiting

Attached are the transcriptInfo.tab file (renamed to .txt for uploading) in my tempstargenomedir and the Log.out file from the STARsolo run.

transcriptInfo.tab.txt
Log.out.txt

@mtekman
Copy link

mtekman commented Aug 3, 2020

It looks like STAR does not like the fact that the GTF file specified more chromosomes than the FASTA

After constraining the GTF file just to chromosome 21, it seems to progress to the mapping stage

    cat Homo_sapiens.GRCh38.100.gtf | grep "^#!"  >  Homo_sapiens.GRCh38.100.chr21.gtf
    cat Homo_sapiens.GRCh38.100.gtf | grep "^21"  >> Homo_sapiens.GRCh38.100.chr21.gtf

@Kusimeena
Copy link
Author

Kusimeena commented Aug 4, 2020 via email

@alexdobin
Copy link
Owner

Hi Meena,

the files did not get attached - they cannot be attached in a reply, you would need to do it via GitHub site.

Cheers
Alex

@Kusimeena
Copy link
Author

Hi Alex,

Here are the files:
01_TestMappingLog.out.zip

IndexLog.out.zip

@alexdobin
Copy link
Owner

Hi Meena, Mehmet,

Mehmet is right - this issue occurs when the GTF files contains extra chromosome not present in the FASTA file.
I will fix the issue shortly and release 2.7.5c.
The bug was introduced in 2.7.5a - so for now you can fall back to 2.7.4a for genome generation.
However, it's always better to sync your FASTA and GTF files.

Cheers
Alex

alexdobin added a commit that referenced this issue Aug 17, 2020
…or cases where GTF file contains extra chromosomes not present in FASTA files.
@alexdobin
Copy link
Owner

Hi Meena, Mehmet,

This bug is fixed in 2.7.5c, please try it out.
Thanks for reporting it!

Cheers
Alex

@davidrequena
Copy link

Hello Alex, I found the error again in version 2.7.7a, could you please take a look?
Thanks,
David

@alexdobin
Copy link
Owner

Hi David,

I do not see this problem in my tests in 2.7.7a.
Have you regenerated the genome with 2.7.7a?
Please send me the first 2 lines from the transcriptInfo.tab file in the genome directory to check for this issue.

Cheers
Alex

@jcolinge
Copy link

jcolinge commented Oct 6, 2021

Dear Alex,

As I am experiencing similar problems. I was doing 2 pass alignment with intermediary genome files generation for pass 2 using the mouse genome from Ensembl (Mus_musculus.GRCm39.104.gtf and Mus_musculus.GRCm39.dna.primary_assembly.fa). After the first pass with STAR 2.7.1a I got this output while generating the intermediary reference genome:

EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes
SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269

Based on the discussion above, I compiled and used STAR 2.7.9a but I still get:

EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes
SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269

I do not know what to do at this stage. Thanks for your help.

Best,
Jacques

@alexdobin
Copy link
Owner

Hi Jacques,

it looks like the file with splice junctions contains chromosomes that are not present in the genome.
What are the STAR commands that you are using?

Cheers
Alex

@jcolinge
Copy link

jcolinge commented Oct 7, 2021 via email

@alexdobin
Copy link
Owner

Hi Jacques,

was the /share/apps/STAR/indexes/Mus_musculus genome index generated with the same
--genomeFastaFiles /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.dna.primary_assembly.fa
and
--sjdbGTFfile /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.104.gtf
files?

Please send me the output of the failed run.

Thanks!
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants