EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #493

juulluu21 · 2018-09-21T23:31:47Z

**Hi,

I know this is a very old question. Has it been solved yet?
I am running STAR on paired end reads:**

_star --runThreadN 50 --genomeDir index --readFilesIn ASS_1_1.fq.gz ASS__1_2.fq.gz --outFileNamePrefix ASS.mapped.sam --readFilesCommand gzcat

Sep 21 16:20:32 ..... started STAR run
Sep 21 16:20:33 ..... loading genome
Sep 21 16:20:39 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence line: 0
Read Name=@NB501259:103:HNTJLBGX2:1:21310:2440:4630
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=650

Sep 21 16:21:53 ...... FATAL ERROR, exiting
Segmentation fault: 11_

**I have cut my reads by cutadapt. Is there any option in STAR to get this problem around? Or, I have to do something with the cutadapt now?

Thanks.**

alexdobin · 2018-09-28T21:06:59Z

Hi @juulluu21

this indicates some problem with formatting of the fastq files.
Please check that gzcat works as .gz uncompressor on your system - it's usually just zcat.

Cheers
Alex

juulluu21 · 2018-10-01T20:14:35Z

Hi Alex,

Thanks very much. I messed up during the adapter clipping step I guess. Now, its working fine. I have also changed the zcat to gunzip -c option. Here is my command:

star --runThreadN 36 --genomeDir ./Genome --readFilesIn R1.fastq.gz. R2.fastq.gz --outFileNamePrefix R.sam --readFilesCommand gunzip -c

I have two more questions.

I am mapping the reads against my assembled transcriptome. Do you recommend to change any above mentioned parameter? The transcriptome file has ~25000 contigs (or assembled transcripts).

During sequencing, the library was not ribo-depleted. I am getting ~30% mapping with the parameters (commands) mentioned above. Do you think it is normal? Here is my Log.final.out file::

                        _Started job on |	Oct 01 12:21:42
                    Started mapping on |	Oct 01 12:21:49
                           Finished on |	Oct 01 12:32:39

Mapping speed, Million of reads per hour | 234.93

                 Number of input reads |	42417211
             Average input read length |	298
                           UNIQUE READS:
          Uniquely mapped reads number |	11755143
               Uniquely mapped reads % |	27.71%
                 Average mapped length |	284.56
              Number of splices: Total |	84399
   Number of splices: Annotated (sjdb) |	0
              Number of splices: GT/AG |	29043
              Number of splices: GC/AG |	3003
              Number of splices: AT/AC |	468
      Number of splices: Non-canonical |	51885
             Mismatch rate per base, % |	1.20%
                Deletion rate per base |	0.05%
               Deletion average length |	2.19
               Insertion rate per base |	0.03%
              Insertion average length |	3.62
                    MULTI-MAPPING READS:

Thanks a ton again, Alex!

alexdobin · 2018-10-02T12:52:01Z

Hi @juulluu21

compared to the normal mapping rate >90% to a genome, you are getting low mappability.
It might indicate incomplete transcriptome assembly, other possibilities are contamination (BLAST a few unmapped reads to check that) or poor sequencing quality (check the sequencing quality scores distribution).

Cheers
Alex

juulluu21 · 2018-10-04T21:45:15Z

Actually most of the reads are coming from rRNA. My library prep didn't ribo depleted my sample.

However, I see several mismatches in the reads (when I see them in IGV). Would you suggest some stringent mapping? I am also not very happy with the transcriptome that I am mapping against (I see many mismatches/SNPs when we sanger sequence some genes).

Thanks very much Alex!

alexdobin · 2018-10-06T22:10:53Z

Hi @juulluu21

ribosomal reads should appear as multimappers - if their sequences are present in the genome assembly.
You could try a less stringent mapping - allowing more mismatches and/or shorter mapped length.

Cheers
Alex

juulluu21 · 2018-10-10T21:14:18Z

HI Alex,

Thanks very much for your support!

I am now adding using these parameters:
--outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outFilterMismatchNmax 20

If I use these parameters
--outFilterScoreMinOverLread 0.2 --outFilterMatchNminOverLread 0.2 --outFilterMismatchNmax 25

            _Number of input reads |	29216568
                  Average input read length |	299
                                UNIQUE READS:
               Uniquely mapped reads number |	11676022
                    Uniquely mapped reads % |	39.96%
                      Average mapped length |	223.20
                   Number of splices: Total |	58467
        Number of splices: Annotated (sjdb) |	0
                   Number of splices: GT/AG |	32605
                   Number of splices: GC/AG |	1837
                   Number of splices: AT/AC |	175
           Number of splices: Non-canonical |	23850
                  Mismatch rate per base, % |	2.23%
                     Deletion rate per base |	0.08%
                    Deletion average length |	1.93
                    Insertion rate per base |	0.04%
                   Insertion average length |	3.71
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |	729810
         % of reads mapped to multiple loci |	2.50%
    Number of reads mapped to too many loci |	105807
         % of reads mapped to too many loci |	0.36%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |	0.00%
             % of reads unmapped: too short |	57.16%
                 % of reads unmapped: other |	0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |	0
                        % of chimeric reads |	0.00%_

If I use these parameters
--outFilterScoreMinOverLread 0.1 --outFilterMatchNminOverLread 0.1 --outFilterMismatchNmax 25

**I am mapping the reads against transcriptome (small ~25000 assembled contigs). I do not have a genome. With outFilterScoreMinOverLread 0.1 and outFilterScoreMinOverLread 0.0, I guess I am getting most of the non-specific mapping. So, I guess I have to be satiefied with --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 --outFilterMismatchNmax 20.

Do you suggest to change between --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 ??

Thanks a ton, Alex!**

alexdobin · 2018-10-12T17:40:06Z

Hi @juulluu21

as you reduce the mapping stringency (mapped length) you can map more reads, however, only small portions of these reads are mapped, which cannot guarantee a correct mapping location. Effectively, you are trading off precision for higher sensitivity. You would have to think if your downstream analysis tolerates low precision (i.e. high false positive rate). In general, I would not recommend going below 0.6 for these parameters.

The best solution to low mappability is to figure out why it happens.
For instance, if your transcriptome assembly is incomplete, a lot of unmapped reads should be going to the parts of the genome not included in the assembly, and forcing them to map to the limited transcriptome will skew your results - might be better to leave such reads unmapped.

Cheers
Alex

juulluu21 · 2018-10-12T18:29:58Z

Thanks Alex.
Ok. I will not compromise with the stringency. I do not want less confident mapping. Actually, ribosomal RNAs were not depleted during the library prep. If I add rRNA to my transcriptome file, % mapping go up to 70% and more.

I will be satisfied with my ~30% mapping.

Thanks!!

zillurbmb51 · 2019-02-21T19:39:07Z

Hello there,
I am having the same problem. I have uncompressed my fastq files manually before mapping with STAR.
(base) [zillur@genomics analysis]$ head ../concat/A1001_S20_R1.fq @NS500438:247:HHK2FBGX7:1:11101:18152:1051 1:N:0:GGAATGTC TTTTTTCACTGACCCGGTGAGGCGGGGGGGCGAGCCCCGAGGGGCTCTCGCTTCTGGCGCCAAGCGCCCGGCCGCG + AAAA/EE/AEEEEEE6EEEAEEEEEEEEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEAEEEEEEE< @NS500438:247:HHK2FBGX7:1:11101:20051:1053 1:N:0:GGAATGTC CCTGGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCCCGTTCCAGTGAGTTCACCCTCTAAATC + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE @NS500438:247:HHK2FBGX7:1:11101:6358:1054 1:N:0:GGAATGTC TCGGGGGTCGCGTAACTAGTTAGCATGCCAGAGTCTCGTTCGTTATCGGAATTAACCAGACAAATCGCTCCACCAA
STAR is giving me same error whether the fastq files are compressed or uncompressed. Any help?
Best regards
Zillur

alexdobin · 2019-02-21T20:58:24Z

Hi @zillurbmb51

it looks like the fastqs are not properly formatted.
The "+" has to be a separate line, i.e.
@NS500438:247:HHK2FBGX7:1:11101:18152:1051 1:N:0:GGAATGTC TTTTTTCACTGACCCGGTGAGGCGGGGGGGCGAGCCCCGAGGGGCTCTCGCTTCTGGCGCCAAGCGCCCGGCCGCG
+
AAAA/EE/AEEEEEE6EEEAEEEEEEEEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEAEEAEEEEEEE<

Cheers
Alex

zillurbmb51 · 2019-02-21T21:07:47Z

Thank you for the quick reply. The "+" is in separate line. It got merged it here after copying (attached)!!
Thanks again.

alexdobin · 2019-02-21T21:18:39Z

These lines look OK.
Please try to cut out a few lines, e.g.
$ head -n1000 ../concat/A1001_S20_R1.fq > smallR1.fastq
and map only those reads.

zehemminger · 2019-04-28T22:39:01Z

Hey Alex,

Just had a follow-up question that was related to this post. I am having a similar error. Paired-end reads that have the same format as zillurbmb51. I have not tried your suggestion about trying a smaller set but I tried running them as single instead of paired reads and they aligned perfectly fine without errors. I was curious if you had seen this before? I also had issues with having multiple threads. my ulimit is unlimited but I was only able to run mapping jobs with 20 threads. Anything more and I would get a permissions error in creating files. For reference, I am running STAR 2.7.0f. Also just wanted to say that you do an amazing job providing support to us all!

alexdobin · 2019-04-29T19:04:25Z

Hi @zehemminger

the OP error did not get resolved, I cannot reproduce with my data. If you can find a small subset that still causes this error, please send it to me along with the Log.out file.
For ulimit, you need to run
$ ulimit -Sn
$ ulimit -Hn
which will give you the soft and hard limits. You can then increase the soft limit up to the hard limit without root privileges.

Cheers
Alex

Buuntu · 2019-05-02T18:54:24Z

I am getting a similar error. When running on fastq.gz files (I've confirmed are formatted correctly, I get):

EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

I am running the command

STAR --runThreadN 16 --genomeDir star --readFilesIn forward_reads.fastq.gz reverse_reads.fastq.gz --outFileNamePrefix star --readFilesCommand gzip -c

And first few lines of a sample file:

@NB502126:79:HGWMMBGX7:1:11101:7703:1047 1:N:0:TCCGCGAA+GCCTCTAT
CTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTTGGTATTCTCTACCTGACCNNCTGTNTCGGTTTGGGGTACGAT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE##EEEE#EEEEEEEEEEEEEEEEE
@NB502126:79:HGWMMBGX7:1:11101:20738:1047 1:N:0:TCCGCGAA+GCCTCTAT
CAGCGTTCAAAGGCGGCTTCCAGCGCCTGCTCCGCTTCGGCCAGCTGCGCCANNACTTNCTGCGTACGATCGTGCG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEE##EEEE#EEEAEEEEEE/EEEEEE
@NB502126:79:HGWMMBGX7:1:11101:8442:1047 1:N:0:TCCGCGAA+GCCTCTAT
GCACTACATGCTTAGTCAGTCTTTACATTCGCTTGCCAGCTGCGGACAGACANNCCACNAACAAACTAGCCTGATT
+

Buuntu · 2019-05-02T19:00:57Z

Okay so when I run with

zcat

or

gunzip -c

now I'm getting:

STAR --runThreadN 16 --genomeDir output/star/ --readFilesIn output/trimmed_forward_reads.fastq.gz output/trimmed_reverse_reads.fastq.gz --outFileNamePrefix output/star --readFilesCommand gunzip -c
May 02 19:31:17 ..... started STAR run
May 02 19:31:17 ..... loading genome
May 02 19:31:18 ..... started mapping
[1]    12366 segmentation fault (core dumped)  STAR --runThreadN 16 --genomeDir output/star/ --readFilesIn      gunzip -c

alexdobin · 2019-05-03T16:28:37Z

Hi Buuntu,

please send me the Log.out file.

Cheers
Alex

Buuntu · 2019-05-03T16:39:18Z

So I was able to fix the error by using the option --genomeSAindexNbases 10 when generating the index. My genome is 5.6 MB so quite small. It takes a while to run with that option but it doesn't crash anymore. I guess I will have to set that manually depending on my genome size?

Thanks Alex

alexdobin · 2019-05-03T16:44:03Z

Right - at the moment this parameter needs to be scaled down for small genomes. (See Manual 2.2.5).
Cheers
Alex

alexdobin added the issue: code Likely to be an issue with STAR code label Sep 28, 2018

alexdobin closed this as completed May 3, 2019

kaermadan mentioned this issue Jan 7, 2020

STAR mapping EXITING because of FATAL ERROR in reads input: short read sequence line: 0 (Everything have been tried) #803

Open

taolincj mentioned this issue Aug 24, 2022

Using SNPsplit for single cell data FelixKrueger/SNPsplit#69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #493

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #493

juulluu21 commented Sep 21, 2018 •

edited

Loading

alexdobin commented Sep 28, 2018

juulluu21 commented Oct 1, 2018

alexdobin commented Oct 2, 2018

juulluu21 commented Oct 4, 2018

alexdobin commented Oct 6, 2018

juulluu21 commented Oct 10, 2018

alexdobin commented Oct 12, 2018

juulluu21 commented Oct 12, 2018

zillurbmb51 commented Feb 21, 2019 •

edited

Loading

alexdobin commented Feb 21, 2019

zillurbmb51 commented Feb 21, 2019

alexdobin commented Feb 21, 2019

zehemminger commented Apr 28, 2019

alexdobin commented Apr 29, 2019

Buuntu commented May 2, 2019

Buuntu commented May 2, 2019 •

edited

Loading

alexdobin commented May 3, 2019

Buuntu commented May 3, 2019 •

edited

Loading

alexdobin commented May 3, 2019

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #493

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #493

Comments

juulluu21 commented Sep 21, 2018 • edited Loading

alexdobin commented Sep 28, 2018

juulluu21 commented Oct 1, 2018

alexdobin commented Oct 2, 2018

juulluu21 commented Oct 4, 2018

alexdobin commented Oct 6, 2018

juulluu21 commented Oct 10, 2018

alexdobin commented Oct 12, 2018

juulluu21 commented Oct 12, 2018

zillurbmb51 commented Feb 21, 2019 • edited Loading

alexdobin commented Feb 21, 2019

zillurbmb51 commented Feb 21, 2019

alexdobin commented Feb 21, 2019

zehemminger commented Apr 28, 2019

alexdobin commented Apr 29, 2019

Buuntu commented May 2, 2019

Buuntu commented May 2, 2019 • edited Loading

alexdobin commented May 3, 2019

Buuntu commented May 3, 2019 • edited Loading

alexdobin commented May 3, 2019

juulluu21 commented Sep 21, 2018 •

edited

Loading

zillurbmb51 commented Feb 21, 2019 •

edited

Loading

Buuntu commented May 2, 2019 •

edited

Loading

Buuntu commented May 3, 2019 •

edited

Loading