Issues regarding flair #1

ashokpatowary · 2018-10-15T18:40:35Z

I tried to run the flair pipeline on my MinION RNAseq dataset. I have faces some issues.

while trying align I have came across the following error

[ERROR] unknown preset ' splice '

So I mapped the reads independently with minimap2. Thereafter while using the collapse options python /u/home/Resource/flair/flair.py collapse -r test.fastq -g GRCh37.primary_assembly.genome.fa -q test_strand_corrected.psl -m minimap2 -f GenCode_v28.gtf , it gives me a blank output file. Also I tried using psl_to_sequence.py tools separately, it gives a blank output file.

Am I doing anything wrong?

The text was updated successfully, but these errors were encountered:

belgravia · 2018-10-15T19:25:08Z

I am taking a guess that what's causing psl_to_sequence.py to fail is an inability to match the chromosome names in the reference genome fasta with the names in the psl. I have pushed a new version of bin/psl_to_sequence.py that may help. Otherwise, I'd need to take a quick look at what your files look like.

In regards to the minimap2 usage error, that's been fixed with the newest flair.py wrapper script I just pushed, thank you for notifying me of that. That step is definitely fine to do outside of the wrapper script, as long as you convert to psl moving forward.

ashokpatowary · 2018-10-15T19:37:12Z

Hi @belgravia

Thanks for your quick response. I will try the updated one now. Just one more query. While running the last version, I was having 723040 entries in my input *.psl, 722688 entries in *_strand.psl; 129339 entries in corrected.gp; and only 10941 entries in *_strand_corrected.psl. Does it looks ok?

Thanks

belgravia · 2018-10-15T21:17:41Z

There is an expected drop in entries after correction since any reads that contain a splice junction not conforming to GT-AG splice motif and isn't within window of an annotated GT-AG is removed. A >40% drop in entries is larger than what I've normally seen.

If *_strand_corrected.psl is your output file name from the flair correct step, then what may be happening is:

Many entries are being filtered out for uncorrectable splice sites (--> corrected.gp)
More entries are being filtered out for containing novel, but valid sites (these filtering steps produce strand_corrected.psl)

Potential fixes are:
1a) You could increase the window size (-w) or use a more comprehensive annotation
1b) Your genepred genome annotation may be misformatted for how FLAIR expects it. Have you compared with the genepred in the example files yet?
2a) May be fixed with a better annotation file (1b)
2b) If you choose to move forward with novel, valid splice sites you can now specify just -n in the correct command and those will be retained (please update your flair.py script in that case)

And there should be little to no drop for the first strand/gene inference step. To address that, I'd have to look at a the missing entries since it's unclear to me what could be causing that.

ashokpatowary · 2018-10-17T00:07:00Z

Hi @belgravia

I am still having issues with running it. Also I tried to check the intermidiate files. Run4_strand_corrected_novels.collapse1.psl is a blank file

Inferring strand for reads in PSL
Correcting reads in Run4_strand.psl
chrY
chrX
chr13
chr12
chr11
chr10
chr17
chr16
chr15
chr14
chr19
chr18
chrM
chr22
chr20
chr21
chr7
chr6
chr5
chr4
chr3
chr2
chr1
chr9
chr8
Converting output gp to PSL
Collapsing isoforms
Read data extracted, collapsing
Traceback (most recent call last):
File "/u/home/flair/bin/collapse_isoforms_precise.py", line 303, in
ends = find_best_sites(isoforms[chrom][jset]['tss'], isoforms[chrom][jset]['tss_tes'], chrom)
File "/u/home/Resource/flair/bin/collapse_isoforms_precise.py", line 124, in find_best_sites
found_tss = find_tsss(sites_tss_all, finding_tss=True, max_results=max_results, chrom=chrom)
File "/u/home/Resource/flair/bin/collapse_isoforms_precise.py", line 105, in find_tsss
if t in annotends[chrom] and t - bestsite[0] < closest_annotated - bestsite[0]:
KeyError: 'chrY'
Filtering isoforms
Renaming isoforms
Aligning reads to first-pass isoform reference
[M::mm_idx_gen::0.0017.01] collected minimizers
[M::mm_idx_gen::0.0035.19] sorted minimizers
[M::main::0.0035.06] loaded/built the index for 0 target sequence(s)
[M::mm_mapopt_update::0.0034.92] mid_occ = 0
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 0
[M::mm_idx_stat::0.0034.82] distinct minimizers: 0 (-nan% are singletons); average occurrences: -nan; average spacing: -nan
[M::worker_pipeline::23.1350.76] mapped 453269 sequences
[M::main] Version: 2.12-r847-dirty
[M::main] CMD: minimap2 -a -t 4 --secondary=no /u/home/MinION/Run_04_Part/correction_output/Run4_strand_corrected_novels.collapse1.fa /u/project/
/nanopore/test.fastq
[M::main] Real time: 23.195 sec; CPU: 17.540 sec; Peak RSS: 0.845 GB
Counting isoform expression
Filtering isoforms by read coverage
Removing intermediate files/done!

belgravia · 2018-10-17T20:43:51Z

After the FLAIR v1..0 release, I had pushed an updated version of collapse_isoforms_precise.py that takes an optional gtf that also contained the bug that you found. I have fixed it now, so you can either use the most recent version (commit aa25f2c) or just use the version in the v1.0 release (no gtf support).

On another note, I hope you have fixed your issue with the loss of reads after novel junction filtering!

ashokpatowary · 2018-10-18T00:01:32Z

@belgravia

It working perfect with updates. -novel option solve the issues related to loss of reads. Thank you for your help. I am having one last query. During collapse the minimap2 command use was CMD: minimap2 -a -t 4 --secondary=no /u/home/MinION/Run_04_Part/correction_output/Run4_strand_corrected_novels.collapse1.fa /u/project/nanopore/big.fastq

Are you considering the collapse.fa as the ref genome?

Regards

belgravia · 2018-10-18T19:31:16Z

I'm glad it's working for you now.
Yes, that step is aligning the reads directly to the first-pass isoforms to assign an isoform to each read. Isoforms that surpass some number threshold of reads supporting them are retained in the final isoform output.

ashokpatowary · 2018-10-18T19:37:41Z

@belgravia

Thanks for the explanation and also for this nice tools. It makes life easy.

ashokpatowary · 2018-11-26T21:35:34Z

Hi Alison, I have a query indirectly related to flair script. In the flair output, we get the fasta files on the NOVEL isoforms. Is there any way we can have the fasta files of the KNOWN isoforms of the flair-identified NOVEL isoforms from the input file? Thanks and regards

…

On Thu, Oct 18, 2018 at 12:36 PM Alison ***@***.***> wrote: I'm glad it's working for you now. Yes, that step is aligning the reads directly to the first-pass isoforms to assign an isoform to each read. Isoforms that surpass some number threshold of reads supporting them are retained in the final isoform output. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AopMlU0aleGY9QBh482riCA5rwUs6SMBks5umNgggaJpZM4Xc6oZ> .

belgravia · 2018-11-30T00:24:00Z

The fasta files should include all isoforms, including the annotated ones. If you run bin/identify_similar_annotated_isoforms.py on your final isoform psl, it should rename the isoforms that are annotated to their names in the annotation. This script is already run in the flair.py wrapper.

So for human data, the annotated isoforms should be renamed to 'ENST....' and the novel ones should have the nanopore read name as their name.

Also I'm going to be pushing a faster version of this script and others quite soon, if you can wait a little bit.

ashokpatowary · 2018-12-05T00:16:18Z

Hi Alison, Thanks for your response. I can see and updated version in flair. Is this the version you were talking about? Regards Ashok Patowary, PhD University of California Los Angeles 635 Charles E Young Dr S Los Angeles CA 90095 http://www.ashokpatowary.com/

…

On Thu, Nov 29, 2018 at 4:24 PM Alison ***@***.***> wrote: The fasta files should include all isoforms, including the annotated ones. If you run bin/identify_similar_annotated_isoforms.py on your final isoform psl, it should rename the isoforms that are annotated to their names in the annotation. This script is already run in the flair.py wrapper. So for human data, the annotated isoforms should be renamed to 'ENST....' and the novel ones should have the nanopore read name as their name. Also I'm going to be pushing a faster version of this script and others quite soon, if you can wait a little bit. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AopMld6dAYxN0QZUtc7LNAvKCd4resrEks5u0HqggaJpZM4Xc6oZ> .

belgravia · 2018-12-18T23:58:44Z

Sorry for delay in response! Yes, that script is correct. I combined scripts and there's now a script called bin/identify_gene_isoform.py. You can run it with a command like python identify_gene_isoform.py yourisoforms.psl annotation.gtf yourisoforms_named.psl. The script will output a file using the name of the last argument with the name field rewritten with ensemble gene IDs and transcript IDs. Isoforms that do not have a transcript ID would be novel isoforms.

ashokpatowary closed this as completed Oct 18, 2018

DepledgeLab mentioned this issue Aug 16, 2019

FLAIR collapse misidentifying (and missing) isoforms #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues regarding flair #1

Issues regarding flair #1

ashokpatowary commented Oct 15, 2018

belgravia commented Oct 15, 2018

ashokpatowary commented Oct 15, 2018

belgravia commented Oct 15, 2018

ashokpatowary commented Oct 17, 2018

belgravia commented Oct 17, 2018

ashokpatowary commented Oct 18, 2018

belgravia commented Oct 18, 2018

ashokpatowary commented Oct 18, 2018

ashokpatowary commented Nov 26, 2018 via email

belgravia commented Nov 30, 2018

ashokpatowary commented Dec 5, 2018 via email

belgravia commented Dec 18, 2018

Issues regarding flair #1

Issues regarding flair #1

Comments

ashokpatowary commented Oct 15, 2018

belgravia commented Oct 15, 2018

ashokpatowary commented Oct 15, 2018

belgravia commented Oct 15, 2018

ashokpatowary commented Oct 17, 2018

belgravia commented Oct 17, 2018

ashokpatowary commented Oct 18, 2018

belgravia commented Oct 18, 2018

ashokpatowary commented Oct 18, 2018

ashokpatowary commented Nov 26, 2018 via email

belgravia commented Nov 30, 2018

ashokpatowary commented Dec 5, 2018 via email

belgravia commented Dec 18, 2018