Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing attribute #1

Open
edmundmiller opened this issue Feb 19, 2024 · 3 comments
Open

Error parsing attribute #1

edmundmiller opened this issue Feb 19, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@edmundmiller
Copy link

Hey! Love the concept going on here and wanted to replace some perl scripts in some pipelines with the tool.

I went to make a nf-core module for it and just got errors though.

nf-core/modules#4951

Details

thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"thread '
<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"thread '
<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33thread ':
<unnamed>called `Result::unwrap()` on an `Err` value: "Error parsing attribute"' panicked at 
src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"
thread '<unnamed>' panicked at src/main.rs:197:33:
called `Result::unwrap()` on an `Err` value: "Error parsing attribute"

Reproduced it locally with gxf2bed -i genome.gff3 -o genome.bed as well.

@alejandrogzi
Copy link
Owner

@edmundmiller ,

Hope everything is ok! Gave it a look and the problem arised because of the way these .gtf/.gff test files are build. I had planned to integrate this "parent/child/feat" functionality in the future but given the circumstances the new release got ready earlier.

Just to clarify some things, I build this mostly thinking in working with .gtf/.gff tha followed the Ensembl/GENCODE guidelines (so basically GFF3 and GTF2.5). Test files did not have "common" attribute lines, that is why those test runs failed. Now gxf2bed provides 3 new optional arguments: parent, child, feature. With this, now you can specify the parent gene structure (could be gene or transcript or anything you want if you have a custom .gtf/.gff3), the child (the things that will build up the parent; e.g. exons, CDS) and the feature you want your .bed file to be based on (e.g. "gene_id", "transcript_id" or any other variant). **These arguments have the same default values that were expected in v0.1 ("transcript", "exon", and "transcript_id"), so no functionality has been compromised or erased.

In order to pass the tests nextflow provides, use:

gx2bed -i genome.gtf --output test_gtf.bed --parent gene --child CDS --feature gene_id

for the .gtf file, and

gx2bed -i genome.gff3 --output test_gff.bed --parent gene --child CDS --feature geneID

for the .gff3 file.

This will ensure pass the testing modules. If you want to ensure the test to produce a solid .bed, I'd recommend to just cat few lines from any GENCODE or Ensembl GTF file and upload it as custom tests.

I will publish the new version in the next minutes, so I think you could update some of the files there (hope the conda env gets updated asap, but this will depend on how fast my PR wil be attended).

I am very grateful for your time building the gxf2bed nf module! This was planned for all my tools but I have not had enough time to do it. If you have any other ideas, recommendations or feedback I'd love to hear them!

Best,
Alejandro

edmundmiller added a commit to nf-core/modules that referenced this issue Feb 27, 2024
@edmundmiller
Copy link
Author

Hey! Thanks for the extemely detailed response!

I've updated the version in the module, and added the args to the tests. The gtf one passed, but the gff didn't. 😞 Any other ideas?

Also, is it okay if I add you as a maintainer of the module?

@alejandrogzi
Copy link
Owner

Hi @edmundmiller!

That was kind of strange, with those args no exit code 1 should appear. (with gff3 a blank file should be produced) Anyways, I gave it a deeper look and found some particular features with these test files that produced abnormal outputs (e.g. blank files). I did not consider 1-feature GFF3 lines (my approach was to regex for ";" in both formats, but in 1-feature lines this did not work) , like this for example:

MT192765.1 Genbank gene 259 21548 . + . Parent=unknown_transcript_1

After fixing these some errors with GTF files start appearing. Seems that this GTF test file has an additional blank space at the end of each line:

MT192765.1 Genbank gene 259 21548 . + . gene_id "orf1ab"; gbkey "Gene"; gene "orf1ab"; gene_biotype "protein_coding";_

"_" at the end denotes that blank space.

This new version takes care of all these little things, so nf test should pass this time. These are the commands to use:

gx2bed -i genome.gtf --output test_gtf.bed --parent gene --child CDS --feature gene_id [remains the same]

this outputs:

MT192765.1	258	21548	orf1ab	0	+	258	21545	0	1	8085,	13202,
MT192765.1	21555	25377	S	0	+	21555	25374	0	1	3819,	0,
MT192765.1	25385	26213	ORF3a	0	+	25385	26210	0	1	825,	0,
MT192765.1	26237	26465	E	0	+	26237	26462	0	1	225,	0,
MT192765.1	26515	27184	M	0	+	26515	27181	0	1	666,	0,
MT192765.1	27194	27380	ORF6	0	+	27194	27377	0	1	183,	0,
MT192765.1	27386	27752	ORF7a	0	+	27386	27749	0	1	363,	0,
MT192765.1	27748	27880	ORF7b	0	+	27748	27877	0	1	129,	0,
MT192765.1	27886	28252	ORF8	0	+	27886	28249	0	1	363,	0,
MT192765.1	28266	29526	N	0	+	28266	29523	0	1	1257,	0,
MT192765.1	29550	29667	ORF10	0	+	29550	29664	0	1	114,	0,

and

gx2bed -i genome.gff3 --output test_gff.bed --parent gene --child CDS --feature Parent [changes "geneId" for "Parent"]

this outputs:

MT192765.1	29550	29667	unknown_transcript_1	0	+	29550	29667	0	1	117,	0,

No functionality have been compromised. GENCODE and Ensembl GTFs/GFFs work smoothly (now the tool takes ~0.7s less time due to additional enhancements).

Also, is it okay if I add you as a maintainer of the module?

Yes, of course. Thank you for your time doing this, it is really motivating!

This new version is already published but we still need to wait for the bioconda team to update it.

Let me know if there is anything else I can do!

Best,
Alejandro

@alejandrogzi alejandrogzi self-assigned this Feb 27, 2024
@alejandrogzi alejandrogzi added the bug Something isn't working label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants