Sanger sequences VER1=4.0 does not get correct error probablity #7

RobinVanSchendel · 2019-01-15T23:26:54Z

See subject, for some reasons many bases get an error probability of 1.0, while other programs such as SnapGene Viewer assigns the bases a very different quality. I am not 100% sure this is causing the difference, but it was the only apparent difference as for VER1 = 3.0 it goes fine.

dkatzel-home · 2019-01-16T01:36:03Z

Thank you for using Jillion! I'd be happy to help. Are you able to provide an example of a sanger sequence with this "VER1=4.0" and the equivalent VER1=3.0 ? The sanger parsing code is the oldest code in Jillion and I wrote it about 13 years ago, I'm not sure there was a version 4.0 back then.

Thanks again!

RobinVanSchendel · 2019-01-16T01:48:54Z

Dear Dan, Thanks for your quick response and your help!. I have added two ab1 files: 5565810.ab1 => this is the VER1=4.0 file which gives wrong error probabilities XF1359_0…ab1 => this one is read in just fine If you need more examples, just let me know. Kind regards, Robin van Schendel

RobinVanSchendel · 2019-01-16T18:34:32Z

I hope you received the .ab1 files. I am not sure if GitHub keeps these attachments. If you did not receive them could you send me your e-mail address so I can mail them to you? Thanks again, Robin

dkatzel-home · 2019-01-16T18:59:24Z

it doesn't look like it. looking at the github help on how to attach files to issues it only supports a few fileformats.

Can you please try zipping the 2 files and then attaching the zipfile to this ticket? (might need to use the web GUI)

Thanks

RobinVanSchendel · 2019-01-16T20:21:39Z

Jillion_Test_AB1.zip

RobinVanSchendel · 2019-01-17T16:59:18Z

Hi Dan, I have uploaded the test files and I hope the sequence files are now testable for you

dkatzel-home · 2019-01-17T22:06:09Z

got it thanks

dkatzel-home · 2019-01-18T02:08:32Z

I think I found the problem. will take a bit of time to fix I'll try to get it done over the weekend.

Long story short - ABI1 files store 2 versions of the trace data inside the chromtogram file. An original version and a current version. Often they are identical but sometimes a different basecaller or manual edits are done and they become different. ABI files store this information in datablocks that are stored as binary blobs inside the file with an index for where each entry is inside the file.

The Jillion ABI1 parser was based on data sequenced at TIGR/JCVI which produced over 1 million sanger reads over the course of many years. This parser took a few short cuts to parse this data because I'm not aware of a real file specification for it but many parts have been reverse engineered since the 90s. I think the TIGR sequencing center happened to always have the datablocks for the original vs current version in the same specific order.

For whatever reason, this "version 4" not only has large differences in the original vs current sequence but some of the datablocks are in a different order so in the example you attached the original quality and current qualities are swapped. There may be other differences too, I haven't fully investigated.

I will fix the parser to correctly detect original vs current but in the meantime if you need a work around you can access the original version of the traces:

       AbiChromatogram abiChromatogram = new AbiChromatogramBuilder("id", file).build();
    
       Chromatogram original = abiChromatogram.getOriginalChromatogram();

I would suggest doing a check to see if the current qualities have a lot of 0s in them and if so get the original qualities instead etc.

Thanks! I'll let you know when the fix is tested and pushed.

PS: May I include these files in the Jillion test suite to prevent regressions ?

RobinVanSchendel · 2019-01-19T15:51:37Z

You can add the files to the test suite. I tested you solution and unfortunately that does not work. If you for example open the 5565810 file in Snapgene Viewer (free software) you see that the sequence corresponds to the current version. The original sequence looks quite different from that when I use your workaround. The only difference is that in Snapgene viewer the quality values seem to deviate a lot from those of the current (and original) version. I think Snapgene viewer is correct in terms of sequence and quality. So how does it manage to obtain different quality values from this file?

dkatzel-home · 2019-01-19T16:18:19Z

The jillion version had a bug that had the qualities for the original different sequence switched with the current. pushing the fix now

dkatzel-home · 2019-01-19T16:33:18Z

Everything has been fixed and committed if you pull the latest version from the repository and build it it should work. If you have any questions let me know.

Thanks!

RobinVanSchendel · 2019-01-19T16:44:01Z

Great work! It works now indeed as expected! Thanks a lot for your help!

dkatzel-home closed this as completed in 8a0f7ae Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanger sequences VER1=4.0 does not get correct error probablity #7

Sanger sequences VER1=4.0 does not get correct error probablity #7

RobinVanSchendel commented Jan 15, 2019

dkatzel-home commented Jan 16, 2019

RobinVanSchendel commented Jan 16, 2019 via email •

edited

Loading

RobinVanSchendel commented Jan 16, 2019 via email

dkatzel-home commented Jan 16, 2019

RobinVanSchendel commented Jan 16, 2019

RobinVanSchendel commented Jan 17, 2019

dkatzel-home commented Jan 17, 2019

dkatzel-home commented Jan 18, 2019

RobinVanSchendel commented Jan 19, 2019

dkatzel-home commented Jan 19, 2019

dkatzel-home commented Jan 19, 2019

RobinVanSchendel commented Jan 19, 2019

Sanger sequences VER1=4.0 does not get correct error probablity #7

Sanger sequences VER1=4.0 does not get correct error probablity #7

Comments

RobinVanSchendel commented Jan 15, 2019

dkatzel-home commented Jan 16, 2019

RobinVanSchendel commented Jan 16, 2019 via email • edited Loading

RobinVanSchendel commented Jan 16, 2019 via email

dkatzel-home commented Jan 16, 2019

RobinVanSchendel commented Jan 16, 2019

RobinVanSchendel commented Jan 17, 2019

dkatzel-home commented Jan 17, 2019

dkatzel-home commented Jan 18, 2019

RobinVanSchendel commented Jan 19, 2019

dkatzel-home commented Jan 19, 2019

dkatzel-home commented Jan 19, 2019

RobinVanSchendel commented Jan 19, 2019

RobinVanSchendel commented Jan 16, 2019 via email •

edited

Loading