Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with reporting resistance OXA-48 #149

Open
talnor opened this issue Nov 26, 2021 · 10 comments · May be fixed by #173
Open

Issue with reporting resistance OXA-48 #149

talnor opened this issue Nov 26, 2021 · 10 comments · May be fixed by #173

Comments

@talnor
Copy link
Contributor

talnor commented Nov 26, 2021

Describe the bug
Perfect matches to OXA-48 are no longer reporter in microSALT, instead microSALT reports OXA-566 resistance of lower %identities.

Message from KS in Ticket 722237:

...angående att OXA-48 har börjat rapporteras som OXA-566 (identity 99,88%). Kör jag fastq-filerna från er i Resfinder så rapporterar den OXA-48 (100% match). Jag skulle då meddela ungefär när denna förändring inträffat och vad jag kan se så rapporteras prov 20ET200231 (Ticket 112544, v44 2020) som OXA-48 medan prov 20ET500236 (Ticket 935485, v47 2020) rapporteras som OXA-566. Är det något som ändrats i assembly under denna period?

To Reproduce
Steps to reproduce the behavior:

  1. Run sample 20ET200231 or 20ET500236 with microSALT > 3.1.3 and microSALT 3.1.0

Expected behavior
OXA-48 should be reported if this resistances has higher %identity.

Additional context
Is OXA-48 in the raw blast results?

Adding some of the commits made during the period below. Has any affected OXA-48 reporting? Scrape_blast changes likely.

@talnor talnor added the bug label Nov 26, 2021
@talnor
Copy link
Contributor Author

talnor commented Nov 26, 2021

Currently blocked by installation issues

@moahaegglund
Copy link

No longer blocked, installation issues solved in #142.

@pbiology
Copy link
Contributor

Will be resolved in Clinical-Genomics/CG-JASEN#5

@samuell
Copy link

samuell commented Jun 19, 2024

Re-opening this, as it continues to cause problems, and we might need to prioritize it if Jasen is not completed soon.

@samuell samuell reopened this Jun 19, 2024
@samuell
Copy link

samuell commented Jun 19, 2024

I have been running Spades (4.0.0) assembly + Blasting against the resfinder DB semi-manually, using the same commands as used in microSALT, both with and without the --isolate flag to Spades, and the OXA-48 gene is reported correctly with both methods.

This thus strengthens the suspicion that the error might be in the scraping logic.

(Unless there are some differences between Spades 3.13.1 used in microSALT right now, and 4.0.0.)

@samuell
Copy link

samuell commented Jun 25, 2024

(Unless there are some differences between Spades 3.13.1 used in microSALT right now, and 4.0.0.)

Tested this also with Spades 3.13.1 with the same result, so that is apparently not the issue.

@samuell
Copy link

samuell commented Jun 25, 2024

I have now confirmed that an existing OXA-48 is removed in this part of the code (for cleaning up overlapping hits):

# Cleanup of overlapping hits
if type == "seq_type":
identifier = "loci"
elif type == "resistance" or type == "expec":
identifier = "gene"
ind = 0
while ind < len(hypo) - 1:
targ = ind + 1
while targ < len(hypo):
ignore = False
if (
hypo[ind]["contig_name"] == hypo[targ]["contig_name"]
or hypo[ind][identifier] == hypo[targ][identifier]
):
# Overlapping or shared gene
if (
(
hypo[ind].get("contig_start")
>= hypo[targ].get("contig_start")
and hypo[ind].get("contig_start")
<= hypo[targ].get("contig_end")
)
or (
hypo[ind].get("contig_end")
>= hypo[targ].get("contig_start")
and hypo[ind].get("contig_end")
<= hypo[targ].get("contig_end")
)
or (hypo[ind].get(identifier) == hypo[targ].get(identifier))
):
# Rightmost is worse
if float(hypo[ind].get("identity")) * (
1 - abs(1 - hypo[ind].get("span"))
) > float(hypo[targ].get("identity")) * (
1 - abs(1 - hypo[targ].get("span"))
):
del hypo[targ]
ignore = True
# Leftmost is worse
elif float(hypo[ind].get("identity")) * (
1 - abs(1 - hypo[ind].get("span"))
) < float(hypo[targ].get("identity")) * (
1 - abs(1 - hypo[targ].get("span"))
):
del hypo[ind]
targ = ind + 1
ignore = True
# Identical identity and span, seperating based on contig coverage
else:
# Rightmost is worse
if float(hypo[ind].get("contig_coverage")) >= float(
hypo[targ].get("contig_coverage")
):
del hypo[targ]
ignore = True
# Leftmost is worse
elif float(hypo[ind].get("contig_coverage")) < float(
hypo[targ].get("contig_coverage")
):
del hypo[ind]
targ = ind + 1
ignore = True
if not ignore:
targ += 1
else:
pass
ind += 1

Perhaps there is an overlap with OXA-566 ... we'll see.

@samuell
Copy link

samuell commented Jun 25, 2024

And in one of these commits:

deed81b Removed newline char in log
ab280db Update pull_request_template.md
7e38873 Clarified logic in referencer
9757f6b Updated and fixed bugs in pubmlst and ncbi downloads
27e4685 Merge pull request #121 from Clinical-Genomics/resistance-scrape
170b013 Version bump
b4e1231 Added padder safety
5749721 Quast backwards fix
2e626ef Rolledback expec locilength generous lookup
395a218 Locilengths now ignores reference name
a05ef89 Update init.py
3f1488c Attempt to make resistance searches more compatible

@samuell
Copy link

samuell commented Jun 26, 2024

It is introduced with this one:

395a218 Locilengths now ignores reference name

(Tested by cherry-picking the commits one by one into a test branch, using a real-world resistance blast output containing both OXA-566 and OXA-48 instead of the test one, and looking for occurrences of OXA-48 in caplog.text).

samuell added a commit that referenced this issue Jun 26, 2024
This needs to be follow up with additional checks

The same change might need to be done for other matching as well.
samuell added a commit that referenced this issue Jun 26, 2024
This needs to be follow up with additional checks

The same change might need to be done for other matching as well.
@samuell samuell linked a pull request Jun 26, 2024 that will close this issue
9 tasks
samuell added a commit that referenced this issue Jul 5, 2024
This needs to be follow up with additional checks

The same change might need to be done for other matching as well.
samuell added a commit that referenced this issue Jul 11, 2024
This needs to be follow up with additional checks

The same change might need to be done for other matching as well.
@samuell
Copy link

samuell commented Jul 11, 2024

I think I'm starting to have a reasonable test now, that now fails (before implementing the fix), saying:

>     assert "blaOXA-48" in genes
E     assert 'blaOXA-48' in ["aph(3')-III", 'ant(6)-Ia', 'blaOXA-566']

(See https://github.com/Clinical-Genomics/microSALT/actions/runs/9898158769/job/27344305202 )

This is triggered by using some real-world example data where both OXA-48 and OXA-566 are included. But re-assuring to see that we reproduce the inclusion of blaOXA-566 there, as the issue described.

And, with the fix implemented, this test passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants