Skip to content

Handle 404 NOT FOUND error for download_and_extract_antismash_data#160

Merged
gcroci2 merged 22 commits intodevfrom
fix_404_error_gcroci2
Jul 17, 2023
Merged

Handle 404 NOT FOUND error for download_and_extract_antismash_data#160
gcroci2 merged 22 commits intodevfrom
fix_404_error_gcroci2

Conversation

@gcroci2
Copy link
Contributor

@gcroci2 gcroci2 commented Jul 11, 2023

With this PR:

  • Now the hash of the project's json file (MSV000079284.json in our test case) is checked, and if the file changes an error is raised
  • Now two try-except statements handle better the creation of antismash extracted folders, which happens only if the corresponding zip folder has been downloaded. Also, the genome_status.json file now contains an empty string value for the key bgc_path in the cases in which no zip folder has been downloaded and no folder has been extracted.
  • I added one test in tests/genomics/antismash/test_antismash_downloader.py to assess that download_and_extract_antismash_data does not create an empty folder when the id does not exist in NCBI and when the id does exist there but not in the antismash database (404 error).
  • I added two tests in tests/pairedomics/test_podp_antismash_downloader.py to assess that podp_download_and_extract_antismash_data does not create an empty folder when the id does not exist in NCBI and when the id does exist there but not in the antismash database (404 error); they also check for the correctness of the genome status file.

In conclusion, we were already handling cases in which the genome id was non-existing, but we weren't handling cases in which the genome id was existing in NCBI but not in the antismash database. Now we handle both cases.

Notes for @CunliangGeng:

ERROR tests/test_nplinker_local.py::test_load_data - Exception: Failed to find *ANY* strains, missing strain_mappings.json?
  • running mypy on src/nplinker/genomics/antismash/antismash_downloader.py gives me an error about os.scandir, but I didn't find how to solve it:
src/nplinker/genomics/antismash/antismash_downloader.py:91: error: Value of type variable "AnyStr" of "scandir" cannot be "Union[PathLike[Any], Any]"  [type-var]

@gcroci2 gcroci2 linked an issue Jul 11, 2023 that may be closed by this pull request
@gcroci2 gcroci2 requested a review from CunliangGeng July 11, 2023 09:48
@CunliangGeng
Copy link
Member

We didn't catch such errors because the possibility of having a broken URL wasn't tested. Should we create a unit test for that?

Yes please!

@gcroci2 gcroci2 requested a review from CunliangGeng July 12, 2023 18:06
@gcroci2
Copy link
Contributor Author

gcroci2 commented Jul 12, 2023

See my updated comment above @CunliangGeng

Copy link
Member

@CunliangGeng CunliangGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to define what "non-existent ID" is.
For the function download_and_extract_antismash_data, the non-existent ID (fake ID, not exist in NCBI) or broken ID (not exist in antismash database) are actually the same thing, both will trigger a 404 error. It'd be easier to treat them as the same case I guess.

@CunliangGeng
Copy link
Member

the os.scandir type issue is fine, we can ignore it.

@gcroci2 gcroci2 requested a review from CunliangGeng July 14, 2023 12:58
@CunliangGeng
Copy link
Member

the os.scandir type issue is fine, we can ignore it.

you can change if any(os.scandir(extract_path)) to if any(os.scandir(str(extract_path))) to fix the mypy issue. os.scandir need string as input but we give a Pathlike.

Copy link
Member

@CunliangGeng CunliangGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mypy issues should be fixed in my suggested changes. Apply them, then you can merge this PR 🎉

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>
gcroci2 and others added 4 commits July 17, 2023 10:21
Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>
Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>
Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>
@gcroci2 gcroci2 merged commit 5704188 into dev Jul 17, 2023
@gcroci2 gcroci2 deleted the fix_404_error_gcroci2 branch July 17, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

http 404 error from test_nplinker_local.py

2 participants