Handle 404 NOT FOUND error for `download_and_extract_antismash_data` by gcroci2 · Pull Request #160 · NPLinker/nplinker

gcroci2 · 2023-07-11T09:48:42Z

With this PR:

Now the hash of the project's json file (MSV000079284.json in our test case) is checked, and if the file changes an error is raised
Now two try-except statements handle better the creation of antismash extracted folders, which happens only if the corresponding zip folder has been downloaded. Also, the genome_status.json file now contains an empty string value for the key bgc_path in the cases in which no zip folder has been downloaded and no folder has been extracted.
I added one test in tests/genomics/antismash/test_antismash_downloader.py to assess that download_and_extract_antismash_data does not create an empty folder when the id does not exist in NCBI and when the id does exist there but not in the antismash database (404 error).
I added two tests in tests/pairedomics/test_podp_antismash_downloader.py to assess that podp_download_and_extract_antismash_data does not create an empty folder when the id does not exist in NCBI and when the id does exist there but not in the antismash database (404 error); they also check for the correctness of the genome status file.

In conclusion, we were already handling cases in which the genome id was non-existing, but we weren't handling cases in which the genome id was existing in NCBI but not in the antismash database. Now we handle both cases.

Notes for @CunliangGeng:

tests/test_nplinker_local.py still fails, but it's an error unrelated to this PR (maybe it will be solved with Create StrainMappingLoader #116?):

ERROR tests/test_nplinker_local.py::test_load_data - Exception: Failed to find *ANY* strains, missing strain_mappings.json?

running mypy on src/nplinker/genomics/antismash/antismash_downloader.py gives me an error about os.scandir, but I didn't find how to solve it:

src/nplinker/genomics/antismash/antismash_downloader.py:91: error: Value of type variable "AnyStr" of "scandir" cannot be "Union[PathLike[Any], Any]"  [type-var]

…ct json file

…y extracted folder if url does not exist

… string to bgc_path

tests/test_nplinker_local.py

CunliangGeng · 2023-07-11T10:23:49Z

We didn't catch such errors because the possibility of having a broken URL wasn't tested. Should we create a unit test for that?

Yes please!

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

…wnloader.py

… non existing and broken ids

gcroci2 · 2023-07-12T18:06:24Z

See my updated comment above @CunliangGeng

CunliangGeng

I guess we need to define what "non-existent ID" is.
For the function download_and_extract_antismash_data, the non-existent ID (fake ID, not exist in NCBI) or broken ID (not exist in antismash database) are actually the same thing, both will trigger a 404 error. It'd be easier to treat them as the same case I guess.

tests/genomics/antismash/test_antismash_downloader.py

src/nplinker/genomics/antismash/antismash_downloader.py

CunliangGeng · 2023-07-13T07:27:21Z

the os.scandir type issue is fine, we can ignore it.

…led lookups

CunliangGeng · 2023-07-14T14:52:53Z

the os.scandir type issue is fine, we can ignore it.

you can change if any(os.scandir(extract_path)) to if any(os.scandir(str(extract_path))) to fix the mypy issue. os.scandir need string as input but we give a Pathlike.

CunliangGeng

The mypy issues should be fixed in my suggested changes. Apply them, then you can merge this PR 🎉

tests/genomics/antismash/test_antismash_downloader.py

tests/pairedomics/test_podp_antismash_downloader.py

src/nplinker/genomics/antismash/antismash_downloader.py

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

gcroci2 added 5 commits July 11, 2023 11:43

add function to test_nplinker_local to verify has of the tested proje…

f0cd671

…ct json file

add try except to download_and_extract_antismash_data and remove empt…

e42c6d3

…y extracted folder if url does not exist

add try except to catch the propagated http error and assign an empty…

7a28f91

… string to bgc_path

change save_to_file with the new to_json

449e5bf

make downloader an internal attribute for DatasetLoader class

210d028

gcroci2 linked an issue Jul 11, 2023 that may be closed by this pull request

http 404 error from test_nplinker_local.py #156

Closed

gcroci2 requested a review from CunliangGeng July 11, 2023 09:48

CunliangGeng approved these changes Jul 11, 2023

View reviewed changes

tests/test_nplinker_local.py Outdated Show resolved Hide resolved

gcroci2 and others added 8 commits July 12, 2023 18:43

Update function name in tests/test_nplinker_local.py

23b12d9

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

Merge branch 'dev' into fix_404_error_gcroci2

19e9135

update function name in tests/test_nplinker_local.py

4af9683

add test for broken url in tests/genomics/antismash/test_antismash_do…

70c8b21

…wnloader.py

fix new test with the correct assert statement

bcc196c

create two separate tests for download_and_extract_antismash_data for…

6bdff4c

… non existing and broken ids

fix failed lookup test and add test for broken antismash id

bcd7d89

run isort on relevant files

e4cfcd2

gcroci2 requested a review from CunliangGeng July 12, 2023 18:06

CunliangGeng reviewed Jul 13, 2023

View reviewed changes

tests/genomics/antismash/test_antismash_downloader.py Outdated Show resolved Hide resolved

CunliangGeng reviewed Jul 13, 2023

View reviewed changes

src/nplinker/genomics/antismash/antismash_downloader.py Outdated Show resolved Hide resolved

gcroci2 added 3 commits July 14, 2023 14:54

unify tests for wrong ids for download_and_extract_antismash_data

e6f6a40

fix mypy error about HTTPError in download_and_extract_antismash_data

f1a6634

run yapf on relevant files

692fbec

gcroci2 requested a review from CunliangGeng July 14, 2023 12:58

add details in tests for podp_download_and_extract_antismash_data fai…

59139e4

…led lookups

CunliangGeng approved these changes Jul 14, 2023

View reviewed changes

Update tests/genomics/antismash/test_antismash_downloader.py

8aa4679

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

gcroci2 and others added 4 commits July 17, 2023 10:21

Update src/nplinker/genomics/antismash/antismash_downloader.py

f392b3a

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

Update src/nplinker/genomics/antismash/antismash_downloader.py

6da1b70

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

Update tests/pairedomics/test_podp_antismash_downloader.py

bc3a3ed

Co-authored-by: Cunliang Geng <c.geng@esciencecenter.nl>

directly import HTTPError

7df265e

gcroci2 merged commit 5704188 into dev Jul 17, 2023

gcroci2 deleted the fix_404_error_gcroci2 branch July 17, 2023 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle 404 NOT FOUND error for `download_and_extract_antismash_data`#160

Handle 404 NOT FOUND error for `download_and_extract_antismash_data`#160
gcroci2 merged 22 commits intodevfrom
fix_404_error_gcroci2

gcroci2 commented Jul 11, 2023 •

edited

Loading

Uh oh!

Uh oh!

CunliangGeng commented Jul 11, 2023

Uh oh!

gcroci2 commented Jul 12, 2023

Uh oh!

CunliangGeng left a comment

Uh oh!

Uh oh!

Uh oh!

CunliangGeng commented Jul 13, 2023

Uh oh!

CunliangGeng commented Jul 14, 2023

Uh oh!

CunliangGeng left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcroci2 commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CunliangGeng commented Jul 11, 2023

Uh oh!

gcroci2 commented Jul 12, 2023

Uh oh!

CunliangGeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CunliangGeng commented Jul 13, 2023

Uh oh!

CunliangGeng commented Jul 14, 2023

Uh oh!

CunliangGeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcroci2 commented Jul 11, 2023 •

edited

Loading