HDXDSYS-467 Refactor create revise #60

mcarans · 2024-02-23T04:11:12Z

I've done a major refactor of HDX Python API's create and update code. It fixes the issue found by HOTOSM where tags that were deleted did not get removed from the dataset. The old logic just put into the update field passed to package_revise the merge of the dataset dictionary read from HDX with the metadata to be updated. This was done to minimise changing the code from when package_update was used. However, I misunderstood how the update field operates - elements in lists are not removed by changes in the update field.

The new logic follows the package_revise approach fully, putting only the metadata to be updated in the update field and working out what needs to be deleted to put into the filter field.

github-actions · 2024-02-23T04:13:07Z

Test Results

133 tests +1 133 ✅ +1 1m 14s ⏱️ +46s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit a957770. ± Comparison against base commit 4e29dc5.

♻️ This comment has been updated with latest results.

coveralls · 2024-02-23T04:13:33Z

coverage: 97.546% (-0.1%) from 97.659%
when pulling a957770 on refactor_create_revise
into 4e29dc5 on main.

mcarans · 2024-02-23T04:47:00Z

I had a look at the slightly reduced coverage but don't propose to try to add the 0.2% back again as it looks like the important code is covered and it could be due to reducing the amount of code so that as a percentage of the total anything that continues missing coverage gets larger.

IanHopkinson

Looks good to me!

I think my comments are all around explanation rather than actual code.

_prepare_hdx_call could do with some documentation since it seems to be new

_revise_dataset seems to be where all the action is - it is a rather large chunk of code which could do with splitting up/commenting - maybe extract functions at lines 644 and 682

The massive lump of data in test_update_dataset_resources.py at line 101 could go into a JSON format fixture file which would make the test a bit more readable.

I would comment on the shift in language from merge_update to update - I think it is a good thing.

A few "process" comments based on my previous experience:

I use a PR template which has sections on the purpose of the change, a summary of major file changes, minor file changes and a check for bumping the version. (example here: https://github.com/OCHA-DAP/hdx-cli-toolkit/blob/main/PULL_REQUEST_TEMPLATE.md)
Commit messages could have been a bit more informative!
The list() -> [] and dict() -> {} are good but maybe should have gone into their own commit - I assume you did them with search and replace. As I understand [] and {} are faster than list() and dict().
The test coverage looks really good - personally I would look to tuning the coverage test so that it passes because coverage exceeds a threshold rather than failing because of a trivial reduction in test coverage. Merging a PR with failing test makes me sad :-(
I normally raise an issue (or issues) on GitHub which describes the problem the PR is intended to fix

Move code in _revise_dataset into 2 functions _revise_filter and _revise_files_to_upload_resource_deletions Move test data for test_update_dataset_resources into fixture

… so that the former is always first Improve documentation of _dataset_update_resources and remove mention of merging data sicne we are not doing that any more - we are comparing.

mcarans · 2024-02-23T22:48:09Z

Thanks @IanHopkinson for your helpful feedback.
I missed documenting (and adding type hints) to _prepare_hdx_call - now fixed
I've refactored _revise_dataset and pulled out two functions including the one you mentioned and documented those functions - makes it a lot clearer
I've made the separate fixture - that is a great improvement to test readability
I've fixed the things referring to merging data since you're right that this has changed

I admit I'm sometimes a bit lazy with commit messages and also in not doing separate commits (or even PRs) for changes like the dict -> {} unrelated to the main purpose of the PR :-(

Ironically adding all this has improved test coverage by virtue of there being more lines :-)

IanHopkinson · 2024-02-24T06:40:07Z

@mcarans that looks good! Some of these things I aspire to do but am not necessarily consistent in actually doing

I once parsed Scala source code with regular expressions to extract data, so I'm sensitive to seeing big blobs of data in code files - if it is extracted out into separate data files it becomes reusable.

… breaking other tests

b-j-mills · 2024-02-29T02:25:12Z

I've been doing some testing on stage (editing datasets, adding, removing, and reordering resources, adding and removing tags) and everything is behaving the way I expect it to!

mcarans added 6 commits February 21, 2024 11:28

First refactor

a4c84b6

Second refactor

930e1e9

Second refactor

7bb90d6

The only operation that works here is create

ee8d241

Refactor how updates work so that deleting tags and groups should work

ea419e5

Refactor how updates work so that deleting tags and groups should work

8c865e1

mcarans requested review from IanHopkinson and b-j-mills February 23, 2024 04:11

Add test for tag removal

fa887f7

mcarans changed the title ~~Refactor create revise~~ HDXDSYS-467 Refactor create revise Feb 23, 2024

IanHopkinson approved these changes Feb 23, 2024

View reviewed changes

mcarans added 2 commits February 24, 2024 11:19

Document _prepare_hdx_call

fa07743

Move code in _revise_dataset into 2 functions _revise_filter and _revise_files_to_upload_resource_deletions Move test data for test_update_dataset_resources into fixture

Switch order or parameters filestore_resources and new_resource_order…

c0e00ed

… so that the former is always first Improve documentation of _dataset_update_resources and remove mention of merging data sicne we are not doing that any more - we are comparing.

mcarans added 2 commits February 29, 2024 14:19

Add integration test that actually writes to demo server

78dc6a8

Add vocabulary and configuration cleanup to integration test to avoid…

a957770

… breaking other tests

b-j-mills approved these changes Feb 29, 2024

View reviewed changes

mcarans merged commit d4d6762 into main Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDXDSYS-467 Refactor create revise #60

HDXDSYS-467 Refactor create revise #60

Uh oh!

mcarans commented Feb 23, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2024 •

edited

Loading

Uh oh!

coveralls commented Feb 23, 2024 •

edited

Loading

Uh oh!

mcarans commented Feb 23, 2024

Uh oh!

IanHopkinson left a comment •

edited

Loading

Uh oh!

mcarans commented Feb 23, 2024

Uh oh!

IanHopkinson commented Feb 24, 2024

Uh oh!

b-j-mills commented Feb 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HDXDSYS-467 Refactor create revise #60

HDXDSYS-467 Refactor create revise #60

Uh oh!

Conversation

mcarans commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

coveralls commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcarans commented Feb 23, 2024

Uh oh!

IanHopkinson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcarans commented Feb 23, 2024

Uh oh!

IanHopkinson commented Feb 24, 2024

Uh oh!

b-j-mills commented Feb 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mcarans commented Feb 23, 2024 •

edited

Loading

github-actions bot commented Feb 23, 2024 •

edited

Loading

coveralls commented Feb 23, 2024 •

edited

Loading

IanHopkinson left a comment •

edited

Loading