Skip to content

Conversation

@mcarans
Copy link
Contributor

@mcarans mcarans commented Feb 23, 2024

I've done a major refactor of HDX Python API's create and update code. It fixes the issue found by HOTOSM where tags that were deleted did not get removed from the dataset. The old logic just put into the update field passed to package_revise the merge of the dataset dictionary read from HDX with the metadata to be updated. This was done to minimise changing the code from when package_update was used. However, I misunderstood how the update field operates - elements in lists are not removed by changes in the update field.

The new logic follows the package_revise approach fully, putting only the metadata to be updated in the update field and working out what needs to be deleted to put into the filter field.

@github-actions
Copy link

github-actions bot commented Feb 23, 2024

Test Results

133 tests  +1   133 ✅ +1   1m 14s ⏱️ +46s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit a957770. ± Comparison against base commit 4e29dc5.

♻️ This comment has been updated with latest results.

@coveralls
Copy link

coveralls commented Feb 23, 2024

Coverage Status

coverage: 97.546% (-0.1%) from 97.659%
when pulling a957770 on refactor_create_revise
into 4e29dc5 on main.

@mcarans
Copy link
Contributor Author

mcarans commented Feb 23, 2024

I had a look at the slightly reduced coverage but don't propose to try to add the 0.2% back again as it looks like the important code is covered and it could be due to reducing the amount of code so that as a percentage of the total anything that continues missing coverage gets larger.

@mcarans mcarans changed the title Refactor create revise HDXDSYS-467 Refactor create revise Feb 23, 2024
Copy link
Contributor

@IanHopkinson IanHopkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I think my comments are all around explanation rather than actual code.

_prepare_hdx_call could do with some documentation since it seems to be new

_revise_dataset seems to be where all the action is - it is a rather large chunk of code which could do with splitting up/commenting - maybe extract functions at lines 644 and 682

The massive lump of data in test_update_dataset_resources.py at line 101 could go into a JSON format fixture file which would make the test a bit more readable.

I would comment on the shift in language from merge_update to update - I think it is a good thing.

A few "process" comments based on my previous experience:

  1. I use a PR template which has sections on the purpose of the change, a summary of major file changes, minor file changes and a check for bumping the version. (example here: https://github.com/OCHA-DAP/hdx-cli-toolkit/blob/main/PULL_REQUEST_TEMPLATE.md)
  2. Commit messages could have been a bit more informative!
  3. The list() -> [] and dict() -> {} are good but maybe should have gone into their own commit - I assume you did them with search and replace. As I understand [] and {} are faster than list() and dict().
  4. The test coverage looks really good - personally I would look to tuning the coverage test so that it passes because coverage exceeds a threshold rather than failing because of a trivial reduction in test coverage. Merging a PR with failing test makes me sad :-(
  5. I normally raise an issue (or issues) on GitHub which describes the problem the PR is intended to fix

Move code in _revise_dataset into 2 functions _revise_filter and _revise_files_to_upload_resource_deletions
Move test data for test_update_dataset_resources into fixture
… so that the former is always first

Improve documentation of _dataset_update_resources and remove mention of merging data sicne we are not doing that any more - we are comparing.
@mcarans
Copy link
Contributor Author

mcarans commented Feb 23, 2024

Thanks @IanHopkinson for your helpful feedback.
I missed documenting (and adding type hints) to _prepare_hdx_call - now fixed
I've refactored _revise_dataset and pulled out two functions including the one you mentioned and documented those functions - makes it a lot clearer
I've made the separate fixture - that is a great improvement to test readability
I've fixed the things referring to merging data since you're right that this has changed

I admit I'm sometimes a bit lazy with commit messages and also in not doing separate commits (or even PRs) for changes like the dict -> {} unrelated to the main purpose of the PR :-(

Ironically adding all this has improved test coverage by virtue of there being more lines :-)

@IanHopkinson
Copy link
Contributor

@mcarans that looks good! Some of these things I aspire to do but am not necessarily consistent in actually doing

I once parsed Scala source code with regular expressions to extract data, so I'm sensitive to seeing big blobs of data in code files - if it is extracted out into separate data files it becomes reusable.

@b-j-mills
Copy link
Contributor

I've been doing some testing on stage (editing datasets, adding, removing, and reordering resources, adding and removing tags) and everything is behaving the way I expect it to!

@mcarans mcarans merged commit d4d6762 into main Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants