Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Scripts folder to dea_tools Python module #705

Merged
merged 14 commits into from Dec 1, 2020
Merged

Conversation

MatthewJA
Copy link
Contributor

@MatthewJA MatthewJA commented Nov 5, 2020

This PR adds a dea_tools module to Tools/, which is installable with pip. It also updates every notebook to use that module instead of inserting Scripts/ into the path.

This is not meant to be a breaking change, so every notebook should still work. I ran every notebook that I could (a handful were too slow or too memory-hungry), so I have confirmed that they still work where possible.

The only breaking change that could happen is on the Sandbox, where user modules don't persist. The Sandbox will need to be updated to pip install this module from the most recent version (or cached version?) on load.

I don't want to merge this PR until we have resolved the Sandbox question (@Kirill888 may be the person to ask?), but I do want to kick-start the process of moving things over.

As discussed on Slack with @cbur24, once we have a functioning module for DEA, we can pretty easily build a DEAfrica component with a dea_tools dependency.


Here's what the old method of loading DEA scripts looked like:

import sys
sys.path.insert(1, '../Scripts')  # or ../../Scripts etc

from dea_plotting import rgb
from dea_spatialtools import xr_rasterize

New method that I introduce here:

from dea_tools.plotting import rgb
from dea_tools.spatial import xr_rasterize

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@robbibt
Copy link
Collaborator

robbibt commented Nov 5, 2020

This is definitely a good one to tag @Kirill888 (and maybe @whatnick and @omad ) as a reviewer too.

As this is such a major change, my preference would be to run this in parallel and not merge it in to develop or stable for a little while, and probably have a chat with @caitlinadams and others from DEAfrica and Core about how this may align with standardising code and workflows between other non-DEA datacubes.

This is super cool though, and great progress on an issue we've been wanting to move forward on for a long time!

@robbibt robbibt linked an issue Nov 5, 2020 that may be closed by this pull request
@MatthewJA
Copy link
Contributor Author

Good suggestions, and I also am open to having it in a parallel branch for a while - the target branch is currently develop, but we can change that.

That said, I actually think we're better pushing this into develop. Most people in DEA seem to be running their own parallel branch anyway, and adding another major branch (that touches every notebook) seems confusing at best and a nightmare to merge at worst. The longer it stays parallel to develop the worse that merge becomes, while right now it's pretty easy.

@MatthewJA
Copy link
Contributor Author

Updated the PR with a code example.

@MatthewJA
Copy link
Contributor Author

MatthewJA commented Nov 6, 2020

Oh BTW, if anyone wants to try this out, you can pip install it anywhere you like:

pip install git+https://github.com/GeoscienceAustralia/dea-notebooks@dea-tools
import dea_tools.waterbodies
dea_tools.waterbodies.get_waterbody('r3dp1nxh8')

@robbibt robbibt changed the title dea_tools module Convert Scripts folder to dea_tools Python module Nov 6, 2020
@BexDunn
Copy link
Collaborator

BexDunn commented Nov 6, 2020

My opinion is that we should go ahead and merge it, and make the changes, with future module developments with @caitlinadams and our external collaborators to build upon this progress. We should announce the changes on the beginner's channel + potentially in a sandbox popup @whatnick ?

@robbibt robbibt requested a review from CEKrause November 6, 2020 00:16
@Kirill888
Copy link
Contributor

there are some junk files included in the commit .lock an the like.

@MatthewJA
Copy link
Contributor Author

My bad, it was hard keeping track of this PR. I'll fix them up now

@caitlinadams
Copy link
Collaborator

Perfect summary, thanks @MatthewJA!

@cbur24
Copy link
Collaborator

cbur24 commented Nov 9, 2020

Just a thought, but this could be a great project for the ODC conference in February. Most of the stakeholders will be available then and a week-long hack means we should be able to make good progress

@robbibt
Copy link
Collaborator

robbibt commented Nov 10, 2020

A reasonably small issue in the scheme of things, but if we do go for an all-of-ODC implementation, we should have a think about how this would interact with the existing odc-tools package here, as it's kind of aimed at a similar type of code, i.e.

"ODC features that DEA is experimenting with or prototyping with the intention of being integrated into odc-core in the future"

@Kirill888
Copy link
Contributor

Kirill888 commented Nov 10, 2020

odc-tools started off as prototypes for code to be maybe included in datacube-core, it have since evolved into "useful code that works with datacube library, but might be out of scope for datacube-core". But odc-tools does not and should not contain any "installation specific" knowledge, like code to load NCI version of Landsat 8, for example. Even RGB code searchers for "red|green|blue" in the names of measurement and does not check for "nbar(t)_red" or things like that.

So to me there are several layers of generality/applicability

  • datacube-core
    • data agnostic (no hard-coded product names, band names etc)
    • should be useful for data processing tasks one might do with datacube
    • data visualization tools should probably not go there (they are hard to test and we want high test coverage in core, they also might depend on way too many externals libraries)
  • odc-tools
    • data agnostic (no hard-coded product names, band names etc)
    • Generic data visualisation tools with complex dependencies can be here
    • Generic metadata munging tools
    • Various prototypes that might end up in core
  • dea-tools
    • Tools for working with specific datacube installations
    • hard-coded product/band names are ok, but best to make them "configurable" to help manage change and migrations across installations

@MatthewJA
Copy link
Contributor Author

From our discussion about this just now, I'm going to fix up this PR so that the Scripts folder passes through to the Tools folder.

@MatthewJA
Copy link
Contributor Author

Longer-term plan:

  1. Make the tools module exist and work
  2. Trial the module by manually installing it editable on Sandbox
  3. Deploy the module on Sandbox once we are happy with it
  4. Remove the Scripts folder
  5. Generalise to de-tools + deafrica-tools

@MatthewJA
Copy link
Contributor Author

MatthewJA commented Nov 24, 2020

I'd like to force push a revert back to c0a82fc to avoid touching every notebook in this PR. Previously that was needed but now since we want to maintain the notebooks exactly as they are with a passthrough Scripts folder it's not necessary any more.

@MatthewJA
Copy link
Contributor Author

MatthewJA commented Nov 24, 2020

OK, I think that should work! This should now be completely compatible with existing notebooks and workflows. You should also be able to pip install dea-tools by running pip in editable mode from the dea-notebooks directory:

pip install -e Tools

@MatthewJA
Copy link
Contributor Author

I'll be trying out my own work for a bit using this branch, to see if anything has broken (the only thing I noticed was that the __all__ definition in dea_bom.py was stopping my splat import, which I've now fixed by deleting __all__). I've run a few notebooks and they have all been fine so far.

@CEKrause
Copy link
Collaborator

I'll be trying out my own work for a bit using this branch, to see if anything has broken (the only thing I noticed was that the __all__ definition in dea_bom.py was stopping my splat import, which I've now fixed by deleting __all__). I've run a few notebooks and they have all been fine so far.

The notebook that would be worth testing is the stream gauge wofs notebook in frequently used code. If that still works, then it should be ok?

@MatthewJA
Copy link
Contributor Author

MatthewJA commented Nov 24, 2020

Funnily enough that was one of the ones I tested! It worked fine as best I could tell, though it'd be great if you could double check

@MatthewJA
Copy link
Contributor Author

This one right? dea-notebooks/Real_world_examples/Mapping_inundation_using_stream_gauges.ipynb

Copy link
Collaborator

@caitlinadams caitlinadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MatthewJA,

This is all looking good to me. For clarity, I would suggest the following changes:

  • update the README in the Scripts folder to describe a little about how the Scripts folder now works (i.e. that these files now import the code stored in the Tools folder). It would also be good to explain that this ensures all current code works, but that users are encouraged to now use the DEA module from the Tools folder (see my next suggestion).
  • update the README in the Tools folder to explain how Tools works, including an example of how to import the desired functionality in a notebook.
  • update the README in the Tools folder to be a .rst file rather than .md file; this is mostly around if we ever build this into the documentation. It's not super critical, but it's easy to do now and will save someone having to do it in future. Feel free to ask for help with this if you haven't worked with .rst before.

These are mostly related to supporting users, so are nice to have rather than critical. I'm happy for you to decide if you want to make these changes and I can re-review if you want. Otherwise, way to go!

@robbibt
Copy link
Collaborator

robbibt commented Nov 30, 2020

It might also be nice to include a standard comment at the top of all the re-factored scripts (e.g. here), explaining where the functions have moved to and why:

e.g. something like:

# Functions in this script have been moved to a new location to allow them to be imported into notebooks as a Python package: <link to new location containing scripts>

That way people can easily find them if they're looking for the source code

@MatthewJA
Copy link
Contributor Author

Should be all good to go @caitlinadams and @robbibt!

Copy link
Collaborator

@robbibt robbibt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, the extra documentation should help any existing users who may be confused about the changes!

Just flagging this as a future idea (not required for this PR): at some point could we move the remaining the notebookapp_* modules out of Scripts entirely and put them in Supplementary_data instead? That way we could just keep the Scripts around for compatibility in the short term, then remove it entirely later on once we have verified the new tools module works as intended.

@MatthewJA
Copy link
Contributor Author

Just gonna merge from develop into this first

@MatthewJA MatthewJA merged commit ffde386 into develop Dec 1, 2020
@MatthewJA MatthewJA deleted the dea-tools branch December 1, 2020 04:30
MatthewJA added a commit that referenced this pull request Mar 11, 2021
* start dea-tools setup

* setting up project structure

* more setup stuff

* setup.py

* added gitignore

* fixed package

* Moved Scripts into Tools

* Added passthrough scripts to Scripts

* removed __all__ from bom.py

* Response to @caitlinadams and @robbibt for #705

* Added dea-tools readme

Co-authored-by: BexDunn <bex.dunn@ga.gov.au>
Co-authored-by: Matthew Alger <matthew.alger@ga.gov.au>
emmaai pushed a commit that referenced this pull request Feb 14, 2024
* start dea-tools setup

* setting up project structure

* more setup stuff

* setup.py

* added gitignore

* fixed package

* Moved Scripts into Tools

* Added passthrough scripts to Scripts

* removed __all__ from bom.py

* Response to @caitlinadams and @robbibt for #705

* Added dea-tools readme

Co-authored-by: BexDunn <bex.dunn@ga.gov.au>
Co-authored-by: Matthew Alger <matthew.alger@ga.gov.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Scripts folder into package
7 participants