Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Sandbox notebooks #317

Merged
merged 19 commits into from Aug 2, 2019
Merged

Adding Sandbox notebooks #317

merged 19 commits into from Aug 2, 2019

Conversation

caitlinadams
Copy link
Collaborator

This pull request will add notebooks developed for the DEA Sandbox to the general notebook repository for DEA.

Still to do:

  • test notebooks on GA-hosted sandbox to validate indexing
  • modify notebook content/formating to fit with needs of GA-hosted sandbox (e.g. dask_average_pixel_example.ipynb needs explanatory text).

Happy to take any other feedback and update as necessary! I'll leave this as a draft PR until we're ready.

-m All notebooks confirmed to run in the FrontierSI DEA Sandbox

- Corrects decibel formula in Water_Classification.ipynb and Shipping_Lanes.ipynb

- Fixes an instance of plot not displaying in do_it_yourself_notebook.ipynb

- Numbers tutorial notebooks for ease of use

- Updates README.md to reflect inclusion in GA Sandbox.
@robbibt
Copy link
Collaborator

robbibt commented Jul 29, 2019

This are fantastic notebooks, and are really the level of documentation we should try and include in all notebooks in dea-notebooks.

In the longer run (not necessary now):

  1. We should probably add in the tags field we have in other notebooks (https://github.com/GeoscienceAustralia/dea-notebooks/blob/master/tags.rst) that allows them to show up in the index page on the DEA notebooks user guide (http://geoscienceaustralia.github.io/digitalearthau/genindex.html), so they're as discoverable as possible.
  2. We also need to think about the best way to differentiate NCI notebooks from DEA Sandbox notebooks. Perhaps describing this clearly at the top of notebooks would be enough?

@caitlinadams
Copy link
Collaborator Author

I'm now working on running the notebooks in the GA sandbox (https://app.sandbox.dea.ga.gov.au) before we perform this PR. This comment is to log data/package issues that I run into.

Case_Studies Notebooks:

  • Agriculture.ipynb: Sentinel-2 data are not available for the specified date/lat-lon range. Notebook relies on s2a/s2b_nrt_granule, which is listed as a product, but the data isn't appearing.
  • Coastal_Erosion.ipynb: Notebook relies on ls8_nbar_scene, which is not available. Could we update the notebook to use ga_ls8c_ard_3 instead, @robbibt?
  • Shipping_Lanes.ipynb: Dask client not working: OSError: Timed out trying to connect to 'tcp://dask-datacube-dask.odchub:8786' after 10 s: [Errno -2] Name or service not known. Might need to map to a different Dask client? Also need the s1_gamma0_geotif_scene product.
  • Water_Classification.ipynb: Need the s1_gamma0_geotif_scene product.

Code_Examples Notebooks:

  • dask_average_pixel_example.ipynb: Dask client not working: OSError: Timed out trying to connect to 'tcp://dask-datacube-dask.odchub:8786' after 10 s: [Errno -2] Name or service not known.

Dataset_Examples Notebooks:

  • WaterObservationsfromSpace_AnnualSummary.ipynb: WOfS annual summary data are not available for the specified date/lat-lon range. Notebook relies on wofs_annual_summary, which is listed as a product, but the data isn't appearing (Empty xarray after load).

Tutorial Notebooks:

  • 02_Do_It_Yourself.ipynb: Sentinel-2 data are not available for the specified date/lat-lon range. Notebook relies on s2a_nrt_granule, which is listed as a product, but the data isn't appearing (Empty xarray after load).

High-level summary:

  • Need to check how s2a_nrt_granule, s2b_nrt_granule and wofs_annual_summary have been indexed and make sure data is available over the regions/times of interest for the relevant notebooks
  • Need to add the s1_gamma0_geotif_scene` product, as used in the DEA Sandbox: https://dashboard.dea-sandbox.test.frontiersi.io/s1_gamma0_geotif_scene
  • Rewrite of Coastal_Erosion.ipynb to use Collection 3 upgrade ls8 data (or index ls8_nbar_scene).
  • Fix the dask client issue

@alexgleith and @robbibt, could you please advise about how we should proceed to address each of these issues? Feel free to tag anyone else involved in the comments so we can keep track of what's happening.

@robbibt
Copy link
Collaborator

robbibt commented Jul 30, 2019

@caitlinadams I'm as of the last few hours working on updating the Coastal_Erosion.ipynb notebook to work on the Collection 3 ga_ls8c_ard_3 because we need that notebook for a workshop next month. It should be ready soon, and work much more nicely as we'll have a full 30 years of data to play with. Only draw-back is that the current ga_ls8c_ard_3-style data is a test set scheduled for deletion soon, so there might be a short period between it being deleted and the final data being processed where the notebook breaks again

@robbibt
Copy link
Collaborator

robbibt commented Jul 30, 2019

Regarding 02_Do_It_Yourself.ipynb with the missing Sentinel-2 data, I've also run into that same problem. I believe that @alexgleith has plans to index the threddsky Sentinel 2 data which comes from the NCI, which should make the entire Sentinel 2 time series available for the notebook

@alexgleith
Copy link
Contributor

alexgleith commented Jul 30, 2019 via email

@robbibt
Copy link
Collaborator

robbibt commented Jul 30, 2019

@alexgleith Regarding the other required datasets that look like they're missing or not indexed on the Sandbox... should we be recording this in any official place other than here? e.g. as issues on another repository, or an email to someone specific?

@harshurampur
Copy link
Collaborator

@robbibt There wasn't any S2 NRT data indexed on DEA Sandbox .. I have started indexing S2 NRT along with Sentinel 2 Definitive .. will update you here once the indexing is complete

@caitlinadams
Copy link
Collaborator Author

The dask issue should now be resolved. The GA sandbox is configured with a different name, so the client value is different. Dask should be working now, however, the calculation in the dask_average_pixel_example.ipynb notebook is taking much longer than it does in the FrontierSI sandbox. I'm just about to try running it again in the GA Sandbox, but it would be good if someone could check this too.

@caitlinadams
Copy link
Collaborator Author

I've now confirmed that the data are available for all of the notebooks. Only remaining fixes are to get dask working on the sandbox, and to update any references of the DEA Dashboard (https://dashboard.dea-sandbox.test.frontiersi.io) to the DEA Explorer (https://explorer.dea.ga.gov.au). The Explorer needs to be updated to reflect the newly added data sets before this step can happen.

Additionally, I've added explanatory text to the Code_Examples/Dask_Average_Colour_Australia.ipynb notebook (https://github.com/GeoscienceAustralia/dea-notebooks/blob/caitlinadams/DEA_sandbox/Code_Examples/Dask_Average_Colour_Australia.ipynb). It would be great if someone could review this as part of this PR.

@caitlinadams caitlinadams marked this pull request as ready for review August 2, 2019 00:57
@caitlinadams
Copy link
Collaborator Author

I've now marked this as ready for review. Happy to take any feedback on text/code inside notebooks before this is merged.

@robbibt
Copy link
Collaborator

robbibt commented Aug 2, 2019

Note: we've deliberately not including some dea-notebook requirements like tags and index files because this needs some longer term thinking to work out how we want these AWS notebooks to show up on the user guide. For now, I think we can relax those rules for this pull request.

The overall intention is that this new directory will be synced directly to the sandbox and will be the first thing people see when they log in.

@CEKrause
Copy link
Collaborator

CEKrause commented Aug 2, 2019

@caitlinadams I'll leave my small comments here as I go through the notebooks.

In the agricultural case study, I think this line is incorrect "It takes values from -1 to 1, with high values corresponding to dense vegetation" - it should be "It takes values from -1 to 1, with high values corresponding to healthy vegetation".

In the coastal erosion case study, the hyperlink in the description at the top doesn't work - 404 error.

Copy link
Collaborator

@CEKrause CEKrause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the Coastal_Erosion notebook. Under the heading for Computer MNDWI, there is a sentence where some words are missing: "When it comes to interpreting the index, , while."

@caitlinadams
Copy link
Collaborator Author

@CEKrause The text is actually there, the html script that colours it in the notebook is failing to appear in github. @robbibt -- I like the colour, but should it be removed so all the text can be seen in github?

Copy link
Collaborator

@CEKrause CEKrause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a couple of very small comments, these look great!

What is the purpose of duplicating the BandIndices.py and utils.py files in both the Case Studies and Tutorials folders?

@caitlinadams
Copy link
Collaborator Author

Because of how Python accesses other files, they have to be contained within the folder of the script that's accessing them. This could be fixed if utility functions were included as a Python package instead, but I think that's a significant amount of development.

@robbibt
Copy link
Collaborator

robbibt commented Aug 2, 2019

@CEKrause That's a really good point... I guess the sandbox will need those scripts to run the notebooks, but it will only be reading in the dea-sandbox subdirectory? Agree it's not an ideal situation though

@robbibt
Copy link
Collaborator

robbibt commented Aug 2, 2019

@caitlinadams I like the text too, but I think we should probably get rid of it to reduce confusion when it doesn't render

Copy link
Collaborator

@BexDunn BexDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@

@BexDunn BexDunn merged commit a0b7dd1 into master Aug 2, 2019
BexDunn added a commit that referenced this pull request Aug 2, 2019
caitlinadams pushed a commit that referenced this pull request Aug 2, 2019
Gabzgit pushed a commit that referenced this pull request Mar 24, 2020
* Initial commit for DEA sandbox notebooks.

* Minor changes to address GA Style

-m All notebooks confirmed to run in the FrontierSI DEA Sandbox

- Corrects decibel formula in Water_Classification.ipynb and Shipping_Lanes.ipynb

- Fixes an instance of plot not displaying in do_it_yourself_notebook.ipynb

- Numbers tutorial notebooks for ease of use

- Updates README.md to reflect inclusion in GA Sandbox.

* Changes dask client address

* Renames and adds text to Dask example notebook

- Reorganises structure and adds clear headings

- Reformats output colour picture

- Removes unnecessary libraries

* Fixes dask client for sandbox

- changes the client from FrontierSI key to DEA key

* Refactor coastal erosion notebook to use Collection 3 data

* Update waterline funcs load_cloudmaskedlandsat func to use Collection 3 data

* Add tidal data for Gold Coast erosion study area

* Minor fix to notebook to remove references to non-used bands

* Clears evaluated cells in Coastal_Erosion notebook

* Updates instructions to use Gold Coast tide file and Explorer link

* Add tidal file for Perth

* Reupload Perth tides with sensible name

* Delete perth_-32.046331_115.716678_tides.csv

* Delete dalyriver_-13.32_130.23_tides.csv

* Delete josephbonapartegulf_-14.95_129.54_tides.csv

* Delete pointstuart_-12.21_131.82_tides.csv

* Updates references to Explorer and dask resource page

- Removes redundant .DS_Store file

* Adds references to dask resourcing page
Gabzgit pushed a commit that referenced this pull request Mar 24, 2020
emmaai pushed a commit that referenced this pull request Feb 14, 2024
* Initial commit for DEA sandbox notebooks.

* Minor changes to address GA Style

-m All notebooks confirmed to run in the FrontierSI DEA Sandbox

- Corrects decibel formula in Water_Classification.ipynb and Shipping_Lanes.ipynb

- Fixes an instance of plot not displaying in do_it_yourself_notebook.ipynb

- Numbers tutorial notebooks for ease of use

- Updates README.md to reflect inclusion in GA Sandbox.

* Changes dask client address

* Renames and adds text to Dask example notebook

- Reorganises structure and adds clear headings

- Reformats output colour picture

- Removes unnecessary libraries

* Fixes dask client for sandbox

- changes the client from FrontierSI key to DEA key

* Refactor coastal erosion notebook to use Collection 3 data

* Update waterline funcs load_cloudmaskedlandsat func to use Collection 3 data

* Add tidal data for Gold Coast erosion study area

* Minor fix to notebook to remove references to non-used bands

* Clears evaluated cells in Coastal_Erosion notebook

* Updates instructions to use Gold Coast tide file and Explorer link

* Add tidal file for Perth

* Reupload Perth tides with sensible name

* Delete perth_-32.046331_115.716678_tides.csv

* Delete dalyriver_-13.32_130.23_tides.csv

* Delete josephbonapartegulf_-14.95_129.54_tides.csv

* Delete pointstuart_-12.21_131.82_tides.csv

* Updates references to Explorer and dask resource page

- Removes redundant .DS_Store file

* Adds references to dask resourcing page
emmaai pushed a commit that referenced this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants