Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARD overpass notebook + supplementary data #736

Merged
merged 8 commits into from Feb 19, 2021

Conversation

Eric-git-999
Copy link
Contributor

@Eric-git-999 Eric-git-999 commented Dec 6, 2020

Supplementary data for ARD_overpass_predictor notebook in dea-notebooks/Frequently_used_code/

Proposed changes

Supplementary data for ARD_overpass_predictor notebook in dea-notebooks/Frequently_used_code/

Checklist (replace [ ] with [x] to check off)

  • Notebook created using the DEA-notebooks template
  • Remove any unused Python packages from Load packages
  • Remove any unused/empty code cells
  • Remove any guidance cells (e.g. General advice)
  • Ensure that all code cells follow the PEP8 standard for code. The jupyterlab_code_formatter tool can be used to format code cells to a consistent style: select each code cell, then click Edit and then one of the Apply X Formatter options (YAPF or Black are recommended).
  • Include relevant tags in the final notebook cell (refer to the DEA Tags Index, and re-use tags if possible)
  • Clear all outputs, run notebook from start to finish, and save the notebook in the state where all cells have been sequentially evaluated
  • Test notebook on both the NCI and DEA Sandbox (flag if not working as part of PR and ask for help to solve if needed)
  • If applicable, update the Notebook currently compatible with the NCI|DEA Sandbox environment only line below the notebook title to reflect the environments the notebook is compatible with

Supplementary data for ARD_overpass_predictor notebook in dea-notebooks/Frequently_used_code/
@Eric-git-999 Eric-git-999 changed the title Add files via upload Add input dataset for ARD overpass predictor notebook Dec 6, 2020
@Eric-git-999 Eric-git-999 changed the title Add input dataset for ARD overpass predictor notebook Add input dataset for ARD overpass notebook Dec 6, 2020
@MatthewJA
Copy link
Contributor

MatthewJA commented Dec 6, 2020

Hi Eric, thanks for your PR. I can't seem to find the ARD overpass notebook in question? Could you please provide a link? Thanks.

@Eric-git-999
Copy link
Contributor Author

Eric-git-999 commented Dec 6, 2020 via email

@MatthewJA
Copy link
Contributor

Great, looking forward to it :) Just add it to this PR and we can look at the notebook + data at the same time.

@Eric-git-999 Eric-git-999 changed the title Add input dataset for ARD overpass notebook Add ARD overpass notebook + supplementary data Dec 7, 2020
@Eric-git-999
Copy link
Contributor Author

Hey I have added the notebook!

@MatthewJA MatthewJA self-requested a review December 7, 2020 00:08
@Eric-git-999
Copy link
Contributor Author

Tidied up the fist MD cell with a link to DEA Sandbox, and the DEA image

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can assume that the user will change their input file if they want to analyse somewhere else. Instead of telling the user to change the file, explain how the file is formatted (which you've done below anyway) and they can make that change if they want. So I reckon remove the "Caution" line.

The input file is csv now right, not xlsx ?

I don't quite understand this line, as I think I should be able to run the notebook without making nchanges:

Make changes to the notebook, following the Steps in bold

Load packages shouldn't be a subsection of description, it should be in a getting started section - check the template and make sure you're matching how it is formatted and structured.


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the secondary overpasses not already in datetime format? I think you can get pd.read_csv to force them all into datetime format automatically.

What is a secondary overpass and why would I want to have one in my input? Could you please add a little explanation?

Don't use os.chdir as it has a tendency to make the rest of the notebook harder to understand and may break existing scripts. Read the file using a relative path instead. You also can't assume that this notebook is in jovyan/ (e.g. I am testing it in a different place!) so try something like ../Supplementary_data/ARD_overpass_predictor/overpass_input.csv instead.

Input file looks good!


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no reason not to use the more accurate timestamps, let's use those.


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of 20? I'm a bit confused as to what this is doing. I think it's finding the next 20 times Landsat will be overhead? Please make this a bit clearer.


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why 32?


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is evident from the input file, then we should be able to automatically extract it from the input file. Please do that instead of having to edit the notebook if at all possible.

Also, overpass_input.csv?


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the datestep means. Is it a meaningful value? If not, maybe we could just give the rows an index?


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can output all of the field sites that were in the input? That would make it easier.


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_csv rather than to_excel


Reply via ReviewNB

@@ -0,0 +1,1677 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea to have the output filename at the top of the notebook along with other analysis parameters like the input name.


Reply via ReviewNB

@MatthewJA
Copy link
Contributor

This is a really useful notebook @EricHay, and it'll be a great addition to Frequently_used_code! A few comments so far that I've posted on ReviewNB, and a few I'll post here. My main request is that you add more documentation. This looks like a useful tool, but dea-notebooks is an entry point for much of the DEA environment, so we want all notebooks to be well-explained and detailed so that even beginners can understand them. Take a look at some of the other Frequently_used_code notebooks - that's the level of documentation I'd like to see! Could you please add more markdown cells that explain what you're doing and why so that people without a strong background in the topic can understand what's going on? There's also a lot of repetition going on between the satellites and locations, which could be eliminated by some loops or functions. That'd make the notebook much more easy to edit and work with, and so it'd be much more useful! Thanks :)

@Eric-git-999
Copy link
Contributor Author

Great thanks Matthew. I sure can fix up the documentation and code. I did this a while back as a learning exercise and it is a pretty convoluted process!

@robbibt
Copy link
Collaborator

robbibt commented Dec 7, 2020

Hey @EricHay, notebook looks awesome! To echo some of what Matthew wrote above, I think the key changes that are needed are:

  1. Extra markdown documentation and explanation from a really basic beginner level, explaining both what each section does, but also why it is needed. A lot of users of dea-notebooks have never dealt with complex code before, so we try and walk them through everything pretty slowly. This is particularly the case for the Frequently_used_code directory which is kind of like a "library" where users can jump in and copy and paste examples of code into their own analyses - we want them to have as good an idea as possible about what each section does so they can re-use and re-purpose the code themselves later on.

  2. Wherever possible, it would be great if all bits of code that require user input be moved up the very top in an "analysis parameters" section (see example here). We typically find that users somewhat blindly run through most of the notebook body without paying close attention to things that need to be changed later in the code, so having all the configurable bits in one single up-front section means they only need to focus on changing one bit rather than having to be on the look out for making changes all the way through in order to get valid results.

Happy to help out with any of this stuff! It's looking great through, and will be an excellent addition to the repo. 🚀

@Eric-git-999
Copy link
Contributor Author

Thanks @robbibt, sorry i'm a bit flat out today with meetings but will get to polishing this off soon. Agree on both points 👍 and thanks for the feedback!

@MatthewJA
Copy link
Contributor

MatthewJA commented Dec 8, 2020

Take your time, no rush :)

Also feel free to Slack me (or Robbi probably?) if you need any help figuring out how to document it.

Copy link
Contributor Author

Ok I have done some major tidying. I have simplified things A LOT! It is actually kinda fun to re-visit old notebooks and fix them up.

The notebook is now pretty much automatic. No more specifying which sites to order by etc, and will merge using pd.concat instead of pd.merge which was messy. You can specify an output directory / file name at the top of the notebook. I have also simplified the input file to use 3 sites as an example, and added hashed out lines where you can add extra sites.

The only comment I couldn't really address was about repitition around the looping for different satellites. I am not sure how to "loop loops" :p

I combined these all into one cell and tried to explain it a bit better.

Let me know if I should change anything further!

@Eric-git-999
Copy link
Contributor Author

Sorry I forgot to remove some text referring to the old Process. Should be good now!

@@ -0,0 +1,864 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented out bits make the code a little harder to follow:

#Sentinel_2A = Sentinel_2A + datetime.timedelta(hours=10) #convert to local time (Aus eastern standard time) = utc + 10 hours
#Sentinel_2A ### to AEDT, add 11h not 10 ###

I think it might be better to simplify the code by removing them from here and instead make it really clear in the Combine dataset bit that the times are in UTC (?) (and possibly give an example of how to convert the columns in the final table there).


Reply via ReviewNB

@@ -0,0 +1,864 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: can we remove the commented out lines here? Anyone who wants to look at the data can always edit and add them in themselves

#S2B_combined

Reply via ReviewNB

@@ -0,0 +1,864 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table looks really nice and useful! The "Site" title for the first column is a tiny bit confusing (perhaps it should be "Overpass"?) but that's a super minor thing and probably not worth updating.


Reply via ReviewNB

@robbibt
Copy link
Collaborator

robbibt commented Jan 11, 2021

Hey @EricHay , I just posted a few very minor comments above, it's looking great! Thanks for all your work in updating this, I think it should be much easier to use now.

@Eric-git-999
Copy link
Contributor Author

Thanks @robbibt. Easy enough fixes, I agree on all points. This has been a bit of a backburner project, and I didn't even notice the "Site" column in the final table! That can definitely be dropped. I will do another quick tidy, and comment again once done.

@Eric-git-999
Copy link
Contributor Author

Ok, tidied up the unnecessary commented-out code in cells, updated the final table with "Overpass" as the index, and added optional code in the Combine Dataset section to add / alter time zones, with UTC to AEST as an example (+10h). I think this is straight-forward enough for an average person with some Python knowledge to utilise now :)

Copy link
Collaborator

@robbibt robbibt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, did I forget to accept this? I think this looks much nicer and easier to use/follow, thanks for putting in these changes!

When you're happy to merge, select the "squash and merge" option below :)

@robbibt robbibt merged commit f294ec0 into GeoscienceAustralia:develop Feb 19, 2021
emmaai pushed a commit that referenced this pull request Feb 14, 2024
* Add files via upload

Supplementary data for ARD_overpass_predictor notebook in dea-notebooks/Frequently_used_code/

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants