Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ready for batch #44

Merged
merged 2 commits into from
Nov 22, 2023
Merged

Ready for batch #44

merged 2 commits into from
Nov 22, 2023

Conversation

yellowcap
Copy link
Member

What is changing

  • Better retry strategy for Sentinel-1 tiles matching with Sentinel-2
  • Storing data into S3 as compressed netCDF files
  • Uint16 for Sentinel-2 data to save memory and disk space

How it was done

  • Loop through S2 scenes by cloud cover (least first) until a S1 scenes is found within 3 days
  • Use tempfile to write a compressed netcdf and then upload.

Will add batch configs later if this works. Its currently in active testing on AWS Batch

Refs

#10

scripts/datacube.py Outdated Show resolved Hide resolved
scripts/datacube.py Outdated Show resolved Hide resolved
@@ -394,22 +403,63 @@ def process(
return ds_merge
Copy link
Contributor

@weiji14 weiji14 Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to save one MGRS tile, and tried dragging and dropping the NetCDF into QGIS, but it ends up being georeferenced to (0, 0). We can try to add the georeferencing later (using rioxarray) to the xr.Dataset to make it easier to plot in QGIS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea, I'll see if I can get this to work for the tiles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did quite some work on the xarrays so that they are proper rioxarrays. The algorithm now stores tiles that can be opened in Qgis and friends.

@weiji14 weiji14 added the data-pipeline Pull Requests about the data pipeline label Nov 20, 2023
@weiji14 weiji14 added this to the v0 Release milestone Nov 20, 2023
@lillythomas
Copy link
Contributor

The script stack_tile.py works for calling this version of the datacube module and the tile module. We just need to add the file writing to tile.py now.

@yellowcap yellowcap marked this pull request as ready for review November 21, 2023 16:08
@yellowcap
Copy link
Member Author

Ok its working locally. Outputting compressed TIF tiles with 13 bands. I think that is even easier to use than netCDF. I'll try to run this in the cloud later today to see if it works there too.

@yellowcap
Copy link
Member Author

I will add a section for this in the README and add the configurations for the jobs etc about how to run it on Batch over the next few days in a separate PR

@yellowcap yellowcap merged commit b6fad67 into main Nov 22, 2023
2 checks passed
@yellowcap yellowcap deleted the ready-for-batch branch November 22, 2023 13:50
This was referenced Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline Pull Requests about the data pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants