Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(raster-loader) Update raster-loader to generate new Raster and Metadata table format #116

Merged

Conversation

luipir
Copy link
Contributor

@luipir luipir commented Oct 27, 2023

@shortcut-integration
Copy link

@luipir luipir self-assigned this Oct 27, 2023
@luipir luipir added the WIP Work in progress label Oct 27, 2023
@luipir luipir force-pushed the feature/sc-358996/update-raster-loader-to-generate-new-raster branch from 2d355c6 to 8335f7f Compare October 30, 2023 14:16
1) metadata can't be null due to shema
2) renamed attrs to metadata
3) ref to metadata as record where block = 0
4) removed unuseful call to old metadata update now probably unnecessary
@luipir luipir removed the WIP Work in progress label Oct 30, 2023
@Jesus89 Jesus89 changed the base branch from main to feature/refactor November 6, 2023 07:30
raster_loader/io.py Outdated Show resolved Hide resolved
raster_loader/io.py Outdated Show resolved Hide resolved
raster_loader/io.py Outdated Show resolved Hide resolved
raster_loader/io.py Outdated Show resolved Hide resolved
raster_loader/io.py Outdated Show resolved Hide resolved
raster_loader/io.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
Copy link
Member

@Jesus89 Jesus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. But there are some details to complete before merging:

Regarding the metadata: Why are width and height fields included

{
  "resolution": 8,
  "bands": [
    "band_1_uint8"
  ],
  "bounds": [
    -4.218749999999985,
    48.92249926375824,
    -1.4062499999999951,
    50.73645513701064
  ],
  "center": [
    -2.812499999999989,
    49.82947720038444,
    8
  ],
  "width": 512,
  "height": 512,
  "block_width": 256,
  "block_height": 256,
  "num_blocks": 4,
  "num_pixels": 262144
}

Additionally, let's remove some leftover code and update the tests

@luipir
Copy link
Contributor Author

luipir commented Nov 6, 2023

Overall looks good. But there are some details to complete before merging:

Regarding the metadata: Why are width and height fields included

give the overall size of the original raster. To have this info, should be calculated with block and num of blocks.
IMHO is exactly as center that can be derived from centroid of BBOX.
Should I remove then width and height?

@Jesus89
Copy link
Member

Jesus89 commented Nov 6, 2023

This is not used actually but as num_blocks or num_pixels, we can keep it as informative data 👍

@luipir
Copy link
Contributor Author

luipir commented Nov 20, 2023

I've found also an important bug:

  • The metadata is not incrementally updated on mosaic rasters. This must be done by reading the existing metadata and merging the old and new values.

There are some values missing in the metadata:

  • minresolution
  • maxresolution
  • nodata

We should keep wide_number_mode=>'round' in PARSE_JSON. This was added to solve a recent bug

@luipir @volaya

I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.

@luipir
Copy link
Contributor Author

luipir commented Nov 20, 2023

  • Fix progressbar count (using the latest job)

hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order

@luipir
Copy link
Contributor Author

luipir commented Nov 20, 2023

  • Recompute num_blocks and num_pixels

could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.

@Jesus89
Copy link
Member

Jesus89 commented Nov 20, 2023

I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.

It's basically this code: https://github.com/CartoDB/raster-loader/pull/116/files#diff-35ccab082c7efa2b1a83c6fef7690c297582c4d96b4551043db8f491083886a2L1177. If we add two pieces of a raster mosaic, the first upload, the metadata will be associated to the first piece, but the second upload, the metadata will need to be updated to include both pieces

hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order

The bug was that it always uses the num_records of the latest job, instead of the record of the specific job. For example, for 1510 blocks, it will push 500 + 500 + 500 + 10, but the previous implementation added 500 + 10 + 10 + 10. It does not matter when the jobs end, but they need to add to the progress bar the correct amount.

could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.

The num params are computed from the raster medatata width, height, block_width, block_height. It pushed the metadata for success upload, if the upload fails, it will have wrong metadata anyway

@volaya
Copy link
Contributor

volaya commented Nov 20, 2023

@Jesus89 I have fixed the issues that you mentioned. While adding a new test for the bounds re-calculation, I found another issue that I have fixed, related to metadata. It was not being written in some cases. Tests were passing because the fixtures were actually wrong.

Let me know what you think

@luipir luipir requested a review from Jesus89 November 21, 2023 08:48
@luipir
Copy link
Contributor Author

luipir commented Nov 21, 2023

LGTM btw Github do not allow me to approve a PR opened by me :)

raster_loader/io.py Outdated Show resolved Hide resolved
Copy link
Member

@Jesus89 Jesus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good @volaya!

It seems you fixed all the reported bugs, added extra checks for mosaic rasters, and added more tests. The refactor of the code looks nice too, it's more clear now.

@Jesus89
Copy link
Member

Jesus89 commented Nov 23, 2023

Let's rename "band_name" to "name" in metadata.bands @volaya. We will apply the changes in the API and AT @vdelacruzb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants