Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add SQLite Backend #148

Merged
merged 15 commits into from Jan 15, 2021
Merged

add SQLite Backend #148

merged 15 commits into from Jan 15, 2021

Conversation

vincentsarago
Copy link
Member

@vincentsarago vincentsarago commented Jan 13, 2021

The SQLite MosaicJSON backend follows what we did in dynamoDB.

mosaicjson_metadata Table

By default we store the mosaic metadata in a reserved table mosaicjson_metadata. It's schema is following the mosaicJSON specification (minus the tiles).

Schema

row type
mosaicjson TEXT NOT NULL
name TEXT NOT NULL
description TEXT
version TEXT NOT NULL
attribution TEXT
minzoom INTEGER NOT NULL
maxzoom INTEGER NOT NULL
quadkey_zoom INTEGER
bounds JSON NOT NULL
center JSON

{mosaic} Table

When creating a mosaic, the backend will update the mosaicjson_metadata table with the metadata and update a new table with the mosaic_name provided. This table will be used to store the tiles info ({quadkey: [datasets]})

Schema

row type
quadkey TEXT NOT NULL,
assets JSON NOT NULL

path and mosaic name

The database can old multiple mosaics thus the user needs to pass it in the path name

{schema}:///{path to sqlite file}:{mosaic_name}
sqlite:///mydb.db:mosaic

To Do

  • add tests
  • add docs

ref #142

Copy link
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason we set -1 to be the metadata quadkey for DynamoDB is that DynamoDB is a no-sql database, so the values are any json document. In a relational database like SQLite, I think it would be more standard to expand each key. For the tiles mapping, it would still only have two columns: quadkey and tiles, where quadkey could be an integer and tiles a JSONb. But I think standard schema for metadata would have it be a separate table, so that more fields could be their own columns.

cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
cogeo_mosaic/backends/sqlite.py Outdated Show resolved Hide resolved
vincentsarago and others added 2 commits January 13, 2021 12:07
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
@vincentsarago
Copy link
Member Author

@kylebarron

I think the main reason we set -1 to be the metadata quadkey for DynamoDB is that DynamoDB is a no-sql database, so the values are any json document. In a relational database like SQLite, I think it would be more standard to expand each key. For the tiles mapping, it would still only have two columns: quadkey and tiles, where quadkey could be an integer and tiles a JSONb. But I think standard schema for metadata would have it be a separate table, so that more fields could be their own columns.

I agree this is quite of a hack, so following what you are saying we could create 2 tables, using the current mosaic_name as a prefix:

  • {mosaic_name}_metadata
  • {mosaic_name}_tiles

my only worry is that we have to hard code the schema of MosaicJSON, meaning that any change in the mosaicJSON will need to be reflected here to.

@kylebarron
Copy link
Member

I agree this is quite of a hack, so following what you are saying we could create 2 tables, using the current mosaic_name as a prefix:

No, I was suggesting one global metadata table for all mosaics, plus one data table for each mosaic. E.g. geopackage has gpkg_metadata as the global table, plus a table for each dataset layer.

So a global metadata table like mosaicjson_metadata with one row of metadata for each table, plus a {mosaic_name} table for each mosaic.

my only worry is that we have to hard code the schema of MosaicJSON, meaning that any change in the mosaicJSON will need to be reflected here to.

Yes... With relational databases you need to have a stable schema, or otherwise deal with database migration when your schema changes.

You could still have a global metadata table + a data table for each mosaic, while still storing all fields as jsonb 🤷‍♂️

);
""",
self.metadata.dict(),
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit verbose, maybe there is a better way to write this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to named, not positional, arguments

@@ -101,7 +101,7 @@ def assets_for_point(self, lng: float, lat: float) -> List[str]:

@cached(
TTLCache(maxsize=cache_config.maxsize, ttl=cache_config.ttl),
key=lambda self, x, y, z: hashkey(self.path, x, y, z),
key=lambda self, x, y, z: hashkey(self.path, x, y, z, self.mosaicid),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the SQLite backend, but while writing the tests I realized that we were caching this even when the mosaic changed... to fix it we can use the mosaicid (built from the metadata, e.g mosaic version) to make sure we call the function when the mosaic has been updated.

@@ -151,7 +152,7 @@ def update(
"""Update existing MosaicJSON on backend."""
logger.debug(f"Updating {self.mosaic_name}...")

new_mosaic = self.mosaic_def.from_features(
new_mosaic = MosaicJSON.from_features(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using self.mosaic_def.from_features( was a bit misleading

@vincentsarago vincentsarago marked this pull request as ready for review January 14, 2021 03:50
if not self.mosaic_def and not os.path.exists(self.db_path):
raise MosaicNotFoundError(
f"SQLite database not found at path {self.db_path}."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not create a file if mosaic_def is not passed and file doesn't exist yet

@vincentsarago vincentsarago merged commit b0596ab into master Jan 15, 2021
@vincentsarago vincentsarago deleted the SQLiteBackend branch January 15, 2021 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants