You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigate use of Spark QL to support download service
Investigate connector between Spark and Elastic (Elastic SQL) for reading Elastic search from Spark to produce exports. See this
4 potential types of download we could support, each with different complexities in implementation.
a) Single dataset download
These would be full exports of the event datasets with our interpretation (taxonomy etc).
These could be pre-generated using pipelines (similar to DwCA export pipeline) and copied to S3 or FS.
These would satisfy the EcoCommons people.
Complexity: LOW
b) Multiple dataset download
Similar to the above, but the ability to package multiple complete datasets (a zip of zips).
Complexity:MEDIUM
c) Query based cross dataset download
This would be the sort of download we are familiar with for occurrence data, but i question whether it is a good idea for event data, where the datasets are all quite different.
If AVRO based, then events need (globally) unique eventIDs which is something we dont have at the moment.
Complexity: HIGH
d) Sites by species download
Elastic search based, using facets
Complexity: MEDIUM
The text was updated successfully, but these errors were encountered:
Exploration required which should include:
4 potential types of download we could support, each with different complexities in implementation.
a) Single dataset download
These would be full exports of the event datasets with our interpretation (taxonomy etc).
These could be pre-generated using pipelines (similar to DwCA export pipeline) and copied to S3 or FS.
These would satisfy the EcoCommons people.
Complexity: LOW
b) Multiple dataset download
Similar to the above, but the ability to package multiple complete datasets (a zip of zips).
Complexity:MEDIUM
c) Query based cross dataset download
This would be the sort of download we are familiar with for occurrence data, but i question whether it is a good idea for event data, where the datasets are all quite different.
If AVRO based, then events need (globally) unique eventIDs which is something we dont have at the moment.
Complexity: HIGH
d) Sites by species download
Elastic search based, using facets
Complexity: MEDIUM
The text was updated successfully, but these errors were encountered: