Download service implementation for event data #30

djtfmartin · 2022-04-01T15:14:24Z

Exploration required which should include:

Investigate use of Spark QL to support download service
Investigate connector between Spark and Elastic (Elastic SQL) for reading Elastic search from Spark to produce exports. See this

4 potential types of download we could support, each with different complexities in implementation.

a) Single dataset download
These would be full exports of the event datasets with our interpretation (taxonomy etc).
These could be pre-generated using pipelines (similar to DwCA export pipeline) and copied to S3 or FS.
These would satisfy the EcoCommons people.
Complexity: LOW

b) Multiple dataset download
Similar to the above, but the ability to package multiple complete datasets (a zip of zips).
Complexity:MEDIUM

c) Query based cross dataset download
This would be the sort of download we are familiar with for occurrence data, but i question whether it is a good idea for event data, where the datasets are all quite different.
If AVRO based, then events need (globally) unique eventIDs which is something we dont have at the moment.
Complexity: HIGH

d) Sites by species download
Elastic search based, using facets
Complexity: MEDIUM

javier-molina · 2022-04-05T06:37:15Z

New service should be reusable, GBIF is happy to adopt this in the future.

djtfmartin · 2022-05-03T08:16:07Z

Current plan after discussion is to support (a) and (d) in the first instance.

djtfmartin assigned djtfmartin and adam-collins Apr 1, 2022

djtfmartin mentioned this issue Apr 4, 2022

Download Event Data #2

Closed

2 tasks

adam-collins removed their assignment Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download service implementation for event data #30

Download service implementation for event data #30

djtfmartin commented Apr 1, 2022 •

edited

javier-molina commented Apr 5, 2022

djtfmartin commented May 3, 2022

Download service implementation for event data #30

Download service implementation for event data #30

Comments

djtfmartin commented Apr 1, 2022 • edited

javier-molina commented Apr 5, 2022

djtfmartin commented May 3, 2022

djtfmartin commented Apr 1, 2022 •

edited