-
Notifications
You must be signed in to change notification settings - Fork 0
Additional initializer #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Small fixes Also adjusted tools to use `source_id` instead of `gbif_id`
Added tool_name_override option for Tools, to be able to use custom tools
Added the use of `verification_scheme` instead of hard coded column names for some of the parts of the runner
plus some minor fixes
Updated readme - added `how to access data` section. Updated pyproject.toml - added dependency libraries directly into this file, instead of link to `requirements.txt`.
Now there is a distinction between scheduled filtering or scheduling jobs and completed ones. Adjusted logic of scripts according to this change.
Extracted initializers into a class structure Rewrote initialization calling file to have a dict with initialization types. Added `initializer_type` in mandatory config fields
Small fix
Please add description of base initializer, inheritance to child initializers, and considerations for making a custom child initializer. Use existing as examples, e.g., filters that could be applied (GBIF excluding As discussed, put this description into a README in the |
Added README.md to initializer. Made small code quality adjustments to initializers
Added doc strings to `base_initializer.py`
Todo: add comments to base initializer and individual data sources as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions and questions.
And lastly, it removes any entries that have `MATERIAL_CITATION` in `basisOfRecord`. | ||
- `FathomNetInitializer`: Initializer for the FathomNet dataset. It filters out any entries without an `uuid` or `url` | ||
value. | ||
Additionally, removes any entries that are "not valid" by the `valid` column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recall what determined if an entry was valid
in FathomNet?
Added docstrings to python files Adjusted main `README` Deleted unused `multimedia_scheme` file
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Updated package requirements
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested generalization for clarifying uuids (beyond just TreeOfLife).
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Updated dependency list Updated readmes
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Fix in checkpoint
competed_queue: Queue of completed batches | ||
total_batches: Total number of batches to process | ||
done_batches: Number of batches that have been processed | ||
""" | ||
|
||
server_name: str | ||
download_complete: threading.Event | ||
competed_queue: queue.Queue[CompletedBatch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here: "competed_queue" instead of "completed_queue" It seems to only exist here (search result), so should be a simple fix.
dataclasses.py misspelling fix
Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
Added new initializers - fathom_net, EoL and Lila
Small fixes:
MATERIAL_CITATION
filtering for gbif initializer, issue described here