Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds data_source metadata to ElementMetadata #690

Merged
merged 21 commits into from
Jun 8, 2023

Conversation

ryannikolaidis
Copy link
Contributor

@ryannikolaidis ryannikolaidis commented Jun 6, 2023

We need to add a few abstractions to the ingest base classes to support querying data and populating it when ingesting files. To do so we need to add a few optional fields in Element metadata to support tracking data source metadata:

  • version (to know when something is out of date, last modified could work for this field in some cases)
  • source url (where applicable, so an end user can view original source content)
  • record locator (for querying for the document, which could be used to determine if it exists). It may make sense to make this a dictionary since in some cases there may be multiple values that are used to uniquely identify a document.
  • date_created, date_modified, date_processed (for general query of the document state)

This PR adds these fields to a new data_source field in ElementMetadata. It additionally an exists() property to the base IngestDoc definition. Subclasses should leverage record_locator to determine whether a given IngestDoc exists on the source.

bonus: bump dependencies (adds freezegun to validate date_processed functionality)
bonus: adds functionality to exclude nested metadata by dot notation
bonus: updates ingest fixtures to account for data_source field which is empty with addition of date_processed exclusion

Testing

Adds unit test coverage for data source metadata fields

@cragwolfe
Copy link
Contributor

These make abstractions make sense to me! ✅

@ryannikolaidis ryannikolaidis marked this pull request as ready for review June 8, 2023 00:52
Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ryannikolaidis ryannikolaidis merged commit 2094b97 into main Jun 8, 2023
20 checks passed
@ryannikolaidis ryannikolaidis deleted the ryan/new-connector-abstractions branch June 8, 2023 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants