Skip to content

Create logic for auto-providing name, description, other data sources metadata, where applicable. #158

@maxachis

Description

@maxachis

name and description are required components for data sources, which are currently not yet incorporated in the labeling and annotation.

Additionally, some collectors pull in additional metadata that can be used for further elaborating on metadata for the URL when submitting to data sources.

Thus, logic should be updated to include subtasks which, depending on the batch strategy of the URL in question, extract different metadata that will be submitted to data sources.

A mapping based on different strategies will be provided below:

  • ckan
    • submitted_name -> name
    • description
    • record_format
    • data_portal_type
    • supplying_entity
  • auto_googler
    • title -> name
    • snippet -> description
  • muckrock_simple_search/county_search/all_search
    • title -> name

Ideally, the logic mapping collector URL metadata to data source metadata is flexible to keys missing.

In the database, I'm thinking of having a URL_optional_data_source_metadata table, with a 1:1 relationship with the URL in question.

  • This will hold all metadata not required for the data source.
  • Each column will represent a different value, complete with validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions