Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move document_name attribute to meta #217

Merged
merged 7 commits into from Jul 14, 2020
Merged

Move document_name attribute to meta #217

merged 7 commits into from Jul 14, 2020

Conversation

tanaysoni
Copy link
Contributor

@tanaysoni tanaysoni commented Jul 10, 2020

This PR makes all three document stores(Elasticsearch, SQL, and InMemory) compliant in how they ingest documents using the write_documents() method.

The input dicts for write_documents() have a mandatory text field. All other metadata used for filtering documents(eg, name, author, category etc) can be passed under a meta dict. So, a sample dict would look like this:

{
    "text": "....body-of-the-document...",  # mandatory
    "meta": {  # meta is optional
         "year": "2019",  # all values are strings
         "name": "xyz",
         "author": "abc"
    }
}

For backward compatibility, the meta fields can also be passed as top-level keys in the document dict. For example,

{
    "text": "....body-of-the-document...",  # mandatory
    "name": "abc"  # optional; this will get interepreted as a `meta` field
}

@tanaysoni tanaysoni requested a review from tholor July 10, 2020 14:01
@tanaysoni tanaysoni changed the title WIP: Move document_name attribute to meta Move document_name attribute to meta Jul 10, 2020
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got this right we are making the document stores more similar for the input like {"text": "<the-actual-text>", "meta":[{"name": "some name"}]}.
What about input like{"text": "<the-actual-text>", "some_meta_field": "value"...}? I believe this is still treated differently. For elasticsearch we will "collect" these extra fields in "meta", while we don't do this for InMemory and SQL. Right?

@tanaysoni tanaysoni changed the title Move document_name attribute to meta WIP: Move document_name attribute to meta Jul 10, 2020
@tanaysoni tanaysoni requested a review from tholor July 13, 2020 09:42
@tanaysoni tanaysoni changed the title WIP: Move document_name attribute to meta Move document_name attribute to meta Jul 13, 2020
@tanaysoni tanaysoni merged commit b886e05 into master Jul 14, 2020
@tanaysoni tanaysoni deleted the document-name-fix branch July 14, 2020 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants