Skip to content

Support for nested fields in ElasticsearchDocumentStore #291

@moise-g

Description

@moise-g

Question
Does the ElasticsearchDocumentStore support nested data types . If so how could I select a nested field as search field or text field

Additional context

  • I am able to retrieve documents using document_store.get_all_documents_in_index('document_index')
In [6]: print(document_store.get_document_count())
   ...: for doc in document_store.get_all_documents_in_index('document_index'):
   ...:     pp.pprint(doc)
   ...:     break
   ...:
   ...:
11196
{   '_id': 'qWwo9nIB7CZChbLK4FjC',
    '_index': 'document_index',
    '_score': None,
    '_source': {   'actor_type': 'content',
                   'media': [   {   'actor_id': [],
                                    'actor_type': 'content',
                                    'body': 'FTS International Has Fragility '
                                            'In The Short-Term; Balance Sheet '
                                            'Remains Vulnerable (NYSE:FTSI) '
                                            'Its high leverage ratio is a '
                                            'significant risk factor in the '
                                            'current environment.\n'
                                            '\n'
                                            ...
                                    'linked_concept_search_id': [342],
                                    'locations': None,
                                    'media_type': 'rss',
                                    'meta': [   {   'key': 'link',
                                                    'value': 'https://seekingalpha.com/article/4352427-fts-international-fragility-in-short-term-balance-sheet-remains-vulnerable'}],
                                    'similar_dictionaries': [],
                                    'sql_handle_id': None,
                                    'sql_media_id': 'gXVpzzMzkQ',
                                    'tags': [   231027,
                                                233849,
                                                233408,
                                                231786,
                                                231102,
                                                231124,
                                                231857,
                                                233795
                                            ],
                                    'title': 'FTS International Has Fragility '
                                             'In The Short-Term; Balance Sheet '
                                             'Remains Vulnerable '
                                             '(NYSE:FTSI)'}]},
    '_type': 'actor',
    'sort': [2]}
  • document_store.get_all_documents() gives a KeyError:
In [7]: document_store.get_all_documents()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-0571d29188b7> in <module>
----> 1 document_store.get_all_documents()

~/miniconda3/lib/python3.7/site-packages/haystack/database/elasticsearch.py in get_all_documents(self)
    153     def get_all_documents(self) -> List[Document]:
    154         result = scan(self.client, query={"query": {"match_all": {}}}, index=self.index)
--> 155         documents = [self._convert_es_hit_to_document(hit) for hit in result]
    156         return documents
    157

~/miniconda3/lib/python3.7/site-packages/haystack/database/elasticsearch.py in <listcomp>(.0)
    153     def get_all_documents(self) -> List[Document]:
    154         result = scan(self.client, query={"query": {"match_all": {}}}, index=self.index)
--> 155         documents = [self._convert_es_hit_to_document(hit) for hit in result]
    156         return documents
    157

~/miniconda3/lib/python3.7/site-packages/haystack/database/elasticsearch.py in _convert_es_hit_to_document(self, hit, score_adjustment)
    285         document = Document(
    286             id=hit["_id"],
--> 287             text=hit["_source"][self.text_field],
    288             external_source_id=hit["_source"].get(self.external_source_id_field),
    289             meta=meta_data,

KeyError: 'media.full_body'
  • I have tried using a flattened notation and no results:
In [1]: from haystack.retriever.sparse import ElasticsearchRetriever
   ...: from haystack.database.elasticsearch import ElasticsearchDocumentStore

In [2]: document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", text_field="media.full_body", index="vopak-monitoring",name_field="media.title", sea
   ...: rch_fields=["media.full_body", "media.title"], create_index=False)
   ...: retriever = ElasticsearchRetriever(document_store=document_store)
   ...: res = retriever.retrieve("Energy")
   ...: print(res)
[]
  • I also tried using a custom query without succes.

┆Issue is synchronized with this Jira Task by Unito

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions