You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if I did something wrong here. I tried indexing parliament-sweden-old locally and got this error:
elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', 'Failed to parse mapping: analyzer [stemmed_en] has not been configured in mappings')
I fixed it by changing the definition of the speech field:
So I think this corpus is just missing its own definition for the mapping (and language) of the speech field? This seems to be true for other parliament corpora too.
What did you expect to happen?
The index operation should run without exceptions.
Screenshot
No response
Where did you find the bug?
a local server
Version
develop (~5.4.0)
Steps to reproduce
Configure the backend settings to include the parliament-sweden-old corpus. Add the corpus definition to CORPORA and add any string value for PP_SWEDEN_OLD_DATA.
Run yarn django index parliament-sweden-old
The text was updated successfully, but these errors were encountered:
Yes, this is indeed still a to do on which I got stuck: I have a branch somewhere that applies the new mapping style (with language suffix) for all corpora, but realized that we can't deploy this unless we reindex all corpora first. I did not know the best solution for this at the time, and then forgot to flag this problem.
What we could do:
apply new mapping style to & reindex all non-English corpora
overhaul mapping style such that only corpora with multiple values in the languages array will get the new mapping style
The second option will be harder to understand for outside developers, I think, but so will be the language suffix for (the majority of) corpora which aren't multilingual.
No, I don't think so, as the analyzers are defined per corpus. The different language analyzers won't affect the query syntax, as far as I can foresee. Visualizations, however, may be affected by this. Will have to look at this again and will comment on the issue if I spot some problems.
Hm, actually, I would prefer it if this were fixed sooner rather than later. I actually do index them quite regularly on my local machine for testing. They're now in a weird state where the code does not work but is still supposed to be maintained.
What went wrong?
Not sure if I did something wrong here. I tried indexing
parliament-sweden-old
locally and got this error:I fixed it by changing the definition of the
speech
field:I-analyzer/backend/corpora/parliament/sweden-old.py
Lines 88 to 89 in d040118
To:
So I think this corpus is just missing its own definition for the mapping (and language) of the speech field? This seems to be true for other parliament corpora too.
What did you expect to happen?
The index operation should run without exceptions.
Screenshot
No response
Where did you find the bug?
Version
develop (~5.4.0)
Steps to reproduce
parliament-sweden-old
corpus. Add the corpus definition toCORPORA
and add any string value forPP_SWEDEN_OLD_DATA
.yarn django index parliament-sweden-old
The text was updated successfully, but these errors were encountered: