Error while mutating Tokenizer not available for language #2601

danielmai · 2018-09-18T17:46:44Z

Title: Error while mutating Tokenizer not available for language

If you suspect this could be a bug, follow the template.

What version of Dgraph are you using?
v1.0.8
Have you tried reproducing the issue with latest release?
Yes.
What is the hardware spec (RAM, OS)?
64 GB. Ubuntu 18.04.
Steps to reproduce the issue (command/config used to run Dgraph).

Run 1 Dgraph Zero and 1 Dgraph Alpha.
Run dgraph live with the 21-million movie data set. (rdf and schema from the benchmarks repo):

dgraph live -r 21million.rdf.gz -s 21million.schema

After running for ~50 minutes, I see a lot of aborts and repeated "Tokenizer not available" errors. Something like this:

2018/09/18 10:41:52 batch.go:125: Error while mutating Tokenizer not available for language: pl
2018/09/18 10:41:52 batch.go:125: Error while mutating Tokenizer not available for language: sl
2018/09/18 10:41:52 batch.go:125: Error while mutating Tokenizer not available for language: hi
2018/09/18 10:41:52 batch.go:125: Error while mutating Tokenizer not available for language: sr
Total Txns done:    17094 RDFs per second:    3992 Time Elapsed: 1h11m22s, Aborts: 70232
Total Txns done:    17094 RDFs per second:    3990 Time Elapsed: 1h11m24s, Aborts: 70232
Total Txns done:    17094 RDFs per second:    3988 Time Elapsed: 1h11m26s, Aborts: 70232
Total Txns done:    17094 RDFs per second:    3986 Time Elapsed: 1h11m28s, Aborts: 70232
2018/09/18 10:42:00 batch.go:125: Error while mutating Tokenizer not available for language: es-419

This happens indefinitely at this point on. Dgraph keeps retrying the failed mutations.

Expected behaviour and actual result.

Dgraph live loader finishes loading the data set.

The text was updated successfully, but these errors were encountered:

danielmai · 2018-09-18T19:17:38Z

I tried re-running dgraph live from a fresh cluster with the following modified schema:

diff --git a/data/21million.schema b/data/21million.schema
index 9dc91a2..7900554 100644
--- a/data/21million.schema
+++ b/data/21million.schema
@@ -4,7 +4,7 @@ genre                : uid @reverse @count .
 initial_release_date : datetime @index(year) .
 rating               : uid @reverse .
 country              : uid @reverse .
-loc                  : geo @index(geo) .
-name                 : string @index(hash, fulltext, trigram) @lang .
+loc                  : geo .
+name                 : string @index(hash, trigram) @lang .
 starring             : uid @count .
 _share_hash_         : string @index(exact) .

It's been running for an hour and I see aborts in the live loader output:

Total Txns done:    19519 RDFs per second:    3823 Time Elapsed: 1h25m5s, Aborts: 1454

In the Alpha's /debug/requests I see this trace for Server.Mutate:


When | Elapsed (s)
-- | --
2018/09/18 12:11:18.324662 | 110.322906 | Server.Mutate
12:11:18.325889 | .  1227 | ... Added Internal edges
12:11:18.328082 | .  2192 | ... Proposing data with key: 01-17683798971607446653. Timeout: 4s
12:11:18.328369 | .   287 | ... Waiting for the proposal.
12:11:22.328219 | 3.999850 | ... Internal context timed out with error: context deadline exceeded. Retrying...
12:11:22.328249 | .    30 | ... Proposing data with key: 01-17134495942084430214. Timeout: 8s
12:11:22.328630 | .   380 | ... Waiting for the proposal.
12:11:30.328496 | 7.999866 | ... Internal context timed out with error: context deadline exceeded. Retrying...
12:11:30.328534 | .    38 | ... Proposing data with key: 01-16310624025916951683. Timeout: 16s
12:11:30.329358 | .   824 | ... Waiting for the proposal.
12:11:46.328672 | 15.999314 | ... Internal context timed out with error: context deadline exceeded. Retrying...
12:11:46.328704 | .    32 | ... Proposing data with key: 01-1565500641011909092. Timeout: 32s
12:11:46.329015 | .   311 | ... Waiting for the proposal.
12:12:18.328814 | 31.999800 | ... Internal context timed out with error: context deadline exceeded. Retrying...
12:12:18.328833 | .    18 | ... Proposing data with key: 01-9267239880346609593. Timeout: 1m4s
12:12:18.329192 | .   359 | ... Waiting for the proposal.
12:13:08.639079 | 50.309887 | ... Done with error: <nil>
12:13:08.642800 | .  3721 | ... Prewrites err: <nil>. Attempting to commit/abort immediately.
12:13:08.647566 | .  4766 | ... Status of commit at ts: 39894: <nil>

Reuse know language stopwords with similar languages. If/when the support for the languages is added the aliases are ignored. Ref: #2601

srfrog · 2018-09-29T02:22:51Z

we can fix some languages with #2602 but we are still missing some, such as @pl. here's a list of the languages we could support: https://github.com/blevesearch/bleve/tree/master/analysis/lang

* add language aliases for broader support. Reuse know language stopwords with similar languages. If/when the support for the languages is added the aliases are ignored. * added a test for all supported and potential fulltext index language tokenizers Ref: #2601

* add language aliases for broader support. Reuse know language stopwords with similar languages. If/when the support for the languages is added the aliases are ignored. * added a test for all supported and potential fulltext index language tokenizers Ref: dgraph-io#2601

danielmai added the kind/bug Something is broken. label Sep 18, 2018

srfrog pushed a commit that referenced this issue Sep 19, 2018

add language aliases for broader support.

4636597

Reuse know language stopwords with similar languages. If/when the support for the languages is added the aliases are ignored. Ref: #2601

srfrog mentioned this issue Sep 19, 2018

add language aliases for broader support. #2602

Merged

srfrog self-assigned this Sep 28, 2018

srfrog mentioned this issue Nov 8, 2018

Refactor bleve tokenizer usage #2738

Merged

srfrog closed this as completed in #2738 Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while mutating Tokenizer not available for language #2601

Error while mutating Tokenizer not available for language #2601

danielmai commented Sep 18, 2018

danielmai commented Sep 18, 2018

srfrog commented Sep 29, 2018 •

edited by danielmai

Error while mutating Tokenizer not available for language #2601

Error while mutating Tokenizer not available for language #2601

Comments

danielmai commented Sep 18, 2018

danielmai commented Sep 18, 2018

srfrog commented Sep 29, 2018 • edited by danielmai

srfrog commented Sep 29, 2018 •

edited by danielmai