Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question About EdgeNGram #20

Closed
dvreed77 opened this issue Jan 31, 2016 · 1 comment
Closed

Question About EdgeNGram #20

dvreed77 opened this issue Jan 31, 2016 · 1 comment

Comments

@dvreed77
Copy link

I'm not seeing the expected behaviour when using the EdgeNGram Analyzer. First I have some setup questions:

I have configured my settings.py as per instructions:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}
ELASTICSEARCH_INDEX_SETTINGS = {
     'settings': {
         "analysis": {
             "analyzer": {
                 "synonym_analyzer" : {
                     "type": "custom",
                     "tokenizer" : "standard",
                     "filter" : ["synonym"]
                 },
                 "ngram_analyzer": {
                     "type": "custom",
                     "tokenizer": "lowercase",
                     "filter": ["haystack_ngram", "synonym"]
                 },
                 "edgengram_analyzer": {
                     "type": "custom",
                     "tokenizer": "standard",
                     "filter": ["haystack_edgengram"]
                 }
             },
             "tokenizer": {
                 "haystack_ngram_tokenizer": {
                     "type": "nGram",
                     "min_gram": 1,
                     "max_gram": 15,
                 },
                 "haystack_edgengram_tokenizer": {
                     "type": "edgeNGram",
                     "min_gram": 1,
                     "max_gram": 15,
                     "side": "front"
                 }
             },
             "filter": {
                 "haystack_ngram": {
                     "type": "nGram",
                     "min_gram": 1,
                     "max_gram": 15
                 },
                 "haystack_edgengram": {
                     "type": "edgeNGram",
                     "min_gram": 1,
                     "max_gram": 15
                 },
                 "synonym" : {
                     "type" : "synonym",
                     "ignore_case": "true",
                     "synonyms_path" : "synonyms.txt"
                 }
             }
         }
     }
 }

When I do python manage.py show_mapping --detail I see

default
-------
{
    "django_ct": {
        "include_in_all": false,
        "index": "not_analyzed",
        "type": "string"
    },
    "text": {
        "type": "string",
        "analyzer": "edgengram_analyzer"
    },
    "django_id": {
        "include_in_all": false,
        "index": "not_analyzed",
        "type": "string"
    },
    "address": {
        "type": "string",
        "analyzer": "edgengram_analyzer"
    }
}

Which looks good, but why don't I see this when I do a CURL GET /haystack/_mapping I see:

{
   "haystack": {
      "mappings": {
         "modelresult": {
            "properties": {
               "address": {
                  "type": "string"
               },
               "django_ct": {
                  "type": "string"
               },
               "django_id": {
                  "type": "string"
               },
               "id": {
                  "type": "string"
               },
               "text": {
                  "type": "string"
               }
            }
         }
      }
   }
}

also, when I do a CURL GET /haystack/_settings I see:

{
   "haystack": {
      "settings": {
         "index": {
            "creation_date": "1454206129416",
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "uuid": "3kN6tYG0SHWkICsDT63mBw",
            "version": {
               "created": "2010199"
            }
         }
      }
   }
}

On to expected behavior. When I do:

GET /haystack/modelresult/_search
{
    "query": {
        "match": {
            "text": "bost"
        }
    }
}

I don't get matched on "30 Boston St." for example.

Do I have something configured wrong?

@dvreed77
Copy link
Author

I figured it out. The put_mapping call to elastic_search was failing silently since I didn't have a synonyms.txt file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant