Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add analyzer to specific field #9

Closed
aleha84 opened this issue Mar 27, 2017 · 5 comments
Closed

add analyzer to specific field #9

aleha84 opened this issue Mar 27, 2017 · 5 comments

Comments

@aleha84
Copy link

aleha84 commented Mar 27, 2017

Using version 3.0.0-SNAPSHOT
When executing command like this: GET /index/type/_mapping/field/content see this:

{
  "index": {
    "mappings": {
      "type": {
        "content": {
          "full_name": "content",
          "mapping": {
            "content": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

is it possible to add specific analyzer for specific fields?
Like described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
Most of my content is in russian language and i want perform seraching by content field using russian morfology and stop words.

@essiembre
Copy link
Contributor

This channel is for the Norconex Elasticsearch Committer only.
I think what you are asking is related to the configuration of Elasticsearch itself, not the Committer library. Please confirm.

@aleha84
Copy link
Author

aleha84 commented Mar 28, 2017

If I understand correctly, mappings in Elastic creates automatically based on the data that is sent there, so then i run crawler first tyme with elastic commiter it creates an index and type automatically. But after the crawling is finished i have filled index, and i can't modify it's type fields analyser property, because anylyse is happened at index time.

@essiembre
Copy link
Contributor

That's because you are using the dynamic mapping feature of Elasticsearch, which tries to guess the data types of each fields it receives. If you want to control this, you have to define the schema yourself (static mapping). This is something you do within Elasticsearch, not the Collector (refer to Elastic documentation for this).

This being said, if you want to discover which fields are found, you can leave the dynamic mapping while you are developing/testing. Then you can analyze the fields you get and create the best schema for you before re-indexing for real.

You can also use a few different taggers to help you get just what you want. For instance:

  • KeepOnlyTagger: Use this to only keep the fields you are interested in.
  • RenameTagger: Rename fields you get from the collector to what you want it to be called in Elasticsearch
  • DebugTagger: Can print on console/logs the fields captured and their value, so you have an idea while you are developing (before it reaches Elasticsearch).

The above are part of the Importer module and it is recommended to use them as post-parse handlers so all fields extracted during the parsing of documents are there.

@aleha84
Copy link
Author

aleha84 commented Mar 28, 2017

Already have workaroud. Bebore first indexing, just put some mapping for "content" and "title" fields with specific analyzer properties. Commiter is only updates these properties, but not override existing. Forks fine. Will think about KeepOnlyTagger. Thx.

@essiembre
Copy link
Contributor

Great, thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants