Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapper: An analyzer mapper allowing to control the index analyzer of a document based on a document field #485

Closed
kimchy opened this issue Nov 7, 2010 · 12 comments

Comments

@kimchy
Copy link
Member

kimchy commented Nov 7, 2010

The _analyzer mapping allows to use a document field property as the name of the analyzer that will be used to index the document. The analyzer will be used for any field that does not explicitly defines an analyzer or index_analyzer.

Here is a sample mapping:

{
    "type1" : {
        "_analyzer" : {
            "path" : "my_field"
        }
    }
}

The above will use the value of the my_field to lookup an analyzer registered under it. For example, indexing a the following doc:

{
    "my_field" : "whitespace"
}

Will cause the whitespace analyzer to be used as the index analyzer for all fields without explicit analyzer setting.

The default path value is _analyzer, so the analyzer can be driven for a specific document by setting _analyzer field in it. If custom json field name is needed, an explicit mapping with a different path should be set.

@kimchy
Copy link
Member Author

kimchy commented Nov 7, 2010

Mapper: An analyzer mapper allowing to control the index analyzer of a document based on a document field, closed by 171fa4a.

@sebaes
Copy link

sebaes commented Nov 9, 2010

Hi Shay,
I couldn't make this feature work, I think it is broken, take a look at: AnalyzerMapper
line 85: String value = context.doc().get(path);

When indexing a document with:
curl -XPUT 'http://sd:5100/test/s/ZZZ' -d '
{
"analyzer_field":"es_analyzer",
"description":"an mp3",
"title":"mp3"
}
'

value is null, because context.doc() doesn't contain the analyzer field in it.
I don't know exactly how to fix the problem though.

Sebastian.

@kimchy
Copy link
Member Author

kimchy commented Nov 9, 2010

Did you set a mapping for type s with _analyzer definition that points to analyzer_field? The analyzer_field was just an example for the field name that will control the analysis.

Maybe there should be a default for this, for example, if you place in the json _analyzer property, then it will be used, so you won't even need to define a mapping unless you want the analyzer to be driven by a different json field. I will change that.

@sebaes
Copy link

sebaes commented Nov 9, 2010

Yes I did, the file looks like:

{
"s" : {
"_all" : {"enabled" : false},
"_source" : {"enabled" : false},
"_analyzer" : {"path" : "analyzer_field"},
"dynamic" : false,
"properties" : {
"title" : {
"type" : "string",
"store" : "yes",
"index" : "analyzed",
"analyzer" : "all_analyzer",
"term_vector" : "with_positions_offsets"
},
"description" : {
"type" : "string",
"store" : "yes",
"index" : "analyzed",
"term_vector" : "with_positions_offsets"
},
...
and I've used a field named "analyzer_field" in my JSON as you can see in my previous post.

I will try the "_analyzer" default one

Thanks

@kimchy
Copy link
Member Author

kimchy commented Nov 9, 2010

I see the problem, because the mapping you have is not dynamic, then you need to explicitly define the "analyzer_field" in the properties mapping so it will get added to the document. Same will happen with _analyzer you will need to add it to the properties part. Why don't you use dynamic?

@sebaes
Copy link

sebaes commented Nov 9, 2010

I see, (well, sort of trying to).

I didn't use dynamic mappings as a safety measure, to rule out wrong mappings from my client code, so I won't push invalid unsearchable data to an index, with the overhead it involves taking into account the scale and scarcity of a SSD space for example. Do you think this is not a good idea?

So, what are my options? What do you mean by define an "_analyzer" or "analyzer_field" in the properties mapping? Adding another field like "title" but containing what?

Thanks
Sebastian.

@kimchy
Copy link
Member Author

kimchy commented Nov 9, 2010

add a mapping called in the same name (like analyzer_field) of type string.

@sebaes
Copy link

sebaes commented Nov 9, 2010

Hi Shay,
I added the field and everything works now, also tried dynamic mapping and works too. The only thing I noticed that wasn't ok is that the "_analyzer" field gets added to Lucene as well (checked with Luke with dynamic and not dynamic settings), and it shouldn't because it's a "control" field, not data.
Is it feasible to remove it from being added?
Thanks,
Sebastian.

@sebaes
Copy link

sebaes commented Nov 9, 2010

To be more precise I added:
"_analyzer" : {"type" : "string"},
I thought in adding index=no and store=no, but I didn't see any example with both options disabled, is that the way to prevent that field to be added?

@sebaes
Copy link

sebaes commented Nov 9, 2010

Answering to myself, so somebody looking at the issue can use the information. I tried both index and store in "no" and it works:

        "_analyzer" : {"type" : "string","store" : "no","index" : "no"},

@sebaes
Copy link

sebaes commented Nov 9, 2010

The past post wasn't accurate, I made a mistake interpreting the results, the one that works is:
"_analyzer" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
What happened in case of both "no" is that "_analyzer" is ignored, but Standard analyzer, with english stop words set kicks in and that's why I thought it was working.

Shay, is there a way to avoid adding that "_analyzer" stored field in every document?
Thanks,
Sebastian.

@kimchy
Copy link
Member Author

kimchy commented Nov 12, 2010

Pushed support for specifying index set to no (store defaults to no). I would argue that almost in all cases you would want it indexed (maybe set to not_analyzed) so later on you can query on it and see what docs where indexed how.

medcl pushed a commit to medcl/elasticsearch that referenced this issue Jul 1, 2011
williamrandolph pushed a commit to williamrandolph/elasticsearch that referenced this issue Jun 4, 2020
emilykmarx pushed a commit to emilykmarx/elasticsearch that referenced this issue Dec 26, 2023
emilykmarx pushed a commit to emilykmarx/elasticsearch that referenced this issue Dec 26, 2023
* origin/master:
  Bump version to 5.1.2 (elastic#485)
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants