Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multifield path per field #4099

Closed
roytmana opened this issue Nov 5, 2013 · 11 comments
Closed

Support multifield path per field #4099

roytmana opened this issue Nov 5, 2013 · 11 comments

Comments

@roytmana
Copy link

roytmana commented Nov 5, 2013

Currently multifield mapping support path parameter per property so if you need to have several fields mapped for a property they will either all have full path name or just_name

it is rather inconvenient when you want to have the property with say two secondary fields one with full name (because it only makes sense as a variant of primary field say not analyzed) and one with just_name because you want to have an all-like field to which many of your properties contribute.

consider an example (a part of a bigger json)

"category": {"code":"CTZ", "description":"My Description"}

code was indexed as multifield resulting in names
category.code
category.code.untouched

later I want to have my_all field to where I want to index category.code as well as other fields

I will add path:"just_name" to my mapping and another field my_all

that will immediately break my application because untouched will become a just_name mapping as well and all untouched from all my data elements will be rolled into it

My current workaround is to provide full name for untouched field (category.code.untouched) so it retains its full name. I am not sure it is intentional behavior but it seems to work (need to test it more)

But much cleaner approach would be to allow path per field in multifield mapping

@imotov
Copy link
Contributor

imotov commented Nov 6, 2013

I attempted to solve this problem a few months ago in #2535. I even started implementing it, but then I stumbled upon an issue that I couldn't quite figure out. In order to be consistent if we support field-level path for multi_field, we also need to support it for objects. And with objects adding field-level path creates an interesting issue. Here is an example that demonstrates this issue. Assuming that we enable "path":"just_name" on the field level, how should we deal with the following mapping:

{
    "my_type": {
        "properties": {
            "abc": {
                "properties": {
                    "foo": {
                        "type": "object",
                        "path": "just_name",
                        "properties": {
                            "bar": { 
                                "type": "string"
                            }
                        }
                    }, 
                    "baz": {
                        "type": "string",
                        "path": "just_name"
                    }
                }
            }
        }
    }
}

Here we have object abc with two properties foo and baz. With baz everything is clear, we have field-level "path":"just_name", so {"abc":{"baz":"123"}} should be indexed baz:123.

With foo it's not as clear. Does "path":"just_name" apply to foo as a property of abc, or does it apply to foo as an object (the way it works today). In the former case, "path":"just_name" is a field-level path and therefore {"abc": {"foo": {"bar":123}}} should be indexed as foo.bar:123. In the latter case, "path":"just_name" is an object-level path and it affects only properties of foo and therefore {"abc": {"foo": {"bar":123}}} should be indexed as bar:123.

@roytmana
Copy link
Author

roytmana commented Nov 6, 2013

Thank you Igor. Do you see any danger in giving multi_fields "."-separated names to mimic full name?

like award.activity.code.na in the example below. I can work around the issue using this approach if use of "." would not pose any future compatibility risks. It works so far but I want to be sure it is not accidental

...
"activity": {
              "dynamic": "true",
              "properties": {
                "code": {
                  "type": "multi_field",
                  "path": "just_name",
                  "fields": {
                    "code": {
                      "type": "string",
                      "index_options": "offsets",
                      "boost": 0.3
                    },
                    "all": {
                      "type": "string",
                      "index_options": "offsets",
                      "boost": 0.3
                    },
                    "all_stem": {
                      "type": "string",
                      "index_options": "offsets",
                      "boost": 0.3
                    },
                    "award.activity.code.na": {
                      "type": "string",
                      "index": "not_analyzed",
                      "omit_norms": true,
                      "index_options": "docs",
                      "include_in_all": false
                    }
                  }
                },
...

@imotov
Copy link
Contributor

imotov commented Nov 7, 2013

I think your workaround should work, but it's not pretty. I would love to come up with a better solution. Perhaps we can change the path_only behavior, or add some field_path_only attribute to differentiate it from object-level path_only or perhaps add a full_index_name attribute that would work like this. Fields in a document with the following mapping:

{
    "my_type": {
        "properties": {
            "obj": {
                "properties": {
                    "foo": {
                        "type": "string",
                    }, 
                    "bar": {
                        "type": "string",
                        "index_name": "alt_bar"
                    }, 
                    "baz": {
                        "type": "string",
                        "full_index_name": "alt_baz"
                    }
                }
            }
        }
    }
}

would be indexed as obj.foo, obj.alt_bar and alt_baz.

In other words, I think that we've identified the problem, but solution still requires some brain-storming.

@clintongormley
Copy link

I find the {path: just_name | full} setting rather confusing, for the reasons listed above.

My suggestion is to remove the path setting and instead to support index_name and index_path.

If you specify an index_name then it is appended to the full path.
If you specify an index_path then it sets the absolute field name.

This would allow mappings like:

{
    "my_type": {
        "properties": {
            "name": {
                "properties": {
                    "first": {
                        "type": "multi_field",
                        "fields": {
                            "first": { "type": "string" },
                            "full": { 
                                "type": "string",
                                "index_path": "name.full"
                            },
                        }
                    }, 
                    "last": {
                        "type": "multi_field",
                        "fields": {
                            "last": { "type": "string", "index_name": "surname" },
                            "full": { 
                                "type": "string",
                                "index_path": "name.full"
                            },
                        }
                    } 
                }
            }
        }
    }
}

In this example, you'd have:

  • name.first and name.first.first
  • name.last and name.last.surname
  • name.full (which would index values from name.first and name.last)

What do you think of this proposal?

@imotov
Copy link
Contributor

imotov commented Nov 7, 2013

Yes! index_path - that's the name I was looking for. I love it! We need to think about the proper way of removing path though. We need to make sure that it's still possible to migrate current indices with path in them to the next version without reindexing. Maybe we can deprecate path and keep it for a while. It will continue to affect index_name, but not index_path.

@roytmana
Copy link
Author

roytmana commented Nov 7, 2013

Good news! Never liked just_name thing. Good riddance. What would prevent me from putting wrong index name for the primary field or is it always translated to index name via its mapping json key (matching property name)?

What if we replace fields with array and get rid of redundant json keys for the fields.
where absence of index_name/path indicate the primary field to make sure it is always in sync with property name

"last": {
                        "type": "multi_field",
                        "fields": [
                            { "type": "string"},
                            { "type": "string", "index_name": "untouched" ...},
                            { "type": "string", "index_path": "name.full" },
                        ] 
                    }   

@clintongormley
Copy link

@roytmana for 90% of use cases, specifying the field name and the mapping is more understandable than the array with index_name, so I'd keep that as it is.

@roytmana
Copy link
Author

roytmana commented Nov 7, 2013

@clintongormley Requirement to provide field name matching property name to identify the primary field is error prone and redundant better supply no name at all. Of course not supplying the name would not work with json object structure.

May be it was intuitive before you introduced concept of index_name and index_path but not any more. now these attributes are the important ones and field json keys are truly redundant and serve no purpose just making mapping bigger and more confusing with competing naming strategies

just my 2c

@mattweber
Copy link
Contributor

I would like to see something we we can do a basic field "copy" without needing all the multi_field stuff unless you need different analyzer settings, etc. Something like:

{
    "my_type": {
        "properties": {
            "name": {
                "properties": {
                    "first": {"type":"string", "copy_to":["name.full"]},
                    "last": {"type":"string", "index_name": "surname", "copy_to":["name.full"]},
                    "full": {"type": "string"}
                }
            }
        }
    }
}

This would give us:

name.first
name.surname
name.full (w/ both first and last)

This way we can avoid all the duplicated "full" mappings.

@roytmana
Copy link
Author

roytmana commented Nov 7, 2013

please also see #4123 particularly my last comment


I do not know lucene internals but from ES search API perspective name field is not collapsed into one field across type and category properties and can be searched independently from each other.
I always assumed their lucene name is their full name (type.name and category.name) just like if i did not use just_name since they are the primary fields of multifield mappings (their field name matches property name)

the quick test shows that what you say is correct and primary field 'name' in my case acts no different than 'all' field in my mapping which i intended to COLLAPSE

If ES does not expand primary field name to full path, than the whole concept of multi_field is flawed! All ES examples virtually imply that it is not so

I am already using full path name for parts of multifield that I do not want to collapse but I thought the primary fields will be different. I will have to switch to using full path for all field of multifields that i do not intend to collapse - what a pain

@imotov
Copy link
Contributor

imotov commented Jan 8, 2014

This issue was superseded by #4520 and #4521. Closing.

@imotov imotov closed this as completed Jan 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants