Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field resolution should be unambiguous #4081

Closed
clintongormley opened this issue Nov 4, 2013 · 25 comments
Closed

Field resolution should be unambiguous #4081

clintongormley opened this issue Nov 4, 2013 · 25 comments
Assignees

Comments

@clintongormley
Copy link

As far as I understand it, fields are resolved on a first found basis. So given the following documents:

PUT /index/foo/1
{
    "count": 1,
    "foo": {
        "count": 1
    }
}

PUT /index/bar/2
{
    "count": 1,
    "foo": {
        "count": 1
    }
}

.... the field foo.count could resolve to foo.count, foo.foo.count, or bar.foo.count, depending on which is found first.

Field resolution should be unambiguous. Field names should be grouped by type and sorted in descending order by number of .. So the above mappings should result in:

bar:
    foo.count
    count

foo:
   foo.count
   count

Then if no type is specified (or multiple types are specified), go through the groups in alphabetical order.

This would result in the following resolutions:

foo.foo.count   => (foo) foo.count
foo.count       => (foo) count
count           => (bar) count
*.foo.count     => (bar) foo.count
*.count         => (bar) count
*.*.count       => (bar) foo.count
@clintongormley
Copy link
Author

I'd even be tempted to change the format of fieldnames which include the type, to make the type unambiguous, eg:

foo:foo.count
foo:count

This particular syntax would however interfere with fieldname parsing in the query-string query. To use a colon as a type/field name separator in the query string query it would have to be escaped:

foo\:foo.count:10

I doubt that type names are often used in the qs syntax

@dadoonet
Copy link
Member

I'm +1 for this proposal. It really helps to clarify type vs field names.
I'm wondering if solving this will also fix #3005 (side effect)?

@benjismith
Copy link

I'd love to see this in the next ES release. We're making heavy use of ES at my company, and this is the only remaining issue causing us significant headaches.

We frequently have different fields in different types with the same names. In particular, we often have "id" fields modeled as integers or strings. And because those IDs are used throughout our architecture (relational databases, protobufs, etc), making changes to either field names or field types creates a lot of friction in our codebase.

Please consider storing fully-qualified field names for all top-level and nested types ("user.id" is a different field from "item.id" or "user.address.id"). The extra few bytes of storage a small price to pay for eliminating ambiguity in mapping and faceting.

For anyone depending on the current behavior (multiple logically-distinct fields consolidated into a single lucene field), it might be possible to provide an option (similar to the copy_to param) that lets them continue consolidating field data even when ES default behavior is to use fully-qualified field names.

Or maybe the Create Index API could provide an option for choosing between a new "unambiguous fully-qualified-field-name indexing" vs the current behavior.

Just a thought :)

@benjismith
Copy link

@clintongormley Is there any reason you prefer "foo:bar.baz" instead of just "foo.bar.baz" for fully-qualified field names? I'd like to see the dot notation used throughout the mappings. I'd even go so far as to say "index.type.nested_type.field" would be perfect.

@s1monw s1monw added v1.2.0 and removed v1.1.0 labels Mar 12, 2014
@karmi
Copy link
Contributor

karmi commented Apr 9, 2014

@clintongormley I guess I would lean to the full dot-based notation as well, ie. <INDEX>.<TYPE>.foo.bar.baz, not the the special : operator...

@clintongormley
Copy link
Author

I'm not sure that we need the index name in there as well... querying multiple indices but then in a subclause requiring a match on a field in just a particular index is a seldom used edge case, which could be accomplished by just filtering on that index.

And re type:foo.bar vs type.foo.bar, no I have no particular reason to favour the colon version over just dots.

@benjismith note: there's no bwc issue with making field resolution deterministic. Everything will work just as it does today. The only difference would be that fields can be referred to unambiguously, which isn't the case at the moment.

@benjismith
Copy link

@clintongormley Regarding bwc, I'm not sure I understand what you mean when you say "everything will work just as it does today".

I was hoping this change would fix #2603. Is that not the case? In particular, I'm interested in this comment made by @kimchy ...

Same field name under different types should have the same mappings, otherwise
there will be problems. Internally, in order to conserve space, we map them to the
same field name. If you really feel you need to separate between them, use different
indices. We should give better failure message when we detect conflicts in mappings.

After this change, will it still be problematic to have two different fields with the same name and different mappings (either in different types or at different levels of nesting within the same type) in the same index?

@clintongormley
Copy link
Author

@benjismith Correct - this doesn't fix #2603.

After this change, will it still be problematic to have two different fields with the same name and different mappings (either in different types or at different levels of nesting within the same type) in the same index?

Yes, correct. But now that I understand what you were thinking, it's an interesting idea. I've opened issue #5851 to discuss it.

@clintongormley
Copy link
Author

To make sure that path resolution is unambiguous, we should require full path names and never resolve just the short path name eg:

{ "name": { "first": "john"}}

would require searching on name.first. first would no longer work, hence bumping to 2.0

@rjernst rjernst assigned rjernst and unassigned clintongormley Dec 1, 2014
@coderplay
Copy link

@clintongormley

Does it solve the problem when two different types have the same parent ? I meant say if we have type C1, C1 with their parent name P, both C1 and C2 have internal _parent field. Thus caused a same field name in different types.

@clintongormley
Copy link
Author

@coderplay The parent field will still be configurable per type

Does it solve the problem when two different types have the same parent

What problem are you referring to? It is perfectly acceptable for two types to have the same parent.

@clintongormley
Copy link
Author

Closed in favour of #8870

@coderplay
Copy link

@clintongormley

Because that all children types share the same "_parent" index name, you will be needed to load all children types' "_parent" field data when you just try to fetch something from one of the types through the "_parent" field data. If a parent has hundreds to thousands children, that field data will be huge. right?

@clintongormley
Copy link
Author

@coderplay that's exactly how it works today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants