Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping conflict error upgrading from 1.7 to 2.0 #13169

Closed
rayward opened this issue Aug 28, 2015 · 15 comments
Closed

Mapping conflict error upgrading from 1.7 to 2.0 #13169

rayward opened this issue Aug 28, 2015 · 15 comments
Assignees
Labels
blocker >bug :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta2

Comments

@rayward
Copy link

rayward commented Aug 28, 2015

[2015-08-28 01:17:40,304][ERROR][org.elasticsearch.gateway] [bcapp.dev] failed to read local state, exiting...
java.lang.IllegalStateException: unable to upgrade the mappings for the index [myindex], reason: [Mapper for [content] conflicts with existing mapping in other types:
[mapper [content] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types.]]
    at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.checkMappingsCompatibility(MetaDataIndexUpgradeService.java:337)
    at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.upgradeIndexMetaData(MetaDataIndexUpgradeService.java:113)
    at org.elasticsearch.gateway.GatewayMetaState.pre20Upgrade(GatewayMetaState.java:226)
...

The field in question is:

"content" : {
    "type" : "string",
    "analyzer": "html_strip"
}

This is the relevant analyzer configuration:

{
  "index" : {
    "analysis" : {
      "analyzer" : {
        "html_strip" : {
          "filter" : [
            "standard",
            "lowercase",
            "stop",
            "asciifolding",
            "minimal_stemmer"
          ],
          "char_filter" : [
            "html_strip"
          ],
          "tokenizer" : "standard",
          "type": "custom"
        }
      },
      "filter" : {
        "minimal_stemmer" : {
          "type" : "stemmer",
          "name" : "minimal_english"
        }
      }
    }
  }
}

This field is defined on 2 out of 6 types that I have and is identical for those two types.

@rayward
Copy link
Author

rayward commented Aug 28, 2015

Also getting a similar error on a _parent field which I have on only one type.

"_parent": {
  "type": "product",
  "fielddata": {
    "loading": "eager_global_ordinals"
  }
}
java.lang.IllegalStateException: unable to upgrade the mappings for the index [myindex], reason: [Mapper for [_parent] conflicts with existing mapping in other types:
[mapper [_parent] is used by multiple types. Set update_all_types to true to update [fielddata] across all types.]]

Also, the migration plugin only had informational messages (eg. required on _routing), nothing that would stop an upgrade.

@clintongormley clintongormley added >bug blocker :Search/Mapping Index mappings, including merging and defining field types labels Aug 28, 2015
@clintongormley
Copy link

Hi @rayward

Thanks for testing out the beta, and for reporting these bugs. I think the analyzer problem will be fixed by rjernst@2e4a053 but the _parent issue needs further investigation.

To reproduce, create this index on 1.x:

PUT /test
{
  "mappings": {
    "parent": {},
    "child": {
      "_parent": {
        "type": "parent",
        "fielddata": {
          "loading": "eager_global_ordinals"
        }
      }
    }
  }
}

@rjernst
Copy link
Member

rjernst commented Aug 31, 2015

I agree the first issue described should be fixed by #13206.

The issue with _parent is a known issue, and unavoidable at this time (without further refactoring of how mappings work). @clintongormley There are two types in your example, parent, and child. Because parent is added first (types are added one at a time), it gets the default _parent setup with lazy loading. When the second child type is parsed, the _parent field is now different.

The problem is that all types have a _parent field. To fix this, we need to selectively add _parent (only when the user actually specifies it for the type), but that is much different than we do today (all meta fields are added to all types). I'm also not sure what effect not having _parent on all types would have an existing parent/child code.

@clintongormley
Copy link

@martijnvg could you comment on #13169 (comment) please?

@martijnvg
Copy link
Member

@rjernst @clintongormley I think it may work. I'm going to try out if making the _parent optional is working out.

@martijnvg
Copy link
Member

Internally removing the _parent field from DocumentMapper works out for the old p/c implementation, but not for the new implementation (that uses doc values and Lucene's JoinUtil). The disabled _parent for parent types is used to write to a doc values join field and if the parent field is removed has_child/has_parent won't be able to work because parent docs don't write to the join field.

Just thinking out loud, but what we can try to do is:

  1. Remove the notion of 'active' from the parent field
  2. add the notion that a document can be a child or a parent
  3. all type we always have a _parent field in the mapping
  4. but it only gets used by documents if it either is a parent or a child
  5. I think this way all _parent fields in all types can have the same settings. (besides the type there are pointing to, but that is ok)

@jpountz
Copy link
Contributor

jpountz commented Sep 2, 2015

Maybe the bug is that loading should be allowed to be different across types? The _parent field actually stores data in a field that depends on the type name (_parent#type) so having different settings per type does not introduce inconsistencies at the Lucene level?

@rjernst
Copy link
Member

rjernst commented Sep 2, 2015

The _parent field actually stores data in a field that depends on the type name (_parent#type) so having different settings per type

Maybe the problem is we aren't using this name in map of field types? If we use that as the key, instead of _parent we won't have any conflicts.

@clintongormley
Copy link

@rjernst you mean internally, rather than in the API? Using it in the API would make it difficult to explain

@rjernst
Copy link
Member

rjernst commented Sep 2, 2015

Yes, I mean internally.

@martijnvg
Copy link
Member

@rjernst @jpountz I like the idea of internally mapping the _parent field under a name that takes the type into account. I quickly checked this and this seem to work out. Both the upgrade works without an error and has_child/has_parent queries work.

However after running some more tests, I realized that there are other issues. One can include the _parent field in the search hits, query by _parent field or sort by it. In these case we lookup the field mapping and that doesn't map correctly. There is no _parent field, but fields with _parent* prefix. Not sure how to resolve this.

@jpountz
Copy link
Contributor

jpountz commented Sep 3, 2015

Hmm, so maybe we should be able to have 2 field types? One called _parent that would only be stored and another one called _parent#type that would only be doc-valued?

@martijnvg
Copy link
Member

Right, that would solve it, just not sure how to would like. The ParentFieldMapper can only produce one MappedFieldType. We then need to add another field mapper?

Taking a step back, maybe just allowing the the loading part of field data settings to be different is okay for 2.0? This is a much smaller change. We can then work on a better fix for 2.x and 3.0. The fact that ParentFieldMapper now embeds more fields then it used to should be addressed? We maybe break it out in different fields (multiple unique join fields that connects docs of a different type).

@martijnvg
Copy link
Member

Yesterday we discussed that _parent field should use an unique internal field type name per type. So that when the mapping compatibility check is performed no check would fail since each field would be unique and having different field data settings wouldn't matter.

Making that change has one important implication the _parent field can't used directly in many places in all the APIs. The most notable place would be in the query dsl and we decided that we should have a _parent_id query to cover this. Also how we lookup the _parent id for search hits in the _search api requires a special sub fetch phase.

I went a head and made change: martijnvg@26e2e5d

However things got tricky and I had to make more changes that I would like:

  • Due to difference of p/c query implementation, several checks had to be in place to make sure that the right fields were picked.
  • Sorting and aggregating by _parent field wouldn't work too. Whether we should remain to support this is a good question (I lean towards not).
  • To maintain the support for specifying the parent id in the _parent field in a document required a very subtle change for 1.x indices. (this is no longer allowed in 2.x indices)
  • In the field data cache we would end up with multiple _parent field entries per field type. (not sure if this is bad)
  • Clear cache by _parent field wouldn't work too.

The change grew to a level that I'm not comfortable porting it back to 2.0 and I think we should completely revise the _parent field. For example with the new implementation we don't need to store _parent indexed and stored field. The join doc values fields are sufficient. The _parent Lucene fields just exist for the old parent child implementation. Also if we going to revise types this is going to have a big impact on the _parent field (if it still exists then).

I think for 2.0 we need to make a compromise in order to keep the change small and low risk. Disabling the field data settings would work, but that wouldn't be nice for all the other fields. So I think disabling the field type compatibility check on the _parent field is maybe the right workaround for now:
https://github.com/martijnvg/elasticsearch/commits/mapping/disable_compatibility_check_for_parent_field

The only field type related setting that can be set on _parent field is field data loading. All the other settings aren't configurable. I think that makes this workaround better then disabling field data settings for all fields.

@clintongormley
Copy link

@martijnvg ++

martijnvg added a commit that referenced this issue Sep 9, 2015
… field types:

1) A shared immutable fieldtype for the _parent field (used for direct access to that field in the dsl). This field type is stored and indexed.
2) A per type field type for the child join field. The field type has doc values enabled if index is created on or post 2.0 and field data type is allowed to be changed.
3) A per type field type for the parent join field. The field type has doc values enabled if index is created on or post 2.0.

This resolves the issue that a mapping is not compatible if parent and child types have different field data loading settings.

Closes #13169
martijnvg added a commit that referenced this issue Sep 9, 2015
… field types:

1) A shared immutable fieldtype for the _parent field (used for direct access to that field in the dsl). This field type is stored and indexed.
2) A per type field type for the child join field. The field type has doc values enabled if index is created on or post 2.0 and field data type is allowed to be changed.
3) A per type field type for the parent join field. The field type has doc values enabled if index is created on or post 2.0.

This resolves the issue that a mapping is not compatible if parent and child types have different field data loading settings.

Closes #13169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta2
Projects
None yet
Development

No branches or pull requests

5 participants