Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es.mapping.exclude doesn't work for Hive #595

Closed
absynthe opened this issue Nov 6, 2015 · 2 comments
Closed

es.mapping.exclude doesn't work for Hive #595

absynthe opened this issue Nov 6, 2015 · 2 comments

Comments

@absynthe
Copy link

absynthe commented Nov 6, 2015

I am using the elasticsearch-hadoop-2.2.0-beta1.jar to move data from hive to elasticsearch service on amazon AWS.

I want to use the uid as an index but not to also include it in the _source of the document.

This doesn't work.

DROP TABLE IF EXISTS corpusElasticSearch;
CREATE EXTERNAL TABLE corpusElasticSearch (
country STRING, uid STRING, gender INT, age INT,
education INT, employment INT, income INT, householdsize INT, children INT,
domains ARRAY<STRING>, devices ARRAY<STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(  'es.mapping.id' = 'uid',
                'es.mapping.exclude' = 'uid', --this should exclude the uid from the _source mapping. For some reason it doesn't. Might only work for JSON ?
                'es.resource' = 'audiencereport/testHive21',
                'es.nodes' = 'Amazon ES endpoint',
                'es.nodes.wan.only' = 'true',
                'es.index.auto.create' = 'true');
INSERT OVERWRITE TABLE corpusElasticSearch select * from corpus limit 10;

After more testing neither es.mapping.exclude nor es.mapping.include work in this scenario.

@costin
Copy link
Member

costin commented Nov 9, 2015

Looks like a bug (likely in the way the pattern matching is applied).

@costin costin added v2.2.0 and removed v2.2.0-rc1 labels Jan 8, 2016
@costin costin closed this as completed in 3000c22 Jan 10, 2016
@costin
Copy link
Member

costin commented Jan 10, 2016

Finally got around looking at this bug. It was caused by Hive stripping the real names of the columns which meant the filtering had completely different values. Fixed in master (and will be back ported to 2.1.3).
A nightly build will soon ensue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants