Skip to content
This repository has been archived by the owner on Nov 20, 2019. It is now read-only.

[CROSSDATA] Elastic fix types #644

Merged
merged 7 commits into from Aug 31, 2016

Conversation

pfcoperez
Copy link
Contributor

@pfcoperez pfcoperez commented Aug 24, 2016

Description

This PR adds missing type conversions from elastic native types to SparkSQL's:

  • BINARY
  • TINYINT
  • FLOAT

It also changes the behavior of com.stratio.crossdata.connector.elasticsearch.DefaultSource by adding types conversions in the ElasticsearchXDRelation value extractor (executed at each node).

Note that huge parts of the original elastic search datasource have been copied into our code thus adding maintenance hazards as the data source development continues. This has been done this way because of the limitations raised by the author's use of access modifiers. So a PR, already merged, (elastic/elasticsearch-hadoop#826) has been used to change that at the connector code.

When an elasticsearch-spark-sql13 with the update code gets published, then the changes employing the datasource improvements (pfcoperez@863e9aa) should be merged into crossdata development branch.

Finally, it adds support for native sub-document selection by both: Making it possible to select sub-documents and columns and transforming the document into a rows hierarchy.

Testing

  • Unit

@pfcoperez pfcoperez changed the title Elastic fix native types [WIP] Elastic fix types Aug 24, 2016
@pfcoperez pfcoperez changed the title [WIP] Elastic fix types [WIP][CROSSDATA-] Elastic fix types Aug 24, 2016
@coveralls
Copy link

coveralls commented Aug 25, 2016

Coverage Status

Coverage increased (+0.5%) to 57.349% when pulling eb10863 on pfcoperez:elasticFixNativeTypes into f6b8b0e on Stratio:master.

@pfcoperez pfcoperez changed the title [WIP][CROSSDATA-] Elastic fix types [CROSSDATA] Elastic fix types Aug 26, 2016
@pfcoperez pfcoperez force-pushed the elasticFixNativeTypes branch 2 times, most recently from d669582 to 062d8cd Compare August 29, 2016 11:03
@coveralls
Copy link

Coverage Status

Coverage increased (+1.1%) to 58.467% when pulling 062d8cd on pfcoperez:elasticFixNativeTypes into 5348885 on Stratio:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.8%) to 58.405% when pulling 148cccc on pfcoperez:elasticFixNativeTypes into f3b47d5 on Stratio:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 58.623% when pulling fc44690 on pfcoperez:elasticFixNativeTypes into 8798996 on Stratio:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.8%) to 58.685% when pulling 9a41911 on pfcoperez:elasticFixNativeTypes into 8798996 on Stratio:master.


val fieldsQuery = query.fields(stringFields.toList: _*)

if(stringFields.size != fields.size)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be more precise and elegant to use:

stringFields.containsAll(fields)

Copy link
Contributor Author

@pfcoperez pfcoperez Aug 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that operation doesn't exists. Besides, imagine it did: It would be O(n Log n) at best, most probably O(n^2). Given that stringFields is the result of filtering out elements from stringFields, we know for sure that it is either empty or a sub set of the latter. Provided that condition you can check that one is different to the other by just comparing their size.

If, on the other hand, you take into consideration that most Seq implementers provide a O(1) length operation, not making use of the advantage of being able to compare them by just comparing their size would be a huge sacrifice of performance.

e.g: Blazing blazing fast size checking of huge streams:

val s = (1 to 1000000000).toSeq
s.size

}
}

if (filters != null && filters.size > 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could an Option be used in order to avoid this ugly code? 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely but this is code coming from the connector, we agreed that copy-pasted code should remain untouched for maintenance purposes.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 58.685% when pulling f74136c on pfcoperez:elasticFixNativeTypes into 39fa0d6 on Stratio:master.

@darroyocazorla
Copy link
Member

👍

@mafernandez-stratio
Copy link
Member

LGTM

@coveralls
Copy link

coveralls commented Aug 31, 2016

Coverage Status

Coverage decreased (-8.9%) to 48.977% when pulling cf2c63c on pfcoperez:elasticFixNativeTypes into 39fa0d6 on Stratio:master.

@pfcoperez pfcoperez force-pushed the elasticFixNativeTypes branch 2 times, most recently from fb134a9 to e938c82 Compare August 31, 2016 12:18
darroyo-stratio and others added 7 commits August 31, 2016 14:30
… running spark tasks in order to follow standard spark results types.

NOTE: This change relies on manually copying a significant portion of code from the ElasticSearch data source. This should be changed ASAP
…ssing so now it is possible to ask for subdocuments using native Elasticsearch.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 58.685% when pulling a5a30f0 on pfcoperez:elasticFixNativeTypes into 39fa0d6 on Stratio:master.

@pfcoperez pfcoperez merged commit b64ee38 into Stratio:master Aug 31, 2016
@pfcoperez pfcoperez deleted the elasticFixNativeTypes branch August 31, 2016 14:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants