-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[PIO-49] Add support for Elasticsearch 5.x #336
Conversation
+1 pretty please |
We use the following template. |
8ff185c
to
36b79d7
Compare
This is really great. Thanks a lot @haginot and @marevol ! Officially ES 1.7 and before are deprecated. I will start a thread in user@ and dev@ to gauge if we need to maintain backward compatibility. If it is not too much effort, it would be nice to have two different sets of configuration and code for ES 1.x and ES 5.x. |
Thank you for your comment. |
Added Elasticsearch 1.x support! |
481b807
to
d67b3b1
Compare
d67b3b1
to
89c9d26
Compare
@marevol Thanks this looks very promising! How do you configure PIO to use 1.x vs 5.x, Templates may need to look at the config or supply their own. The UR uses the REST API for 1.x and can easily switch to 5.X so I'd like to coordinate. What config flags the Elasticsearch version |
To use the existing code for elasticsearch 1.x, please use PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH1 and PIO_STORAGE_SOURCES_ELASTICSEARCH1_* settings.
o.a.p.data.strorage.elasticsearch.* classes contain index mappings and it's different between 1.x and 5.x. |
Reviewing the code right now. It would be nice if we could include a migration guide in the docs if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"org.clapper" %% "grizzled-slf4j" % "1.0.2", | ||
"org.elasticsearch.client" % "rest" % elasticsearchVersion.value, | ||
"org.elasticsearch" % "elasticsearch" % elasticsearch1Version.value, | ||
"org.elasticsearch" % "elasticsearch-spark-13_2.10" % elasticsearchVersion.value % "provided", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do
"org.elasticsearch" %% "elasticsearch-spark-13" % elasticsearchVersion.value % "provided",
We do not need to change this now because PIO-30 will need to be extended to take care of cross-building, and the whole artifact name will need to change.
val typeExistResponse = indices.prepareTypesExists(index).setTypes(estype).get | ||
if (!typeExistResponse.isExists) { | ||
val json = | ||
val restClient = client.open() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reuse the same client after it's opened? According to https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/_initialization.html the official recommendation is to have the REST client have the same lifecycle as the application. I realize this pattern is used extensively in this PR, so please let us know what you think before you go ahead and change your code.
Right now PredictionIO does not provide a cleanup step in the storage layer when the application shuts down. This is probably a good opportunity to add. Would highly appreciate if you would like to take a stab at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a good opportunity to add.
+1
We could not reuse RestClient because it's not closed.
I think it's better to call "close" process for storage at the end of application, and then reuse RestClient.
|
||
class StorageClient(val config: StorageClientConfig) extends BaseStorageClient | ||
with Logging { | ||
override val prefix = "ES" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open discussion: Should we rename this to ES1
(and subsequently all classnames in this package)? Having this thought mainly due to the universal recommender template using the ES client directly and it would be nice to have a way to tell which version the client is. Another approach could be modifying Storage
to provide discovery mechanism based on package name. @pferrel you may want to chime in.
If there's no objection, I suggest merging this first and do the other suggested optimizations in a separate JIRA/PR later. |
I agree that in all likelihood only the UR will need a code change to support 5.x and am willing to commit to that as long as the other Templates remain untouched--ideally. My $0.02 worth |
I'm going to do some more manual testing. I'll report back and let's merge this if things look well. |
@haginot @marevol @pferrel After putting more thought to this, we should probably rename If that make sense, please feel free to modify the PR and fix the conflicts. I can also take care of this over the weekend if you are busy. |
@dszeto Did you work on this issue? I think I'm available this week. So, I'll do that if you do not work yet. |
Yes. I've created and pushed a branch (feature/es5). @pferrel is verifying
with universal recommender.
…On Mon, Feb 13, 2017 at 12:37 AM Shinsuke Sugaya ***@***.***> wrote:
@dszeto <https://github.com/dszeto> Did you work on this issue? I think
I'm available this week. So, I'll do that if you do not work yet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#336 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUvSmXjcGSIx-Hn-AjKQzTuxIxAWENZks5rcBZEgaJpZM4LkKNu>
.
|
@dszeto Could you replace ELASTICSEARCH1 with ELASTICSEARCH5? feature/es5 branch works on my es5 environment if the above is fixed. |
This may be ok to stage to the feature/es5 branch but IMO should not be merged with master for release yet. The technique of linking and creating an assembly with both es1 and es5 is IMO not ideal. The reasoning is that ES refactored es-hadoop-mr and es-spark at some time between the 2 so if you pull them into a Template (the UR does this) you get duplicate classes. It is not clear yet how to solve this but it is highly unlikely that if we had 2 PIO build profiles, one for ES1 and one for ES5 as well as 2 for any template using the ES Spark integration, this would go away and would be cleaner. Another reason to do this is that the pio-env.sh would be simplified to only configure ELASTICSEARCH, not different named stores. Most of this is fine and though it would require a non-trivial change to the UR, I think this is doable and would commit to supporting it if the above issues can be solved. So to summarize I suggest we:
This will make any Template that uses ES directly just work with ES1 and the default build and it will leave the Template work to move to ES5 using the new build profile for PIO with ES5. If this pushes the ES5 build profile to 0.12.0, I personally am ok with that. I it delay 0.11.0 I'm also ok with that. |
@dszeto I gave you a link to my merged version, to modify this PR I'd need permission to collaborate on the branch from the PR author, or I can create a new PR, which do you want? @marevol are you ok with the suggestions above? I've spent several days trying to get the UR to work with this even when using ES1 and cannot. This worries me so I made the suggestions above. |
To solve this issue, how about using plugin directory? |
The current problem is linking them both in and putting them into the assembly. PIO, including assemblies should be built with one or the other, not both. Not sure how plugins fixes this. I'm no SBT expert but this is pretty easy and common to solve with maven profiles. You would build pulling in certain classes for es1 and others for es5. This means templates will need to be changed if they use any of the things that change for es1 vs es5 and I maintain one. This is ok, it's seems like the right way to do this. |
Please see our snapshot build: http://fess.codelibs.org/snapshot/apache-predictionio-0.11.0-v1-SNAPSHOT.zip IMO, I think that Plugin is better than maven profile. In our build, if you want to use ES1, (I'm working on this issue now. For ES1, more testing might be need...)
Although I did not check it on UR template, it may work with no modification(or dependencies update) with ES1. |
So you are using the old PIO plugins feature that basically puts any files in This might work since we are not linking in classes from more than one version of ES but how do you move a jar that isn't built, how do the jars get built? Seems like some part of the build process has to change even if it's to create these alternative jars? |
This would imply too that a single assembly is not enough and you'd have to deploy the entire pio directory structure to any machine that runs it, right? This in turn means that it may have implications for running in Yarn "cluster" mode. I'll defer to @dszeto for all this. |
Although I'm not sure about old plugin features, in my fix, I put plugin jar files with --jars on spark-submit: |
compute-classpath.sh deals with plugins directory, but existing code seems not to put jars in spark-submit. If plugin feature looks good, I'll create PR for feature/es5. |
@marevol @pferrel Would have never expected about the plugins mechanism being found without documentation. It's a debt that I should pay. :) The original intention is for implementing event server and prediction server plugins, so that you can put arbitrary filters in front by just dropping in JARs. Using |
Well I like passing in only what is used so the general idea seems good. But does the use of lib fit conventions? As I recall lib is used for managed dependencies but in this case it is more like a loose assembly. If this is not an issue then +1 |
+1 |
What I meant was the The I agree that we should do something like |
Please see #352
If you have better location(lib/spark and lib/extra), please change them. |
@dszeto I meant This still sounds good. What will populate @marevol this means the params in pio-env.sh do not need to flag ES5, there should be only ELASTICSEARCH and it applies to whichever is in |
Right. To use ES5, please replace pio-data-elasticsearch1-assembly-*.jar with pio-data-elasticsearch-assembly-*.jar in lib/spark directory. To change JARs location(ex. lib/spark), we can modify JAR lookup handing in compute-classpath.sh and Common.scala for #352.
ES 1.x was EOLed. |
PredictionIO / PIO-49
We work on meta/event storage support for Elasticsearch 5.x.
Although Elasticsearch 2.x does not allow dots in field names,
Elasticsearch 5.x supports it. So, it's better to upgrade to ES 5.x release.
Since ES 5.x provides Java Rest API client, we replaced
Transport communication with HTTP one. Therefore, our fix
uses HTTP(9200 port) only.