Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use elasticsearch REST client in ES Client. #17

Closed
wants to merge 10 commits into from

Conversation

haginot
Copy link

@haginot haginot commented Apr 19, 2017

To be developed.
I'm temporary using latest repository of our organization (JPIOUG).

Once I have successfully run the test, and it will be announced.

@pferrel
Copy link
Collaborator

pferrel commented Apr 19, 2017

@haginot Have you been able to replace all uses of the Elasticsearch Transport Client?

PIO 0.10.0 uses ES1 by default. Have you tried this with ES1 as well? Given the way PIO 0.10.0 is packaged, this may just work with either. Not sure.

I'm waiting for release of PIO 0.10.0 to release v0.6.0 of the UR. This code is ready but in a local branch. If this all works it might make the first PIO 0.10.0 branch and work for ES1 or ES5 if you follow the conventions used in pio 0.10.0. Namely separate binaries for ES1 and ES5 ESClient (the PIO class).

Have you checked this? Running on ES1 and ES5 will be required or we'll have to support 2 template releases. I'd rather have a flag to the build.sbt that tells which ES to use, like a maven profile.

Very interested in merging this if you can verify you testing.

@pferrel
Copy link
Collaborator

pferrel commented Apr 20, 2017

@haginot Have you tested with PIO-0.11.0-SNAPSHOT with ES1 and ES5?

@haginot
Copy link
Author

haginot commented Apr 21, 2017

@pferrel Tnak you for your comment.
I have not tested with PIO-0.11.0-SNAPSHOT and ES1 but I plan to test this.
I'm working on other errors.
Some Errors have to do with RestClient are thrown at runtime.

@pferrel
Copy link
Collaborator

pferrel commented Apr 21, 2017

Ok, let me know if you have questions.

I'm working on a new release of The UR v0.6.0 and I'd love to include this, if it works with both ES5 and ES1. My changes do not affect EsClient so they should be easy to merge. The "develop" branch has some speedups and new features.

@pferrel
Copy link
Collaborator

pferrel commented Apr 28, 2017

I'd like to include this in the UR v0.6.0 but want to be sure of ES1 as well as ES5 compatibility. When will this be ready?

PIO 0.11.0 has been released with ES1 and ES5 support

@pferrel
Copy link
Collaborator

pferrel commented Jun 8, 2017

Hi @haginot ,

We have moved to a "develop" and "master" gitflow methodology. Can you retarget this to "develop" which is v 0.6.1-SNAPSHOT. If not I can hand-merge. I'd like to include this in 0.6.1 release, is it ready to test?

@haginot
Copy link
Author

haginot commented Jun 12, 2017

I'm testing it now.
But it is difficult to accept both ES1 and ES5.
I will try little more.

Copy link

@mars mars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forked @haginot's original UR 0.5.0 REST EsClient branch in order to get it working on Heroku with ES 5.1 and up.

Since then, we've made quite a few fixes that do not seem to have equivalents in this PR. I've annotated these issues with links to our commits in https://github.com/heroku/predictionio-engine-ur.

restClient.performRequest(
"POST",
s"/$indexName/_refresh",
Map.empty[String, String].asJava)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing try { … } finally { restClient.close() } will cause train process to never exit.

Ref: heroku/predictionio-engine-ur@f027c4f

val response = restClient.performRequest(
"POST",
s"/$indexName/_search",
Map.empty[String, String].asJava)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not actually use the query.

Missing new StringEntity(query, ContentType.APPLICATION_JSON) as last argument to performRequest.

Ref: heroku/predictionio-engine-ur@8c7a9cc


fieldNames.foreach { fieldName =>
if (typeMappings.contains(fieldName))
mappings += (fieldName + mappingsField(typeMappings(fieldName)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON encoding broken unless fieldName is passed as s""""$fieldName"""".

Ref: heroku/predictionio-engine-ur@697503b

Map.empty[String, String].asJava).getStatusLine.getStatusCode match {
case 404 => {
var mappings = """
|{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing top-level indexType. Requires wrapping entire JSON entity in { "mappings": { "$indexType": { … }}}.

Ref: heroku/predictionio-engine-ur@697503b

s"/$indexName/$typeName/$doc",
Map.empty[String, String].asJava)
val responseJValue = parse(EntityUtils.toString(response.getEntity))
(responseJValue \ "_hits" \ "hits" \ "_source").extract[Map[String, JValue]].asJava
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \ "_hits" \ "hits" path parts should be deleted, as the ES 5.x REST API response does not include these levels of nesting.

Without this fix it breaks item recommendations.

Ref: heroku/predictionio-engine-ur@64b7daf

|}
""".stripMargin.replace("\n", "")
val entity = new NStringEntity(aliasQuery, ContentType.APPLICATION_JSON)
if (!oldIndexSet.isEmpty) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if block is unnecessary and conflicts with the below oldIndexSet.foreach(deleteIndex(_)).

Ref: heroku/predictionio-engine-ur@2113992

val responseJValue = parse(EntityUtils.toString(response.getEntity))
// TODO to use keys
val oldIndexSet = responseJValue.extract[Map[String, JValue]].keys
val deleteOldIndexQuery = oldIndexSet.map(oldIndex => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in errors when loading the indexes the first time, because there is no old index to delete.

Refs:
heroku/predictionio-engine-ur@351cccf
heroku/predictionio-engine-ur@302c753

@woutersamaey
Copy link

Is this the PR that will add ES5 support?

@pferrel
Copy link
Collaborator

pferrel commented Jul 21, 2017

Yes.

When we support ES5 we will switch completely to the REST client from what is now part TransportClient on port 9300 and REST for fast writes using Spark on port 9200. Internally several changes must be made. I think this PR is ready for integration work but splits the UR ESClient into 2, one for ES 1.7.5 and one for ES 5.x. This will likely be 2 build.sbt files chosen by the user given the choice of ES they made for PredictionIO.

This will be rather difficult to support so we are choosing our timing based on user needs. In other words no one has really pushed for ES5 yet and there are lots of ES1.7.5 installs out there.

Feel free to express and opinion on the UR support forum: https://groups.google.com/forum/#!forum/actionml-user

@msathis
Copy link

msathis commented Oct 14, 2017

Will this be merged soon?

@ftopan80
Copy link

+1 ES5 Support, It's ES5 everywhere now on our production environment

@pferrel
Copy link
Collaborator

pferrel commented Oct 16, 2017

Quite a bit required beyond this PR since PIO 0.12.0

We are working on a UR 0.7.0, which will be ES5 only and have some significant speedups. First will be UR 0.6.1, which allows existing installations to upgrade to pio 0.12.0 with ES 1.x. Legacy systems need support too.

@msathis
Copy link

msathis commented Oct 19, 2017

@pferral Can you please share the branch name where ES5 support is being worked on? I am really interested in trying out.

@pferrel
Copy link
Collaborator

pferrel commented Oct 19, 2017

Wait for an announcement on the Google Group. We use the git flow model for git branching so it will be moved when it is ready to test.

@pferrel pferrel closed this Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants