Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade integration with spark 1.3.0 #400

Closed
rbraley opened this issue Mar 23, 2015 · 9 comments
Closed

Upgrade integration with spark 1.3.0 #400

rbraley opened this issue Mar 23, 2015 · 9 comments

Comments

@rbraley
Copy link

rbraley commented Mar 23, 2015

Several breaking changes have been introduced in this new version of spark including renaming SchemaRDD to DataFrame and some work in simplifying the scala and java apis of spark-sql.

@rbraley
Copy link
Author

rbraley commented Mar 24, 2015

For those interested in this I have started the work here.
rbraley@42770b2
I just made the minimal set of changes to get it to work until a proper integration can be made that takes into account all the simplifications to the api in spark 1.3.0.

@costin maybe this will help make things a bit easier for you :)

@costin
Copy link
Member

costin commented Mar 24, 2015

Thanks @rbraley. 1.3 support has been added last week, before Spark Summit but hasn't been pushed out since I'd like to try first to preserve backwards compatibility with Spark 1.2- releases.
The core is compatible however the SQL isn't.
This is in particular important since Spar 1.0-1.2 is included in some distros and by supporting 1.3, I don't want to drop support for those nor require different binaries, if possible.

@rbraley
Copy link
Author

rbraley commented Mar 24, 2015

Yeah, understood. I can just use my jars until you finish up that work :)

@MLnick
Copy link
Contributor

MLnick commented Mar 24, 2015

+1 would be good to have this soon :)

@costin
Copy link
Member

costin commented Apr 5, 2015

I've pushed Spark 1.3 support in master through 777fb60
Despite my efforts to preserve compatibility with Spark SQL 1.2 (and lower) the changes in signature proved too serious - the serialization code ran nicely across both versions however at the user API level, there was no common ground.
The emerging solution was to, unfortunately, provide two different binaries - one for Spark 1.0-1.2 and one for Spark 1.3. The Spark core integration is the same however the SQL parts different - to keep things simple, I've kept all the functionality into one jar so there's one for Spark 1.2 and one for Spark 1.3.

Considering the breaking chances, I've also updated the Spark SQL API signatures from esRDD to esDF which should also address the issues in #382. The Java API for SQL is still there but only to allow the Java collection signature to be passed in (for configuration purposes). Internally, Spark SQL doesn't make any distinction between Java and Scala types and as such, having a different Row implementation provides no value.

@rbraley thanks for your PR - it was useful to double check my changes and see whether I've missed something.
I plan to update the docs shortly (after fixing #415) to indicate the new packages available in master.

The artifacts have been pushed to maven - for the 1.2 use elasticsearch-spark-1.2 artifactId, for 1.3 the usual elasticsearch-spark should work.

Cheers,

costin added a commit that referenced this issue Apr 5, 2015
@rbraley
Copy link
Author

rbraley commented Apr 7, 2015

Hi costin,
Thanks for the good work!
I don't see the artifacts in maven central yet, should they be there already?

@yanchaoguo
Copy link

I come from Chinese,my name is yanchao ,I like es-spark ,but “./gradlew distZip” can't construct ,exception is :

  • Where:
    Build file '/home/hdfs/gyc/elasticsearch-hadoop-master/build.gradle' line: 131

  • What went wrong:
    A problem occurred evaluating root project 'elasticsearch-hadoop'.

    Cannot invoke method exists() on null object

@costin
Copy link
Member

costin commented Apr 7, 2015

@rbraley Make sure you look at the snapshot repository
The artifacts are there: https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/

@costin
Copy link
Member

costin commented Apr 7, 2015

@yanchaoguo I've added a fix for that. it seems you were downloading the zip without checking it out from git. Either way, you don't have to build it - you can download the existing builds. But if you want to, now it should work.

Either way, in the future please create a new issue, don't hijack existing ones. Thanks.

@costin costin closed this as completed Apr 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants