Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded the package to spark 2.0. #9

Closed
wants to merge 10 commits into from

Conversation

Projects
None yet
4 participants
@shiv4nsh
Copy link
Contributor

commented Aug 9, 2016

I have upgraded the package with the latest spark release ! Please review the code. I faced some problems with the Writer function as now The DataFrameWriter returns the Dataset and DataFrame is no more in the Java APi of Spark and hence there are some problems.

Some tests are also breaking ! Please help me out @daschl I want to upgrade it to the top class level !

Thanks !

@cb-sdk-robot

This comment has been minimized.

Copy link

commented Aug 9, 2016

Thanks for the pull request!! To ensure quality review, Couchbase employs a code review system based on Gerrit to manage the workflow of changes in addition to tracking our contributor agreements.

To get this change in and collaborate in code review, please register on Gerrit and accept our CLA. The easiest way to do this is to follow the link below, sign in with your GitHub account and then follow through the steps provided on that page to sign an 'Individual' agreement: http://review.couchbase.org/#/settings/new-agreement.

Note: Please contact us if you have any issues registering with Gerrit! If you have not signed our CLA within 7 days, the Pull Request will be automatically closed.

::SDKBOT/PR:no_cla

@cb-sdk-robot

This comment has been minimized.

Copy link

commented Aug 19, 2016

Unfortunately it has been 7 days and we are still unable to confirm that you have signed our CLA. We sincerely appreciate your submission and hope that you will register and resubmit this Pull Request in the future!

::SDKBOT/PR:timeout

@daschl daschl reopened this Aug 19, 2016

@daschl

This comment has been minimized.

Copy link
Member

commented Aug 19, 2016

damn robot!

@shiv4nsh thank you very much for your contribution, I've been on vacation and plan to get to 2.0 migration soon.

I'd love to build on your changeset here, but for this we need to get it into gerrit (our code review tool) and you need to sign the CLA - are you interested in doing that? Its not so much work and you only need to do it once.

build.sbt Outdated
@@ -2,26 +2,27 @@ name := "spark-connector"

organization := "com.couchbase.client"

version := "1.2.1"
version := "1.2.2"

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

Please set to 1.3.0

build.sbt Outdated

crossScalaVersions := Seq("2.11.8", "2.10.6")
crossScalaVersions := Seq("2.11.8", "2.11.0")

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

that doesn't make much sense, we gotta keep 2.10 as well the default is correctly 2.11 above anyways

build.sbt Outdated
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
"com.couchbase.client" % "java-client" % "2.3.1",

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

2.3.2 is the current one

build.sbt Outdated
"io.reactivex" %% "rxscala" % "0.26.1",
"org.scalatest" %% "scalatest" % "2.2.5" % "test",
"junit" % "junit" % "4.12" % "test"
"org.scalatest" %% "scalatest" % "3.0.0" % "test",

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

do we need to bump scalatest right now? I'd prefer doing this later

import org.slf4j.{Logger, LoggerFactory}
import org.slf4j.impl.StaticLoggerBinder

trait Logging {

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

why do we need our own logging trait? is it gone from spark 2.0?

This comment has been minimized.

Copy link
@shiv4nsh

shiv4nsh Aug 23, 2016

Author Contributor

This logging trait is not gone but its made private for internal use of spark only from spark 2.0 . You can take a look here https://issues.apache.org/jira/browse/SPARK-9307

.toJSON
.map(rawJson => {
val encoded = JsonObject.fromJson(rawJson)
val id = encoded.get(idFieldName).toString
encoded.removeKey(idFieldName)
JsonDocument.create(id, encoded)
})
.saveToCouchbase(bucketName, storeMode)

new DocumentRDDFunctions[JsonDocument](datas.toJavaRDD).saveToCouchbase(bucketName, storeMode)

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

I'd rather avoid doing java conversions in scala code, is this really necessary?

This comment has been minimized.

Copy link
@shiv4nsh

shiv4nsh Aug 30, 2016

Author Contributor

Here also I tried to resolve it by looking what the other method needs and how can I provide that !
What should we do here Please let me know what changes are required !

)
.map(row => {
Row.fromSeq(requiredColumns.map(col => row.get(row.fieldIndex(col))).toList)
}).toJavaRDD

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

same here, why do we need java RDDs in pure scala code?

new DataFrameWriterFunctions(dfw)

implicit object Encoder {

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

why is that encoder needed?

@@ -62,20 +63,20 @@ class CouchbaseDataFrameSpec extends FlatSpec with Matchers with BeforeAndAfterA
airline
.limit(10)
.write
.mode(SaveMode.Overwrite)
.mode(SaveMode.Overwrite).

This comment has been minimized.

Copy link
@daschl

daschl Aug 23, 2016

Member

two dots?

This comment has been minimized.

Copy link
@shiv4nsh

shiv4nsh Aug 30, 2016

Author Contributor

Here we have to write some code because it can't recognize couchbase method !

@daschl

This comment has been minimized.

Copy link
Member

commented Aug 23, 2016

@shiv4nsh thanks for your contribution again! I made some comments, if you could answer/fix them and then squash them into one commit that would be great - I'll then carry it over to gerrit and we can merge it into master, doing all the subsequent changes on top.

thanks!

@shiv4nsh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2016

@daschl : For others I will made the fix and update the PR as soon as possible and for the java Code I think I will need your help Please help me out here !

@daschl

This comment has been minimized.

Copy link
Member

commented Aug 23, 2016

Awesome, thanks! It doesn't have to be perfect we can iterate over it once the initial changes are merged :)

@daschl

This comment has been minimized.

Copy link
Member

commented Aug 30, 2016

@shiv4nsh thanks! I see that you also signed the CLA. So I'll get your changes here squashed and through gerrit into master and then we can iterate on the changes. I'll ping you once thats done

@shiv4nsh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2016

@daschl : Thanks man ! Looking forward to it ! 👍

@daschl

This comment has been minimized.

Copy link
Member

commented Aug 30, 2016

@shiv4nsh I've squashed and merged your PR with a couple more fixes to make it compile and pass the tests .. thank you very much for contributing this!

@daschl daschl closed this Aug 30, 2016

@shiv4nsh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2016

@daschl : Your welcome 👍

@vpillac

This comment has been minimized.

Copy link

commented Sep 6, 2016

@shiv4nsh thanks for your contribution 👍
@daschl when can we expect a new version on Maven?

@daschl

This comment has been minimized.

Copy link
Member

commented Sep 6, 2016

@vpillac I still have a few things I want to get done, but hopefully in september. Is there anything else you 'd like to see in it? (for example I want to upgrade it to our brand new dcp client (https://github.com/couchbaselabs/java-dcp-client) for spark streaming)

@vpillac

This comment has been minimized.

Copy link

commented Sep 6, 2016

We're not using spark streaming yet, so I'm mainly interested in having support for Spark2. Thanks for your answer!

@shiv4nsh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2016

@daschl : I want to help ! When can we start ? If youll make the high level stories that may be good !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.