[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save by dongjoon-hyun · Pull Request #12676 · apache/spark

dongjoon-hyun · 2016-04-26T00:21:07Z

What changes were proposed in this pull request?

This PR changes GLMRegressionModel.save function like the following code that is similar to other algorithms' parquet write.

- val dataRDD: DataFrame = sc.parallelize(Seq(data), 1).toDF()
- // TODO: repartition with 1 partition after SPARK-5532 gets fixed
- dataRDD.write.parquet(Loader.dataPath(path))
+ sqlContext.createDataFrame(Seq(data)).repartition(1).write.parquet(Loader.dataPath(path))

How was this patch tested?

Manual.

SparkQA · 2016-04-26T00:59:26Z

Test build #56947 has finished for PR 12676 at commit b237877.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-04-26T03:57:36Z

Hi, @jkbradley .
Could you review this PR when you have some time?

MLnick · 2016-04-26T06:40:56Z

LGTM.

("save/load" test case in LinearRegressionSuite covers this)

dongjoon-hyun · 2016-04-26T16:56:31Z

Thank you for review, @MLnick !

dongjoon-hyun · 2016-04-26T17:37:01Z

Hi, @mengxr .
Could you review this too?

jkbradley · 2016-04-26T18:23:42Z

This isn't really changing anything, and I actually think this makes the DF creation less similar to most spark.mllib save methods in terms of code style. I'd prefer to close this issue.

dongjoon-hyun · 2016-04-26T18:35:02Z

If you think so, it's okay, @jkbradley .
But, if you don't mind, could you remove those TODO by yourself.
Do you have any reason to maintain that?

TODO comments always mislead community developer like me.

dongjoon-hyun · 2016-04-26T18:35:52Z

In fact, I didn't try to change that if it's just a style problem.

jkbradley · 2016-04-26T20:57:19Z

Oh, I apologize; I missed the TODO removal. You're right; that should have been removed previously when the sc.parallelize call was changed to use 1 partition.

It looks like that's the only remaining mention of SPARK-5532 in the codebase, so your fix be it.

jkbradley · 2016-04-26T20:57:56Z

LGTM
I'll merge this with master
Thanks (and thanks for pushing back)!

dongjoon-hyun · 2016-04-26T21:03:04Z

Oh, thank YOU, @jkbradley .

[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save

b237877

asfgit closed this in e4f3eec Apr 26, 2016

dongjoon-hyun deleted the SPARK-14907 branch May 12, 2016 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save#12676

[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save#12676
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-14907

dongjoon-hyun commented Apr 26, 2016

Uh oh!

SparkQA commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

MLnick commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dongjoon-hyun commented Apr 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

MLnick commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

jkbradley commented Apr 26, 2016

Uh oh!

dongjoon-hyun commented Apr 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants