Skip to content

[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save#12676

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-14907
Closed

[SPARK-14907][MLLIB] Use repartition in GLMRegressionModel.save#12676
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-14907

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR changes GLMRegressionModel.save function like the following code that is similar to other algorithms' parquet write.

- val dataRDD: DataFrame = sc.parallelize(Seq(data), 1).toDF()
- // TODO: repartition with 1 partition after SPARK-5532 gets fixed
- dataRDD.write.parquet(Loader.dataPath(path))
+ sqlContext.createDataFrame(Seq(data)).repartition(1).write.parquet(Loader.dataPath(path))

How was this patch tested?

Manual.

@SparkQA
Copy link

SparkQA commented Apr 26, 2016

Test build #56947 has finished for PR 12676 at commit b237877.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @jkbradley .
Could you review this PR when you have some time?

@MLnick
Copy link
Contributor

MLnick commented Apr 26, 2016

LGTM.

("save/load" test case in LinearRegressionSuite covers this)

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @MLnick !

@dongjoon-hyun
Copy link
Member Author

Hi, @mengxr .
Could you review this too?

@jkbradley
Copy link
Member

This isn't really changing anything, and I actually think this makes the DF creation less similar to most spark.mllib save methods in terms of code style. I'd prefer to close this issue.

@dongjoon-hyun
Copy link
Member Author

If you think so, it's okay, @jkbradley .
But, if you don't mind, could you remove those TODO by yourself.
Do you have any reason to maintain that?

TODO comments always mislead community developer like me.

@dongjoon-hyun
Copy link
Member Author

In fact, I didn't try to change that if it's just a style problem.

@jkbradley
Copy link
Member

Oh, I apologize; I missed the TODO removal. You're right; that should have been removed previously when the sc.parallelize call was changed to use 1 partition.

It looks like that's the only remaining mention of SPARK-5532 in the codebase, so your fix be it.

@jkbradley
Copy link
Member

LGTM
I'll merge this with master
Thanks (and thanks for pushing back)!

@asfgit asfgit closed this in e4f3eec Apr 26, 2016
@dongjoon-hyun
Copy link
Member Author

Oh, thank YOU, @jkbradley .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-14907 branch May 12, 2016 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants