Skip to content

[SPARK-29872] Improper cache strategy in examples#26498

Closed
Icysandwich wants to merge 1 commit intoapache:masterfrom
Icysandwich:SPARK-29872
Closed

[SPARK-29872] Improper cache strategy in examples#26498
Icysandwich wants to merge 1 commit intoapache:masterfrom
Icysandwich:SPARK-29872

Conversation

@Icysandwich
Copy link
Contributor

What changes were proposed in this pull request?

Correct some cache strategy in examples.

Why are the changes needed?

These changes can improve the performance of examples.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually

@AmplabJenkins
Copy link

Can one of the admins verify this patch?


// Set the model threshold to maximize F-Measure
val fMeasure = trainingSummary.fMeasureByThreshold
val fMeasure = trainingSummary.fMeasureByThreshold.cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this one is important for an example. It won't matter much


// Because join() joins on keys, the edges are stored in reversed order.
val edges = tc.map(x => (x._2, x._1))
val edges = tc.map(x => (x._2, x._1)).cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more complex for an example, but, I think it's reasonably important to show this pattern in this type of example.

.textFile("examples/src/main/resources/people.txt")
.map(_.split(","))
.map(attributes => Person(attributes(0), attributes(1).trim.toInt))
.cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't important enough to bother with, I think. The data is small

@srowen
Copy link
Member

srowen commented Nov 13, 2019

BTW "improper" isn't quite the right word. These aren't bugs per se.

@Icysandwich
Copy link
Contributor Author

Okay, I see.

@srowen
Copy link
Member

srowen commented Nov 13, 2019

I think the SparkTC change was OK, but yeah the examples just don't matter much. Maybe it's best to not clutter the example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants