Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16438] Add Asynchronous Actions documentation #14104

Closed
wants to merge 6 commits into from

Conversation

Projects
None yet
4 participants
@phalodi
Copy link
Contributor

commented Jul 8, 2016

What changes were proposed in this pull request?

Add Asynchronous Actions documentation inside action of programming guide

How was this patch tested?

check the documentation indentation and formatting with md preview.

@phalodi phalodi changed the title Add Asynchronous Actions documentation [SPARK-16438]Add Asynchronous Actions documentation Jul 8, 2016

@phalodi phalodi changed the title [SPARK-16438]Add Asynchronous Actions documentation [SPARK-16438] Add Asynchronous Actions documentation Jul 8, 2016

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2016

@AmplabJenkins I dont know i just create the issue on JIRA and make changes and make pull request for merge , I think it is valid pull request because in real world all application need non blocking execution. So if its is valid then merge it.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 8, 2016

@phalodi it's a bot, not a person.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 8, 2016

Jenkins test this please

@SparkQA

This comment has been minimized.

Copy link

commented Jul 8, 2016

Test build #61980 has finished for PR 14104 at commit aba737f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Jul 8, 2016

Test build #61979 has finished for PR 14104 at commit aba737f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@rxin

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2016

Can you paste a screenshot of the doc?

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2016

@rxin Below are screen shot of Asynchronous Actions inside actions and screen shot of index of programming guide.

screenshot from 2016-07-09 14-20-54

screenshot from 2016-07-09 14-21-17

@@ -1099,6 +1099,34 @@ for details.
</tr>
</table>

#### Asynchronous Actions
Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking each other.

This comment has been minimized.

Copy link
@srowen

srowen Jul 10, 2016

Member

"without blocking each other": actions never block each other anyway. They block the calling thread.

</tr>
<tr>
<td> <b>foreachAsync</b>(<i>func</i>) </td>
<td> Applies a function f to all elements of this RDD. </td>

This comment has been minimized.

Copy link
@srowen

srowen Jul 10, 2016

Member

This isn't specific to the async version.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 10, 2016

Mostly this just duplicates the non-async documentation. I think it's worth mentioning the existence of the async methods with a pointer to the API docs, but, I don't think this table adds value.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2016

@srowen yeah you are right its not blocking calling thread but they execute sequentially right but in async action its return future so its running on different threads soo not run sequentially soo its fully non blocking example if i have action in that we have some transformation like map n inside map i have blocking code then how its work its block or not?

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2016

@srowen yeah i agree with you its same like non async actions but yeah we should list it because its more useful in real life application when single application running multiple jobs of different users. so we should list it so user will more familiar with it and also know which are the async actions. i think its helpful for users still if you think its not a valid point then just tell me i change pull request and add just reference for async actions but i want user should know because when i first time see it i m surprise that in whole documentation no where mention about async actions.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 10, 2016

I fixed my comment above, sorry. They do block the calling thread, but actions triggered in different threads on one RDD do not block each other. The first sentence isn't correct. My suggestion is to omit the table, not all of the text you added.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2016

@srowen ok no problem so what you suggest should i remove the table and just add the line below the actions table and give link to scala and java docs? And the line "Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking each other." is not correct so should i add that to "return future to non-block calling thread?" even in above statement my intention is also talk about calling thread because when we call action it will always in calling thread so calling thread will block but when we use async actions it will not block so we are targeted here non-blocking for calling thread because actions are called in calling thread. so i make changes and push it again as u suggest just add single line at end of actions and give link for scala and java doc api. Its cool?

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2016

@srowen i make changes as you suggest i remove the table and add a single below actions table and also change statement to be more clear about non blocking.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2016

@srowen @rxin Please review it and give your comments.

screenshot from 2016-07-11 01-08-21

@@ -1099,6 +1099,9 @@ for details.
</tr>
</table>

Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking calling thread. Refer to the RDD API doc for asynchronous actions ([Scala](api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions),[Java](api/java/org/apache/spark/rdd/AsyncRDDActions.html))

This comment has been minimized.

Copy link
@srowen

srowen Jul 11, 2016

Member

I still don't think this is accurate. Spark can execute actions concurrently without this API. This merely makes the call non-blocking for the caller. It really adds very little beyond calling the normal API and wrapping in a Future (though it's more useful for the Java API where that's harder). Hence why I thought it was unofficially deprecated.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen so what do you think we should not add this api reference in document?

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen i know its nothing major just call normal functions in future but still naive user first time who learn scala and spark dont know what are future and all so at least we should add reference of it. if you suggest i simple add reference link without adding any statement of blocking and non blocking. so atleast user know there is some api like this.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 11, 2016

Although I don't think it's essential to mention this API (it's in the API doc) it wouldn't hurt to point to it here. The mention should be accurate though. I think it's correct to say that the Spark RDD API also exposes asynchronous versions of some actions like foreach, which return a handle immediately to the caller which can be used to wait for its completion.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen sure i will make appropriate changes and push it again

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen now its perfect i think at last from table to one line but yeah you are right it good to just mention that spark provide it and not duplicate table of actions.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@@ -1099,6 +1099,9 @@ for details.
</tr>
</table>

Spark RDD API also exposes asynchronous versions of some actions like foreach, collect, count etc. which return a handle immediately to the caller which can be used to wait for its completion.

This comment has been minimized.

Copy link
@srowen

srowen Jul 11, 2016

Member

I have a few edits to suggest so let me just propose some text directly:

The Spark RDD API also exposes asynchronous versions of some actions, like foreachAsync for foreach, which immediately return a FutureAction to the caller instead of blocking on completion of the action. This can be used to manage or wait for the asynchronous execution of the action.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen Thanks man i know i am not giving correct language but my intention is also same i hope its final commit :) if its done can you merge it.

@@ -1099,6 +1099,9 @@ for details.
</tr>
</table>

The Spark RDD API also exposes asynchronous versions of some actions, like foreachAsync for foreach, which immediately return a FutureAction to the caller instead of blocking on completion of the action. This can be used to manage or wait for the asynchronous execution of the action.

This comment has been minimized.

Copy link
@srowen

srowen Jul 11, 2016

Member

Last nit pick: back-tick-quote the code nouns like I did.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen I put back-tick-quote for foreach, foreachAsync and FutureAction.

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2016

@srowen Below is final screen shot as you suggested i hope its looks good please review it and merge it.
screenshot from 2016-07-11 17-45-24

@phalodi

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2016

@srowen Is this correct?? if it is then please merge it otherwise give comment if needed.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 12, 2016

@phalodi no need to ping. We don't merge things immediately, especially when non-essential. It gives more time for review.

@srowen

This comment has been minimized.

Copy link
Member

commented Jul 13, 2016

Jenkins retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Jul 13, 2016

Test build #62231 has finished for PR 14104 at commit 845a123.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in bf107f1 Jul 13, 2016

asfgit pushed a commit that referenced this pull request Jul 13, 2016

[SPARK-16438] Add Asynchronous Actions documentation
## What changes were proposed in this pull request?

Add Asynchronous Actions documentation inside action of programming guide

## How was this patch tested?

check the documentation indentation and formatting with md preview.

Author: sandy <phalodi@gmail.com>

Closes #14104 from phalodi/SPARK-16438.

(cherry picked from commit bf107f1)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.