New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no method for reduce_by_key(::Spark.PipelinedRDD, ::Function)
#51
Comments
Don't expect Scala or Python examples to be reproducible line-by-line. Scala, for example, uses implicit conversion from |
Try this:
Let me know if there are any other issues with Spark examples. |
By the way, thanks for raising the issue! I think many people come across the project, try it out without success and silently go away thinking it's not working. I'll add a couple of example from that page to Spark.jl's docs. |
That seems to work fine, thanks. Of course I think it would be extremely useful to have some documentation that explains the differences from the Scala and Java API's in such a way that it would be a bit clearer what to do in cases like this. When I finish creating Julia versions of all the examples here it might be helpful to post them in the documentation, that way people can see Julia code equivalent to well-known examples in all 3 other languages. I'm not sure if the remaining examples can even be done yet, as they require some dataframes functionality. I've started looking through the source code and made a fork in the hopes of adding some that are missing, but going through the Spark API docs is pretty painful (as I'm sure you know) and I'm not a Spark expert. We'll see how far I get with this. |
I don't really think there are many "cases like this". Spark API uses different tricks in different languages. I always look at Java API since this is what we actually call under the hood and it almost doesn't contain hidden operations (like implicit conversions in Scala, for example). But for end users it's not much useful, so we need to provide independent and easily discoverable examples.
Don't hesitate to ask questions! I spent quite a lot of time both - writing Spark programs in Scala and Python, and wrapping it in Julia, so I'll be glad to help whenever possible. On my side, I'm going to implement a couple of important functions from that page today or tomorrow to get things easier. |
That sounds great, thanks. From my limited experience, one of the best uses of Spark is as a sane replacement for SQL, so I would consider dataframe operations such as groupby and join to be quite valuable (though I realize that under the hood it's probably just |
|
I am going through the Spark documentation examples here and trying to reproduce all of them. For the very first one
one gets
My guess is that either these method signatures are unnecessarily restrictive (in which case some associated methods would have to be changed as well) or one of these is returning a
PipelinedRDD
when it should be returning aPipelinedPairRDD
.The text was updated successfully, but these errors were encountered: