Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Limiting FragmentRDD pipe paralellism #1977
Hi, I'm pretty new to Spark/Adam. While I was playing with the pipe function I hit a too big parallelism problem: simply too many subprocesses are created. Each subprocess uses a lot of RAM, so I cannot afford having many of them on a single processing node.
I'd like to limit the number of subprocesses that my FragmentRDD is piped to (on single computer), however I didn't find any way to do so -- there is no .repartition on FragmentRDD . Is there a way to force piping multiple partitions into single process or can I repartition my FragmentRDD somehow?
Thank you for your help and please forgive me if this is a trivial question,
val fragments: FragmentRDD = ... fragments.transform(rdd => rdd.repartition(n))
In fact, the reference to
val fragments: FragmentRDD = ... fragments.transformDataset(dataset => dataset.repartition(n))