Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

ERROR TaskSetManager: Total size ... is bigger than spark.driver.maxResultSize (1024.0 MB) #14

Open
chrs-myrs opened this issue Jun 12, 2019 · 4 comments

Comments

@chrs-myrs
Copy link

I cannot run the CloudFront task without getting this responses.

ERROR TaskSetManager: Total size of serialized results of 3055 tasks (1052.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

@dacort
Copy link
Contributor

dacort commented Jun 12, 2019

Hi @chrs-myrs - this can be an issue if you have a large number of source files that you're trying to convert. As a workaround, can you try setting spark.driver.maxResultSize on the Glue job?

In the "Security configuration, script libraries, and job parameters (optional)". ===> "Job Parameters" section, add the following key --conf and value spark.driver.maxResultSize=2g.

Long term, we may need to find a way to better filter the initial set of inbound files to a smaller set, possibly as part of #12.

@RickardCardell
Copy link

While adding spark.driver.maxResultSize=2g or higher, it's also good to increase driver memory so that the allocated memory from Yarn isn't exceeded and results in a failed job.
The setting is spark.driver.memory.

Adding two spark configs is done like this:
Key: --conf
Value: spark.driver.maxResultSize=2g --conf spark.driver.memory=8g

@chrs-myrs
Copy link
Author

Setting the maxResultsSize gave us enough to get this to run properly

@jpduckwo
Copy link

I'm experiencing this error, but only in subsequent job executions, the first time I run the job even with 100,000s of files in the processing folder (CloudFront logs) it will work with no memory issues. However on subsequent runs it keeps failing. Anyone got any idea? I've been trying move files around and process in batches, but it's a pain. Should this library be able to handle huge file numbers without issues? Or should I be pre-moving into day folders and only processing a day at a time or something?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants