-
Notifications
You must be signed in to change notification settings - Fork 722
Description
Hi everyone
I programmed a processing of data on Jupyter Notebook (SageMaker) with the awswrangler library. This code work perfectly in this enviorement but when I try run it on Glue, the code finish with the next error: Command Failed with exit code 10. This error in the Knowledge Center say that is an error by Memory. Then I runed a memory profile to check how many memory use the process and I find that the process use 25Gb of memory in a "pandas.merge" because the Dataframes are so big (more than 10 Gb each one).
Next, I tryed create "categories" on the some columns for optimize the memory use, but when the code execute the "merge" again, this categories was lose.
¿How can I improve this? Is better change all for a Spark Job (Programmed in Spark)?
I think that someone must haved this problem and could resolved it.
Please I need guidance.
Thanks You.