-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting DEAP working with Spark #268
Comments
Hi Ryan, First, thanks for the detailed issue, it was a pleasant read. Second, we do have a solution for this. It has been sitting in the pull-request list for almost three years, and I am the one to blame. I have taken a too long DEAP hiatus, during which I have been teaching Spark among other things. Here is the PR : #76 There a some conflicts since the patch is a bit old, but you should be able to merge without too much effort and then test it with Spark. It should solve your issue. Let us know the outcome. The patch was normally meant for DEAP 2.0, this could give us the necessary motivation to assemble a new release. Cheers, |
Great, thanks @cmd-ntrf. I’ve been working with deap parallelization for a while now. Unfortunately I’m unable to publish some of my company tutorials on how to do it xD Needless to say I’ve become somewhat aquainted with the source code and the like. It would be a good idea I think to expand our documentation on the parallelization options! My one hint would be that I’ve had luck with DEAP on AWS cfncluster using Scoop. I would like to make it automatic with boto, but I wouldn’t be able to publish that either unfortunately without corporate approval :( Just saying, it was damn hard! And if anyone not bound by an agreement has examples they should publish (and PM me for advice on bug fixes and directions lol) |
Do you have specific examples of parallelization that would be of value to you? We thought we had it covered, but clearly there are some issues if you found it "damn hard" ;). Also, I am curious, who do you work for? |
Well I think the specific parallelization example (as opposed to something very general like: "Use Scoop") with AWS would be helpful to some people. The problem really isn't as much with DEAP, as it is with the combined 3 documentation, between DEAP, SCOOP, and whatever cloud platform you are working with. I work with Keysight technologies, I use DEAP mostly for hyperparameter optimization. I'll see after a while if we can't publish some of my tutorials. |
I'm having the same issue and looked at the above referenced pull request, but it appears to address problems with pickling when creator.create is called outside of the global scope; however, in my case (and, it appears, in the OPs example as well), I am calling that function in the global scope and getting a similar exception:
That being said, I'll try merging the pull request and test. |
I just tried merging the PR. The code which wasn't working for @ryanpeach worked. |
There was minor issue though, that pull request has |
I just meant that the OP's example called "creator.create" in the global scope, but the PR fix was to allow that call within a local scope, which originally didn't work due to an issue with visibility of pickled objects by worker nodes during distributed execution. That being said, I was incorrect in assuming that it wouldn't fix the problem - merging PR #76 also got it working for me. I've created PR #280 which incorporates the fix into the latest baseline and it seems to be passing CI checks. |
please see rsteca/sklearn-deap#59 I am having issue with this as well. Thanks, |
I created an workaround so that DEAP could work with the Reusable Processes Framework loky. The example below rewrites DEAP's onemap_mp.py example `
` I believe this approach could also work with PYSPARK, due to its "reusable execution environment" architecture. It seems that the problem arises from DEAP's unusual design for object scope and memory space, reusing several global objects (for example, the "creator"), that are hard to share among different processes. |
Any further feedback on this issue being resolved? |
Goal
I'm trying to get DEAP to parallelize across a spark cluster. I have seen this referenced by other users, as it allows for tight integration with existing server architecture easily via yarn. I have followed several tutorials online cited in the references. I have working code for deap, and then code that I have attempted to transform to use spark. The same error 'Can't get attribute Individual on module deap.creator' is the one to usually occur.
Working Code
Ran:
Output
Not Working Code
Ran:
spark-submit --master local test-deap.py
Output
Full Text (pastebin)
Highlights:
References:
The text was updated successfully, but these errors were encountered: