-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about TPOT #47
Comments
Can you elaborate on this? I'm not familiar with ASTs. There are function within TPOT that generate and mutate the GP trees that represent machine learning pipelines, but those are currently hidden from the user.
My boss is very much pushing for GPU support, so we may go that way eventually, but currently we're still focusing on fully developing the TPOT functionality (e.g., adding more pipeline operators #45 / #46) and expanding support to other ML problems (e.g., regression #30). We'll be looking at optimizations such as GPU support after we've reached a fairly stable state for TPOT. Currently, we set |
regarding Abstract Syntax Trees (ASTs), frameworks like deap can be used to support configurable input/output tree formats, which is to say that a DSL (domain specific language) can be internally used by the GP framework for representation and crossover/mutation purposes. The powerful thing here is to support a programming language, like Python, as both, its output, but als input - i.e .similar to LISP. For simplicity, let's imagine a python "hello world" script that is fed to tpot for functions/terminals, with its fitness function requiring it to output "Hello tpot", and provide the script as python-code. Thanks for your clarifying comments regardign GPU support, you will probably want to take a look at pyopencl sooner or later. |
I see what you mean now. I've heard of researchers using GP as a way to create computer programs that take given inputs and produce a specific output. With TPOT, the idea is to constrain the available grammar to only machine learning operators, with the hope that such constraints will aid in faster discovery of effective pipelines. Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks. My boss, Jason Moore, has actually developed a system similar to what you propose. He has dozens of papers out on it now; here's one of them. His GP system evolves both the rules and the features that are used to make the classification. The big difference with his work is that he's not evolving Python code; rather, he's evolving mathematical expressions. |
Thanks for your response !
Actually, a configurable subset will do - in fact, you will see that this is how "deap" works: you can register primitives/terminals and use Python callbacks for those, while specifying their signature/arity, including even strong typing: http://deap.gel.ulaval.ca/doc/default/examples/gp_symbreg.html The example you can see there is using just a handful of Python callbacks for the tree representation, which makes it possible to use Python as both, the input but also the output for the trees that are manipulated by GP to evolve algorithms. The paper you mentioned looks interesting, note that this way of using Python to literally support "recursion" is very powerful, as it allows genetic metaprogramming, i.e. a genetic program used to modify a GP to evolve algorithms: https://mitpress.mit.edu/sites/default/files/titles/alife/0262297140chap52.pdf |
regarding your comments on gpu support, deap is using scoop |
http://www.randalolson.com/2015/11/15/introducing-tpot-the-data-science-assistant/
Given that this is very much work in progress, I am primarily wondering:
and 2) if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?
Thanks
(note that numpy based code can often be easily moved to OpenCL using pyOpenCL)
The text was updated successfully, but these errors were encountered: