Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about TPOT #47

Closed
UniqueFool opened this issue Dec 5, 2015 · 5 comments
Closed

Questions about TPOT #47

UniqueFool opened this issue Dec 5, 2015 · 5 comments
Labels

Comments

@UniqueFool
Copy link

http://www.randalolson.com/2015/11/15/introducing-tpot-the-data-science-assistant/

Perhaps the most basic way to help is to give TPOT a try for your normal workflow and let me know how it works for you. What worked well? What didn't work well? What new features do you think would help? I have my way of doing things, but I'd like to design this tool to be useful for everyone.

Given that this is very much work in progress, I am primarily wondering:

  1. if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast)
    and 2) if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?

Thanks

(note that numpy based code can often be easily moved to OpenCL using pyOpenCL)

@rhiever
Copy link
Contributor

rhiever commented Dec 7, 2015

  1. if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast)

Can you elaborate on this? I'm not familiar with ASTs. There are function within TPOT that generate and mutate the GP trees that represent machine learning pipelines, but those are currently hidden from the user.

  1. if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?

My boss is very much pushing for GPU support, so we may go that way eventually, but currently we're still focusing on fully developing the TPOT functionality (e.g., adding more pipeline operators #45 / #46) and expanding support to other ML problems (e.g., regression #30). We'll be looking at optimizations such as GPU support after we've reached a fairly stable state for TPOT.

Currently, we set n_jobs=-1 everywhere possible in the sklearn code to support multithreading. Random forests, for example, will make use of all available cores when fitting and predicting. It looks like it may be possible to support multithreading in DEAP (the GA library) as well.

@rhiever rhiever changed the title feedback Questions about TPOT Dec 7, 2015
@UniqueFool
Copy link
Author

regarding Abstract Syntax Trees (ASTs), frameworks like deap can be used to support configurable input/output tree formats, which is to say that a DSL (domain specific language) can be internally used by the GP framework for representation and crossover/mutation purposes.

The powerful thing here is to support a programming language, like Python, as both, its output, but als input - i.e .similar to LISP.
In other words, you could throw python code at the GP framework, in terms of functions and building blocks (e.g. a subset of python) and let tpot mutate that, and then dump the resulting python code to a file.

For simplicity, let's imagine a python "hello world" script that is fed to tpot for functions/terminals, with its fitness function requiring it to output "Hello tpot", and provide the script as python-code.

Thanks for your clarifying comments regardign GPU support, you will probably want to take a look at pyopencl sooner or later.

@rhiever
Copy link
Contributor

rhiever commented Dec 8, 2015

I see what you mean now. I've heard of researchers using GP as a way to create computer programs that take given inputs and produce a specific output. With TPOT, the idea is to constrain the available grammar to only machine learning operators, with the hope that such constraints will aid in faster discovery of effective pipelines. Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.

My boss, Jason Moore, has actually developed a system similar to what you propose. He has dozens of papers out on it now; here's one of them. His GP system evolves both the rules and the features that are used to make the classification. The big difference with his work is that he's not evolving Python code; rather, he's evolving mathematical expressions.

@UniqueFool
Copy link
Author

Thanks for your response !

Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.

Actually, a configurable subset will do - in fact, you will see that this is how "deap" works: you can register primitives/terminals and use Python callbacks for those, while specifying their signature/arity, including even strong typing: http://deap.gel.ulaval.ca/doc/default/examples/gp_symbreg.html

The example you can see there is using just a handful of Python callbacks for the tree representation, which makes it possible to use Python as both, the input but also the output for the trees that are manipulated by GP to evolve algorithms.

The paper you mentioned looks interesting, note that this way of using Python to literally support "recursion" is very powerful, as it allows genetic metaprogramming, i.e. a genetic program used to modify a GP to evolve algorithms: https://mitpress.mit.edu/sites/default/files/titles/alife/0262297140chap52.pdf

@UniqueFool
Copy link
Author

regarding your comments on gpu support, deap is using scoop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants