Questions about TPOT #47

UniqueFool · 2015-12-05T17:20:03Z

http://www.randalolson.com/2015/11/15/introducing-tpot-the-data-science-assistant/

Perhaps the most basic way to help is to give TPOT a try for your normal workflow and let me know how it works for you. What worked well? What didn't work well? What new features do you think would help? I have my way of doing things, but I'd like to design this tool to be useful for everyone.

Given that this is very much work in progress, I am primarily wondering:

if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast)
and 2) if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?

Thanks

(note that numpy based code can often be easily moved to OpenCL using pyOpenCL)

rhiever · 2015-12-07T16:17:17Z

if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast)

Can you elaborate on this? I'm not familiar with ASTs. There are function within TPOT that generate and mutate the GP trees that represent machine learning pipelines, but those are currently hidden from the user.

if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?

My boss is very much pushing for GPU support, so we may go that way eventually, but currently we're still focusing on fully developing the TPOT functionality (e.g., adding more pipeline operators #45 / #46) and expanding support to other ML problems (e.g., regression #30). We'll be looking at optimizations such as GPU support after we've reached a fairly stable state for TPOT.

Currently, we set n_jobs=-1 everywhere possible in the sklearn code to support multithreading. Random forests, for example, will make use of all available cores when fitting and predicting. It looks like it may be possible to support multithreading in DEAP (the GA library) as well.

UniqueFool · 2015-12-08T09:52:09Z

regarding Abstract Syntax Trees (ASTs), frameworks like deap can be used to support configurable input/output tree formats, which is to say that a DSL (domain specific language) can be internally used by the GP framework for representation and crossover/mutation purposes.

The powerful thing here is to support a programming language, like Python, as both, its output, but als input - i.e .similar to LISP.
In other words, you could throw python code at the GP framework, in terms of functions and building blocks (e.g. a subset of python) and let tpot mutate that, and then dump the resulting python code to a file.

For simplicity, let's imagine a python "hello world" script that is fed to tpot for functions/terminals, with its fitness function requiring it to output "Hello tpot", and provide the script as python-code.

Thanks for your clarifying comments regardign GPU support, you will probably want to take a look at pyopencl sooner or later.

rhiever · 2015-12-08T14:45:06Z

I see what you mean now. I've heard of researchers using GP as a way to create computer programs that take given inputs and produce a specific output. With TPOT, the idea is to constrain the available grammar to only machine learning operators, with the hope that such constraints will aid in faster discovery of effective pipelines. Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.

My boss, Jason Moore, has actually developed a system similar to what you propose. He has dozens of papers out on it now; here's one of them. His GP system evolves both the rules and the features that are used to make the classification. The big difference with his work is that he's not evolving Python code; rather, he's evolving mathematical expressions.

UniqueFool · 2016-01-07T19:54:42Z

Thanks for your response !

Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.

Actually, a configurable subset will do - in fact, you will see that this is how "deap" works: you can register primitives/terminals and use Python callbacks for those, while specifying their signature/arity, including even strong typing: http://deap.gel.ulaval.ca/doc/default/examples/gp_symbreg.html

The example you can see there is using just a handful of Python callbacks for the tree representation, which makes it possible to use Python as both, the input but also the output for the trees that are manipulated by GP to evolve algorithms.

The paper you mentioned looks interesting, note that this way of using Python to literally support "recursion" is very powerful, as it allows genetic metaprogramming, i.e. a genetic program used to modify a GP to evolve algorithms: https://mitpress.mit.edu/sites/default/files/titles/alife/0262297140chap52.pdf

UniqueFool · 2016-02-15T20:56:53Z

regarding your comments on gpu support, deap is using scoop

rhiever added the question label Dec 7, 2015

rhiever changed the title ~~feedback~~ Questions about TPOT Dec 7, 2015

rhiever closed this as completed Sep 1, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about TPOT #47

Questions about TPOT #47

UniqueFool commented Dec 5, 2015

rhiever commented Dec 7, 2015

UniqueFool commented Dec 8, 2015

rhiever commented Dec 8, 2015

UniqueFool commented Jan 7, 2016

UniqueFool commented Feb 15, 2016

Questions about TPOT #47

Questions about TPOT #47

Comments

UniqueFool commented Dec 5, 2015

rhiever commented Dec 7, 2015

UniqueFool commented Dec 8, 2015

rhiever commented Dec 8, 2015

UniqueFool commented Jan 7, 2016

UniqueFool commented Feb 15, 2016