First pass at pipeline export functionality #37

rhiever · 2015-12-02T21:20:34Z

Per #5

This PR adds the export() function, which allows the user to specify an output file for the pipeline. Once export() is called (and a pipeline has already been optimized), this function converts the pipeline into its corresponding Python code and exports it to the specified output file.

Docs have been updated along with this PR to demonstrate how the export() function works.

@rasbt, I would greatly appreciate if you could review this before we merge it.

Some odd bugs remain with sklearn model fields that can take both string and integer values.

# Conflicts: # .gitignore # tpot/tpot.py

rhiever · 2015-12-02T21:25:26Z

The decline in coverage is expected. The new export() function introduces a fair bit of new code without a unit test for it (yet). I'll eventually get to expanding the unit tests... :-)

rasbt · 2015-12-02T21:40:56Z

tpot/tpot.py

+        -------
+        None
+        """
+        if self.optimized_pipeline_ == None:


if self.optimized_pipeline_ == None:

should be

if not self.optimized_pipeline_:

Isn't the equality more explicit?

I just noticed in the score() function it now says:

if self.optimized_pipeline_ is None:

Personally, I think the equality is better since it is explicitly checking for the default state (None). I'm open to being convinced otherwise though.

Oh I see; I thought this way it may be more robust, accounting for empty objects like "", [], {}. But maybe being explicit is not a bad idea here.

However

if self.optimized_pipeline_ is None:

is definitely preferred over

if self.optimized_pipeline_ == None:

It's more efficient (because you don't have the __eq__ overhead) and also cleaner style.

rasbt · 2015-12-02T21:52:11Z

@rasbt, I would greatly appreciate if you could review this before we merge it.

Sure, I can take a more detailed look at it later when I get home. However, would it be possible to add a small unit test comparing the actual output with what you'd expect? I think that would be helpful as "documentation" and to follow along.

rhiever · 2015-12-02T22:32:01Z

Sure, here's an example:

from tpot import TPOT
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75)

tpot = TPOT(generations=1, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_train, y_train, X_test, y_test))
tpot.export('tpot_export.py')

TPOT will likely discover that a random forest alone does well on the data set, so tpot_export.py should contain something like:

from itertools import combinations

import numpy as np
import pandas as pd

from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR')
training_indeces, testing_indeces = next(iter(StratifiedShuffleSplit(tpot_data['class'].values, n_iter=1, train_size=0.75)))


result1 = tpot_data.copy()

# Perform classification with a random forest classifier
rfc1 = RandomForestClassifier(n_estimators=97, max_features=min(8, len(result1.columns) - 1))
rfc1.fit(result1.loc[training_indeces].drop('class', axis=1).values, result1.loc[training_indeces]['class'].values)
result1['rfc1-classification'] = rfc1.predict(result1.drop('class', axis=1).values)

If you want to perform further inspection on the pipelines, if you print() the optimized pipeline at any point it will give you the nested function version, from which you can deduce what the linear code version should look like.

rhiever · 2015-12-02T22:41:42Z

Oh, and let's make sure to close #36 when this is merged. Now when the user ends TPOT early on the command line and they provided an output file, the current best pipeline will be exported.

rasbt · 2015-12-02T23:56:50Z

Oh, and let's make sure to close #36 when this is merged. Now when the user ends TPOT early on the command line and they provided an output file, the current best pipeline will be exported.

Sounds cool! Maybe a nice addition would be to refactor the exportmethod a bit so that a slimmer "core" component can be written to a log-file. This let's the user also conveniently check the progress over time (e.g., via tail tpot_run_x.log) or so.

rhiever · 2015-12-03T03:53:01Z

export() actually runs pretty quick even on large pipelines, so it wouldn't be much overhead to run it every generation.

Changed equality operators to “is” syntax. Changed exceptions raised from not having an optimized pipeline to a ValueError.

rasbt · 2015-12-04T07:11:19Z

export() actually runs pretty quick even on large pipelines, so it wouldn't be much overhead to run it every generation.

Oh, sure, but I was more thinking of "bloated log files" here. I think that log/status files are generally helpful to keep track of errors but also to judge the progress if your are running stuff remotely, and analyzing what's going on under the hood (e.g., think bakc of ye goode olde times submitting jobs to HPCC ;)). So, I was thinking to refactor it into a more bare-bones "export_params" and a "export_pipeline_standalone" or so. But this is just a general suggestion, doesn't have to be now :)

rasbt · 2015-12-04T07:14:58Z

Besides that, the code looks fine to me so far. But I haven't run it through a debugger yet and looked at it in detail... sorry, it's the end of the year and I am pretty busy wrapping things up before I go on a family visit, but January should be a good time for a fresh start :)

rhiever · 2015-12-04T14:08:12Z

Alrighty, I think I'll merge this for now since I'd like to push this functionality out. Please file bug reports against it if you see anything wrong.

First pass at pipeline export functionality

rhiever added 8 commits November 12, 2015 16:52

Initial steps toward export function

79fbefb

Merge branch 'master' into export

6c92b9f

Most of #5 implemented

2caa6d3

Some odd bugs remain with sklearn model fields that can take both string and integer values.

Merge branch 'master' into export

63163cb

# Conflicts: # .gitignore # tpot/tpot.py

First pass at #5

4ba3281

Command line support for #5

53d1cc0

Update docs for #5

7e7cb00

Fix .gitignore merge failure

42e53e4

rhiever added the enhancement label Dec 2, 2015

rasbt reviewed Dec 2, 2015
View reviewed changes

Syntax changes per @rasbt

abadde3

Changed equality operators to “is” syntax. Changed exceptions raised from not having an optimized pipeline to a ValueError.

rhiever pushed a commit that referenced this pull request Dec 4, 2015

Merge pull request #37 from rhiever/export

1f56114

First pass at pipeline export functionality

rhiever merged commit 1f56114 into master Dec 4, 2015

rhiever deleted the export branch December 4, 2015 14:08

AIAdventures mentioned this pull request Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this pull request Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First pass at pipeline export functionality #37

First pass at pipeline export functionality #37

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt Dec 2, 2015

rhiever Dec 2, 2015

rasbt Dec 2, 2015

rasbt commented Dec 2, 2015

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt commented Dec 2, 2015

rhiever commented Dec 3, 2015

rasbt commented Dec 4, 2015

rasbt commented Dec 4, 2015

rhiever commented Dec 4, 2015

First pass at pipeline export functionality #37

First pass at pipeline export functionality #37

Conversation

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt Dec 2, 2015

Choose a reason for hiding this comment

rhiever Dec 2, 2015

Choose a reason for hiding this comment

rasbt Dec 2, 2015

Choose a reason for hiding this comment

rasbt commented Dec 2, 2015

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt commented Dec 2, 2015

rhiever commented Dec 3, 2015

rasbt commented Dec 4, 2015

rasbt commented Dec 4, 2015

rhiever commented Dec 4, 2015