Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPOT command line usage help #34

Closed
SimplyAhmazing opened this issue Nov 30, 2015 · 2 comments
Closed

TPOT command line usage help #34

SimplyAhmazing opened this issue Nov 30, 2015 · 2 comments
Labels

Comments

@SimplyAhmazing
Copy link

I downloaded a sample mnist data set into a CSV and installed TPOT and all the dependencies.

I tried running it through the command line and below is the command I ran and the results I got,

$ tpot -i mnist.csv -is , -g 100 -s 42 -v 2

TPOT settings:
crossover_rate  =   0.05
generations =   100
input_file  =   mnist.csv
input_separator =   ,
mutation_rate   =   0.9
population_size =   100
random_state    =   42
verbosity   =   2





gen nevals  Minimum accuracy    Average accuracy    Maximum accuracy
0   100     0.1                 0.404918            0.964608






^C


^CTraceback (most recent call last):
  File "/Users/moi/.pyenv/versions/tpot/bin/tpot", line 9, in <module>
    load_entry_point('TPOT==0.1.3', 'console_scripts', 'tpot')()
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/tpot/tpot.py", line 479, in main
    training_features, training_classes)))
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/tpot/tpot.py", line 207, in score
    training_testing_data.rename(columns={column: str(column).zfill(5)}, inplace=True)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/frame.py", line 2697, in rename
    **kwargs)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/generic.py", line 606, in rename
    result._data = result._data.rename_axis(f, axis=baxis, copy=copy)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/internals.py", line 2587, in rename_axis
    obj = self.copy(deep=copy)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/internals.py", line 3059, in copy
    do_integrity_check=False)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/internals.py", line 2823, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/moi/.pyenv/versions/tpot/lib/python3.5/site-packages/pandas/core/internals.py", line 578, in copy
    values = values.copy()
KeyboardInterrupt

It took a couple of hours until TPOT returned the stats summary and then after an hour or so it was still running so I terminated it. I'm curious as to what a TPOT run looks like? And for some reason I was expecting code to be written to a directory,

Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.

Or maybe this python source is printed to the terminal? Going to start reading the tpot source more thoroughly.

@SimplyAhmazing SimplyAhmazing changed the title TPOT command line usage expectations TPOT command line usage help Nov 30, 2015
@rhiever
Copy link
Contributor

rhiever commented Dec 1, 2015

Hi @SimplyAhmazing,

The first thing I should clarify is that TPOT will be quite slow on large data sets such as the full MNIST data set. Using the default TPOT settings, each iteration of the algorithm is evaluating 100 pipelines on the training set, many of which are training multiple classifiers on the data. This statement is true of most Evolutionary Computation-based methods, where it's not uncommon to allow the algorithm to run for several hours, days, or even weeks. Your best bet is to set TPOT to running on the data set and give it a couple days to crunch on the data.

Regarding outputting the pipeline: Currently, TPOT on the command line only outputs the best pipeline in terms of TPOT functions at the end of the run. If you terminate a command line version early, you won't see the final pipeline. I've raised #36 as a suggestion to fix that.

We're still working on outputting the pipelines as sklearn code. You can see the latest on this branch. It turned out to be quite tricky to convert these pipelines to useable Python code, so that feature is somewhat delayed.

@SimplyAhmazing
Copy link
Author

@rhiever Thanks! I figured that it would indeed be slow to run iterations on the training set.

I think this is a really cool project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants