Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at pipeline export functionality #37

Merged
merged 9 commits into from
Dec 4, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,4 @@ docs/_build/
target/

# IPython Notebooks
Testing TPOT usage.ipynb
.ipynb_checkpoints/*
27 changes: 21 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ from tpot import TPOT
pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
```

Now TPOT is ready to work! You can pass TPOT some data with a scikit-learn-like interface:
Now TPOT is ready to work! You can tell TPOT to optimize a pipeline based on a data set with the `fit` function:

```Python
from tpot import TPOT
Expand All @@ -111,18 +111,31 @@ pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
```

then evaluate the final pipeline as such:
then evaluate the final pipeline with the `score()` function:

```Python
from tpot import TPOT

pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes)
print(pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes))
```

Note that you need to pass the training data to the `score()` function so the pipeline re-trains the scikit-learn models on the training data.

Finally, you can tell TPOT to export the optimized pipeline to a text file with the `export()` function:

```Python
from tpot import TPOT

pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes))
pipeline_optimizer.export('tpot_exported_pipeline.py')
```

Once this code finishes running, `tpot_exported_pipeline.py` will contain the Python code for the optimized pipeline.

### Using TPOT via the command line

To use TPOT via the command line, enter the following command to see the parameters that TPOT can receive:
Expand All @@ -135,6 +148,7 @@ The following parameters will display along with their descriptions:

* `-i` / `INPUT_FILE`: The path to the data file to optimize the pipeline on. Make sure that the class column in the file is labeled as "class".
* `-is` / `INPUT_SEPARATOR`: The character used to separate columns in the input file. Commas (,) and tabs (\t) are the most common separators.
* `-o` / `OUTPUT_FILE`: The path to a file that you wish to export the pipeline code into. By default, exporting is disabled.
* `-g` / `GENERATIONS`: The number of generations to run pipeline optimization for. Must be > 0. The more generations you give TPOT to run, the longer it takes, but it's also more likely to find better pipelines.
* `-p` / `POPULATION`: The number of pipelines in the genetic algorithm population. Must be > 0. The more pipelines in the population, the slower TPOT will run, but it's also more likely to find better pipelines.
* `-mr` / `MUTATION_RATE`: The mutation rate for the genetic programming algorithm in the range [0.0, 1.0]. This tells the genetic programming algorithm how many pipelines to apply random changes to every generation. We don't recommend that you tweak this parameter unless you know what you're doing.
Expand All @@ -145,7 +159,7 @@ The following parameters will display along with their descriptions:
An example command-line call to TPOT may look like:

```Shell
tpot -i data/mnist.csv -is , -g 100 -s 42 -v 2
tpot -i data/mnist.csv -is , -o tpot_exported_pipeline.py -g 100 -s 42 -v 2
```

## Examples
Expand All @@ -163,10 +177,11 @@ X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,

tpot = TPOT(generations=5)
tpot.fit(X_train, y_train)
tpot.score(X_train, y_train, X_test, y_test)
print(tpot.score(X_train, y_train, X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
```

Running this code should discover a pipeline that achieves ~98% testing accuracy.
Running this code should discover a pipeline that achieves ~97% testing accuracy, and the corresponding Python code should be exported to the `tpot_mnist_pipeline.py` file.

## Want to get involved with TPOT?

Expand Down
5 changes: 3 additions & 2 deletions docs/sources/examples/MNIST_Example.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,

tpot = TPOT(generations=5)
tpot.fit(X_train, y_train)
tpot.score(X_train, y_train, X_test, y_test)
print(tpot.score(X_train, y_train, X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
```

Running this code should discover a pipeline that achieves ~98% testing accuracy.
Running this code should discover a pipeline that achieves ~97% testing accuracy, and the corresponding Python code should be exported to the `tpot_mnist_pipeline.py` file.
19 changes: 16 additions & 3 deletions docs/sources/examples/Using_TPOT_via_code.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ from tpot import TPOT
pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
```

Now TPOT is ready to work! You can pass TPOT some data with a scikit-learn-like interface:
Now TPOT is ready to work! You can tell TPOT to optimize a pipeline based on a data set with the `fit` function:

```Python
from tpot import TPOT
Expand All @@ -42,14 +42,27 @@ pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
```

then evaluate the final pipeline as such:
then evaluate the final pipeline with the `score()` function:

```Python
from tpot import TPOT

pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes)
print(pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes))
```

Note that you need to pass the training data to the `score()` function so the pipeline re-trains the scikit-learn models on the training data.

Finally, you can tell TPOT to export the optimized pipeline to a text file with the `export()` function:

```Python
from tpot import TPOT

pipeline_optimizer = TPOT(generations=100, random_state=42, verbosity=2)
pipeline_optimizer.fit(training_features, training_classes)
print(pipeline_optimizer.score(training_features, training_classes, testing_features, testing_classes))
pipeline_optimizer.export('tpot_exported_pipeline.py')
```

Once this code finishes running, `tpot_exported_pipeline.py` will contain the Python code for the optimized pipeline.
3 changes: 2 additions & 1 deletion docs/sources/examples/Using_TPOT_via_the_command_line.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The following parameters will display along with their descriptions:

* `-i` / `INPUT_FILE`: The path to the data file to optimize the pipeline on. Make sure that the class column in the file is labeled as "class".
* `-is` / `INPUT_SEPARATOR`: The character used to separate columns in the input file. Commas (,) and tabs (\t) are the most common separators.
* `-o` / `OUTPUT_FILE`: The path to a file that you wish to export the pipeline code into. By default, exporting is disabled.
* `-g` / `GENERATIONS`: The number of generations to run pipeline optimization for. Must be > 0. The more generations you give TPOT to run, the longer it takes, but it's also more likely to find better pipelines.
* `-p` / `POPULATION`: The number of pipelines in the genetic algorithm population. Must be > 0. The more pipelines in the population, the slower TPOT will run, but it's also more likely to find better pipelines.
* `-mr` / `MUTATION_RATE`: The mutation rate for the genetic programming algorithm in the range [0.0, 1.0]. This tells the genetic programming algorithm how many pipelines to apply random changes to every generation. We don't recommend that you tweak this parameter unless you know what you're doing.
Expand All @@ -20,5 +21,5 @@ The following parameters will display along with their descriptions:
An example command-line call to TPOT may look like:

```Shell
tpot -i data/mnist.csv -is , -g 100 -s 42 -v 2
tpot -i data/mnist.csv -is , -o tpot_exported_pipeline.py -g 100 -s 42 -v 2
```
12 changes: 9 additions & 3 deletions docs/sources/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,19 @@ Consider TPOT your **Data Science Assistant**. TPOT is a Python tool that automa

TPOT will automate the most tedious part of Machine Learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

![An example Machine Learning pipeline](https://github.com/rhiever/tpot/blob/master/images/tpot-ml-pipeline.png "An example Machine Learning pipeline")
<center>
<img src="https://raw.githubusercontent.com/rhiever/tpot/master/images/tpot-ml-pipeline.png" width=800 alt="An example Machine Learning pipeline" />

<p align="center"><strong>An example Machine Learning pipeline</strong></p>
<strong>An example Machine Learning pipeline</strong>
</center>

Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.

![An example TPOT pipeline](https://github.com/rhiever/tpot/blob/master/images/tpot-pipeline-example.png "An example TPOT pipeline")
<center>
<img src="https://raw.githubusercontent.com/rhiever/tpot/master/images/tpot-pipeline-example.png" width=800 alt="An example TPOT pipeline" />

<strong>An example TPOT pipeline</strong>
</center>

TPOT is built on top of scikit-learn, so all of the code it generates should look familiar... if you're familiar with scikit-learn, anyway.

Expand Down
Loading