New TPOT features #498

kuratsak · 2017-06-15T12:18:30Z

What does this PR do?

Implements 3 cool new features:

A new optional argument that saves the best pipeline so far after each generation. Run TPOT -> use pipelines without stopping it's optimization process.
TPOT now saves the score that mattered for every optimized pipeline it outputs
personal/manual scoring function now supported in the main driver too.

Where should the reviewer start?

Relatively small PR, should be straightforward.. just look at the full diff 💃

How should this PR be tested?

create a local_pipeline_folder
run TPOT and add -psp="./local_pipeline_folder/"
after 2-3 generations TPOT will output at least one pipeline into the folder

Any background context you want to provide?

Extremely useful features we used in our research projects!

What are the relevant issues?

Screenshots (if appropriate)

Questions:

Do the docs need to be updated? Nope
Does this PR add new (Python) dependencies? Nope

…generation

…function"

coveralls · 2017-06-15T12:22:58Z

Coverage decreased (-2.5%) to 85.102% when pulling ab809ad on kuratsak:tpot_features into f89d2de on rhiever:development.

kuratsak · 2017-06-15T12:26:43Z

Hi Approvers!

This is my second attempt after splitting the pull request into a bug fix one that was approved already.

I am finishing a job and starting another, So I won't be able to fix code until at least July probably..
I have thoroughly tested and used these features in my research in the current job, I really hope we can approve this.

I am available until sunday for any fixes to be made in this pull request, otherwise it will have to wait and I really hope we don't have to.

coveralls · 2017-06-15T12:43:42Z

Coverage increased (+9.8%) to 97.436% when pulling 2603815 on kuratsak:tpot_features into f89d2de on rhiever:development.

weixuanfu

Nice functions!

weixuanfu · 2017-06-15T13:24:46Z

tpot/driver.py

+            print('taken from module: {}'.format(module_name))
+        except Exception as e:
+            print('failed importing custom scoring function, error: {}'.format(str(e)))
+            raise


Need raise ValueError Here

I guess the exception string itself is enough at this point
Done, I didn't want to lose the traceback inside. is there a trick for this?

coveralls · 2017-06-15T15:09:57Z

Coverage decreased (-1.4%) to 86.204% when pulling 4d885dc on kuratsak:tpot_features into f89d2de on rhiever:development.

rhiever · 2017-06-15T17:39:35Z

tpot/base.py

    def __init__(self, generations=100, population_size=100, offspring_size=None,
                 mutation_rate=0.9, crossover_rate=0.1,
                 scoring=None, cv=5, subsample=1.0, n_jobs=1,
                 max_time_mins=None, max_eval_time_mins=5,
                 random_state=None, config_dict=None, warm_start=False,
-                 verbosity=0, disable_update_check=False):
+                 verbosity=0, disable_update_check=False, periodic_save_path=None):


Please move periodic_save_path above verbosity and disable_update_check (both here and the docstring).

rhiever · 2017-06-15T17:42:02Z

tpot/export_utils.py

+    if pscore is not None:
+        pipeline_text += '\n# Score on the training set was:{}\n'.format(pscore)
+    else:
+        pipeline_text += '\n'


Please refactor to:

if pscore is not None: pipeline_text += '\n# Score on the training set was: {}'.format(pscore) pipeline_text += '\n'

Please also rename pscore to pipeline_score for a more descriptive variable name.

rhiever · 2017-06-15T17:49:40Z

tpot/base.py

+    def _save_periodic_pipeline(self):
+        if self.periodic_save_path is not None:
+            try:
+                write = self._pbar.write if not self._pbar.disable else print


Please give a more descriptive variable name for write.

Done 👍
renamed to print_func since it does what print does

rhiever · 2017-06-15T17:50:34Z

tpot/base.py

+                if self.verbosity >= 2:
+                    write('failed saving periodic pipeline, exception:\n{}'.format(str(e)[:250]))
+
+    def export(self, output_file_name, skip_if_repeated=False):


Please document skip_if_repeated in the docstrings.

rhiever · 2017-06-15T17:51:54Z

tpot/base.py

+
+        #dont export a pipeline you just had
+        if skip_if_repeated and (self._exported_pipeline_text == to_write):
+            return False


Please document the meaning of the True/False return values for the export function in the docstrings.

rhiever · 2017-06-15T17:53:51Z

tpot/gp_deap.py

@@ -164,6 +165,9 @@ def eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar,

    # Begin the generational process
    for gen in range(1, ngen + 1):
+        # after each population save a periodic pipeline
+        if periodic_pipeline_saver is not None:
+            periodic_pipeline_saver()


Noting that this addition is related to #79.

rhiever · 2017-06-15T17:54:31Z

tpot/gp_deap.py

@@ -103,7 +103,8 @@ def varOr(population, toolbox, lambda_, cxpb, mutpb):


 def eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar,
-                   stats=None, halloffame=None, verbose=0, max_time_mins=None):
+                   stats=None, halloffame=None, verbose=0, max_time_mins=None,
+                   periodic_pipeline_saver=None):


Please refactor periodic_pipeline_saver to periodic_pipeline_saving_fn or a variable name that is more descriptive. Please also document it in the docstrings.

Great idea,
in fact in eaMuPlusLambda context this is a general parameter "per generation function"
renaming to per_generation_function to be more explicit in its role.

This can later be used for more extended logic too.

rhiever · 2017-06-15T17:57:34Z

tpot/base.py

@@ -702,7 +719,33 @@ def set_params(self, **params):

        return self

-    def export(self, output_file_name):
+    def save_pipeline_if_period(self):


save_pipeline_if_period should also be a"private" function.

rhiever · 2017-06-15T17:58:15Z

tpot/base.py

@@ -598,6 +614,7 @@ def _update_top_pipeline(self):
                if pipeline_scores.wvalues[1] > top_score:
                    self._optimized_pipeline = pipeline
                    top_score = pipeline_scores.wvalues[1]


top_score variable isn't needed any more if the same value is being stored in self._optimized_pipeline_score.

removed top_score variable

rhiever · 2017-06-15T17:59:09Z

tpot/base.py

@@ -185,6 +189,13 @@ def __init__(self, generations=100, population_size=100, offspring_size=None,
            A setting of 2 or higher will add a progress bar during the optimization procedure.
        disable_update_check: bool, optional (default: False)
            Flag indicating whether the TPOT version checker should be disabled.
+        periodic_save_path: path string (default: None)


Please refactor the periodic_save_path name to periodic_checkpoint_path. I believe that's a more descriptive term for what it's doing.

changed to periodic_checkpoint_folder to be explicit in its role. Thanks!

rhiever · 2017-06-15T18:00:05Z

tpot/base.py

@@ -79,12 +80,15 @@ def handler(dwCtrlType, hook_sigint=_thread.interrupt_main):
 class TPOTBase(BaseEstimator):
    """Automatically creates and optimizes machine learning pipelines using GP."""

+    # dont save periodic pipelines more often than this
+    OUTPUT_BEST_PIPELINE_PERIOD_SECONDS = 30


Should this be a TPOT parameter? Thoughts @weixuanfu2016 and @teaearlgraycold? I suppose I'm fine with a hard-coded 30 seconds, but I can see a user down the line wanting to tweak this value.

it's initial target is not to be user configurable - but to prevent too many calls to export. in any case the limit is per generation.

if we one day allow the user to save pipeline after each single evaluation for example, then we might want this (or even might NOT want this).

In any case IMHO it seems a bit early to let this be a parameter.

rhiever · 2017-06-15T18:01:11Z

tests.py

+"""
+
+    assert_equal(expected_code, export_pipeline(pipeline, tpot_obj.operators, tpot_obj._pset, pscore=0.929813743))
+


I think these tests look good, but they need to be rebased on the most recent version of the dev branch. @teaearlgraycold reorganized tests.py into a tests directory. Less clutter now.

rhiever · 2017-06-15T18:03:02Z

Overall, I'm 👍 on these changes, pending the rebase and requested changes.

Beyond the several requests for adding to the docstrings, please also update the API documentation and other documentation where relevant.

I'm also concerned that the coverage decreased by 1.4%.

Thank you for the PR, @kuratsak.

rhiever · 2017-06-15T18:06:18Z

tpot/driver.py

+            print('failed importing custom scoring function, error: {}'.format(str(e)))
+            raise ValueError(e)
+
+    return scoring_func


Will this function work if I pass a sklearn metric, e.g., sklearn.metrics.auc?

added your function in the test, works out of the box 👍

kuratsak · 2017-06-15T18:34:12Z

Yea, I added some ifs on verbosity that another comment requested. Rebase? I based it on the development branch. Is that not the most updated one?

…

On Thu, 15 Jun 2017, 21:31 Randy Olson, ***@***.***> wrote: Overall, I'm 👍 on these changes, pending the rebase and requested changes. Beyond the several requests for adding to the docstrings, please also update the API documentation <http://rhiever.github.io/tpot/api/> and other documentation where relevant. Any ideas why the coverage decreased by 1.4%? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#498 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMzW7HLeMFFWHlseEVgDAdoFFDZVTPfnks5sEXHcgaJpZM4N7Ikm> .

rhiever · 2017-06-15T20:11:44Z

We merged some PRs today that re-organized the unit tests. I think it's just a matter of moving the unit tests into the appropriate files in the tests directory.

kuratsak · 2017-06-15T20:49:45Z

Is there a branch? I'll merge on Sunday, it's my final work day so last chance to finish this PR. I will try to address all the comments then..

…

On Thu, 15 Jun 2017, 23:14 Randy Olson, ***@***.***> wrote: We merged some PRs today that re-organized the unit tests. I think it's just a matter of moving the unit tests into the appropriate files in the tests directory. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#498 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMzW7LZw7zagiCrviFUdk7pOfLhYSqz0ks5sEZAFgaJpZM4N7Ikm> .

rhiever · 2017-06-15T20:56:39Z

It's merged into the dev branch now.

Conflicts: tests/tpot_tests.py tpot/base.py tpot/driver.py

kuratsak · 2017-06-18T07:02:08Z

Done, fixed all comments and remerged development into my branch. Sadly this is my last day at work, so next time I can fix anything would be in about a month (mid-july). I fixed all your comments as you asked, I'm hoping we can merge this now.

…

On Thu, Jun 15, 2017 at 11:56 PM, Randy Olson ***@***.***> wrote: It's merged into the dev branch now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#498 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMzW7H7AyKmJ1yn7yBYCfWVzy_NHjUBCks5sEZqNgaJpZM4N7Ikm> .

-- Cheers, Dani K.

kuratsak · 2017-06-18T07:44:48Z

Can someone please check and rerun travis build? it seems the build failed because of external sources of trouble..

both py36 builds failed with errors related to:
Error: HTTPError: 522 against the url https://repo.continuum.io/pkgs/....

I checked of course locally: both py3 and py2 tests pass

Update:
pushed documentation changes, tests passed.
As I suspected the problem was external.

Everything looks good now 🙌

coveralls · 2017-06-18T09:44:22Z

Coverage increased (+6.5%) to 94.81% when pulling 81045ab on kuratsak:tpot_features into 2b0c29c on rhiever:development.

kuratsak · 2017-06-18T10:23:18Z

Yey! Even on the coverage front I look good!

…

On Sun, Jun 18, 2017 at 12:44 PM, Coveralls ***@***.***> wrote: [image: Coverage Status] <https://:/builds/12019579> Coverage increased (+6.5%) to 94.81% when pulling *81045ab <81045ab> on kuratsak:tpot_features* into *2b0c29c <2b0c29c> on rhiever:development*. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#498 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMzW7PQm8VDoocIX-JW78M1CLgZkGqtyks5sFPF6gaJpZM4N7Ikm> .

-- Cheers, Dani K.

kuratsak · 2017-06-18T12:58:47Z

I am being taken away from my working computer, they are saying im fired and i cant touch it anymore 🥇

rhiever · 2017-06-19T18:34:09Z

Thanks @kuratsak. Will look through this soon.

rhiever · 2017-06-21T17:53:17Z

LGTM. Thanks @kuratsak!

DanKoretsky added 3 commits June 15, 2017 14:45

new optional argument: path for saving optimized pipeline after each …

1ab3f39

…generation

saving score to exported pipeline file

3f4777f

added support for manual scoring function in main driver "mymodule.my…

ab809ad

…function"

adding tests for driver manual scoring function

2603815

weixuanfu reviewed Jun 15, 2017

View reviewed changes

CR corrections

4d885dc

rhiever added the enhancement label Jun 15, 2017

rhiever reviewed Jun 15, 2017

View reviewed changes

EpistasisLab deleted a comment from weixuanfu Jun 15, 2017

EpistasisLab deleted a comment from kuratsak Jun 15, 2017

rhiever reviewed Jun 15, 2017

View reviewed changes

DanKoretsky added 2 commits June 18, 2017 09:22

Merge branch 'development' into tpot_features

592185a

Conflicts: tests/tpot_tests.py tpot/base.py tpot/driver.py

CR fixes

d9163c2

documentation updates

81045ab

rhiever merged commit 7f2b6a7 into EpistasisLab:development Jun 21, 2017

		"""

		assert_equal(expected_code, export_pipeline(pipeline, tpot_obj.operators, tpot_obj._pset, pscore=0.929813743))

New TPOT features #498

New TPOT features #498

Conversation

kuratsak commented Jun 15, 2017 • edited Loading

What does this PR do?

Where should the reviewer start?

How should this PR be tested?

Any background context you want to provide?

What are the relevant issues?

Screenshots (if appropriate)

Questions:

coveralls commented Jun 15, 2017

kuratsak commented Jun 15, 2017

coveralls commented Jun 15, 2017

weixuanfu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 15, 2017

rhiever Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuratsak Jun 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhiever Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

kuratsak Jun 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuratsak Jun 18, 2017 • edited Loading

Choose a reason for hiding this comment

rhiever Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhiever commented Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuratsak commented Jun 15, 2017 via email

rhiever commented Jun 15, 2017

kuratsak commented Jun 15, 2017 via email

rhiever commented Jun 15, 2017

kuratsak commented Jun 18, 2017 via email

kuratsak commented Jun 18, 2017 • edited Loading

coveralls commented Jun 18, 2017

kuratsak commented Jun 18, 2017 via email

kuratsak commented Jun 18, 2017

rhiever commented Jun 19, 2017

rhiever commented Jun 21, 2017

kuratsak commented Jun 15, 2017 •

edited

Loading

rhiever Jun 15, 2017 •

edited

Loading

kuratsak Jun 18, 2017 •

edited

Loading

rhiever Jun 15, 2017 •

edited

Loading

kuratsak Jun 18, 2017 •

edited

Loading

kuratsak Jun 18, 2017 •

edited

Loading

rhiever Jun 15, 2017 •

edited

Loading

rhiever commented Jun 15, 2017 •

edited

Loading

kuratsak commented Jun 18, 2017 •

edited

Loading