Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added string representation to AutoBase #675

Merged
merged 10 commits into from May 4, 2020
Merged

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Apr 17, 2020

Before search is ran:
image

After search is ran:

image

Resolves #481

@christopherbunn christopherbunn changed the title Added string representation to AutoBase [WIP] Added string representation to AutoBase Apr 17, 2020
@codecov
Copy link

codecov bot commented Apr 17, 2020

Codecov Report

Merging #675 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #675      +/-   ##
==========================================
+ Coverage   99.29%   99.32%   +0.02%     
==========================================
  Files         140      140              
  Lines        4981     5018      +37     
==========================================
+ Hits         4946     4984      +38     
+ Misses         35       34       -1     
Impacted Files Coverage Δ
evalml/automl/auto_base.py 97.50% <100.00%> (+0.44%) ⬆️
evalml/tests/automl_tests/test_autobase.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 610e168...bcbf817. Read the comment docs.

@christopherbunn
Copy link
Contributor Author

christopherbunn commented Apr 17, 2020

Notes:

  • I decided to add all parameters, even if they are not changed by the user.
  • The search results table drops the parameters column. Due to it's length, keeping this column would cause each row to look weird.
  • Not all of the parameters available were included in the test.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looking good! Some comments and needs more test coverage. Maybe removing the uncalled function will do the trick.

@@ -34,6 +34,7 @@ Changelog
* Changed requirements-parser to be a core dependency :pr:`673`
* Replace `supported_problem_types` field on pipelines with `problem_type` attribute on base classes :pr:`678`
* Update `ModelFamily` values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them :pr:`677`
* Added __repr__ for AutoSearch object :pr:`675`
Copy link
Contributor

@jeremyliweishih jeremyliweishih Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__str__ 😄

@@ -100,6 +100,63 @@ def __init__(self, problem_type, tuner, cv, objective, max_pipelines, max_time,
logger.log("Warning: unable to import plotly; skipping pipeline search plotting\n")
self.plot = None

def __str__(self):
def _pipeline_names():
Copy link
Contributor

@jeremyliweishih jeremyliweishih Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of confused how this is different from _allowed_model_families() or how they are named. Is this for pipelines? or is _allowed_model_families

Copy link
Contributor

@jeremyliweishih jeremyliweishih Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually seems like this isn't used.

all_objectives += '\n\t'
return all_objectives[:-2]

def _allowed_model_families():
Copy link
Contributor

@jeremyliweishih jeremyliweishih Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be Possible Pipelines? Same thing in searc_desc

@christopherbunn christopherbunn force-pushed the 481_str_rep_search branch 3 times, most recently from 56565df to e4cc0e0 Compare Apr 29, 2020
all_objectives += objective.name
all_objectives += '\n\t'
return all_objectives[:-2]

Copy link
Contributor

@angela97lin angela97lin May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... I think this could be simplified? To:

 def _obj_names():
        all_objectives = ""
        for objective in self.additional_objectives:
            all_objectives += "\n\t" + objective.name
        return all_objectives

Or I think using .join could also work here (with some check for last element)?

all_objectives += '\n\t'
return all_objectives[:-2]

def _possible_pipelines():
Copy link
Contributor

@angela97lin angela97lin May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for _obj_names :D


automl.search(X, y, raise_errors=False)
str_rep = str(automl)
# str_rep = str_rep.lower()
Copy link
Contributor

@angela97lin angela97lin May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comment here :o?

Copy link
Contributor

@angela97lin angela97lin left a comment

Nice! I just left a comment about simplifying some functions, but otherwise LGTM :)

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

LGTM if you address Angela's comments!

@christopherbunn christopherbunn changed the title [WIP] Added string representation to AutoBase Added string representation to AutoBase May 4, 2020
@christopherbunn christopherbunn merged commit c75417d into master May 4, 2020
2 checks passed
f"Detect Label Leakage: {self.detect_label_leakage}\n"
f"Start Iteration Callback: {_get_funct_name(self.start_iteration_callback)}\n"
f"Add Result Callback: {_get_funct_name(self.add_result_callback)}\n"
f"Additional Objectives: {_print_list(self.additional_objectives)}\n"
Copy link
Collaborator

@dsherry dsherry May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this print out?

Copy link
Contributor Author

@christopherbunn christopherbunn May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print out all of the additional objectives to score on in an indented list format. Here's an example:
image

f"Objective: {self.objective.name}\n"
f"Max Time: {self.max_time}\n"
f"Max Pipelines: {self.max_pipelines}\n"
f"Possible Pipelines: {_print_list(self.possible_pipelines)}\n"
Copy link
Collaborator

@dsherry dsherry May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this print out?

Copy link
Contributor Author

@christopherbunn christopherbunn May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the _print_list function will go through any iterable object with a name attribute and print out a list of their names.

This call will print out all of the possible pipelines in an indented list format. Here's an example:
image

try:
rankings_str = self.rankings.drop(['parameters'], axis='columns').to_string()
rankings_desc = f"\nSearch Results: \n{'='*20}\n{rankings_str}"
except KeyError:
Copy link
Collaborator

@dsherry dsherry May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn when would this happen? And is there any way to restructure this code to avoid having a try/except? Its nice to avoid these whenever possible because they make the flow harder to follow.

Could we do something like this:

rankings = self.rankings
if 'parameters' not in rankings.columns:
    return search_desc
return search_desc + "\nSearch Results: \n{'='*20}\n{rankings_str}".format(rankings['parameters'])

Copy link
Contributor Author

@christopherbunn christopherbunn May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry The goal of this snippet was to only include the rankings after the search process has been ran. The reason I had the try-catch is that rankings is a @property and this KeyError is what I get when I try to call automl.rankings before any searches is ran. Is there a better way to check if the search process has ran yet?

image

Copy link
Collaborator

@dsherry dsherry May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn nice find. Could you please file this as a bug? :)

I suppose we could add a @property which is something like AutoBase.has_run. That way we could have whatever implementation we want for that method. Not critical though.



@patch('evalml.pipelines.BinaryClassificationPipeline.fit')
def test_automl_str_search(mock_fit, X_y):
Copy link
Collaborator

@dsherry dsherry May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn are there any edge cases which this test doesn't currently cover? For instance, is there any way a None value could be provided as the input to one of the fields you apply your _print_list and _get_funct_name helpers to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add __str__ for Auto Search objects
4 participants