Add `str` and `repr` for components and pipelines #1218

eccabay · 2020-09-23T20:52:38Z

Closes #474

codecov · 2020-09-23T20:54:15Z

Codecov Report

Merging #1218 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1218   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         200      200           
  Lines       12369    12468   +99     
=======================================
+ Hits        12360    12459   +99     
  Misses          9        9

Impacted Files	Coverage Δ
evalml/pipelines/components/component_base.py	`100.00% <100.00%> (ø)`
evalml/pipelines/pipeline_base.py	`100.00% <100.00%> (ø)`
evalml/tests/component_tests/test_components.py	`100.00% <100.00%> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`100.00% <100.00%> (ø)`
evalml/utils/__init__.py	`100.00% <100.00%> (ø)`
evalml/utils/gen_utils.py	`99.09% <100.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9cba15...57040d2. Read the comment docs.

jeremyliweishih

Overall looks great but can you add unit tests that test str and repr (in the same fashion you have for the mock components) for all existing components with @pytest.mark.parametrize("component_class", all_components())?

jeremyliweishih

I think this looks great! Theres just one extra print before merging. I do have one design question though. Would users enjoy more information out of __str__ as well? We specced it out to do self.name but maybe we could include parameters as well. We can think a little more about this and file another issue or just see if we get a feature request for it but not blocking on this issue. Good work!

evalml/tests/component_tests/test_components.py

angela97lin

LGTM! Left two nit-picky comments but that's all 😁

angela97lin · 2020-09-24T17:27:25Z

evalml/pipelines/components/component_base.py

+            if type(value) == str:
+                rpr = rpr + f"{parameter}='{value}',"
+            elif value == float('inf') or value == float('-inf'):
+                rpr = rpr + f"{parameter}=float('{value}'),"
+            else:
+                rpr = rpr + f"{parameter}={value},"


Maybe overkill but could be useful to make a helper function that takes in parameters and returns that portion of the repr, so that it can be shared with the pipeline_base implementation? :o

I love that idea, but unfortunately pipelines need the parameters as a dict, while components need them as parameters (basically just having : or = in there). If you have any ideas for how to work around that I'm open to them!

angela97lin · 2020-09-24T17:29:58Z

evalml/tests/component_tests/test_components.py

    assert enc.describe(return_dict=True) == {'name': 'One Hot Encoder', 'parameters': {'top_n': 10,
                                                                                        'categories': None,
                                                                                        'drop': None,
                                                                                        'handle_unknown': 'ignore',
                                                                                        'handle_missing': 'error'}}
-    drop_col_transformer = DropColumns(columns=['col_one', 'col_two'])
-    assert imputer.describe(return_dict=True) == {'name': 'Simple Imputer', 'parameters': {'impute_strategy': 'mean', 'fill_value': None}}
+    assert imputer.describe(return_dict=True) == {'name': 'Imputer', 'parameters': {'categorical_impute_strategy': "most_frequent",


lol thanks for this! I wonder if there's a way to automate this since it's clear that we missed quite a few of our new components (maybe using all_components()?)

dsherry

@eccabay great work on this! There's some seriously tricky stuff going on here for a feature which looks so simple from the outside. It's really cool we're doing this :)

I left a suggested change on the impl along with some discussion. I also left a couple items on testing. Should be all set after that IMO.

evalml/pipelines/components/component_base.py

dsherry · 2020-09-26T01:13:18Z

evalml/pipelines/components/component_base.py

+            else:
+                rpr = rpr + f"{parameter}={value},"
+        rpr = rpr + ')'
+        return rpr


@eccabay good call looking into float('inf')!! Another one is float('nan'). You've hit on what appears to be a funny bug and/or design flaw with python's float type: repr(float('inf')) comes back as 'inf', which... is just messed up! 🤔 Same for float('nan'). I think they did this so that float(repr(float('inf'))) is equivalent to float('inf'), and same for nan.

The same is true for np.inf and np.nan, by the way:

In [6]: parameters = {'int': 42, 'string': 'string', 'float': 3.14159, 'inf': float('inf'), 'np inf': np.inf, 'np nan': np.nan} In [7]: repr(parameters) Out[7]: "{'int': 42, 'string': 'string', 'float': 3.14159, 'inf': inf, 'np inf': inf, 'np nan': nan}"

So, what do we do? It would be really nice if we could have what comes out of repr(component) be something we could paste into a terminal in order to get an identical instance.

I like the approach you're taking. I played with it for a bit and here's what I got:

# define this in gen_utils so we can use it for the pipeline repr too def safe_repr(value): if isinstance(value, float): if pd.isna(value): return 'np.nan' if np.isinf(value): return f'float({repr(value)})' return repr(value) # then the component repr: def __repr__(self): parameters_repr = ', '.join([f'{key}={safe_repr(value)}' for key, value in self.parameters.items()]) return f'{self.name}({parameters_repr})'

Here's a test I did of the safe_repr, which shows that it outputs the string we need in order to avoid some ugly bugs:

In [1]: [safe_repr(el) for el in ['string', float('nan'), float('-inf'), float('inf'), np.nan, -np.inf, np.inf]] Out[1]: ["'string'", 'np.nan', 'float(-inf)', 'float(inf)', 'np.nan', 'float(-inf)', 'float(inf)']

What do you think of that? Weird stuff!

dsherry · 2020-09-26T01:13:54Z

evalml/pipelines/pipeline_base.py

+                    rpr = rpr + f"'{parameter}': {value}, "
+            rpr = rpr + "}, "
+        rpr = rpr + '})'
+        return rpr


Same comment as for component repr, let's flatten this out, and great thinking trying nan/inf types!

dsherry · 2020-09-26T01:20:48Z

evalml/tests/pipeline_tests/test_pipelines.py

+    assert eval(repr(pipeline_with_parameters)) == pipeline_with_parameters
+
+    pipeline_with_inf_parameters = MockPipeline(parameters={'Imputer': {'numeric_fill_value': float('inf')}})
+    assert eval(repr(pipeline_with_inf_parameters)) == pipeline_with_inf_parameters


This test is pretty cool! Great thinking enforcing this invariant on repr

Two things:

RE our discussion of nan types at the top, it would be ideal if we had a similar test to this one which checks all those edge cases. Example: define a mock component which has a few input parameters, and pass in float('nan')/np.nan and the native/np infs.

There are security concerns with using python's eval. It turns out if you can compromise what goes into eval you can do a lot of bad stuff. To avoid this entirely, my recommendation is to avoid using eval and just check the string output matches what we expect. I know, this is test code so it feels kinda silly, lol--related reading if you're interested.

dsherry

Sweet!! Approved pending a couple small things:

Remove extra curly braces from pipeline repr
Add docstring for safe_repr

evalml/pipelines/components/component_base.py

evalml/utils/gen_utils.py

evalml/tests/pipeline_tests/test_pipelines.py

dsherry · 2020-09-29T03:30:54Z

evalml/pipelines/pipeline_base.py

+            return ', '.join([f"'{key}': {safe_repr(value)}" for key, value in parameters.items()])
+
+        parameters_repr = ' '.join([f"'{component}':{{{repr_component(parameters)}}}," for component, parameters in self.parameters.items()])
+        return f'{(type(self).__name__)}(parameters={{{parameters_repr}}})'


I believe you want double curly braces instead of triple here.

Since the parameters is a dictionary and contains an expression, I do need all three! The first two are for literal curly braces and the third is for the formatting.

Hm interesting, but then the output (taken from the unit test) ends up as

MockPipeline(parameters={{'Imputer':{{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}}, '{final_estimator}':{{'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}},}})

But ideally I think it should be

MockPipeline(parameters={'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'final_estimator': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}})

Do you agree? The second output could be evaluated in the python repl and turned into an object. I think the first would fail.

@eccabay but, I think its fine to merge this and we can circle back. It adds value even if there's a couple details we may wanna discuss further :)

The double curly braces in the unit tests are once again for f-string formatting! If you actually print the repr, you get the second code block you pasted there, and calling eval on the expected_repr produces the correct object.

@eccabay hmm could you paste an example? I don't follow yet. I thought that __repr__ returns a string, which requires no further formatting, and that's the end of it 😂 In other words, I thought we should output the second snippet, because that can be pasted into the python REPL and evaluated, but we're currently outputting the first.

Again, not blocking merge, I'm just trying to make sure we all agree on desired behavior.

@eccabay ok, I think I understand now. So the string I copied was a format string, which is why you need the double curlys there.

I just did a test locally and everything looks great

Realizing the unit tests were using a format string was the key point I was missing, lol. Thanks! 😆

dsherry · 2020-09-29T14:19:39Z

evalml/tests/pipeline_tests/test_pipelines.py

+        component_graph = ['Imputer', final_estimator]
+
+    pipeline = MockPipeline(parameters={})
+    expected_repr = f"MockPipeline(parameters={{'Imputer':{{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}}, '{final_estimator}':{{'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}},}})"


@eccabay I just noticed: can we replace '{final_estimator}': with 'final_estimator':?

Doesn't block merge though IMO! Just a detail we should probably chase down

Unfortunately the way these tests are written, we can't. The string __repr__ prints out uses the name of the estimator, so checking for string equality will fail. If I replace {final_estimator} with final_estimator and try assert eval(repr(pipeline)) == eval(expected_repr), that will pass, but that will reintroduce eval to the code.

The alternative of having __repr__ output final_estimator instead of the name of said estimator feels unnecessarily clunky.

Oh! My bad, I didn't notice this was a format string until just now. Got it, makes sense. Thanks

Wonderful, glad we're on the same page 😅. Does this also clear up our other discussion?

eccabay added 2 commits September 23, 2020 16:48

Add str and repr for components

c70a174

Add str and repr for pipelines

1494acd

Update release notes

c272e98

eccabay marked this pull request as ready for review September 23, 2020 20:57

eccabay requested review from dsherry, angela97lin and jeremyliweishih September 23, 2020 20:57

eccabay self-assigned this Sep 23, 2020

jeremyliweishih requested changes Sep 23, 2020

View reviewed changes

eccabay added 3 commits September 24, 2020 08:37

Add tests for existing component str and repr

a8fca8a

Add describe tests to make linter happy

d74cb20

Add code and tests for supporting inf in repr

8fc0c4e

eccabay requested a review from jeremyliweishih September 24, 2020 13:21

jeremyliweishih approved these changes Sep 24, 2020

View reviewed changes

evalml/tests/component_tests/test_components.py Outdated Show resolved Hide resolved

Remove print statement

17675ef

angela97lin approved these changes Sep 24, 2020

View reviewed changes

eccabay added 2 commits September 25, 2020 09:22

Merge branch 'main' into 474_str_and_repr

a8a6641

Merge branch 'main' into 474_str_and_repr

b537d0a

dsherry suggested changes Sep 26, 2020

View reviewed changes

eccabay added 2 commits September 28, 2020 10:11

Flatten __repr__ functions using safe_repr

418c549

Replace eval() in tests

dac5ad8

eccabay requested a review from dsherry September 28, 2020 15:10

Merge branch 'main' into 474_str_and_repr

f0c4013

dsherry approved these changes Sep 29, 2020

View reviewed changes

eccabay added 2 commits September 29, 2020 08:06

Remove dead code and add docstring for safe_repr

a27f5fc

Merge branch 'main' into 474_str_and_repr

5a576d3

dsherry reviewed Sep 29, 2020

View reviewed changes

eccabay and others added 2 commits September 29, 2020 11:29

Merge branch 'main' into 474_str_and_repr

14eecc3

Lint fix

57040d2

eccabay merged commit 69d7a62 into main Sep 29, 2020

angela97lin mentioned this pull request Sep 29, 2020

Release v0.14.1 #1241

Merged

dsherry mentioned this pull request Oct 8, 2020

Implement pipeline and component str/repr/hash/eq magic methods #465

Closed

eccabay deleted the 474_str_and_repr branch November 2, 2020 16:33

exalate-issue-sync bot mentioned this pull request Feb 14, 2022

Use woodwork public method for _get_subset_schema #3168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `str` and `repr` for components and pipelines #1218

Add `str` and `repr` for components and pipelines #1218

eccabay commented Sep 23, 2020

codecov bot commented Sep 23, 2020 •

edited

Loading

jeremyliweishih left a comment

jeremyliweishih left a comment

angela97lin left a comment

angela97lin Sep 24, 2020

eccabay Sep 25, 2020

angela97lin Sep 24, 2020

dsherry left a comment

dsherry Sep 26, 2020

dsherry Sep 26, 2020

dsherry Sep 26, 2020

dsherry left a comment

dsherry Sep 29, 2020

eccabay Sep 29, 2020

dsherry Sep 29, 2020

dsherry Sep 29, 2020

eccabay Sep 29, 2020

dsherry Sep 29, 2020

dsherry Sep 29, 2020

dsherry Sep 29, 2020

dsherry Sep 29, 2020

eccabay Sep 29, 2020

dsherry Sep 29, 2020

eccabay Sep 29, 2020

Add __str__ and __repr__ for components and pipelines #1218

Add __str__ and __repr__ for components and pipelines #1218

Conversation

eccabay commented Sep 23, 2020

codecov bot commented Sep 23, 2020 • edited Loading

Codecov Report

jeremyliweishih left a comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `str` and `repr` for components and pipelines #1218

Add `str` and `repr` for components and pipelines #1218

codecov bot commented Sep 23, 2020 •

edited

Loading