Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate name for controllable primitives #481

Merged
merged 32 commits into from Apr 9, 2019

Conversation

jeff-hernandez
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Apr 5, 2019

Codecov Report

Merging #481 into master will increase coverage by 0.02%.
The diff coverage is 98.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #481      +/-   ##
==========================================
+ Coverage   96.23%   96.25%   +0.02%     
==========================================
  Files         104      104              
  Lines        8658     8714      +56     
==========================================
+ Hits         8332     8388      +56     
  Misses        326      326
Impacted Files Coverage Δ
...etools/primitives/base/transform_primitive_base.py 100% <100%> (ø) ⬆️
...ools/primitives/base/aggregation_primitive_base.py 100% <100%> (ø) ⬆️
...ools/primitives/standard/aggregation_primitives.py 94.69% <100%> (+0.02%) ⬆️
featuretools/primitives/base/primitive_base.py 100% <100%> (ø) ⬆️
...tools/tests/primitive_tests/test_primitive_base.py 100% <100%> (ø) ⬆️
featuretools/primitives/base/utils.py 86.66% <85.71%> (-1.57%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23dc0b1...215cb75. Read the comment docs.

@kmax12 kmax12 mentioned this pull request Apr 5, 2019
@@ -13,13 +13,16 @@ class AggregationPrimitive(PrimitiveBase):
stack_on_self = True # whether or not it can be in input_types of self
allow_where = True # whether DFS can apply where clause to this primitive

def generate_name(self, base_feature_names, child_entity_id,
parent_entity_id, where_str, use_prev_str):
def generate_name(self, base_feature_names, child_entity_id, parent_entity_id, where_str, use_prev_str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's no reformat the function definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in this commit.

@@ -51,3 +54,28 @@ def get_function(self):

def get_filepath(self, filename):
return os.path.join(config.get("primitive_data_folder"), filename)

def get_args_string(self):
if not isinstance(self.__init__, types.MethodType): # __init__ must be defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it possible for __init__ not to be defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, the __init__ is not defined. I included a test case.

    class Primitive(PrimitiveBase):
        pass

Copy link
Contributor

@kmax12 kmax12 Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I tested in python 3 and it worked without this, but just tested in python 2 and it doesn't so I see what you're saying

what if we just added this to PrimitiveBase?

def __init__(self):
        pass

I think that might be a cleaner solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the test case test_args_string_undefined be deleted?

if not isinstance(self.__init__, types.MethodType): # __init__ must be defined
return ''

v2 = version_info.major == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to use the featuretools.primitives.base.utils.inspect_function_args method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of two functions, could we add an optional argument on inspect_function_args to restrict keywords like time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, it might be better to split into two functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try it with two functions and see how it looks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in this commit.

@jeff-hernandez
Copy link
Contributor Author

Creating primitives using make_agg_primitive and make_trans_primitive will return an empty args_string because arguments are not assigned as attributes in the class and passed into __init__ as **kwargs.

@jeff-hernandez
Copy link
Contributor Author

jeff-hernandez commented Apr 8, 2019

If a primitive argument is stored as a pandas series, then the generated name won't be very readable. Is there a preferred method on working with these types of arguments?

primitive = Primitve(argument=pd.Series(range(4)))
# PRIMITIVE(argument='0    0\n1    1\n2    2\n3    3\ndtype: int64', ...)

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments. Looking good overall

default = is_same_type and arg.default == getattr(self, arg.name)
return is_positional_or_keyword and not default

string = OrderedDict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be an OrderedDict? why not just make it a list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, doesn't need to be. I will make into list.

string[arg.name] = str(getattr(self, arg.name))
if len(string) == 0:
return ''
string = ', ' + ', '.join(map('='.join, string.items()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we break this into two lines to make it easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in this commit.

@kmax12
Copy link
Contributor

kmax12 commented Apr 8, 2019

If a primitive argument is stored as a pandas series, then the generated name won't be very readable. Is there a preferred method on working with these types of arguments?

primitive = Primitve(argument=pd.Series(range(4)))
# PRIMITIVE(argument='0    0\n1    1\n2    2\n3    3\ndtype: int64', ...)

for now, let's not handle the series case specifically. whatever it does by default is fine

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one comment about a comment. Otherwise, ready to merge.

error = '"{}" must be attribute of {}'
assert hasattr(self, arg.name), error.format(arg.name, self.__class__.__name__)

# skip if *args or **kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "skip if not"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The *args and **kwargs would not be considered POSITIONAL_OR_KEYWORD. This comment could work as well.

# skip if not a standard argument (e.g. excluding *args and **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@kmax12 kmax12 changed the title Generate controllable name Automatically generate name for controllable primitives Apr 9, 2019
@kmax12 kmax12 merged commit 40298ad into alteryx:master Apr 9, 2019
@rwedge rwedge mentioned this pull request Apr 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants