-
Notifications
You must be signed in to change notification settings - Fork 894
Clean up EntitySet class #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
72f18f0
to
0d1793d
Compare
kmax12
reviewed
Jun 6, 2018
@property | ||
def entities(self): | ||
return list(self.entity_dict.values()) | ||
|
||
@property | ||
def metadata(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the logic here can just be
@property
def metadata(self):
'''Defined as a property because an EntitySet's metadata
is used in many places, for instance, for each feature in a feature list.
To prevent using copying the full metadata object to each feature,
we generate a new metadata object and check if it's the same as the existing one,
and if it is return the existing one. Thus, all features in the feature list
would reference the same object, rather than copies. This saves a lot of memory
'''
if self._metadata is None:
self._metadata = self._gen_metadata()
else:
new_metadata = self._gen_metadata()
# Don't want to keep making new copies of metadata
# Only make a new one if something was changed
if not self._metadata.__eq__(new_metadata):
self._metadata = new_metadata
return self._metadata
also, any reason to use not self._metadata.__eq__(new_metadata)
rather than self._metadata != new_metadata
Looks good to me. Merging |
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes unnecessary code in EntitySet, and removes BaseEntity/BaseEntitySet.
A lot of this is cleaning up duplicated or convoluted functionality. There is now only a single file for both Entity and EntitySet, and I dropped all functionality that is just a thin wrapper around pandas.
One outstanding issue with this PR: creating EntitySets now produces lots of "data type not understood" warnings. This is related to variable type checking within numpy, but I haven't pinpointed the problem.
Entity
convert_variable_types
Old:
Previously, there were several functions to convert variables types, and the underlying pandas dtypes:
entity.py::Entity.convert_variable_types
would take in a desired FT variable types dictionary, and then callself.entityset_convert_variable_type
with each variable.entity.py::Entity.entityset_convert_variable_type
would convert underlying variable database_entity.py::BaseEntity.convert_variable_type
converts both the FT type and the underlying data, callingEntity.entityset_convert_variable_type
to do the data conversionNew: Similar to old but in the same file, with better names
Entity.convert_all_variable_data
takes in desired FT variable types and callsEntity.convert_variable_data
Entity.convert_variable_data
takes in a variable and desired type, and does underlying data conversionEntity.convert_variable_type
converts a variable to a new type, and optionally callsconvert_variable_data
to change the underlying type.Indexes & sorting
Old:
Entity.set_index
sets the index on the underlying data, asserts uniqueness, callsconvert_variable_type
to convert the variable (but not the data) to an Index, and then callsBaseEntity.set_index
BaseEntity.set_index
sets theindex
attributeEntity.set_time_index
does type checking, sorts the data, and then callsBaseEntity.set_time_index
BaseEntity.set_time_index
sets thetime_index
attributeEntity.add_all_variable_statistics
needs to be called explicitlyNew:
Entity.update_data
makes sure the data is sorted properly. This method is used whenever the underlying data is replaced by a new dataframeindex
andtime_index
attributes are first set in the__init__
methodEntity.update_data
is called from__init__
, which sets the underlying dataframe as an attribute, and callsset_index(self.index)
,set_time_index(self.time_index)
, andadd_all_variable_statistics
Entity.set_index
sets the index on the underlying data, asserts uniqueness, callsconvert_variable_type
to convert the variable (but not the data) to an Index, and sets the` index attributeEntity.set_time_index
does type checking, sorts the data, and then sets thetime_index
attributeEntity.update_data
can accept either a dataframe, or the whole "data" dictionary which includes the dataframe, indexed_by, and last_time_index. It includes optional parameters to resort, reindex, and redo last_time_index. It reindexes through a new methodEntity.index_data
Adding and removing variables
Old:
BaseEntity.add_variable
adds an FT variable to the variable list, and callsadd_variable_statistics
BaseEntity.delete_variable
removes an FT variable from the variable listEntity.add_column
adds column data to the dataframe, optionally infers the variable type, adds an FT variable to thevariable_types
dictionary but not to the variable list. I don't actually know if this works, becausevariable_types
is a property that just converts the variable list into a dictionary. It does not callBaseEntity.add_variable
Entity.delete_column
deletes a column from the dataframe and from thevariable_types
property. Again, not sure how this works.New:
Entity.delete_variable
deletes a variable both from the variables list and from the dataframeEntity.add_variable
adds an FT variable to the variable list (optionally inferring the type), adds data to the underlying dataframe, and callsadd_variable_statistics
Miscellaneous methods/properties changed/deleted
name
is removedshow_instance
is removedget_shape
is removed (this was a duplicate ofshape
)get_column_stat
is removedhas_time_index
is removednum_instances
is removedis_index_column
is removedget_column_type
is removedget_column_max
/min
/etc are all removedget_all_instances
is removedget_top_n_instances
is removedsample_instances
is removedget_sliced_instance_ids
is removedget_column_data
is removedget_sample
is renamed tosample
EntitySet
Methods that were just wrappers around the Entity, or around pandas, were removed. Other methods that didn't seem to be useful were removed as well:
get_sample
is removedget_instance_data
is removednum_instances
is removedget_all_instances
is removedget_top_n_instances
is removedsample_instances
is removedget_sliced_instance_ids
is removedget_dataframe
is removedget_column_names
is removedget_index
is removedget_time_index
is removedget_secondary_time_index
is removedget_column_X
are all removedget_variable_types
is removedadd/delete_column
are both removedstore_convert_variable_type
is removed_related_instances
is now public asrelated_instances
get_name
is removed (and correspondingly removed from PrimitiveBase.get_name, and other primitives)delete_entity_variables
is removedmake_index_variable_name
is a top level function outside of the EntitySet classadd_entity
is removed, andimport_from_dataframe
is now in change of adding the entity toself.entity_stores
, whose name is changed toself.entity_dict
index_by_variable
is now private_index_by_variable