AutoNormalize is a Python library for automated datatable normalization. It allows you to build an
EntitySet from a single denormalized table and generate features for machine learning using Featuretools.
pip install featuretools[autonormalize]
pip uninstall autonormalize
- Blog Post
- Machine Learning Demo with Featuretools
- Kaggle Liquor Sales Dataset Demo
- Demo with Editing Dependencies
- Kaggle Food Production Dataset Demo
auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)
Creates a normalized entityset from a dataframe.
df(pd.Dataframe) : the dataframe containing data
accuracy(0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
index(str, optional) : name of column that is intended index of df
name(str, optional) : the name of created EntitySet
time_index(str, optional) : name of time column in the dataframe.
entityset(ft.EntitySet) : created entity set
find_dependencies(df, accuracy=0.98, index=None)
Finds dependencies within dataframe with the DFD search algorithm.
dependencies(Dependencies) : the dependencies found in the data within the contraints provided
Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:
- shortest lenghts
- has "id" in some form in the name of an attribute
- has attribute furthest to left in the table
new_dfs(list[pd.DataFrame]) : list of new dataframes
make_entityset(df, dependencies, name=None, time_index=None)
Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for
normalize_dataframeand a new index will be created if any key has more than a single attribute.
entityset(ft.EntitySet) : created EntitySet
Returns a new normalized
EntitySet from an
EntitySet with a single entity.
es(ft.EntitySet) : EntitySet with a single entity to normalize
new_es(ft.EntitySet) : new normalized EntitySet
AutoNormalize is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.