Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow installation of additional primitives #326

Merged
merged 51 commits into from
Dec 11, 2018
Merged

Conversation

kmax12
Copy link
Contributor

@kmax12 kmax12 commented Nov 25, 2018

This PR introduces a method to install additional primitives into an existing Featuretools installation.

Installation from command line

$ featuretools install s3://featuretools-static/primitives_to_install.tar.gz
Install primitives: CustomSum, CustomMean, CustomMax? (Y/n) Y
100%|███████████████| 3/3 [00:00<00:00, 1187.63it/s]

Now a user can access the primitives either by importing or by string

import featuretools as ft
from featuretools.primitives import CustomSum

es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="customers",
                                      agg_primitives=[CustomSum, "custommax"],
                                      trans_primitives=[],
                                      max_depth=2)

This returns

            zip_code  CUSTOMSUM(transactions.amount)                         ...                          CUSTOMMAX(sessions.CUSTOMSUM(transactions.amount))  CUSTOMMAX(sessions.CUSTOMMAX(transactions.amount))
customer_id                                                                  ...
1              60091                         9025.62                         ...                                                                    1613.93                                              139.43
2              13244                         7200.28                         ...                                                                    1320.64                                              146.81
3              13244                         6236.62                         ...                                                                    1477.97                                              149.15
4              60091                         8727.68                         ...                                                                    1351.46                                              149.95
5              60091                         6349.66                         ...                                                                    1700.67                                              149.02

[5 rows x 7 columns]

How it works

The installation command is provided a directory or tar archive of primitive source files. If it is an archive, the installation script extracts it to directory. The archive can also be remote as in the example above, in which case it is downloaded to a temporary directory first. We use smart_open to handle the downloading, so it supports downloading primitives from S3, HDFS, and HTTP/HTTPS.

Primitives are installed by copying their source files into a new directory installed within the featuretools.primitives submodule. The installation script only considers files with a .py extension. The installation script detects if a primitive is in the file by looking for an object that is a subclass of PrimitiveBase. Each of these files must only exactly one primitive, otherwise the installation will through an error. When featuretools.primitives is a loaded, primitives source files in installed/ are automatically imported.

To support the CLI, there is new configuration for entry_points in the setup.py based on the instruction here: https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/

Users can also install from a python script using ft.install_primitives(...)

import featuretools as ft
ft.install_primitives("s3://featuretools-static/primitives_to_install.tar.gz")
Install primitives: CustomSum, CustomMean, CustomMax? (Y/n) Y
100%|███████████████| 3/3 [00:00<00:00, 930.28it/s]

In this PR I also updated the structure of featuretools.primitives to more organized. It doesn't change any of the external API, but there is a now

  • featuretools.primitives.standard - all the primitives that come with featuretools
  • featuretools.primitives.installed - all the primitives that are installed into featuretools
  • featuretools.primitives.base - the base classes used by primitives e.g PrimitiveBase, AggregationPrimitive, etc

Finally, I also removed our usage of tox. It wasn't providing any useful functionality after we separate each version into separate circle ci jobs

TODO before ready for review

  • Finalize the submodule structure

Future Development

  • Add more details to documentation
  • Host some new downloadable primitives on S3 for users to access
  • Support uninstalling primitives
  • Improve visual display during installation process
  • Support installing dependencies of primitives
  • Handle trying to reinstall a same primitive or another primitive with same name
  • Give primitives more annotations that point to the author of the primitive, the license, recommended use cases, etc
  • Provide scaffolding for easily testing primitives
  • Discuss if builtin primitives should also have the restriction of one primitive per file

@kmax12 kmax12 changed the base branch from clean-primitive-tests to master November 27, 2018 00:24
@kmax12 kmax12 changed the title (WIP) Allow installation of additional primitives Allow installation of additional primitives Nov 27, 2018
@@ -1,14 +0,0 @@
[tox]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove tox from dev-requirements.txt as well

s3 = s3fs.S3FileSystem(anon=True)
remote_archive = s3.open(uri, 'rb')

f.write(remote_archive.read())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reading the whole archive should we read it line by line or some number of bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it matters, does it? at least for now, I don't expect these archives to be too big

featuretools/primitives/install.py Outdated Show resolved Hide resolved
featuretools/primitives/install.py Show resolved Hide resolved
@kmax12 kmax12 merged commit e13e8a6 into master Dec 11, 2018
@rwedge rwedge mentioned this pull request Dec 17, 2018
@rwedge rwedge deleted the install-primitives branch July 2, 2019 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants