Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primitive.get_filepath and example of primitive loading data from external files #380

Merged
merged 16 commits into from Jan 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST.in
Expand Up @@ -2,5 +2,6 @@ include *.txt
include LICENSE
include README.md
include featuretools/tests/primitive_tests/primitives_to_install.tar.gz
include featuretools/primitives/data/featuretools_unit_test_example.csv
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
30 changes: 29 additions & 1 deletion docs/source/guides/advanced_custom_primitives.rst
Expand Up @@ -8,7 +8,7 @@ Functions With Additional Arguments

import featuretools as ft
from featuretools.primitives import make_trans_primitive
from featuretools.variable_types import Text, Numeric
from featuretools.variable_types import Text, Numeric, Categorical

One caveat with the make\_primitive functions is that the required arguments of ``function`` must be input features. Here we create a function for ``StringCount``, a primitive which counts the number of occurrences of a string in a ``Text`` input. Since ``string`` is not a feature, it needs to be a keyword argument to ``string_count``.

Expand Down Expand Up @@ -54,3 +54,31 @@ Passing in ``string="test"`` as a keyword argument when initializing the `String
feature_matrix.columns
feature_matrix[['STD(log.STRING_COUNT(comments, "the"))', 'SUM(log.STRING_COUNT(comments, "the"))', 'MEAN(log.STRING_COUNT(comments, "the"))']]

.. Primitives That Use External Data Files
.. =======================================
.. Some primitives require external data files in order to perform their computation. For example, imagine a primitive that uses a pre-trained sentiment classifier to classify text. Here is how that would be implemented

.. .. ipython:: python

.. from featuretools.primitives import TransformPrimitive

.. class Sentiment(TransformPrimitive):
.. '''Reads in a text field and returns "negative", "neutral", or "positive"'''
.. name = "sentiment"
.. input_types = [Text]
.. return_type = Categorical
.. def get_function(self):
.. filepath = self.get_filepath('sentiment_model.pickle') # returns absolute path to the file
.. import pickle
.. with open(filepath, 'r') as f:
.. model = pickle.load(f)
.. def predict(x):
.. return model.predict(x)
.. return predict


.. The ``get_filepath`` method is used to find the location of the trained model.

.. .. note::

.. The primitive loads the model within the `get_function` method, but outside of the `score` function. This way the model is loaded from disk only once when the Featuretools backend requests the primitive function instead of every time `score` is called.
7 changes: 7 additions & 0 deletions featuretools/primitives/base/primitive_base.py
@@ -1,5 +1,7 @@
from __future__ import absolute_import

import os

import numpy as np


Expand Down Expand Up @@ -35,3 +37,8 @@ def generate_name(self):

def get_function(self):
raise NotImplementedError("Subclass must implement")

def get_filepath(self, filename):
PWD = os.path.dirname(__file__)
path = os.path.join(PWD, "../data", filename)
return path
@@ -0,0 +1,4 @@
0
1
2
3
40 changes: 39 additions & 1 deletion featuretools/tests/primitive_tests/test_transform_features.py
@@ -1,5 +1,4 @@
# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd
import pytest
Expand Down Expand Up @@ -50,6 +49,7 @@
SubtractNumeric,
SubtractNumericScalar,
Sum,
TransformPrimitive,
Year,
get_transform_primitives
)
Expand Down Expand Up @@ -1055,3 +1055,41 @@ def gen_feat_names(self):
def test_feature_names_inherit_from_make_trans_primitive():
# R TODO
pass


def test_get_filepath(es):
class Mod4(TransformPrimitive):
'''Return base feature modulo 4'''
name = "mod4"
input_types = [Numeric]
return_type = Numeric

def get_function(self):
filepath = self.get_filepath("featuretools_unit_test_example.csv")
reference = pd.read_csv(filepath, header=None, squeeze=True)

def map_to_word(x):
def _map(x):
if pd.isnull(x):
return x
return reference[int(x) % 4]
return pd.Series(x).apply(_map)
return map_to_word

feat = ft.Feature(es['log']['value'], primitive=Mod4)
df = ft.calculate_feature_matrix(features=[feat],
entityset=es,
instance_ids=range(17))

assert pd.isnull(df["MOD4(value)"][15])
assert df["MOD4(value)"][0] == 0
assert df["MOD4(value)"][14] == 2

fm, fl = ft.dfs(entityset=es,
target_entity="log",
agg_primitives=[],
trans_primitives=[Mod4])

assert fm["MOD4(value)"][0] == 0
assert fm["MOD4(value)"][14] == 2
assert pd.isnull(fm["MOD4(value)"][15])