Custom TransformerMixin #725

ChandraLingam · 2019-04-22T20:40:08Z

amazon-sagemaker-examples/sagemaker-python-sdk/scikit_learn_inference_pipeline/

In the abalone example, sklearn build-in transformers/encoders are used. How do we integrate a custom transfomer in the SageMaker Pipeline?

I want to add new features that are computed based on other features. When I include the below class as part of the pipeline, transform job fails with an error:
AttributeError: module 'main' has no attribute 'AddNewFeatures'

What is recommended approach for this?

from sklearn.base import TransformerMixin
class AddNewFeatures(TransformerMixin):
    def __init__(self, *featurizers):
        self.featurizers = featurizers

    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        #Do transformations
        #print(type(X))
        ....
        return X

The text was updated successfully, but these errors were encountered:

tonybaby16 · 2019-05-01T14:30:59Z

I am facing the same issue.
Details:
Following the example - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

The AddNewFeatures(equivalent class in my case) class is created inside script_path = 'sklearn_abalone_featurizer.py'.
Scikit Estimator gets created successfully.
Batch transform our training data step is failing with error - sagemaker_containers._errors.ClientError: module 'main' has no attribute 'AddNewFeatures'

My guess is that error is getting thrown from within in mode_fn(in sklearn_abalone_featurizer.py)
at step preprocessor = joblib.load(os.path.join(model_dir, "model.joblib"))

tonybaby16 · 2019-05-01T23:05:48Z

Update:

This seems to work.

ChandraLingam · 2019-05-01T23:20:07Z

After several hours of trying (including source_dir), the option that finally worked for me was:
dependencies parameter in SKLearn

script_path = 'myscript.py'

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    train_instance_type="ml.c4.xlarge",
    sagemaker_session=sagemaker_session,
    dependencies=['AddNewFeatures.py'])

wiltonwu · 2019-05-20T22:41:31Z

Hi,

I apologize for the delay in response. You are exactly correct, the suggested approach is to either bring in the file through the dependencies parameter or put the file into a new directory and add that directory using the source_dir parameter. I'm going to close this issue as it has been resolved. Please reopen and comment if necessary!

DanyalAndriano · 2019-09-30T00:43:31Z

I know this issue is closed, but I have the same problem. @ChandraLingam, please can I ask what did you include in your AddNewFeatures.py? I'm new to SageMaker, so trying to figure this all out still.

DanyalAndriano · 2019-09-30T00:53:56Z

@wiltonwu how exactly would I add the script to a new directory (and which directory) and then bring it in with source_dir?

pranidhii · 2020-11-04T18:41:07Z

Update:

This seems to work.

The link doesnt have any content !

tonybaby16 · 2020-11-05T01:13:22Z

Update:
This seems to work.

The link doesnt have any content !

https://stackoverflow.com/questions/54314876/aws-sagemaker-sklearn-entry-point-allow-multiple-script

tthpham · 2020-11-23T11:12:21Z

Hi @ChandraLingam and @wiltonwu,
I tried your proposition but always have the issue AttributeError: module 'main' has no attribute 'DataTransformer' when publishing an end point.

Here's my settings:

estimator = SKLearn(
entry_point="script.py",
role=role_name,
train_instance_count=1, # training instance count
train_instance_type=instance_type, # training instance type
output_path=f's3://{bucket}/{prefix}/output', # S3 location for output data
sagemaker_session=sess,
framework_version='0.23-1',
base_job_name=base_job_name,
hyperparameters={'data_path': dataset_to_train},
dependencies=['DataTransformer.py'],
source_dir='s3://mybucket/pyscripts/source.tar.gz')

The files script.py and DataTransformer.py is zipped and uploaded on S3, the 'source_dir' points to the .tar.gz file.
How would I modify my script to make it work?

karthikph007 · 2021-06-24T09:09:23Z

I am facing the same issue.
Details:
Following the example - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

The AddNewFeatures(equivalent class in my case) class is created inside script_path = 'sklearn_abalone_featurizer.py'.
Scikit Estimator gets created successfully.
Batch transform our training data step is failing with error - sagemaker_containers._errors.ClientError: module 'main' has no attribute 'AddNewFeatures'

My guess is that error is getting thrown from within in mode_fn(in sklearn_abalone_featurizer.py)
at step preprocessor = joblib.load(os.path.join(model_dir, "model.joblib"))

Am working on same "sklearn_abalone_featurizer.py" and end up with sagemaker_containers._errors.ClientError: module 'main' has no attribute 'AddNewFeatures'. Could you share your solution how you resolved it.

I have followed with solution mentioned in this link https://stackoverflow.com/questions/54314876/aws-sagemaker-sklearn-entry-point-allow-multiple-script but no result. Still stuck with same error

tonybaby16 · 2021-06-24T13:07:27Z

@karthikph007
What worked for me is to add source_dir parameter like below. script is a folder which contained any helper classes that abc.py needed to import in. Hope this helps.

sklearn_preprocessor = SKLearn(
entry_point= 'abc.py',
source_dir = 'script',
role=role,
train_instance_type="ml.c4.xlarge",
sagemaker_session=sagemaker_session)

karthikph007 · 2021-06-25T01:34:00Z

@tonybaby16
Does script folder must contain helper classes in AddNewFeatures.py file or requirement.txt file?
I have tried with creating AddNewFeatures.py file and direct it using source_dir parameter but still end up getting same error.
AddNewFeatures.py:
from sklearn.pipeline import Pipeline

class DataframeFunctionTransformer():
def init(self, func):
self.func = func

     def transform(self, input_df, **transform_params):
           return self.func(input_df)

     def fit(self, X, y=None, **fit_params):
          return self

def process_dataframe(input_df):

input_df["text"] = input_df["text"].map(lambda t: t.upper())

return input_df

karthikph007 · 2021-06-25T06:44:27Z

I have resolved issue by creating DataframeFunctionTransformer.py with class and import it as a module in both training and testing.

from package.DataframeFunctionTransformer import DataframeFunctionTransformer, process_dataframe

Reference taken from:
https://stackoverflow.com/questions/56260720/import-custom-modules-in-amazon-sagemaker-jupyter-notebook

wiltonwu closed this as completed May 20, 2019

tthpham mentioned this issue Nov 23, 2020

Failed to publish an endpoint with custom sklearn model #1783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom TransformerMixin #725

Custom TransformerMixin #725

ChandraLingam commented Apr 22, 2019

tonybaby16 commented May 1, 2019

tonybaby16 commented May 1, 2019

ChandraLingam commented May 1, 2019 •

edited

Loading

wiltonwu commented May 20, 2019

DanyalAndriano commented Sep 30, 2019

DanyalAndriano commented Sep 30, 2019

pranidhii commented Nov 4, 2020

tonybaby16 commented Nov 5, 2020

tthpham commented Nov 23, 2020

karthikph007 commented Jun 24, 2021

tonybaby16 commented Jun 24, 2021

karthikph007 commented Jun 25, 2021 •

edited

Loading

karthikph007 commented Jun 25, 2021

Custom TransformerMixin #725

Custom TransformerMixin #725

Comments

ChandraLingam commented Apr 22, 2019

tonybaby16 commented May 1, 2019

tonybaby16 commented May 1, 2019

ChandraLingam commented May 1, 2019 • edited Loading

wiltonwu commented May 20, 2019

DanyalAndriano commented Sep 30, 2019

DanyalAndriano commented Sep 30, 2019

pranidhii commented Nov 4, 2020

tonybaby16 commented Nov 5, 2020

tthpham commented Nov 23, 2020

karthikph007 commented Jun 24, 2021

tonybaby16 commented Jun 24, 2021

karthikph007 commented Jun 25, 2021 • edited Loading

karthikph007 commented Jun 25, 2021

ChandraLingam commented May 1, 2019 •

edited

Loading

karthikph007 commented Jun 25, 2021 •

edited

Loading