Create MLPrimitive for feature engineering pipeline #85

micahjsmith · 2021-05-31T17:24:47Z

Creating a primitive for Ballet feature engineering pipelines will allow these pipelines to be included in an MLPipeline

primitive is generic and will work for any ballet project
primitive takes optional init param that gives the fully-qualified name of the Ballet project
implement adapter:
- adapter takes the init param
- initializes a ballet.client.Client
- loads the desired project
- accesses the feature engineering pipeline instances
- creates a function that returns a deepcopy of the pipeline
- returns the function
if the init param is not given, then a project is detected from cwd
expose primitive in new ballet entry points

Prototype (that is not generic)

{
    "name": "predict_census_income.engineer_features",
    "contributors": [
        "Micah Smith <micahs@mit.edu>"
    ],
    "documentation": "",
    "description": "Applies the feature engineering pipeline from the predict_census_income project",
    "classifiers": {
        "type": "preprocessor",
        "subtype": "transformer"
    },
    "modalities": [],
    "primitive": "predict_census_income.api.make_feature_engineering_pipeline",
    "fit": {
        "method": "fit",
        "args": [
            {
                "name": "X",
                "type": "pandas.DataFrame"
            },
            {
                "name": "y",
                "type": "pandas.DataFrame"
            }
        ]
    },
    "produce": {
        "method": "transform",
        "args": [
            {
                "name": "X",
                "type": "pandas.DataFrame"
            }
        ],
        "output": [
            {
                "name": "X",
                "type": "pandas.DataFrame"
            }
        ]
    },
    "hyperparameters": {}
}

micahjsmith · 2021-05-31T17:35:31Z

Something like this allows the feature engineering pipeline to have access to the unencoded targets for supervised transformations

import mlblocks
from ballet import b
from sklearn.metrics import classification_report

X_df, y_df = b.api.load_data()
X_df_te, y_df_te = b.api.load_data(input_dir='data/val')

encoder = b.api.encoder
y = encoder.fit_transform(y_df)
y_te = encoder.transform(y_df_te)

pipeline = mlblocks.MLPipeline(
    primitives=[
        'predict_census_income.engineer_features',
        'sklearn.ensemble.RandomForestClassifier',
    ],
    input_names={
        'predict_census_income.engineer_features#1': {
            'y': 'y_df',
        }
    },
)
pipeline.fit(X_df, y, y_df=y_df)
y_pred = pipeline.predict(X_df)
report = classification_report(y, y_pred, output_dict=True)

y_pred_te = pipeline.predict(X_df_te)
report_te = classification_report(y_te, y_pred_te, output_dict=True)

micahjsmith · 2021-06-11T12:49:19Z

Added in #86

micahjsmith added the enhancement New feature or request label May 31, 2021

micahjsmith closed this as completed Jun 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create MLPrimitive for feature engineering pipeline #85

Create MLPrimitive for feature engineering pipeline #85

micahjsmith commented May 31, 2021 •

edited

micahjsmith commented May 31, 2021

micahjsmith commented Jun 11, 2021

Create MLPrimitive for feature engineering pipeline #85

Create MLPrimitive for feature engineering pipeline #85

Comments

micahjsmith commented May 31, 2021 • edited

micahjsmith commented May 31, 2021

micahjsmith commented Jun 11, 2021

micahjsmith commented May 31, 2021 •

edited