# EvalML Components

Components are the lowest level of building blocks in EvalML. Each component represents a fundamental operation to be applied to data.

All components accept parameters as keyword arguments to their `__init__` methods. These parameters can be used to configure behavior.

Each component class definition must include a human-readable `name` for the component. Additionally, each component class may expose parameters for AutoML search by defining a `hyperparameter_ranges` attribute containing the parameters in question.

EvalML splits components into two categories: **transformers** and **estimators**.

## Transformers

Transformers subclass the `Transformer` class, and define a `fit` method to learn information from training data and a `transform` method to apply a learned transformation to new data.

For example, an [imputer](../generated/evalml.pipelines.components.SimpleImputer.ipynb) is configured with the desired impute strategy to follow, for instance the mean value. The imputers `fit` method would learn the mean from the training data, and the `transform` method would fill the learned mean value in for any missing values in new data.

All transformers can execute `fit` and `transform` separately or in one step by calling `fit_transform`. Defining a custom `fit_transform` method can facilitate useful performance optimizations in some cases.

In [None]:
import numpy as np
import pandas as pd
from evalml.pipelines.components import SimpleImputer

X = pd.DataFrame([[1, 2, 3], [1, np.nan, 3]])
display(X)

In [None]:
imp = SimpleImputer(impute_strategy="mean")
X = imp.fit_transform(X)

display(X)

Below is a list of all transformers included with EvalML:

In [None]:
from evalml.pipelines.components.utils import all_components, Estimator, Transformer
for component in all_components:
    if issubclass(component, Transformer):
        print(f"Transformer: {component.name}")

## Estimators

Each estimator wraps an ML algorithm. Estimators subclass the `Estimator` class, and define a `fit` method to learn information from training data and a `predict` method for generating predictions from new data. Classification estimators should also define a `predict_proba` method for generating predicted probabilities.

Estimator classes each define a `model_family` attribute indicating what type of model is used.

Here's an example of using the [LogisticRegressionClassifier](../generated/evalml.pipelines.components.LogisticRegressionClassifier.ipynb) estimator to fit and predict on a simple dataset:

In [None]:
from evalml.pipelines.components import LogisticRegressionClassifier

clf = LogisticRegressionClassifier()

X = X
y = [1, 0]

clf.fit(X, y)
clf.predict(X)

Below is a list of all estimators included with EvalML:

In [None]:
from evalml.pipelines.components.utils import all_components, Estimator, Transformer
for component in all_components:
    if issubclass(component, Estimator):
        print(f"Estimator: {component.name}")

# Defining Custom Components

EvalML allows you to easily create your own custom components if you follow the below steps.

## Class Inheritance

Your custom component must inherit from the correct subclass: `Transformer` for components that transform data or `Estimator` for components that predict new values. Both [Transformer](../generated/evalml.pipelines.components.Transformer.ipynb) and [Estimator](../generated/evalml.pipelines.components.Estimator.ipynb) are subclasses of [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb).

In [None]:
from evalml.pipelines import Transformer, Estimator

class NewTransformer(Transformer):
    pass

class NewEstimator(Estimator):
    pass

## Required fields

Moreover, your component must override certain fields so that it will work with EvalML pipelines and AutoML. Subclassing without these fields will result in an error.

For a component you need to provide:
- `name` - component's name

- `hyperparameter_ranges` - dictionary of parameter (string) to range ([SkOpt Space](https://scikit-optimize.github.io/stable/modules/classes.html#module-skopt.space.space)) pairings

Additionaly for an estimator you need to provide:
- `name` - component's name

- `model_family` - EvalML [model_family](../generated/evalml.model_family.ModelFamily.ipynb) that this component belongs to

- `supported_problem_types` - list of EvalML [problem_types](../generated/evalml.problem_types.ProblemTypes.ipynb) that this component supports

Model families and problem types include:

In [None]:
from evalml.model_family import ModelFamily
from evalml.problem_types import ProblemTypes

print([m.value for m in ModelFamily])
print([p.value for p in ProblemTypes])

In [None]:
from skopt.space import Integer, Real

class NewTransformer(Transformer):
    name = 'New Transformer'
    hyperparameter_ranges = {
        "parameter_1":['a', 'b', 'c']
    }
    
class NewEstimator(Estimator):
    name = 'New Estimator'
    model_family = ModelFamily.LINEAR_MODEL
    supported_problem_types = [ProblemTypes.BINARY, ProblemTypes.MULTICLASS]
    hyperparameter_ranges = {
        "parameter_1": Integer(10, 1000),
        "parameter_2": Real(0.000001, 1),
    }

transformer = NewTransformer()
estimator = NewEstimator()

## Implementation
EvalML components by default work off of a `component_object` that provides the implementation for methods such as `fit()`, `predict()`, and `transform()`. This can be seen in [ComponentBase](../generated/evalml.pipelines.components.ComponentBase.ipynb) where the base class calls the `fit()` method of the `component_obj`. This applies to both transformers and estimators. You can provide this `component_obj` in the `__init__()` method of your new component.

The `__init__()` method of `ComponentBase` takes in three parameters: a `parameters` dictionary holding the parameters to the component, the `component_obj` described above, and the `random_state` value. The `__init__()` method of your custom component will need to call super and pass these three parameters in. A simple example to follow is the implementation of [SimpleImputer](../generated/evalml.pipelines.components.SimpleImputer.ipynb).

In [None]:
class NewTransformer(Transformer):
    name = 'New Transformer'
    hyperparameter_ranges = {
        "parameter_1":['a', 'b', 'c']
    }
    
    def __init__(self, parameter_1, random_state):
        transformer = ThirdPartyTransformer(parameter_1)
        parameters = {"parameter_1": parameter_1}
        super().__init__(parameters=parameters,
                         component_obj=transformer,
                         random_state=random_state)

Furthermore, if your `component_obj` does not adhere to the same API as EvalML `Transformer`s or `Estimator`s you may need to override the correct methods. Please refer to the [API reference](https://evalml.alteryx.com/en/latest/api_reference.html#components) on each baseclass's API and implementation. An example of this can be seen in implementation of [`CatboostClassifier`](../generated/evalml.pipelines.components.CatBoostClassifier.ipynb).

In [None]:
class NewEstimator(Estimator):
    name = 'New Estimator'
    model_family = ModelFamily.LINEAR_MODEL
    supported_problem_types = [ProblemTypes.BINARY, ProblemTypes.MULTICLASS]
    hyperparameter_ranges = {
        "parameter_1": Integer(10, 1000),
        "parameter_2": Real(0.000001, 1),
    }
    
    def __init__(self, parameter_1, parameter_2, random_state):
        transformer = ThirdPartyEstimator(parameter_1, parameter_2)
        parameters = {"parameter_1": parameter_1,
                      "parameter_2": parameter_2}
        super().__init__(parameters=parameters,
                         component_obj=transformer,
                         random_state=random_state)
    
    def fit(self, X, y):
        self.component_obj.validate_data(X, y)
        return self.component_obj.fit(X, y)
    
    def predict(self, X):
        return self.component_obj.predicts(X)

However, if your component does not require a `component_obj`, you will need to override all methods you intend on using. This can be seen in [DropNullColumns](../generated/evalml.pipelines.components.DropNullColumns.ipynb).

### Implementation Requirements
Evalml with operate under certain assumptions if your component would be used with pipelines.

#### Estimators
- target expected as int from 0 to n-1
- `predict_proba` column order must match the class integers and column names are set to integer values
- no support expected for str targets at estimator level