# EvalML Components

From the [overview](overview/html), we could see how each a machine learning pipeline consists of individual components that process data which then is ultimately sent to an estimator. Below we will go more indepth into each type of component.

## Component Classes

Components can be split into two distinct classes: **transformers** and **estimators**. Both classes could need to **fit** on data to apply functionality but that is also where they diverge. 

In [None]:
import numpy as np
import pandas as pd
from evalml.pipelines.components import SimpleImputer

X = pd.DataFrame([[1, 2, 3], [1, np.nan, 3]])
display(X)

Transformers take in data as input and output altered data. An example of this would be an imputer which takes in data and fills in missing data. An imputer may need to learn from data (X) in order to fill in missing data with the mean or median value. 

A transformer can fit on data and then transform it in two steps by calling `.fit()` and `.transform()` or in one step by calling `fit_transform()`.

In [None]:
imp = SimpleImputer(impute_strategy="mean")
X = imp.fit_transform(X)

display(X)

On the otherhand, an estimator fits on data (X) and labels (y) in order to take in new data as input and return the predicted label as output. Therefore, an estimator can fit on data and labels by calling `.fit()` and then predicting by calling `.predict()` on new data. An example of this would be the LogisticRegressionClassifier. We can now see how a transformer alters data so that the estimator could potentially have an easier time learning and predicting.

In [None]:
from evalml.pipelines.components import LogisticRegressionClassifier

clf = LogisticRegressionClassifier()

X = X
y = [1, 0]

clf.fit(X, y)
clf.predict(X)

## Component Types
Components can be further seperated into different types that serve different functionality. Below we will go over the different types of transformers and estimators.

TODO: Add links to API or add example of how each type work

### Transformer Types

* Imputer: fills missing data
    * Ex: SimpleImputer
* Scaler: alters numerical data into different scales
    * Ex: StandardScaler
* Encoder: translates different data types
    * Ex: OneHotEncoder
* Feature Selection: selects most useful columns of data
    * Ex: RFClassifierSelectFromModel

### Estimator Types

* Regressor: predicts numerical or continuous labels 
    * Ex: LinearRegressor
* Classifier: predicts categorical or discrete labels
    * Ex: XGBoostClassifier