# Process Overview

### Import convention
``from sklearn.family import Model``

In [1]:
from sklearn.linear_model import LinearRegression

### Estimator parameters

In [3]:
model = LinearRegression(normalize=True)
print(model)

LinearRegression(normalize=True)


### Train test split

In [10]:
import numpy as np
from sklearn.model_selection import train_test_split

In [12]:
X, y = np.arange(10).reshape((5, 2)), range(5)
X, y

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 range(0, 5))

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
X_train, X_test, y_train, y_test

(array([[0, 1],
        [8, 9],
        [6, 7]]),
 array([[2, 3],
        [4, 5]]),
 [0, 4, 3],
 [1, 2])

### Fit the model

In [19]:
model.fit(X_train, y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)




LinearRegression(normalize=True)

### Get predictions

In [20]:
predictions = model.predict(X_test)

Some classification estimators also provide ``model.predict_proba()`` method which returns the label with highest probability

### Evaluate model
- comparing predictions to the correct values
- Evaluation method depends on ML algorithm (Classification, Regression, Clustering)

In [23]:
model.score(X_test, y_test)

1.0