In [1]:
from sklearn.linear_model import LinearRegression
import numpy as np

### Training the Estimator  
Estimators learn from data. They have a <code>fit</code> method that takes training data (X) and target variables (y) as input and learns the underlying patterns in the data.  
used for classification, regression and clustering

In [2]:
x = np.array([[1], [2], [3], [4], [5]])  # features
y = np.array([2, 4, 5, 4, 5])  # target variables

# x is 2d, y is 1d
# linear regression estimator
model = LinearRegression()

# fit the model
model.fit(x, y)

print("intercept: ", model.intercept_)
print("coeff: ", model.coef_)

intercept:  2.2
coeff:  [0.6]


#### Predicting now on new, unseen data

In [3]:
x_new = np.array([[6]]) # still 2D
y_pred = model.predict(x_new)
y_pred # gives an numpy array


array([5.8])

 Many estimators also have a score method that evaluates the performance of the model on a given dataset X and corresponding target variables y

[!NOTE]  
for classification models, the score method typically returns the accuracy, while for regression models, it returns the R-squared value.

In [4]:
score = model.score(x, y)
print("r^2 = ", score)
# 0.6000000000000001 how well the linear regression model fits the data

r^2 =  0.6000000000000001


### Transformers  
Transformers are used to preprocess and transform data.  
Implement two main methods 
<code>fit</code> and <code>transform</code>


>  Similar to estimators, the fit method of a transformer learns parameters from the data. However, unlike estimators, transformers **don't** necessarily need a target variable y. For example, a StandardScaler learns the mean and standard deviation of each feature in the data.

In [5]:
from sklearn.preprocessing import StandardScaler

In [6]:
x = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6],
    ]
)

# creating StandardScaler transformer
scaler = StandardScaler()

# fit the transformer
scaler.fit(X=x) # calculates and stores mean and sd in columns, just analyzes, doesn't change the data
print("mean: ", scaler.mean_)
print("scale: ", scaler.scale_) # it's sd, column wise

mean:  [3. 4.]
scale:  [1.63299316 1.63299316]


In [7]:
x_transformed = scaler.transform(x)
print("transformed: ",x_transformed)
# NOTE transform means applying the scaling, i.e. new_value = (original_value - mean)/standard_deviation

transformed:  [[-1.22474487 -1.22474487]
 [ 0.          0.        ]
 [ 1.22474487  1.22474487]]


In [8]:
# fit_transfer method: does fit and transform in a single command
x_transformed = scaler.fit_transform(x)
print("Transformed data: \n", x_transformed)

Transformed data: 
 [[-1.22474487 -1.22474487]
 [ 0.          0.        ]
 [ 1.22474487  1.22474487]]


### Pipelines  
powerful tool for chaining together multiple estimators and transformers into a single workflow,  
consists of a sequence of steps, where each step is either a transformer or an estimator. The final step in a pipeline is typically an estimator.

In [9]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([3, 6, 9])
# recall regression, 1 dependent var(y) and 2 independent variables (x1, x2)

# create a pipeline - chains together multiple steps
# each step = (name, transformer/estimator), executed in order when called .fit() and .predict()
pipeline = Pipeline(
    [
        ("scaler", StandardScaler()),
        ("linear_regression", LinearRegression()),
    ]
)

# fitting pipeline to the data:
pipeline.fit(x,y)

# predict on new data:
X_new = np.array([[7,8]])
y_pred = pipeline.predict(X_new)
y_pred

array([12.])