# Linear Regression

Also known as, Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

In [3]:
# Importing 
import numpy as np
from sklearn.linear_model import LinearRegression

# Features
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3 -> Target
y = np.dot(X, np.array([1, 2])) + 3
# Fit method
reg = LinearRegression().fit(X, y)

# Return the coefficient of determination R^2 of the prediction.
print(reg.score(X, y))
# Return predicted value of y for X (array-like) using the linear model. 
print(reg.predict(np.array([[3, 5]]))) 

1.0
[16.]


# Logistic Regression

Alos know as logit & MaxEnt classifier.

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’ solvers.)

In [5]:
# Importing
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

# Load the data
X, y = load_iris(return_X_y=True)
# Fit the model
clf = LogisticRegression(random_state=0, max_iter=10000000).fit(X, y)
# Predict
print(clf.predict(X[:2, :]))
# Predict class probabilities
print(clf.predict_proba(X[:2, :]))
# Classifier Score
print(clf.score(X, y))

[0 0]
[[9.81585270e-01 1.84147159e-02 1.45076469e-08]
 [9.71333598e-01 2.86663719e-02 3.02076222e-08]]
0.9733333333333334


# Decision Tree

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

In [7]:
# Importing
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

# Classifier object
clf = DecisionTreeClassifier(random_state=0)
# Load the data -> contains the features and the target
iris = load_iris()
# Cross-validation score
print(cross_val_score(clf, iris.data, iris.target, cv=10))

[1.         0.93333333 1.         0.93333333 0.93333333 0.86666667
 0.93333333 1.         1.         1.        ]


# Random Forest 

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.

In [10]:
#Importing 
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
# Classifier object
clf = RandomForestClassifier(max_depth=2, random_state=0)
# Fit the model
clf.fit(X, y)
# Score
print(clf.score(X, y))
# Predict
print(clf.predict([[0, 0, 0, 0]]))

0.946
[1]


# SVM

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

The advantages of support vector machines are:

- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.


In [11]:
# Importing
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
# Features
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
# Targets
y = np.array([1, 1, 2, 2])
# Pipeline
clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
# Fit the model
clf.fit(X, y)
# Classifier score
clf.score(X, y)

1.0