<a href="https://colab.research.google.com/github/MohammadrezaPourreza/Scikit-learn-tutorial/blob/main/SGDClassifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Stochastic Gradient Descent (SGD) is a simple yet efficient optimization algorithm used to find the values of parameters/coefficients of functions that minimize a cost function. 

**PARAMETERS**

loss − str, default = ‘hinge’

It represents the loss function to be used while implementing. The default value is ‘hinge’ which will give us a linear SVM. The other options which can be used are −

log − This loss will give us logistic regression i.e. a probabilistic classifier.

modified_huber − a smooth loss that brings tolerance to outliers along with probability estimates.

squared_hinge − similar to ‘hinge’ loss but it is quadratically penalized.

perceptron − as the name suggests, it is a linear loss which is used by the perceptron algorithm.

penalty − str, ‘none’, ‘l2’, ‘l1’, ‘elasticnet’

It is the regularization term used in the model. By default, it is L2. We can use L1 or ‘elasticnet; as well but both might bring sparsity to the model, hence not achievable with L2.

alpha − float, default = 0.0001

Alpha, the constant that multiplies the regularization term, is the tuning parameter that decides how much we want to penalize the model. The default value is 0.0001.

l1_ratio − float, default = 0.15

This is called the ElasticNet mixing parameter. Its range is 0 < = l1_ratio < = 1. If l1_ratio = 1, the penalty would be L1 penalty. If l1_ratio = 0, the penalty would be an L2 penalty.

fit_intercept − Boolean, Default=True

This parameter specifies that a constant (bias or intercept) should be added to the decision function. No intercept will be used in calculation and data will be assumed already centered, if it will set to false.

tol − float or none, optional, default = 1.e-3

This parameter represents the stopping criterion for iterations. Its default value is False but if set to None, the iterations will stop when 𝒍loss > best_loss - tol for n_iter_no_changesuccessive epochs.

shuffle − Boolean, optional, default = True

This parameter represents that whether we want our training data to be shuffled after each epoch or not.

	
verbose − integer, default = 0

It represents the verbosity level. Its default value is 0.

epsilon − float, default = 0.1

This parameter specifies the width of the insensitive region. If loss = ‘epsilon-insensitive’, any difference, between current prediction and the correct label, less than the threshold would be ignored.

	
max_iter − int, optional, default = 1000

As name suggest, it represents the maximum number of passes over the epochs i.e. training data.

warm_start − bool, optional, default = false

With this parameter set to True, we can reuse the solution of the previous call to fit as initialization. If we choose default i.e. false, it will erase the previous solution.

	
learning_rate − string, optional, default = ‘optimal’

If learning rate is ‘constant’, eta = eta0;

If learning rate is ‘optimal’, eta = 1.0/(alpha*(t+t0)), where t0 is chosen by Leon Bottou;

If learning rate = ‘invscalling’, eta = eta0/pow(t, power_t).

If learning rate = ‘adaptive’, eta = eta0.

eta0 − double, default = 0.0

It represents the initial learning rate for above mentioned learning rate options i.e. ‘constant’, ‘invscalling’, or ‘adaptive’.

power_t − idouble, default =0.5

It is the exponent for ‘incscalling’ learning rate.

early_stopping − bool, default = False

This parameter represents the use of early stopping to terminate training when validation score is not improving. Its default value is false but when set to true, it automatically set aside a stratified fraction of training data as validation and stop training when validation score is not improving.

validation_fraction − float, default = 0.1

It is only used when early_stopping is true. It represents the proportion of training data to set asides as validation set for early termination of training data..

n_iter_no_change − int, default=5

It represents the number of iteration with no improvement should algorithm run before early stopping.

classs_weight − dict, {class_label: weight} or “balanced”, or None, optional

This parameter represents the weights associated with classes. If not provided, the classes are supposed to have weight 1.

average − iBoolean or int, optional, default = false

It represents the number of CPUs to be used in OVA (One Versus All) computation, for multi-class problems. The default value is none which means 1.

In [4]:
import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])

In [5]:
SGDClf = linear_model.SGDClassifier(max_iter = 1000, tol=1e-3,penalty = "elasticnet")
SGDClf.fit(X, Y)

SGDClassifier(penalty='elasticnet')

In [6]:
SGDClf.predict([[2.,2.]])

array([2])

In [7]:
SGDClf.coef_

array([[19.54811198,  9.77200712]])

the signed distance to the hyperplane

In [8]:
SGDClf.decision_function([[2., 2.]])

array([48.6402382])

SGDregressor

**PARAMETERS**

For SGDRegressor modules’ loss parameter the positives values are as follows −

squared_loss − It refers to the ordinary least squares fit.

huber: SGDRegressor − correct the outliers by switching from squared to linear loss past a distance of epsilon. The work of ‘huber’ is to modify ‘squared_loss’ so that algorithm focus less on correcting outliers.

epsilon_insensitive − Actually, it ignores the errors less than epsilon.

squared_epsilon_insensitive − It is same as epsilon_insensitive. The only difference is that it becomes squared loss past a tolerance of epsilon.

In [9]:
import numpy as np
from sklearn import linear_model
n_samples, n_features = 10, 5
rng = np.random.RandomState(0)
y = rng.randn(n_samples)
X = rng.randn(n_samples, n_features)
SGDReg =linear_model.SGDRegressor(
   max_iter = 1000,penalty = "elasticnet",loss = 'huber',tol = 1e-3, average = True
)
SGDReg.fit(X, y)

SGDRegressor(average=True, loss='huber', penalty='elasticnet')

In [10]:
SGDReg.coef_

array([-0.00493986,  0.00277446, -0.00411315,  0.00585336,  0.00458203])