# Custom One-Versus-All Logistic Regression

By: Haiyan Cai, Joe Sellett, and Cole Wagner

## Preparation and Overview

[2 points] Explain the task and what business-case or use-case it is designed to solve (or designed to investigate). Detail exactly what the classification task is and what parties would be interested in the results. For example, would the model be deployed or used mostly for offline analysis? As in previous labs, also detail how good the classifier needs to perform in order to be useful. 

[.5 points] (mostly the same processes as from previous labs) Define and prepare your class variables. Use proper variable representations (int, float, one-hot, etc.). Use pre-processing methods (as needed) for dimensionality reduction, scaling, etc. Remove variables that are not needed/useful for the analysis (give reasoning). Describe the final dataset that is used for classification/regression (include a description of any newly formed variables you created). Provide a breakdown of the variables after preprocessing (such as the mean, std, etc. for all variables, including numeric and categorical). 

[.5 points] Divide your data into training and testing splits using an 80% training and 20% testing split. Use the cross validation modules that are part of scikit-learn. Argue "for" or "against" splitting your data using an 80/20 split. That is, why is the 80/20 split appropriate (or not) for your dataset?  

## Modeling

In [16]:
import numpy as np
from numpy.linalg import pinv
from scipy.special import expit
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split


### Base Class (Steepest Descent)

Class definition pulled from: https://github.com/eclarson/MachineLearningNotebooks/blob/master/06.%20Optimization.ipynb

**Modifications:** Documentation and type hints.

In [None]:
class BinaryLogisticRegression:
    """Binary Logistic Regression using gradient descent.

    Parameters
    ----------
    eta : float
        Learning rate.
    iterations : int, optional (default=20)
        Number of iterations for the solver.
    C : float, optional (default=0.001)
        Constant applied to the regularization term.

    """

    def __init__(
        self,
        eta: float,
        iterations: int = 20,
        C: float = 0.001,
    ) -> None:
        """Initialize the BinaryLogisticRegression object.

        Parameters
        ----------
        eta : float
            Learning rate.
        iterations : int, optional (default=20)
            Number of iterations for the solver.
        C : float, optional (default=0.001)
            Constant applied to the regularization term.

        """
        self.eta = eta
        self.iters = iterations
        self.C = C
        # internally we will store the weights as self.w_ to keep with sklearn conventions

    def __str__(self) -> str:
        """Return a message for the BinaryLogisticRegression object."""
        if hasattr(self, "w_"):
            return (
                "Binary Logistic Regression Object with coefficients:\n"
                + str(self.w_)
            )  # is we have trained the object
        return "Untrained Binary Logistic Regression Object"

    # convenience, private:
    @staticmethod
    def _add_bias(X: np.array) -> np.array:
        return np.hstack((np.ones((X.shape[0], 1)), X))  # add bias term

    @staticmethod
    def _sigmoid(theta: np.array) -> np.array:
        # increase stability, redefine sigmoid operation
        return expit(theta)  # 1/(1+np.exp(-theta))

    # vectorized gradient calculation with regularization using L2 Norm
    def _get_gradient(self, X: np.array, y: np.array) -> np.array:
        ydiff = (
            y - self.predict_proba(X, add_bias=False).ravel()
        )  # get y difference
        gradient = np.mean(
            X * ydiff[:, np.newaxis], axis=0
        )  # make ydiff a column vector and multiply through

        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += -2 * self.w_[1:] * self.C

        return gradient

    # public:
    def predict_proba(
        self, X: np.array, add_bias: bool = True
    ) -> np.array:
        """Predict the probability of the positive class.

        Parameters
        ----------
        X : np.array
            Input data.
        add_bias : bool, optional (default=True)
            Whether to add a bias term to the input data.

        """
        # add bias term if requested
        Xb = self._add_bias(X) if add_bias else X
        return self._sigmoid(Xb @ self.w_)  # return the probability y=1

    def predict(self, X: np.array) -> np.array:
        """Predict the discrete labels based on a cutoff of p > 0.5.

        Parameters
        ----------
        X : np.array
            Input data.

        """
        return self.predict_proba(X) > 0.5  # return the actual prediction

    def fit(self, X: np.array, y: np.array) -> None:
        """Fit the model to data.

        Parameters
        ----------
        X : np.array
            Input data.
        y : np.array
            Target labels.

        """
        Xb = self._add_bias(X)  # add bias term
        num_samples, num_features = Xb.shape

        self.w_ = np.zeros(
            (num_features, 1)
        )  # init weight vector to zeros

        # for as many as the max iterations
        for _ in range(self.iters):
            gradient = self._get_gradient(Xb, y)
            self.w_ += gradient * self.eta  # multiply by learning rate
            # add bacause maximizing

### Stochastic Gradient Descent

Class definition pulled from: https://github.com/eclarson/MachineLearningNotebooks/blob/master/06.%20Optimization.ipynb

**Modifications:** documentation and static typing.

In [None]:
class StochasticLogisticRegression(BinaryLogisticRegression):
    """Logistic Regression using stochastic gradient descent.

    Parameters
    ----------
    eta : float
        Learning rate.
    iterations : int, optional (default=20)
        Number of iterations for the solver.
    C : float, optional (default=0.001)
        Constant applied to the regularization term.

    """

    # stochastic gradient calculation
    def _get_gradient(self, X: np.array, y: np.array) -> np.array:
        # grab a subset of samples in a mini-batch
        mini_batch_size = 50
        idxs = np.random.choice(len(y), mini_batch_size)

        ydiff = (
            y[idxs] - self.predict_proba(X[idxs], add_bias=False).ravel()
        )  # get y difference (now scalar)
        gradient = np.mean(
            X[idxs] * ydiff[:, np.newaxis], axis=0
        )  # make ydiff a column vector and multiply through

        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += -2 * self.w_[1:] * self.C

        return gradient

### Newton's Method

Class definition pulled from: https://github.com/eclarson/MachineLearningNotebooks/blob/master/06.%20Optimization.ipynb

**Modifications:** documentation and static typing.

In [None]:
class HessianBinaryLogisticRegression(BinaryLogisticRegression):
    """Logistic Regression using Newton's method for optimization.

    Parameters
    ----------
    eta : float
        Learning rate.
    iterations : int, optional (default=20)
        Number of iterations for the solver.
    C : float, optional (default=0.001)
        Constant applied to the regularization term.

    """

    # just overwrite gradient function
    def _get_gradient(self, X: np.array, y: np.array) -> np.array:
        g = self.predict_proba(
            X, add_bias=False
        ).ravel()  # get sigmoid value for all classes
        hessian = (
            X.T @ np.diag(g * (1 - g)) @ X - 2 * self.C
        )  # calculate the hessian

        ydiff = y - g  # get y difference
        gradient = np.sum(
            X * ydiff[:, np.newaxis], axis=0
        )  # make ydiff a column vector and multiply through
        gradient = gradient.reshape(self.w_.shape)
        gradient[1:] += -2 * self.w_[1:] * self.C

        # Note the pinv() to make the hessian function inverse
        return pinv(hessian) @ gradient

Class definition pulled from: https://github.com/eclarson/MachineLearningNotebooks/blob/master/06.%20Optimization.ipynb

**Modifications:** Added documentation, refactored `predict_proba()` to use list comprehension.

In [None]:
class MultiClassLogisticRegression:
    """MultiClass Logistic Regression using One-Versus-All approach.

    Parameters
    ----------
    eta : float
        Learning rate.
    iterations : int, optional (default=20)
        Number of iterations for the solver.
    C : float, optional (default=0.0001)
        Constant applied to the regularization term.
    solver : class, optional (default=HessianBinaryLogisticRegression)
        Solver class to use for binary logistic regression.

    """

    def __init__(
        self,
        eta: float,
        iterations: int = 20,
        C: float = 0.0001,
        solver: any = HessianBinaryLogisticRegression,
    ) -> None:
        """Initialize the MultiClassLogisticRegression object.

        Parameters
        ----------
        eta : float
            Learning rate.
        iterations : int, optional (default=20)
            Number of iterations for the solver.
        C : float, optional (default=0.0001)
            Constant applied to the regularization term.
        solver : class, optional (default=HessianBinaryLogisticRegression)
            Solver class to use for binary logistic regression.

        """
        self.eta = eta
        self.iters = iterations
        self.C = C
        self.solver = solver
        self.classifiers_ = []
        # internally we will store the weights as self.w_

    def __str__(self) -> str:
        """Return a message for the MultiClassLogisticRegression object."""
        if hasattr(self, "w_"):
            return (
                "MultiClass Logistic Regression Object with coefficients:\n"
                + str(self.w_)
            )  # is we have trained the object
        return "Untrained MultiClass Logistic Regression Object"

    def fit(self, X: np.array, y: np.array) -> None:
        """Fit the model.

        Parameters
        ----------
        X : np.array
            Input data.
        y : np.array
            Target labels.

        """
        num_samples, num_features = X.shape
        self.unique_ = np.sort(np.unique(y))  # get each unique class value
        self.classifiers_ = []
        for i, yval in enumerate(self.unique_):  # for each unique value
            y_binary = np.array(y == yval).astype(
                int
            )  # create a binary problem
            # train the binary classifier for this class

            hblr = self.solver(
                eta=self.eta, iterations=self.iters, C=self.C
            )
            hblr.fit(X, y_binary)

            # add the trained classifier to the list
            self.classifiers_.append(hblr)

        # save all the weights into one matrix, separate column for each class
        self.w_ = np.hstack([x.w_ for x in self.classifiers_]).T

    def predict_proba(self, X: np.array) -> np.array:
        """Predict the probability of each class.

        Parameters
        ----------
        X : np.array
            Input data.

        """
        probs = [
            hblr.predict_proba(X).reshape((len(X), 1))
            for hblr in self.classifiers_
        ]
        return np.hstack(probs)  # make into single matrix

    def predict(self, X: np.array) -> np.array:
        """Predict the discrete labels.

        Parameters
        ----------
        X : np.array
            Input data.

        """
        return self.unique_[
            np.argmax(self.predict_proba(X), axis=1)
        ]  # take argmax along row

In [28]:
ds = load_iris()
X = ds.data

# X = StandardScaler().fit(X).transform(X)
y_not_binary = (
    ds.target
)  # note problem is NOT binary anymore, there are three classes!
X_train, X_test, y_train, y_test = train_test_split(
    X, y_not_binary, train_size=0.8, test_size=0.2
)


lr = MultiClassLogisticRegression(
    eta=1.0,
    iterations=4,
    C=0.01,
    solver=HessianBinaryLogisticRegression,
)
lr.fit(X_train, y_train)
print(lr)

yhat = lr.predict(X_test)
print("Accuracy of: ", accuracy_score(y_test, yhat))

MultiClass Logistic Regression Object with coefficients:
[[ -9.78265133   2.08829579   2.65681509  -2.63973866  -2.35725176]
 [  8.85621934  -0.38997276  -3.14389254   1.37751022  -2.69916585]
 [-12.91894208  -1.01274131   0.17373293   2.11562596   5.07584625]]
Accuracy of:  0.9333333333333333
