Regression

César Souza edited this page Sep 17, 2016 · 8 revisions
  1. Accord.NET Framework
  2. Getting started
  3. Published books
  4. How to use
  5. Sample applications

Help improve this wiki! Those pages can be edited by anyone that would like to contribute examples and documentation to the framework.

Have you found this software useful? Consider donating only U$10 so it can get even better! This software is completely free and will always stay free. Enjoy!

Donate

Clone this wiki locally

Standard regression problems

In a regression problem, we would typically have some input vectors x and some desired output values y. Note that, differently from classification problems, here the output values y are not restricted to be class labels, but can rather be continuous variables or vectors.

Models

Linear Regression

Let's say we have some univariate, continuous sets of input data, and a corresponding univariate, continuous set of output data, such as a set of points in R². A simple linear regression is able to fit a line relating the input variables to the output variables in which the minimum-squared-error of the line and the actual output points is minimum.

// Declare some sample test data.
double[] inputs = { 80, 60, 10, 20, 30 };
double[] outputs = { 20, 40, 30, 50, 60 };

// Use Ordinary Least Squares to learn the regression
OrdinaryLeastSquares ols = new OrdinaryLeastSquares();

// Use OLS to learn the simple linear regression
SimpleLinearRegression regression = ols.Learn(inputs, outputs);

// Compute the output for a given input:
double y = regression.Transform(85); // The answer will be 28.088

// We can also extract the slope and the intercept term
// for the line. Those will be -0.26 and 50.5, respectively.
double s = regression.Slope;     // -0.264706
double c = regression.Intercept; // 50.588235

See Simple Linear Regression

Multivariate Linear Regression

The multivariate linear regression is a generalization of the multiple linear regression. In the multivariate linear regression, not only the input variables are multivariate, but also are the output dependent variables.

In the following example, we will perform a regression of a 2-dimensional output variable over a 3-dimensional input variable.

double[][] inputs = 
{
    // variables:  x1  x2  x3
    new double[] {  1,  1,  1 }, // input sample 1
    new double[] {  2,  1,  1 }, // input sample 2
    new double[] {  3,  1,  1 }, // input sample 3
};

double[][] outputs = 
{
    // variables:  y1  y2
    new double[] {  2,  3 }, // corresponding output to sample 1
    new double[] {  4,  6 }, // corresponding output to sample 2
    new double[] {  6,  9 }, // corresponding output to sample 3
};

With a quick eye inspection, it is possible to see that the first output variable y1 is always the double of the first input variable. The second output variable y2 is always the triple of the first input variable. The other input variables are unused. Nevertheless, we will fit a multivariate regression model and confirm the validity of our impressions:

// Use Ordinary Least Squares to create the regression
OrdinaryLeastSquares ols = new OrdinaryLeastSquares();

// Now, compute the multivariate linear regression:
MultivariateLinearRegression regression = ols.Learn(inputs, outputs);

// We can obtain predictions using
double[][] predictions = regression.Transform(inputs);

// The prediction error is
double error = new SquareLoss(outputs).Loss(predictions); // 0

See Multivariate Linear Regression

Multiple Linear Regression

We will try to model a plane as an equation in the form "ax + by + c = z". We have two input variables (x and y) and we will be trying to find two parameters a and b and an intercept term c.

// We will use Ordinary Least Squares to create a
// linear regression model with an intercept term
var ols = new OrdinaryLeastSquares()
{
    UseIntercept = true
};

// Now suppose you have some points
double[][] inputs = 
{
    new double[] { 1, 1 },
    new double[] { 0, 1 },
    new double[] { 1, 0 },
    new double[] { 0, 0 },
};

// located in the same Z (z = 1)
double[] outputs = { 1, 1, 1, 1 };

// Use Ordinary Least Squares to estimate a regression model
MultipleLinearRegression regression = ols.Learn(inputs, outputs);

// As result, we will be given the following:
double a = regression.Coefficients[0]; // a = 0
double b = regression.Coefficients[1]; // b = 0
double c = regression.Intercept; // c = 1

// This is the plane described by the equation
// ax + by + c = z => 0x + 0y + 1 = z => 1 = z.

// We can compute the predicted points using
double[] predicted = regression.Transform(inputs);

// And the squared error loss using 
double error = new SquareLoss(outputs).Loss(predicted);

See Multiple Linear Regression and Partial Least Squares

Logistic Regression

Suppose we have the following (fictional) data about some patients. The first variable is continuous and represent patient age. The second variable is dichotomic and give whether they smoke or not. We also know if they have had lung cancer or not, and we would like to know whether smoking has any connection with lung cancer.

double[][] input =
{              // age, smokes?, had cancer?
    new double[] { 55,    0  }, // false - no cancer
    new double[] { 28,    0  }, // false
    new double[] { 65,    1  }, // false
    new double[] { 46,    0  }, // true  - had cancer
    new double[] { 86,    1  }, // true
    new double[] { 56,    1  }, // true
    new double[] { 85,    0  }, // false
    new double[] { 33,    0  }, // false
    new double[] { 21,    1  }, // false
    new double[] { 42,    1  }, // true
};

bool[] output = // Whether each patient had lung cancer or not
{
    false, false, false, true, true, true, false, false, false, true
};

To verify this hypothesis, we are going to create a logistic regression model for those two inputs (age and smoking), learned using a method called "Iteratively Reweighted Least Squares":

// Create a new Iterative Reweighted Least Squares algorithm
var learner = new IterativeReweightedLeastSquares<LogisticRegression>()
{
    Tolerance = 1e-4,  // Let's set some convergence parameters
    Iterations = 100,  // maximum number of iterations to perform
    Regularization = 0
};

// Now, we can use the learner to finally estimate our model:
LogisticRegression regression = learner.Learn(input, output);

At this point, we can compute the odds ratio of our variables. In the model, the variable at 0 is always the intercept term, with the other following in the sequence. Index 1 is the age and index 2 is whether the patient smokes or not.

// For the age variable, we have that individuals with
//   higher age have 1.021 greater odds of getting lung
//   cancer controlling for cigarette smoking.
double ageOdds = regression.GetOddsRatio(1); // 1.0208597028836701

// For the smoking/non smoking category variable, however, we
//   have that individuals who smoke have 5.858 greater odds
//   of developing lung cancer compared to those who do not 
//   smoke, controlling for age (remember, this is completely
//   fictional and for demonstration purposes only).
double smokeOdds = regression.GetOddsRatio(2); // 5.8584748789881331

// If we would like to use the model to predict a probability for
// each patient regarding whether they are at risk of cancer or not,
// we can use the Probability function:

double[] scores = regression.Probability(input);

// Finally, if we would like to arrive at a conclusion regarding
// each patient, we can use the Decide method, which will transform
// the probabilities (from 0 to 1) into actual true/false values:

bool[] actual = regression.Decide(input);

See Logistic regression, Logistic Regression Analysis and Generalized Linear Models.

Multinomial Logistic Regression (Softmax)

See Multinomial Logistic Regression.

Support Vector Machines

See Sequential Minimal Optimization for Regression, L1-regularized logistic regression, L2-regularized logistic regression in the dual and L2-regularized L2-loss logistic regression.

Neural Networks

See Levenberg-Marquardt with Bayesian Regularization and Resilient Backpropagation.

Variations

Regression models censored in time

See Cox's Proportional Hazards Model