## Logistic Regression

In this practice, we will use the same data sets we have used in [Linear Discriminant Analysis practice notebook](Linear_Discriminant_Analysis.ipynb) to demonstrate the concept of linear separability. Take a look at that practice first if you haven't done so yet. 

We will start with the first data set that has two linearly separable classes. 

In [None]:
data1 <- read.csv("../../../datasets/toydata/data1.csv",header=TRUE)

# For logistic regression, we need to change the class labels from -1 and 1 to 0 and 1. 
data1$class[data1$class == -1] <- 0
str(data1)
# Visualize the data
library(ggplot2)
pl1 <- ggplot(data1, aes(X, Y)) + geom_point(aes(colour=factor(class),shape=factor(class))) #+ theme(legend.position="none")
pl1

The classes labeled as "0" and "1" are *linearly separable*; we can draw a linear decision boundary to separate them. Let's apply the logistic regression (LR) to this data set. 

In [None]:
glmfit1 = glm(class ~ X + Y, data=data1, family=binomial)

summary(glmfit1)

You may get a warning about algorithm not converging and probabilities being 0 or 1. Interestingly, if we have a perfectly separable classes in a data set, LR throws this warning because there are infinitely many decision lines that can be drawn between the classes for this data set; LR does not converge to an optimal solution, because optimal solutions are infinitely many. Still, it finds a decision boundary. 

Let's draw the decision boundary of the LR model on the data. To do that, we'll need to figure out the slope and the intercept of the decision boundary line from the model's coefficients. 

In [None]:
# Do not worry if you don't understand the next two lines; 
# it just figures out the decision line equation from the model's coefs. 
glm_slope1 <- coef(glmfit1)[2]/(-coef(glmfit1)[3])
glm_intercept1 <- coef(glmfit1)[1]/(-coef(glmfit1)[3]) 

pl1 + geom_abline(slope=glm_slope1, intercept=glm_intercept1)

It can separate two classes; they are *linearly separable*. LR is also a *linear classifier* like LDA; it finds a decision line in two dimensions, a decision plane in three dimensions, and a decision hyperplane for dimensions higher than three. 

Now, let's compute a confusion table similar to what we have done in LDA practice. 

In [None]:
# Run the model on the same data that it was trained with and get the probabilities for each sample. 
glm1.probs <- predict(glmfit1, type="response")
# create an array to hold predictions and assign all zeros initially.
glm1.pred = rep(0,length(glm1.probs))
# based on model's probablities for each sample, assign the class label.
glm1.pred[glm1.probs>0.5] <- 1

# Create a confusion table.
conftable1 <- table(glm1.pred, data1$class)
conftable1

We can see that there is no confusion between classes; accuracy is 100% (occasionally, you can have a few misclassified points; that is because a random new data set will be created every time you run the code, and a few points may end up too close to the other class).

Let's apply LR to the second data set where the classes can't be separated without making some errors. Here, the samples of different classes will be very closely located so that you can't find a linear separation without misclassifying some of them. 

In [None]:
data2 <- read.csv("../../../datasets/toydata/data2.csv",header=TRUE)

# For logistic regression, we need to change the class labels from -1 and 1 to 0 and 1. 
data2$class[data2$class == -1] <- 0
# Visualize the data
pl2 <- ggplot(data2, aes(X, Y)) + geom_point(aes(colour=factor(class),shape=factor(class))) + theme(legend.position="none")
pl2

In the above plot, you can see that there is an overlap between classes. This means that some of the samples of a class will be misclassified as the other class; these samples will be on the wrong side of the decision boundary. Let's see that. 

In [None]:
glmfit2 = glm(class ~ X + Y, data=data2, family=binomial)
summary(glmfit2)
glm_slope2 <- coef(glmfit2)[2]/(-coef(glmfit2)[3])
glm_intercept2 <- coef(glmfit2)[1]/(-coef(glmfit2)[3]) 

pl2 + geom_abline(slope=glm_slope2, intercept=glm_intercept2)

The LR model did not throw a warning; it converged to an optimal solution. The classifier does a good job, but not without mistakes. Let's compute confusion table and the accuracy:

In [None]:
# Run the model on the same data that it was trained with and get the probabilities for each sample. 
glm2.probs <- predict(glmfit2, type="response")
# create an array to hold predictions and assign all zeros initially.
glm2.pred = rep(0,length(glm2.probs))
# based on model's probablities for each sample, assign the class label.
glm2.pred[glm2.probs>0.5] <- 1
# Create a confusion table.
conftable2 <- table(glm2.pred, data2$class)
conftable2
print (paste("accuracy = ",sum(diag(conftable2))/length(glm2.pred)))

Now, we will apply the same to the third data set where classes are not linearly separable. 
**It's your turn:**

In [None]:
data3 <- read.csv("../../../datasets/toydata/data3.csv",header=TRUE)

# For logistic regression, we need to change the class labels from -1 and 1 to 0 and 1. 
<what goes in here>
# Visualize the data
pl3 <- ggplot(data3, aes(X, Y)) + geom_point(aes(colour=factor(class),shape=factor(class))) + theme(legend.position="none")
pl3

In [None]:
# find model and draw decision boundary
glmfit3 = glm(<what goes in here>)
summary(glmfit3)

glm_slope3 <- coef(glmfit3)[2]/(-coef(glmfit3)[3])
glm_intercept3 <- coef(glmfit3)[1]/(-coef(glmfit3)[3]) 

pl3 + geom_abline(slope=glm_slope3, intercept=glm_intercept3)

In [None]:
# Run the model on the same data that it was trained with and get the probabilities for each sample. 
glm3.probs <- <what goes in here>
# create an array to hold predictions and assign all zeros initially.
glm3.pred = <what goes in here>
# based on model's probablities for each sample, assign the class label.
glm3.pred[<what goes in here>] <- 1
# Create a confusion table.
conftable3 <- <what goes in here>
conftable3
# Compute accuracy
print (paste("accuracy = ",<what goes in here>))

You can see that LR can not classify this data set successfully; there are many misclassifications (classes are confused for each other). These classes are *not linearly separable*. 

Now, apply the same to the "XOR pattern" data set where we have two classes that are linearly nonseparable even though their samples seem to be nicely separated in the plot. 

**Again, it's your turn.** It seems like a lot of repetitions of the same thing for different data sets; would like to convert your code above to a function and just call it here for data set *data4.csv* ? 

In [None]:
data4 <- read.csv("../../../datasets/toydata/data4.csv",header=TRUE)

# For logistic regression, we need to change the class labels from -1 and 1 to 0 and 1. 
<what goes in here>
# Visualize the data
pl4 <- ggplot(data4, aes(X, Y)) + geom_point(aes(colour=factor(class),shape=factor(class))) + theme(legend.position="none")
pl4

In [None]:
# find model and draw decision boundary
<what goes in here>

pl4 + geom_abline(slope=glm_slope4, intercept=glm_intercept4)

In [None]:
# Run the model on the same data that it was trained with and get the probabilities for each sample. 
<what goes in here>
# create an array to hold predictions and assign all zeros initially.
<what goes in here>
# based on model's probablities for each sample, assign the class label.
<what goes in here>
# Create a confusion table.
<what goes in here>
conftable4
# Compute accuracy
print (paste("accuracy = ",<what goes in here>))

Again, just like in LDA practice, this is the worst case scenario; the classifier does not do any better than a "coin toss" (50% accuracy). Linear models can not deal with this data set. We'll need *nonlinear* models to classify this data set. 