Skip to content
R package with Logistic Regression Implementation for Binary Classification Problems
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
man
.Rbuildignore
.travis.yml
DESCRIPTION
LICENSE
LogisticRegression.Rproj
NAMESPACE
README.md

README.md

LogisticRegression

Lifecycle: stable Build Status

Logistic Regression Implementation for Binary Classification Problems

Installation

You can download the package from my Github repository:

devtools::install_github('https://github.com/AlbertoAlmuinha/LogisticRegression')

Usage

You can use the LogisticRegression package for binary classification problems. It has two exported functions to work with:

  • create_data_partition: Function to split the dataset in train-test dataset. All the dataset’s columns must be numeric.

  • logistic_regression: Function to predict over the test dataset based on the model generated on the train dataset.

The first step is to create the data partition:

OJ<-ISLR::OJ %>% select(1, c(4:12)) %>% mutate(Purchase = if_else(Purchase == 'CH', 1, 0))

library(LogisticRegression)

c(train, test) %<-% create_data_partition(OJ, 0.7)

dim(train)
## [1] 749  10
dim(test)
## [1] 321  10

Now we can predict the target variable with the logistic_regression function. This function returns a list of five elements: the predicted train target, the predicted test target, optim theta (for gradient descent algorithm), confusion matrix and confusion matrix plot (this two last only when the parameter ‘probs’ is set to FALSE). The parameters of the function are:

  • train: Train dataset.
  • test: Test dataset.
  • target: Numeric parameter indicating the position column of the target variable.
  • lr: Learning Rate for gradient descent algorithm.
  • max_iter: Max number of iterations for gradient descent algorithm.
  • probs: When FALSE, results are factors, when TRUE, results are probabilities.
  • threshold: Numeric threshold for gradient descent algorithm.
  • regu_method: Regularization method. Only available ‘ridge’.
  • regu_factor: Regularization factor for regularization method. Only applies when regu_method is not null.
logreg <- logistic_regression(train = train,
                              test = test,
                              target = 1,
                              lr = 0.1,
                              max_iter = 1000)

logreg$conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0  95  21
##          1  40 165
##                                           
##                Accuracy : 0.81            
##                  95% CI : (0.7627, 0.8514)
##     No Information Rate : 0.5794          
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.6024          
##                                           
##  Mcnemar's Test P-Value : 0.02119         
##                                           
##             Sensitivity : 0.7037          
##             Specificity : 0.8871          
##          Pos Pred Value : 0.8190          
##          Neg Pred Value : 0.8049          
##              Prevalence : 0.4206          
##          Detection Rate : 0.2960          
##    Detection Prevalence : 0.3614          
##       Balanced Accuracy : 0.7954          
##                                           
##        'Positive' Class : 0               
## 

We can also use a ridge regularization factor:

logreg_ridge <- logistic_regression(train = train,
                              test = test,
                              target = 1,
                              lr = 0.1,
                              max_iter = 1000,
                              regu_method = 'ridge',
                              regu_factor = 0.1)

logreg_ridge$conf_matrix
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0  96  22
##          1  39 164
##                                           
##                Accuracy : 0.81            
##                  95% CI : (0.7627, 0.8514)
##     No Information Rate : 0.5794          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6032          
##                                           
##  Mcnemar's Test P-Value : 0.0405          
##                                           
##             Sensitivity : 0.7111          
##             Specificity : 0.8817          
##          Pos Pred Value : 0.8136          
##          Neg Pred Value : 0.8079          
##              Prevalence : 0.4206          
##          Detection Rate : 0.2991          
##    Detection Prevalence : 0.3676          
##       Balanced Accuracy : 0.7964          
##                                           
##        'Positive' Class : 0               
## 

LogisticRegression Shiny App

You can access the shiny LogisticRegression App from my Github Repository .

Issues

You can contact me if you find some bug, error or doubt here .

License

LogisticRegression is licensed under the GNU General Public License v3.0.

Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.

You can’t perform that action at this time.