Skip to content

mewto is an R package that allows you to experiment with different thresholds for classification of prediction results in the case of binary classification problems and visualize various model evaluation metrics, confusion matrices and the ROC curve. It also allows you to calculate the optimal threshold based on a weighted evaluation criterion.

Notifications You must be signed in to change notification settings

alexandrumonahov/mewto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mewto

Model Evaluation with Weighted Threshold Optimization

mewto is an R package that allows users to experiment with different thresholds for classification of prediction results in the case of binary classification problems and to interactively visualize model evaluation metrics, confusion matrices, the ROC and PR curves. It can also calculate the optimal threshold based on a weighted evaluation criterion and displays related performance metrics.

v1.1.0

What's new?

  • PR curve added to the visualization options
  • UI layout changed to accomodate multiple visualizations
  • code rewriten with optimization in mind: load times signifficantly reduced
  • minor corrections in function documentations

About mewto

mewto currently consists of two functions:

mewtoApp

This function launches a Shiny application where the user can interactively manipulate the threshold used in binary classification and view the associated metrics, confusion matrix and ROC curve. The app also allows for optimal threshold calculation according to a weighted version of Youden's J-statistic.

In R, simply call the function:

mewtoApp(actuals, probabilities)

actuals - Data of factor type with two levels: "yes" for positive and "no" for negative.

probabilities - Data of numeric type which should represent the probabilities of realization of the positive category.

mewtoThresh

This function calculates the optimal threshold according to a weighted version of Youden's J-statistic.

mewtoThresh(actuals, probabilities, weight)

actuals - Data of factor type with two levels: "yes" for positive and "no" for negative.

probabilities - Data of numeric type which should represent the probabilities of realization of the positive category.

weight - The importance attributed to sensitivity, or formulated differently, to the maximization of the true positives rate.

Example

This example generates 100000 observations of actual data (labelled "no", "yes"), as well as predicted values (ranging between 0 and 1), and stores tham in a dataframe called "df". The user can then call the functions included in the mewto package to perform exploratory analysis or obtain the optimum threshold value according to the weighted Youden J-statistic.

# Generate dataset
set.seed(123)
nobs = 100000 # Select the number of observations to be generated
predicted <- runif(nobs, 0, 1) # Probabilities representing the predicted values
thresh <- runif(nobs, 0.2, 0.8) # Intermediary step to generate actuals 
df <- data.frame(predicted, thresh) # "predicted" and "thresh" combined in "df"
df$actuals <- c("no", "yes")[(df$predicted >= df$thresh) + 1] # Actual data

# Call the mewto library
library(mewto)

# Run mewtoApp to launch the visual interface and experiment
mewtoApp(df$actuals, df$predicted)

# Run mewtoThresh with a weight of 0.5 to obtain the optimal threshold according to Youden's original J-statistic
mewtoThresh(df$actuals, df$predicted, weight=0.5)

Technical details

In the calculation of the optimal threshold, a weighted version of Youden's J-statistic (Youden, 1950) is employed. The optimal cut-off is the threshold that maximizes the distance to the identity (diagonal) line. The function maximizes the metric:

w * sensitivity + (1 - w) * specificity, where "w" is the "weight" parameter.

Youden's J-statistic has been modified by adding the weighting parameter "w". The statistic varies in the interval [0;1]. Given a weighting factor w = 0.5, the weighted optimization function produces the same result as Youden's original J statistic. This particular statistic has been chosen since it is well-suited for weighting, and it is also the default criterion used in the R package pROC.

Download and installation

Online, from Github:

You can download mewto directly from Github. To do so, you need to have the devtools pachage installed and loaded. Once you are in R, run the following commands:

install.packages("devtools")

library("devtools")

install_github("alexandrumonahov/mewto")

You may face downloading errors from Github if you are behind a forewall or there are https download restrictions. To avoid this, you can try running the following commands:

options(download.file.method = "libcurl")

options(download.file.method = "wininet")

Offline, by manually downloading and installing the package files:

Alternatively, if you cannot download the file through Github, you may also download the binary package file from the link below:

https://github.com/alexandrumonahov/zip/blob/main/mewto.zip

Place the downloaded file into the working directory of R. The do one of the following:

Option 1) Run the following command:

install.packages('mewto_1.0.zip', repos = NULL, type = "win.binary")

Option 2) In RStudio:

Go to the Packages tab in the bottom-right pane and click on "Install". In the pop-up window that appears, click on "Browse" and choose the package mewto.zip that you have just downloaded. Click on "Install".

Once the package is stalled, you can run it using the: library(mewto) command.

Special thanks!

I would like to give special thanks to Prof. Stefan Bender, Dr. Jens Mehrhoff, Gabriela Alves Werb and the Bundesbank ICBD team for having inspired the creation of this package.

Version history

v1.0.0

  • mewto's application launch
  • interactive threshold component added
  • weighted optimization algorithm developped based on Youden's J-Statistic
  • confusion matrix and performance metrics analysis included
  • ROC curve visualization augmented to display user's threshold selection on the curve

Author details

Alexandru Monahov, 2021

About

mewto is an R package that allows you to experiment with different thresholds for classification of prediction results in the case of binary classification problems and visualize various model evaluation metrics, confusion matrices and the ROC curve. It also allows you to calculate the optimal threshold based on a weighted evaluation criterion.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages