FOR MORE INFO ABOUT THE OLD API (DECRYPTR LEGACY), SEE V0.1
decryptr is an R package to break captchas. It is also an extensible tool built in a way that enables anyone to contribute with their own captcha-breaking code.
decryptr, simply run the code below:
if (!require(devtools)) install.packages("devtools") devtools::install_github("decryptr/decryptr")
decryptr has functions for downloading and breaking captchas from multiple known sources. If you wanted to use this package with, let's say, a TRT (Regional Worker's Court), you could go by the following steps:
# Download captcha from TRT file <- download_captcha("trt", path = "./img") # Break captcha decrypt(file, model = "trt")
##  "yrt67d"
Simple, right? The
decrypt() funcion is this package's workhorse: it is able to take a captcha (either the path to a captcha file or a captcha object read with
read_captcha()) and break it with a model (either the name of a known model, the path to a model file or a model object created with
If you'd like to visualize a captcha and make sure the decryption is working, you can use the
plot() funcion to draw out the captcha image:
# Read captcha captcha <- read_captcha(file) # Plot captcha plot(captcha)
If you want to learn more about the models that already come packaged with
decryptr, check out
load_model()'s documentation (and all of these models also have a corresponding
download_captcha() method so you're always good to go).
If you're willing to create your own custom captcha-breaking models, there are some other functions you might want to know about.
classify() allows the user to manually answer a list of captchas, while
train_model() takes a bunch of classified captchas and trains a
keras model on them.
classify() has two modes: static and interactive. If you already know the answers to all captchas, simply turn them into a string vector and pass it onto the
answers argument; on the other hand, if you're going to manually classify the captchas,
classify() will plot every captcha and prompt you in the console for their answers. In the snippet below, I use static classification to label a set of 10 captchas:
# URL of a captcha (for illustrative purposes I'll be using # TRT's URL, but you can use whichever URL you want) url <- "https://pje.trt4.jus.br/consultaprocessual/seam/resource/captcha" # Download captcha from URL files <- download_captcha(url, n = 10, path = "./img") # Answers to downloaded captchas answers <- c( "ew3h3n", "ew3h3n", "da7522", "da7522", "w8kerh", "mh52v3", "ny248u", "56nwr5", "7tx6dy", "n3fue6") # Classify captchas (if answers weren't supplied, # I'd be promped for interactive classification) new_files <- classify(files, answers, path = "./img")
Now that we have a set of classified captchas, we can use them to train a captcha-breaking model.
classify() used our answers to create a new version of each file, one with the answer at the end of the filename separated by an underscore;
read_captcha() has the
ans_in_path argument that tells it to look for the answers in the filenames and create the captcha objects accordingly.
With this list of labeled captcha objects, we can call
train_model() to generate a model. The model gets automatically saved to disk so that we can load it later with
# Read answered captchas captchas <- read_captcha(new_files, ans_in_path = TRUE) # Use captchas to train a model model <- train_model(captchas, verbose = FALSE) # Use our new model for decryption decrypt(file, model = model)
##  "jre652"
# We could also have loaded the model from disk model <- load_model("./model.hdf5")
Once loaded to memory,
keras models run very quickly Also, we don't run any pre-processing on the image, so decryption is blazing fast.
microbenchmark::microbenchmark(decrypt = decrypt(captcha, model))
## Unit: milliseconds ## expr min lq mean median uq max neval ## decrypt 8.333295 9.256788 10.96834 9.713691 10.47587 106.873 100