Skip to content

Shiny application for Optical Image Recognition of images using tesseract

Notifications You must be signed in to change notification settings

JesseVent/shiny-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shiny Tesseract

OCR (Optical Character Recognition) With R and Shiny

Introduction

Built in R with use of the Shiny package, and version 4.0 of the (Tesseract OCR engine)[https://github.com/tesseract-ocr/] provided through the Tesseract R Package.

This application allows you to upload an image, render the image in the application, where you can 'brush' (drag and select) over the parts of the image containing the text you want to extract.

The text selected will then display below the image.

About Tesseract 4.0

Tesseract 4.0 includes a new neural network-based recognition engine that delivers significantly higher accuracy (on document images) than the previous versions, in return for a significant increase in required compute power. On complex languages however, it may actually be faster than base Tesseract.

Example

An example can be found hosted here on jessevent.shinyapps.io/tesseract/

library(shiny)

# Easiest way is to use runGitHub
runGitHub("shiny-tesseract", "jessevent")

Accuracy

Usage

The following dependencies are required

install.packages("shiny")
install.packages("shinydashboard")
install.packages("magick")
install.packages("tesseract")

shiny::runApp()

Next Steps

  • Add in PDF support
  • Be able to brush multiple regions Needs help

Happy for any other feedback or thoughts.

Releases

No releases published

Packages

No packages published