Skip to content

codeclown/tesseract.js-node

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tesseract.js-node

A focused node-only version of tesseract.js.

Why?

tesseract.js is developed for both node and browser, and includes (in my opinion) bloated functionality like automatic downloading of traineddata-files in the background.

At the time of writing, it also does not have any tests for node-environment (only browser). Example issue where this matters: naptha/tesseract.js#339.

I just wanted a way to use Tesseract 4.0 in a node project without all this extra functionality and background downloads from third-party servers.

Usage

Download traineddata-files from somewhere, e.g. officially:

mkdir tessdata
cd tessdata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata
curl -O -L https://github.com/tesseract-ocr/tessdata_fast/raw/master/fin.traineddata

Then use the library in a node project:

const getWorker = require('tesseract.js-node');
const worker = await getWorker({
  tessdata: '/path/to/tessdata',    // where .traineddata-files are located
  languages: ['eng', 'fin']         // languages to load
});
const text = await worker.recognize('/path/to/image', 'eng');

You can supply the input image in various ways:

// path to image
const text = await worker.recognize('/path/to/image', 'eng');
// Buffer
const text = await worker.recognize(fs.readFileSync('/path/to/image'), 'eng');
// Buffer (from node-canvas)
const text = await worker.recognize(canvas.toBuffer('image/png'), 'eng');

See tesseract.test.js for other examples.

Development

npm test

Useful resources:

Credits

Thanks to tesseract.js-core contributors for the groundwork!

License

Apache License 2.0

About

A focused, tested node-only version of tesseract.js.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published