Skip to content

An algorithm written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries.

License

Notifications You must be signed in to change notification settings

cuulee/Typefont

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typefont

Here I’m working on this algorithm that tries to recognize the font of a text in a photo. My goal is to obtain accurate results with the image as only input avoiding other manual processes.

Usage

Import the compiled module then call the main function like in the following script. The first argument can be: a string with the path of the image, a string with the base64 data of the image, the instance of a canvas or the instance of a image.

import Typefont from "app";

Typefont("path/image.png")
    .then((res) => console.log(res))
    .catch((err) => console.log(err));

You can build the project by using webpack.

webpack src/app.js build/app.js

Preview

Text on the cover of a book (texts are in italian because I live in Italy).

Text on the cover of another book.

Screenshot of a text on a video from the web.

Each font in the result has a percentage of similarity with the input image and a piece of information about the font.

Why

I had just discovered the version of Tesseract written in JavaScript and I noticed that he was also trying to identify the font, I wondered how to improve this process then I used Tesseract to extract the letters from the input image, I created a new system that uses the Jimp image processing library to compare the extracted letters with the fonts stored in a dedicated library.

How it works?

The input image is passed to the optical character recognition after some filters based on its brightness. Then the symbols (letters) are extracted from the input image and compared with the symbols of the fonts in the database using a perceptual (Hamming distance) comparison and a pixel based comparison in order to obtain a percentage of similarity.

The symbols of fonts are just a JSON structure with letters as keys and the base64 of the image of the letter as value. If you want to add a new font you must follow this structure.

{
    "meta": {
        "name": "name",
        "author": "author",
        "uri": "uri",
        "license": "license",
        "key": "value",
        ...
    },
    "alpha": {
        "a": "base64",
        "b": "base64",
        "c": "base64",
        ...
    }
}

Each key of the meta object is included in the final result.

License

MIT License.

About

An algorithm written entirely in JavaScript that recognises the font of a text in a image using the Tesseract optical character recognition engine and some image processing libraries.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 98.5%
  • HTML 1.5%