Skip to content

Octet is an exploratory OCR or text recognition library to prepare and train upon raw data

License

Notifications You must be signed in to change notification settings

Aadv1k/OctetOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OctetOCR

Octet is an OCR model training system, this includes functionality for both preparing and training the data.

This system implements a rudimentary K-NN function to predict the similarity of an OctetCharacter which contains the raw image data (unsigned char*) against a pre-computed array of characters derived from dataset/

NOTE Library under active development, things may change or break. I also appreciate any input on the code, cheers!

See:

Examples

See example.c for how you might use this library

Prepare

Loading a dataset from a folder is slow since everytime pre-processing has to be applied. Hence it is adviced that the data be serialized as a CSV and read from there instead, here is what that may look like :-

#include <octet.h>
  /* ... */

  OctetData* data;

  if (!dataFileExists) {
    data = octet_load_training_data_from_dir("./dataset");
    octet_write_training_data_to_csv("./data.csv");
  } else {
    data = octet_load_training_data_from_csv("./data.csv");
  }

  octet_free_training_data(data);

  /* ... */

Train

#include <octet.h>

  /* ... */

  OctetData* trainingData = octet_load_training_data_from_dir("./dataset");
  OctetCharacter testCharacter = octet_load_character_from_image("./tests/test_data/test-A.jpg");

  char predictedLabel = octet_k_nearest_neighbour(testCharacter, trainingData, /* k */ 3);
  assert(predictedLabel == 'A');

  octet_free_character(testCharacter);
  octet_free_training_data(trainingData);

  /* ... */

Gallery

The program has been tested against the following samples, feel free to add to the ./dataset, if you want an alt version of "A" in the dataset, jut name it something like A1.jpg

Input Training Data Match
A
B

Test

Win32

.\build.bat TEST
.\octet_test.exe

Unix

Credit to rarafael for the shell script

./build.sh TEST
./octet_test

Credits

Credit to rarafael for

  • Providing the build.sh script
  • Implementing octet_load_training_data_from_csv at src/prep.c
  • General quality of life improvements

About

Octet is an exploratory OCR or text recognition library to prepare and train upon raw data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages