Skip to content

Handwritten text recognition using CNN with EMNIST dataset

Notifications You must be signed in to change notification settings

ShambaC/Handwritten-Text-Recognition

Repository files navigation

Handwritten Text Recognition

Handwritten text recognition using Neural Networks with EMNIST dataset. Main model is not a ConvNet.

Intro

Handwritten text recognition using various neural networks.

I am trying out multiple variations right now.

To check the details of the models, refer to Model Details

The Extended MNIST or EMNIST dataset is used to train the model. Specifically the byclass set is used as it had data for all the digits and both capital and small letters

All the scripts have comments to help people understand what the hell is going on.

🔎 How to use ?

Clone the repo and then do the following

📚 Get the EMNIST dataset :

  • Download the dataset from here
  • Extract the gzip.zip file.
  • Now from inside the gzip folder, extract the following .gz files :
    • emnist-byclass-train-images-idx3-ubyte.gz
    • emnist-byclass-train-labels-idx1-ubyte.gz
    • emnist-byclass-test-images-idx3-ubyte.gz
    • emnist-byclass-test-labels-idx1-ubyte.gz
  • Keep the binary files and delete every other file.
    • You can keep the emnist-byclass-mapping.txt file if you want to check out the label mapping. Its in the format of Label<space>ASCII.
  • Move the files remaining to the following folder in your project root : Dataset/EMNIST/

Now you are good to go with the data 👍

⚙ Install the dependencies :

  • install the requirements
  • Do in the terminal : pip install -r requirements.txt
  • Required packages are :
    • idx2numpy
    • matplotlib
    • numpy
    • opencv_python
    • Pillow
    • scikit_learn
    • tensorflow

🧾 Edit the configurations :

The config variables are located in line 18 of model.py. Change them to whatever you feel like.

⛹️‍♀️ Train the model :

Run the model.py script to train the model.

I trained the model on my PC with the following parameters :

  • learning_rate = 0.0005
  • train_epochs = 50
  • train_workers = 20
  • val_split = 0.1
  • batch_size = 100

I trained it with my GTX 1650. It used 2132 MB of GPU memory. Usage was around 7-8 %. Took about 39-44 seconds each epoch. It took 23 minutes to finish training. It stopped at 34 epochs as the validation loss wasn't improving.

CPU usage was around 55%. My CPU is Ryzen 7 3750H.

Python used 5 gigs of RAM 😥. I don't remember for which values but once the RAM usage went up to 10 gigs 😱.

You can visualise the training using tensorboard. Run tensorboard --logdir path_to_logs in terminal to start the server.

The logs are located at the following folder : Models/{timestamp}/logs

With this model I have been unable to increase the accuracy beyond 84%. Model 2 increases accuracy to 86%

⬇️ Download pre-trained models :

Model Type Test Loss Test Accuracy Download
1679033527 1 0.4489 0.8467 Download
1679036461 1 0.4381 0.8480 Download
1679220168 2 0.3837 0.8616 Download
1679378923 3 0.3655 0.8679 Download

🏃‍♂️ Run the model :

  • Run the tkRecogIndv.py script to check for individual characters only.
  • Run the tkRecogAll.py script to recognize words along with numbers.
  • Run the textrecog_ui.py script for realtime results.
    • But this needs recogScript.py to be configured.

IN BOTH CASES MAKE SURE TO EDIT THE unixTime VARIABLE TO YOUR MODEL'S FOLDER.

📸 Screenshots

All screenshots are taken with best results. Totally not biased screenshotting.

image image

Space detection between words

image

As you can see, the model cannot differentiate between capital and small 'O'.

image

NEW UI !!

👨‍🏫 A little explanation on the pre-processing of images during inference :

  • First of all each characters in the image are separated into different images.
    • This is done as the model recognizes individual letters and complete words
    • This done using a method that makes it so that words where the letters are joined together won't work (like cursive writing)
  • Then a list is created with the 4 corners co-ordinates of each character.
  • Then a check is performed to detect detection rects within characters.
    • This happens with characters having a loop. For example :
      • e, the whole character is detected and the white space as well inside the loop.
      • same goes for a, p or any character with loops.
  • Then we sort the detected character co-ordinates in the order in which they appear from left to right.
  • Then the original image is cropped to separate the characters and store them in a list.
  • Then we detect spaces. The logic is ;
    • First calculate the space between each characters.
    • Find the mean spacing.
    • Any space greater than mean spacing is a space between two words.
  • Pad the character images with white pixels to make them as close to a square as possible.
  • Resize image to 20x20.
  • Transpose image.
  • Negate image.
  • Pad image on all sides with 4 pixels, resulting in a 28x28 image.

❌ Flaws in pre-processing :

  • Small 'i' is not detected properly as the dot of the 'i' and the bar count as separate characters. Fixed with the alternate method to MSER
  • Space detection will result in wrong output if the input is a single word.
    • It will add spaces even though they are not needed.

📝 To do list :

  • Make improved models to raise the accuracy
  • Improve the preprocessing of images
  • Remove MSER completely
  • Make a better UI

✔️ All goals reached 🎉