Handwritten text recognition
Handwritten text recognition using Long Short Term Memory implementation of RNN. Honestly I don't know what's happening here. The IAM Handwriting Database is used to train the network. Specifically the words dataset.
Clone the repo and then do the following.
- Make an account on the database website.
- Download the words archive
- Extract the contents to the following directory structure on the root of this project :
Datasets/words
- So the resulting folder structure should be :
Datasets |_words |_a01 |_a01-000u |_images.png |_a01-000x |_... |_a02 |_...
- So the resulting folder structure should be :
- Now you need the labels for these images.
- Download the ascii archive
- Extract only the words.txt file to the
Datasets
folder.
- install the requirements.
- Do in terminal :
pip install -r requirements.txt
- The required packages are :
- numpy : 1.22.4
- opencv_python : 4.6.0.66
- pandas : 1.3.2
- Pillow : 9.4.0
- tensorflow : 2.10.0
- tqdm : 4.62.3
- Open the
config.py
file. - Make whatever changes you want.
- Mainly you might want to make changes to the following variables :
- batch_size
- learning_rate
- train_epochs
- train_workers
- validation_split
- Run the
train.py
script to train the model. - No need for editing anything in this file.
- I trained the model with the following parameters :
- learning rate : 0.0005
- validation split : 0.9
- batch size = 16
- epochs = 100
- I trained it with my GTX 1650. It used 2132 MB of GPU memory. Usage was around 8-10 %. Took around 140-150 seconds each epoch. Took about 4 hours for all 100 epochs.
- CPU usage was around 50%. My CPU is Ryzen 7 3750H.
- RAM usage was around 2 GBs.
- You can visualise the training using tensorboard. Run
tensorboard --logdir = path_to_logs
in terminal to start the server. - The logs are located at the following folder :
Models/Handwriting_recognition/{timestamp}/logs
- After training, a bunch of files are generated. The only important files are
model.meow
andconfigs.meow
.
- You can run the
inferenceModel.py
to check with only one image from the dataset itself. - OR use the Paint GUI made with tkinter by running the
tkRecog.py
.
Totally not copied code.