Around 2005 I started to be interested in optical character recognition. At work I was participating in a document management project and at the same time wanted to expand Copying Machine with OCR capabilities. In the end Copying Machine used the open source Tesseract engine, but I still work occasionally at my own experimental version. I found the field of computer vision and machine learning intruiging and challenging. And this experiment combines both of them.
The source is fairly structured but a bit messy due to all the trial and error I did on the algorithmes. I hope you learn from it and I would love to discuss improvements with anyone who is interested.