Skip to content

Latest commit

History

History
18 lines (9 loc) 路 2.08 KB

README.md

File metadata and controls

18 lines (9 loc) 路 2.08 KB

Optical Character Recognition

alt text

This notebook is the source code used for the submissions for Kaggle Competition. The goal of the competition is to classify images from the MNIST handwritten digit database. The solution developed produced 99.928% accuracy and got me to 60th place. (top 3%).

The approach I took was to perform an Exploratory Data Analysis which enabled me to notice that the data was not noisy but that not all pixels in the images were useful. Thus I could do some Dimensionality Reduction. I build a simple Convolutional Neural Network (CNN) using Keras. The steps I followed are (as described in the Jupyter Notebook) to do normalization, reshaping, data augmentation and training with an Adam Optimizer and a ReduceLROnPlateau callback.

alt text

This directory is an attempt at recognizing CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) images. Built in 1997 as way for users to identify and block bots (in order to prevent spam, DDOS etc.). They have since then been replace by reCAPTCHA because they are breakable using Artificial Intelligence as we will see.

The approach taken by the CAPTCHA creators to make the task of classifying the images impossible for computers, is to distort the letters. Thus the letters have noise, extra lines crossing the words... To solve this, I built a conventional CNN with a twist. I stacked at the deeper level of the model, 5 branching Convolutional Layers such that each one would be specifically trained to classify a single letter.