<a href='https://www.hexnbit.com/'> <img src='https://www.hexnbit.com/wp-content/uploads/2019/09/hexnbit_final_66px.png'/> </a>

# Optical Character Recognition (OCR)
It is a process to convert images of typed, handwritten or printed text into machine-encoded text. The images may be from scanned documents, photo of a documents, a scene-photo (number plate of car, speed and warning road signs, etc)

Here, we will use an image containing samples of digits from 0-9 to train the model.
Once the model is trained, we will select a radom number and recognize the value of the number using the KNN model

### Importing Libraries

In [1]:
import numpy as np
import cv2

### Reading Image
Image contains numbers from 0 to 9. The image has 50 rows (5 rows for each number) and 100 columns (10 columns for each number).<br>
Hence, each number has 500 samples. Each digit sample spans 20 pixels by 20 pixels.

In [2]:
path="digits.png"
img=cv2.imread(path,0)

### Displaying Image

In [3]:
cv2.namedWindow("Digit Image", cv2.WINDOW_NORMAL)  # to make the window manually resizeable
cv2.imshow("Digit Image",img)
cv2.waitKey(0)
cv2.destroyAllWindows()

### Splitting Rows and Columns in order to separate all digits

In [4]:
# hsplit: splits the numpy array into multiple arrays column wise
# vsplit: splits the numpy array into multiple arrays row wise
# since, 50 rows are there and 100 columns are there, if we split complete image into 50 rows and 100 columns,
# we will be able to index each digit separatly

cells = [np.hsplit(row,100) for row in np.vsplit(img,50)]
# Convert it into a Numpy array. Size will be (50,100,20,20)
digit_database= np.array(cells)
print(digit_database.shape)

(50, 100, 20, 20)


### Selecting a digit, printing and displaying it

In [5]:
selected_digit=digit_database[0][0] # digit at row 0, column zero
print(selected_digit)
cv2.imshow("Digit",selected_digit)
cv2.waitKey(0)
cv2.destroyAllWindows()

[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   9  33   9   0   0   0   0
    0   0]
 [  0   0   0   0   0   0   0   0   0   0  41 177 249 178  29   0   0   0
    0   0]
 [  0   0   0   0   0   0   0   0   0  33 198 255 240 255 107   0   0   0
    0   0]
 [  0   0   0   0   0   0   0   1  70 199 255 255 197 154 253  98   0   0
    0   0]
 [  0   0   0   0   0   0   0  45 238 255 205 224 222  83 224 128   0   0
    0   0]
 [  0   0   0   0   0   0  25 202 255 193  40  99  54   0 190 197  16   0
    0   0]
 [  0   0   0   0   0  20 163 246 152  72   0   0   0   0 184 252  74   0
    0   0]
 [  0   0   0   0   0  97 255 118   0   1   0   0   0   0 184 255  82   0
    0   0]
 [  0   0   0   0  20 218 216  17   0   0   0   0   0   0 183 255  78   0
    0   0]
 [  0   0   0   0  67 255 138   0   0   0   0   0   0  24 215 188

### Creating Training and Testing Dataset
0-49 columns selected as training data<br>
50 to 99 columns selected as testing data

In [6]:
digit_database.ndim  # digit_database is 4 dimensional

4

In [7]:
X_train = digit_database[:,:50].reshape(-1,400).astype(np.float32) # Shape = (2500,400)
X_test = digit_database[:,50:100].reshape(-1,400).astype(np.float32) # Shape = (2500,400)

In [8]:
X_train.ndim  # we reshaped slice of digit_database, hence X_train and X_test are 2 dimensional

2

In [9]:
X_train.shape # X_train and X_test are of shape (2500,400) as we have 50 rows and 50 columns in each and each element is 20x20

(2500, 400)

### Creating Labels
We know that in traning data there were initially 5 rows for each digit and 50 columns for each digit and since we reshaped it, we have 250 rows for each digit.<br>
There are numbers from 0 to 9, so we need to create a numpy array which holds each digit 250 times

#### Dummy (Only to demonstrate)

In [10]:
k=np.arange(10)
Y_train=np.repeat(k,3)
Y_train=np.repeat(k,3)[:,np.newaxis]
Y_train

array([[0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [2],
       [2],
       [2],
       [3],
       [3],
       [3],
       [4],
       [4],
       [4],
       [5],
       [5],
       [5],
       [6],
       [6],
       [6],
       [7],
       [7],
       [7],
       [8],
       [8],
       [8],
       [9],
       [9],
       [9]])

#### Actual Labels

In [11]:
k = np.arange(10)
Y_train = np.repeat(k,250)[:,np.newaxis]

In [12]:
# Since testing data also has data arranged in similar fashion, we are copying the labels
Y_test = Y_train.copy()

### Creating Model, Training and Predicting

In [13]:
knn =  cv2.ml.KNearest_create()  # initilazing
knn.train(X_train,cv2.ml.ROW_SAMPLE,Y_train)  # training model
ret,result,neighbours,dist = knn.findNearest(X_test,k=5)  # predicting results for testing data with K=5

### Evaluating Performance

In [14]:
result==Y_test

array([[ True],
       [ True],
       [ True],
       ...,
       [ True],
       [ True],
       [ True]])

In [15]:
matches = (result==Y_test)
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print(accuracy)

91.76


### Handpicking a sample

In [16]:
# Selecting Sample at index 600 and reshaping it to 20x20 to display it
cv2.imshow("Sample 1",X_test[479].reshape(20,20))
cv2.waitKey(0)
cv2.destroyAllWindows()

### Recognizing which digit was selected

In [17]:
ret,result,neighbours,dist=knn.findNearest(X_test[479:480],k=5)

In [18]:
print(ret)
print("Digit Recognized as: ",result)
print("Nearest Neighbors: ",neighbours)
print("Distance from Nearest Neighbors: ",dist)

1.0
Digit Recognized as:  [[1.]]
Nearest Neighbors:  [[1. 1. 1. 1. 1.]]
Distance from Nearest Neighbors:  [[116267. 133869. 147886. 148969. 169892.]]
