## Libraries and Dataset

In [1]:
import numpy as np
import cv2

Dataset: [Letter Recognition(UCI)](https://archive.ics.uci.edu/dataset/59/letter+recognition)

The UCI Letter Recognition dataset is designed for training models to classify capital letters (A–Z) based on features extracted from images. It contains 20,000 samples generated from black-and-white images of letters rendered in 20 different fonts, each subjected to random distortions to simulate variability. Each sample is represented by 16 numerical attributes—such as statistical moments and edge counts—scaled to integer values between 0 and 15. These features capture aspects like the letter's position, size, pixel distribution, and edge characteristics. The dataset is commonly split into 16,000 training and 4,000 testing instances. It's particularly suitable for classification algorithms like k-Nearest Neighbors (KNN) and can be effectively utilized with OpenCV for pattern recognition tasks. 

In [3]:
data= np.loadtxt(
    fname='dataset/letter-recognition.data', 
    dtype= 'float32', 
    delimiter = ',',
    converters= {0: lambda ch: ord(ch)-ord('A')}
)

data.shape

(20000, 17)

In [6]:
#Preprocessing over the data

# Split the dataset from the half, with 10000 samples each for training and test sets
train, test = np.vsplit(data,2)

#lets extract the train labels from the real data
y_train = train[:,:1]
X_train = train[:,1:]

y_test = test[:,:1]
X_test = test[:,1:]

# Better and cleaner way to split the data from the half part and extract its labels from it
y_train, X_train = np.hsplit(train,[1])
y_test, X_test = np.hsplit(test,[1])
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)


(10000, 16) (10000, 1) (10000, 16) (10000, 1)


## Model - From Cv2 - KNN

In [None]:
knn = cv2.ml.KNearest_create()
knn.train(X_train, cv2.ml.ROW_SAMPLE, y_train)

ret, result, neighbours, dist = knn.findNearest(X_test, k=5)
# Now we check the accuracy of classification
matches = result==y_test
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print(accuracy)

93.06
