<h1>Sparse Representation Classificator (SRC)</h1>

 <h2> $\min \|s\|_1 \quad \text{subject to} \quad y = \Theta s$ </h2>
<h2> Where s is the sparse vector, y is the measument and  $\Theta$ the measument matrix.

<h3> To classify a test image, we aim to find the sparsest vector 𝑠, where 𝑦 is a vectorized image and Θ is a collection of vectorized images of the 10 different handwritten digits, ordered sequentially.
The positions in
𝑠
s with the highest values correspond to the columns in
Θ
that most closely represent the input image.
We have
𝑛
images for each digit. The classification is given by:</h3>

<h3>
$\lfloor \frac{i}{n} \rfloor $ where $ s_i = \max_{s_j \in s} s_j$ </h3>.


<h1>Libraries</h1>

In [1]:
# For vector and matrix manipulation
import numpy as np

# For import dataset
import tensorflow as tf

# For image manipulation
from PIL import Image

# For optimazation problem
from sklearn.linear_model import Lasso

In [2]:
# Data from tensorflow
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
# All digit should have the same quantity of images, i choose 100 images

def get_indexes(lst, element):
    return [index for index, value in enumerate(lst) if value == element]

def reduct_samples(lst, x_train, elements, n):
  from random import randint

  for element in elements:

    indices = get_indexes(lst, element)
    while len(indices) > n:

      indices = get_indexes(lst, element)
      random_index = randint(0, len(indices) - 1)

      lst = np.delete(lst, indices[random_index])
      x_train = np.delete(x_train, indices[random_index], axis = 0)
      indices.pop(random_index)


  return (lst, x_train)

# 100 images
(lst, x_train) = reduct_samples(y_train, x_train, range(10), 100)

# new y_train with reducted samples
y_train = lst


# Vectorizing and concatenation in order ( 100 images for 0, 100 images for 1, 100 images for 2 ...)
columns = []
for i in range(10):

  indices = get_indexes(y_train, i)

  for index in indices:

    image = x_train[index]
    image_flatten = image.flatten().reshape(-1, 1)
    columns.append(image_flatten)

Theta = np.hstack(columns)

In [3]:

# I save the Theta matrix in a image so i dont need to run the reduction sample algorithm
Theta = np.array(Image.open("Theta.png"))

In [4]:
from sklearn.linear_model import LassoLars
import numpy as np


# less alpha = less sparse
model = LassoLars(alpha=0.001, fit_intercept=False)


In [None]:
test = []

# 100 tests
for im, tr  in zip(x_test[:300], y_test[:300]):

  model.fit(Theta, im.reshape(-1, 1))
  s_sparse = model.coef_
  pos = s_sparse.argmax() // 100

  if(pos == tr):
    test.append(1)

  else:
    test.append(0)


# 77% of correct classifications alpha = 0.001

sum(test)/len(test) * 100