# Proyecto N°1: Clasificación con método _Bag of Features_

This project will be evaluated in class groups. The evaluation metrics are based on your presentation and code provided in your attachments. Your solutions to the following problem should include commented source code written in Python. You can use additional modules if necessary. Analyze your results. If you discover something interesting, let us know!

The past decade has seen the growing popularity of Bag of Features (BoF) approaches to many computer vision tasks, including image classification, video search, robot localization, and texture recognition. Part of the appeal is simplicity. BoF methods are based on orderless collections of quantized local image descriptors; they discard spatial information and are therefore conceptually and computationally simpler than many alternative methods [here](https://arxiv.org/pdf/1101.3354.pdf).

For this project, you have to implement the Bag of Features representation for a classification problem. To accomplish this, execute the following tasks in the given order:

1. Download CIFAR-10 dataset [here](https://www.kaggle.com/c/cifar-10/data). You have to select at least 4 classes to classify.
2. For each image, compute Bag of features descriptors.
3. Select a classifier using the BoF as input data, Options are Neural Network and KNN
4. Evaluate the classifier using the testing set.
5. Compute performance metrics creating a confusion matrix.

For **task #3**, you have to read and implement the approach presented [here](https://arxiv.org/pdf/1101.3354.pdf), and you can look for additional information on internet. For **task #6** you should measure model performance with unseen data for this classification problem.

### 1. Importación de módulos y librerias.

In [43]:
import numpy as np
import pandas as pd
import os
from collections import defaultdict

### 2. Selección de imagenes de entrenamiento (4 categorías).

#### 2.A Lectura del archivo de etiquetas .csv (_Comma separated values_)

In [42]:
# Abrimos el archivo e imprimimos la cabecera para verificar.
filename = 'trainLabels.csv'
labels = pd.read_csv(filename)
print(labels.head())

   id       label
0   1        frog
1   2       truck
2   3       truck
3   4        deer
4   5  automobile


#### 2.B Lectura de carpeta con imagenes

In [38]:
# Ruta de la carpeta con los archivos
path = 'train'
train_list = os.listdir(path)
print('Lista de imagenes de entrenamiento - Clase "{}" | Tamaño: {}'.format(type(train_list).__name__, len(train_list)))

Lista de imagenes de entrenamiento - Clase "list" | Tamaño: 50000


#### 2.C Selección de imagenes en base a etiquetas elegidas.

In [83]:
# Lista de categorias seleccionadas
categorias = ['cat', 'frog', 'truck', 'ship']
# Diccionario para almacenar los nombres de imagen como listas por defecto
dictTrain = defaultdict(list)

# Filtro en base a las categorias.
boolCateg = labels[labels['label'].isin(categorias)]
print('Tamaño del set de entrenamiento en base a categorías: {}'.format(boolCateg.shape[0]))

# Diccionario en base a filtro.
for index, label in boolCateg.values :
    dictTrain[label].append(train_list[index])
    
print('Llaves del diccionario de entrenamiento: {}'.format(list(dictTrain.keys())))

Tamaño del set de entrenamiento en base a categorías: 20000
Llaves del diccionario de entrenamiento: ['frog', 'truck', 'ship', 'cat']
