# Image Classification
## Challenges:
- Semantic Gap    
What an image that the computer sees is just a big grid of numbers between [0, 255](RGB)
How to map this grid into a semantically meaningful category label?
- Viewpoint Variation    
And if we change the image slightly(e.g. same individual change an angle), the grid of pixel values would change greatly.    
How to make a robust map to deal with the changes?    
- Intraclass Variation    
And we also need to deal with changes within the category(different individual)     
- Fine-Grained Categories
- Background Clutter
- Illumination Changes
- Deformation
e.g. different poses
- Occlusion
e.g. objects covered

## Motivation: Building Block for other tasks
Image Classification can be a block for constructing other algorithms      
->Object Detection: One way to perform object detection is to slide windows for image classification     
->Image Captioning: Given an image, we want to write a sentence to describe it. The task can be a sequence of classification problems      
->...

## Machine Learning: Data-Driven Approach
1. Collect a dataset of images and labels
2. Use machine learning to train a classifier
```python
def train(images, labels):
    #Machine learning
    return model
```
3. Evaluate the classifier on new images
```python
def predict(model, test_images):
    # Use model to predict labels
    return test_labels
```

Different from classic programming pipeline that is driven by aquired knowledge.


## Datasets
- MN
- CIFAR10
- CIFAR100
- ImageNet

## First classifier: Nearest Neighbor
### Simple Nearest Neighbor
- The training function is trivial, only to memorize all the data and labels
- The predict function takes in the image and returns the label of most similar training image

Therefore we first need a distance metric to compute the semantic similarity.
#### Distance Metric to compare images
- L1 distance: $d_1(I_1, I_2) = \sum_p|I_1^p-I_2^p|$
![Alt text](image-1.png)


In [None]:
import numpy as np

class NearestNeighbor:
    def __init__(self):
        pass

    def train(self, X, y):
        """X is N x D where each row is an example. y is 1-d of size N"""
        self.Xtr = X
        self.ytr = y
    
    def predict(self, X):
        """X is N x D where each row is an example we wish to predict label for"""
        num_test = X.shape[0]
        # make sure the output type matches
        Ypred = np.zeros(num_test, dtype=self.ytr.dtype)

        #loop over all test rows
        for i in range(num_test):
            distances = np.sum(np.abs(self.Xtr[i] - X[i]), axis=1)
            min_index = np.argmin(distances)
            Ypred[i] = self.ytr[min_index]
            
        return Ypred



#### Drawbacks using simple nearest neighbor
- Training complexity: $O(1)$
- Testing complexity: $O(N)$     
We can afford slow training, but want fast testing/predicting!

Using this method we can get training images that looks like the testing images, but the classfier actually does not know what it's looking for(not get the semantic meaning).
![Alt text](image-2.png)

#### Dicision Boundaries
Decision boundary is the boundary between 2 classification region.    
They can be noisy and subject to outliers.   
We want to smooth out the decision boundaries and gives more robust classification
![Alt text](image-3.png)


### K-Nearest Neighbors
Using more neighbors helps reduce the effect of outliers.   
But this may also creates ties between classifications that we need to break.

#### Distance Metric
L2 distance: $d_2(I_1, I_2) = \sqrt{\sum_p(I_1^p-I_2^p)^2}$
![Alt text](image-4.png)
The semantic meaning of using L2 metric is not intuitively clear, but with L2 distance we can apply K-Nearest Neighbors to any type of data.

### Hyperparameters
Choices about our learning algorithm that we donot learn from the training data.
- The best value of K to use
- The best distance metric to use
Very problem-dependent
#### Setting hyperparameters