## AI Saturdays, Boise - Week 6 (June 30, 2018)
#### AI Developers Boise
<img src="ai6-banner.png" height=400 width=650/>



## Acknowledgements
<img src="metageek-logo.svg" height=200 width=400/>
- **For supporting the workshop and sponsoring for this week's refreshment for participants**

## Notes from Lecture 2, CS231n

Prepared by:
- Ashish Sharma <accssharma@gmail.com>

## Image Classification

- Core task in computer vision
- Easy for humans - experience, pre-existing knowledge about objects, extremely good ability to extract and understand abstract ideas via vision
- For computers, image classification is very challenging
    - Image = just a grid of pixel values
    - View point
    - Illumination
    - Deformation
    - Occlusion
    - Background color
    - Interclass (also intra-class) variations
    
    
- No obvious way to hard-code algorithm to recognize objects, explicitly programming is challenging and will result in limited capability

- Solution?
    - Data driven approach
        - Collect dataset of images and labels
        - Use ML to train a classifier
        - Evaluate the classifier on new images
        
#### Simplest Classifier: Nearest Neighbor
- Memorize all data and labels
- predict the label based on the similarity metric


#### Distance metrics
- L1 distance (sum of pixel-wise absolute value differences)
- L2 (Euclidean) distance (square-root of pixel-wise squared difference)
- Cosine similarity




## Nearest Neighbor Classifier: Skeleton

In [49]:
import numpy as np

class NearestNeighbor:
    def __init__(self):
        pass
    
    def train(self, X, y):
        """Memorize training data"""
        self.X = X
        self.y = y
        self.tr_size = self.X.shape[0]
        self.dim = self.X.shape[1]
    
    def __similarity(self, frm):
        dist = np.sum(np.abs(self.X - frm), axis=1)
        return dist
        
    def predict(self, X_pred):
        """"""
        assert type(X_pred) in [list, np.ndarray]
        if type(X_pred) == list:
            X_pred = np.array(X_pred)
        assert X_pred.shape[0] > 0
        assert type(X_pred[0])  in [list, np.ndarray]
        
        num_test = X_pred.shape[0]
        y_pred = np.zeros(num_test, dtype=self.y.dtype)
        
        for ind, x_pred in enumerate(X_pred):
            dist = self.__similarity(x_pred)
            print(dist)
            assert len(dist) == self.tr_size
            min_index = np.argmin(dist)
            y_pred[ind] = self.y[min_index]
        return y_pred

In [50]:
X = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9], [10, 11, 12]])
y = np.array([1,1,0,0])
X_pred = [[1,7,9]]

In [51]:
nn = NearestNeighbor()
nn.train(X, y)
nn.predict(X_pred)

[11  8  7 16]


array([0])

### Runtime
Train: O(1)
Test: O(N)

- Exactly opposite of generally what we want. Taking some time for training is ok, but we would like to predict much faster.


### k=1
- island problem
- not immune to noisy, spurious data points

### k >1 => k-NN
- instead of assigning the class based on a single nearest neighbor, we take k closest neaighbors accourding to the distance metric, and choose a label based on the vote from the nearest neighbors or vote weighted on distance, etc.

## How to choose k?  How do you choose distance metric?
    - try and error - educated guesses
    - problem dependent
    
## Setting hyperparameters
- Choose hyperparameters based on data? Terrible idea! NN with k=1 works best with the training data. Prone to overfitting

## Train/Test 
- Terrible idea -> no idea how algorithm will perform on new data

## Train/Val/Test
- Train on train
- choose your model on unseen validation set and choose hyperparameters based on this
- Evaluate your model on test data
- Train/Val set should be from same probability distribution if possible

## Cross-validation 
- used in small scale data and traditional ML
- not so much practical in large Neural Networks, more advanced regularization techniques are used to control overfitting


## Disadvantages of K-NN in images
- very slow at test time
- distance metrics on pixels are not informative
    - some boxed, shifted, tinted and normal images might have similar L2 distance but the information in the image might be hugely different
- Curse of dimensionality (requires number of examples on the exponential scale w.r.t. number of dimensions)
- K-NN on images is almost never used

## Linear Classification
- this will generalize quite well to large neural networks

### Dataset
- CIFAR10
- 50000 training dataset over 10 classes
- 10000 test images
-32x32x3

### Parametric approach
- function approximation that takes an image and outputs n numbers as probability weight for each class
-summraize the knowledge of the training data into parameters and then use these optimized parameters for inference
- bias - data independence bias term for our data (larger for the class that has larger number of examples if we have unbalanced data), independence balance offset

## Notes from Lecture 3

- Linear classifier is one of the major building blocks of neural networks
- define a **loss function** that quantifies our unhappiness with the scores accross the training data
- come up with a way of **efficiently finding the parameters that minimize the loss function (optimization)**