#  Image Analysis for Geospatial Application

##  Lab 1: Hand-crafted features & Bayesian classifiers

Please enter your names below (double click to edit the table).

| Name  | Matr.-nr. |
|-|-|
| your name | 12345 |

`In this lab, first you will implement methods to extract features from images, implement and train Bayesian classifiers to classify real images using the extracted features. After that, you will apply the classifiers to a synthetic toy dataset in a 2D feature space.`

The __libraries__ required for this lab can be installed by running the following command in a code cell:

``!conda install numpy matplotlib scikit-learn imageio``

__Note that this only needs to be done if you have not installed these libraries already.__

Required __imports__ for this lab. Do not forget to run the following cell (which may __take up to some minutes__ because some functions are compiled).

__Note__: Please read the instructions for each exercise carefully before answering. You __should not use any external libraries except those that are implemented in the code cell below__!

Use the following link [to download the data](https://seafile.cloud.uni-hannover.de/d/add5ddfba5bf43699f1b/). The folder `Vaihingen` contains both training and test samples and __should be placed in the `Data` folder__ where this notebook is run. 

In [None]:
# IMPORTS
import lab                                      # Given functions
import numpy as np                              # Numerical computations
import matplotlib                               # Plots
import matplotlib.pyplot as plt     
import imageio.v2 as imageio
import tifffile                                 # Reading .tiff file

# GLOBAL SETTINGS
PlotSize = 8                                     # Size of plots
matplotlib.rcParams['figure.figsize'] = [PlotSize*2, PlotSize]  
CMAP = plt.cm.Accent                             # Color mapping 
np.set_printoptions(precision=3)                 # Array print precision

# CLASS AND FEATURE DESCRIPTION
class_names   = ['STREET','HOUSE','LOW VEG.','HIGH VEG.','CAR']
feature_names = ['NDVI','NDSM','NIR','RED','GREEN']
num_classes   = len(class_names); num_features = len(feature_names)

training_set_path = './Data/Vaihingen/Train/'     # Relative path to training patch root folder
test_set_path     = './Data/Vaihingen/Test/'      # Relative path to test patch root folder


# create the Data folder if it does not exist
lab.create_data_folder()

## Exercise 1: Feature extraction & Data preparation -  (20%)

### Exercise 1.1: Normalized Difference Vegetation Index (NDVI) -  5%

__Implement__ the function below, which computes a NDVI image. You can assume that the first channel of the input image corresponds to the near infrared band ($NIR$) and the second channel corresponds to the red band ($R$). The $NDVI$ is defined as

$$ NDVI = \dfrac{NIR-R}{NIR+R} $$

In [None]:
def compute_ndvi(I):
    h, w, d = I.shape
    assert d == 3, "ndvi computation only valid on multi channel images!"
    num_samples = h*w

    # YOUR CODE GOES HERE
    #
    # Step 1: compute the NDVI channel from the first and second channels of Image
    # step 2: prevent a possible division by zero
    
    return NDVI

__Run__ the next cell to test your implementation.

In [None]:
I = lab.imread3D('images/house_nirrg.jpg')

# apply your function on the Image
NDVI = compute_ndvi(I)

# compute normalized NDVI-image
NDVI = lab.normalize(NDVI.reshape((I.shape[0], I.shape[1], 1)))

lab.imshow3D(I)
print('Original Image')

lab.imshow3D(NDVI)           
print('Normalized Difference Vegetation Index (normalized)')

### Exercise 1.2: Data pre-processing -  15%

#### Data overview

In this exercise aerial images will be classified. There is one patch for training and one for testing. Each patch has a size of $800px \times 800 px$. For each pixel the values for

- Near infra red (NIR)
- Red (R)
- Green (G)

are available in the image ('IR_R_G.png'), where the first channel corresponds to NIR, the second to Red and the third to Green. The values are quantized with 8 bits (1 byte) per pixel and channel. Additionally, a normalized digital surface model (NDSM) is available. The information is stored in an image having 32 bits per pixel ('NDSM.tif'), each pixel containing the height above ground in [m] as a `float32` value. The ground truth (GT) labels are given in a colour coded image file ('GT.png'), in which colours correspond to class labels. 

__Run__ the next two cells to visualize the data.

In [None]:
print('Training data')
lab.plot_patch(training_set_path)

In [None]:
print('Test data')
lab.plot_patch(test_set_path)

__Implement__ the function `read_patch(root_folder)`. The function should:

- Read IR_R_G, NDSM and GT images located inside the `root_folder`, using the filenames given above
- Change data type of IR, R and G to `float32`
- Compute the NDVI as an additional feature (call the function you implemented in Exercise 1.1)
- Shift and scale IR, R and G so that 0 will be mapped to -1.0 and 255 will be mapped to 1.0 
- Normalize the NDSM by subtracting 5.0 before dividing by 5.0.
- Build the GT label maps using the given color codes. __The classes clutter and low vegetation should be merged!__

The function should return two arrays $X, y$, where $X$ is an $N\times 5$ matrix and $y$ is an $N$-dimensional vector holding the true labels. $N$ is the number of samples ($N_{rows}\times N_{cols}$, i.e. the number of rows times the number columns of the images; here, we have $N_{rows} = N_{cols} = 800$). Each row in $X$ should contain the normalized features in the order (NIR, R, G, NDSM, NDVI).

__Colour codes of the classes:__

|Class ID|Description|Colour|Value|
|-|-|-|-|
|0|Street|WHITE|(255,255,255)|
|1|Building|BLUE|(0,0,255)|
|2|Low Vegetation|CYAN|(0,255,255)|
|3|High Vegetation|GREEN|(0,255,0)|
|4|Car|YELLOW|(255,255,0)|
|2|Clutter|RED|(255,0,0)|

In [None]:
def read_patch(root_folder):
    
    # Read images
    # Convert IRRG to float32
    # Compute NDVI: call compute_ndvi(); your previous implementation
    # Shift and scale 'IRRG' 
    # Shift and scale NDSM by using the instructions given in the above description
    # Stack features to 'X'
    # Save labels to 'y'
    
    return X, y

__Run__ the next cell to read the training and testing patch. The assertions will check the shape of the result.

In [None]:
X, y = read_patch(training_set_path)
X_t, y_t = read_patch(test_set_path)

assert X.shape == (800*800, 5), "X has a wrong shape"
assert y.shape == (800*800,),   "y has a wrong shape"

print('min/max irrg', np.min(X_t[:,:3]), np.max(X_t[:,:3]))
print('min/max ndsm', np.min(X_t[:, 3]), np.max(X_t[:, 3]))
print('min/max ndvi', np.min(X_t[:, 4]), np.max(X_t[:, 4]))

### Feature selection

In the next cell, use your previous implementation of the function `read_patch()` to select only two features: __NDSM__ and __NDVI__. The `_t` used in variables defined below stands for test samples.

In [None]:
X, y     = read_patch(training_set_path) # training
X_t, y_t = read_patch(test_set_path)     # testing
 
# YOUR CODE GOES HERE!

ndsm, ndsm_t, ndvi, ndvi_t = #

# New training and test sample sets
X   = np.hstack((ndsm, ndvi))
X_t = np.hstack((ndsm_t, ndvi_t))

# The assertions will again check the shape of the result.
assert X.shape == (800*800, 2), "X has a wrong shape"
assert y.shape == (800*800,),   "y has a wrong shape"

## Exercise 2: Evaluation -  (10%)

For a visual evaluation, __implement__ the function `get_labels_as_image()` that turns a label vector $Y$ back into an image with height, width and depth as defined in `shape`. 

In [None]:
def get_labels_as_image(Y, shape):
    # Convert labels Y back to a color image
    
    
    return Y_color

The code in the next cell will check your implementation.

In [None]:
YC = np.sum(imageio.imread(training_set_path + 'GT.png')*(1,3,5),-1)
YP = np.sum(get_labels_as_image(y, [800,800,3])*(1,3,5),-1)
if np.equal(YC[YC>255], YP[YC>255]).all():
    print('implementation seems correct!')
else:
    print('implementation seems wrong!')

To analyze the classifiers quantitatively, we need a function that computes quality metrics. 
__Implement__ the function below, that computes the following metrics (all in range 0 - 1):

- Precision per class (1D array)
- Recall per class (1D array)
- F1-score per class (1D array)
- Overall accuracy (scalar)
- Mean F1-score  (scalar)

The function takes the array of predictions $Y$, the corresponding reference labels $y$ and the number of classes $C$ as input.

In [None]:
def compute_quality_metrics(Y, y, C):
    # YOUR CODE GOES HERE
    
    return precisions, recalls, f1_scores, overall_accuracy, mean_f1_score

## Exercise 3: Generative probabilistic classifiers -  (35%)

### Exercise 3.1: Single Gaussian Model -  10%

In the following cell, some code for a Bayesian classifier using a single Gaussian Model for the likelihood (Normal Distribution Classifier, NDC) is given. The class design is adapted from the module scikit-learn. Following that implementation each classifier is implemented as a class. The method `fit(X, y)` takes the training samples $X$ and the corresponding labels $y$ as arguments and fits the model to the data. For the NDC this means to compute the mean and covariance for each class. The function `compute_likelihoods(X)` computes the likelihoods for the given feature vectors $X$ to belong to each of the classes.

__Complete__ the method ``fit(X, y)`` that computes the mean and the covariance matrix for each class.

__Complete__ the method ``compute_likelihoods(X)`` that computes the likelihood of the features `X`.

__Complete__ the method ``compute_posteriors(L, P_prior)`` that computes the posterior probabilities $P_{post}$ for $N$ samples. Inputs are the likelihood tensor $L$ with shape $N\times c$ for $N$ samples and $c$ classes and the prior probabilities $P_{prior}$ for the classes. Remember, $P_{prior}$ is a one dimensional vector that contains the prior probabilities for each class: $P_{prior} = [p(C_0), ... ,p(C_c)]$. Make sure, not to modify $L$ in this function.

In [None]:
class NormalDistributionClassifier():
    
    def __init__(self, num_classes, num_features):
        #  Initializes the classifier
        #  num_classes: number of classes
        #  num_features: number of features (length of each feature vector)
        
        C = num_classes
        F = num_features
        self.num_classes = C                    # Store number of classes  
        self.num_features = F                   # Store number of features 
        self.means = np.zeros((C, F))           # Init. means
        self.covars = np.zeros((C, F, F))       # Init. covars
        self.inv_covars = np.zeros((C, F, F))   # Init. inverses of covars
        self.det_covars = np.zeros((C))         # Init. determinants
    
    
    def fit(self, X, y):
        #  Computes mean and covariance matrix for each class using ML
        #  X: feature_vectors [num_feature_vectors x num_features]
        #  y: corresponding labels [num_feature_vectors]
        
        feature_vectors_by_classes = lab.get_feature_vectors_by_classes(X, y) # Group samples by classes
        for c in range(self.num_classes):
            feature_vecs_class_c = feature_vectors_by_classes[c]
            ########################################################## 
            
            # YOUR CODE GOES HERE!
            
            self.means[c] =  # mean for class c
            self.covars[c] = # covariance matrix for class c
            
            ###########################################################
            
            self.inv_covars[c] = np.linalg.inv(self.covars[c])
            self.det_covars[c] = np.linalg.det(self.covars[c])

    def uniform_prior(self):
        # Assumes uniform priors
        
        return [1/self.num_classes for i in range(self.num_classes)]
            
    def compute_likelihoods(self, X):
        #  Computes likelihoods for feature vectors
        #  X: feature_vectors [num_feature_vectors x num_features]
        
        num_samples = X.shape[0]                                  # Number of samples to predict
        likelihoods = np.zeros((num_samples, self.num_classes))   # Init. likelihood matrix
        
        for xi, x in enumerate(X):
            for c in range(self.num_classes):
                ########################################################## 
                
                # YOUR CODE GOES HERE!
            
                likelihoods[xi, c] = 
                ##########################################################
                
        return likelihoods

    
    def compute_posteriors(self, L, P_prior=None):
        # Computes posteriors for feature vectors
        #  X: feature_vectors [num_feature_vectors x num_features]
        
        # If the P_prior is None, we assume a uniform prior. This will help us
        # to call the function .predict() without specifiying priors beforehand.
        if P_prior is None:
            P_prior = self.uniform_prior()
        assert np.sum(P_prior) == 1.0, "The prior has to sum up to one!"
        ##########################################################
        
        # YOUR CODE GOES HERE!
        
        posteriors = 
        
        ##########################################################
        return posteriors
    
    def predict(self, X):
        
        # Predicts labels for feature vectors in X
        #  X: feature_vectors [num_feature_vectors x num_features]
        
        P = self.compute_posteriors(self.compute_likelihoods(X))
        return np.argmax(P, axis=1)

In the next cell a classifier is created and trained. __Run the cell__ to check your implementation.

In [None]:
# Create classifier instance
ndc = NormalDistributionClassifier(num_classes=num_classes, num_features=2)

# Train the classifier
ndc.fit(X, y)

The next two cells will compute the predictions by assuming a uniform prior and run the qualitative and visual evaluation.

In [None]:
# Make predictions

# Compute likelihoods
L_Y_t = ndc.compute_likelihoods(X_t)  

# Compute posteriors assuming uniform prior
P_Y_t = ndc.compute_posteriors(L_Y_t, [0.20,0.20,0.20,0.20,0.20])
Y_t = np.argmax(P_Y_t, axis=1)

I_pred = get_labels_as_image(Y_t, (800, 800, 3))
I_gt   = get_labels_as_image(y_t, (800, 800, 3))

lab.plot_pred_gt(I_pred, I_gt)

In [None]:
precisions, recalls, f1_scores, overall_accuracy, mean_f1_score = compute_quality_metrics(Y_t, y_t, 5)
print('precisions [%]:      ', precisions*100)
print('recalls    [%]:      ', recalls*100)
print('F1-score   [%]:      ', f1_scores*100)
print('')
print('overall accuracy: {:.2%}'.format(overall_accuracy))
print('mean F1-score   : {:.2%}'.format(mean_f1_score))

#### Training and evaluation of the NDC with multiple features

In the next cell, consider adding more features (NIR, R, G) to the training and test data, then compute the predictions by assuming a uniform prior and run the qualitative and visual evaluation.

In [None]:
# Load all 5 features (NIR, R, G, NDSM, NDVI)

X_5, y_5 = read_patch(training_set_path)
X_t_5, y_t_5 = read_patch(test_set_path)

# Create classifier instance
ndc_5 = NormalDistributionClassifier(num_classes=num_classes, num_features=5)

# Train the classifier:
ndc_5.fit(X_5, y_5)   

# Make predictions

# Compute likelihoods
L_Y_t_5 = ndc_5.compute_likelihoods(X_t_5)  

# Compute posteriors assuming uniform prior
P_Y_t_5 = ndc_5.compute_posteriors(L_Y_t_5, [0.20,0.20,0.20,0.20,0.20])
Y_t_5   = np.argmax(P_Y_t_5, axis=1)

I_pred  = get_labels_as_image(Y_t_5, (800, 800, 3))
I_gt    = get_labels_as_image(y_t_5, (800, 800, 3))

lab.plot_pred_gt(I_pred, I_gt)

precisions, recalls, f1_scores, overall_accuracy, mean_f1_score = compute_quality_metrics(Y_t_5, y_t_5, 5)
print('precisions [%]:      ', precisions*100)
print('recalls    [%]:      ', recalls*100)
print('F1-score   [%]:      ', f1_scores*100)
print('')
print('overall accuracy: {:.2%}'.format(overall_accuracy))
print('mean F1-score   : {:.2%}'.format(mean_f1_score))

### Exercise 3.2: Mixture of Gaussian Model -  5%

The next cell will fit a Gaussian Mixture Model (GMM) to the samples of each class. The list `N_clusters` defines the number of components per class. __Modify__ the list in a meaningful way by assuming TWO components per class  with a uniform prior.

In [None]:
N_clusters = []  # To be modified!

# Instantiating the classifier
gmm = lab.GaussianMixtureClassifier(N_clusters)

# Train the classifier using EM:
gmm.fit(X, y)   

# Make predictions

# Compute likelihoods
L_Y_t = gmm.compute_likelihoods(X_t)  

# Compute posteriors assuming uniform prior
P_Y_t = gmm.likelihoods_to_posteriors(L_Y_t, [0.20,0.20,0.20,0.20,0.20])
Y_t = np.argmax(P_Y_t, axis=1)

I_pred = get_labels_as_image(Y_t, (800, 800, 3))
I_gt   = get_labels_as_image(y_t, (800, 800, 3))

# Plot predictions
lab.plot_pred_gt(I_pred, I_gt)

__Run__ the next cell for qualitative evaluation:

In [None]:
precisions, recalls, f1_scores, overall_accuracy, mean_f1_score = compute_quality_metrics(Y_t, y_t, 5)
print('precisions [%]:      ', precisions*100)
print('recalls    [%]:      ', recalls*100)
print('F1-scores  [%]:      ', f1_scores*100)
print('')
print('overall accuracy: {:.2%}'.format(overall_accuracy))
print('mean F1-score   : {:.2%}'.format(mean_f1_score))

#### Fitting a GMM with multiple features

In the next setup, consider adding more features (NIR, R, G) to the training and test samples, then compute the predictions by assuming a uniform prior and run the qualitative and visual evaluation. You can use the same setting for `N_clusters` as above.

In [None]:
# Load all 5 features (NIR, R, G, NDSM, NDIV)

X_5, y_5 = read_patch(training_set_path)
X_t_5, y_t_5 = read_patch(test_set_path)

# Set similar number of components as in previous setting for each class
N_clusters = [] # To be set! 

# Instantiating the classifier
gmm_5 = lab.GaussianMixtureClassifier(N_clusters)

# Train the classifier using EM:
gmm_5.fit(X_5, y_5)   

# Make predictions

# Compute likelihoods
L_Y_t_5 = gmm_5.compute_likelihoods(X_t_5)  

# Compute posteriors assuming uniform prior
P_Y_t_5 = gmm_5.likelihoods_to_posteriors(L_Y_t_5, [0.20,0.20,0.20,0.20,0.20])
Y_t_5   = np.argmax(P_Y_t_5, axis=1)

I_pred = get_labels_as_image(Y_t_5, (800, 800, 3))
I_gt   = get_labels_as_image(y_t_5, (800, 800, 3))

lab.plot_pred_gt(I_pred, I_gt)

precisions, recalls, f1_scores, overall_accuracy, mean_f1_score = compute_quality_metrics(Y_t_5, y_t_5, 5)
print('precisions [%]:      ', precisions*100)
print('recalls    [%]:      ', recalls*100)
print('F1-score   [%]:      ', f1_scores*100)
print('')
print('overall accuracy: {:.2%}'.format(overall_accuracy))
print('mean F1-score   : {:.2%}'.format(mean_f1_score))

### Exercise 3.3: Comparison and evaluation -  20%


The code in the following cell will compute and print the overall accuracy for both datasets and both models.

In [None]:
_, _, _, oa_ndc_train, mf1_ndc_train = compute_quality_metrics(ndc.predict(X), y, 5)
_, _, _, oa_ndc_test, mf1_ndc_test = compute_quality_metrics(ndc.predict(X_t), y_t, 5)

_, _, _, oa_gmm_train, mf1_gmm_train = compute_quality_metrics(gmm.predict(X), y, 5)
_, _, _, oa_gmm_test, mf1_gmm_test = compute_quality_metrics(gmm.predict(X_t), y_t, 5)

_, _, _, oa_ndc_5_train, mf1_ndc_5_train = compute_quality_metrics(ndc_5.predict(X_5), y_5, 5)
_, _, _, oa_ndc_5_test, mf1_ndc_5_test = compute_quality_metrics(ndc_5.predict(X_t_5), y_t_5, 5)

_, _, _, oa_gmm_5_train, mf1_gmm_5_train = compute_quality_metrics(gmm_5.predict(X_5), y_5, 5)
_, _, _, oa_gmm_5_test, mf1_gmm_5_test = compute_quality_metrics(gmm_5.predict(X_t_5), y_t_5, 5)

print('Overall Accur. | TRAIN-SET| TEST-SET\n' + '-' * 37)
print('SINGLE GAUSSIAN                     |  {:.2%}  |  {:.2%}'.format(oa_ndc_train, oa_ndc_test))
print('SINGLE GAUSSIAN WITH MULT. FEATURES |  {:.2%}  |  {:.2%}'.format(oa_ndc_5_train, oa_ndc_5_test))
print('MIXTURE OF GAU.                     |  {:.2%}  |  {:.2%}'.format(oa_gmm_train, oa_gmm_test))
print('MIXTURE OF GAU. WITH MULT. FEATURES |  {:.2%}  |  {:.2%}'.format(oa_gmm_5_train, oa_gmm_5_test))
print('\n')
print('Mean F1-Score  | TRAIN-SET| TEST-SET\n' + '-' * 37)
print('SINGLE GAUSSIAN                     |  {:.2%}  |  {:.2%}'.format(mf1_ndc_train, mf1_ndc_test))
print('SINGLE GAUSSIAN WITH MULT. FEATURES |  {:.2%}  |  {:.2%}'.format(mf1_ndc_5_train, mf1_ndc_5_test))
print('MIXTURE OF GAU.                     |  {:.2%}  |  {:.2%}'.format(mf1_gmm_train, mf1_gmm_test))
print('MIXTURE OF GAU. WITH MULT. FEATURES |  {:.2%}  |  {:.2%}'.format(mf1_gmm_5_train, mf1_gmm_5_test))

__Write a discussion__ which __briefly__ answers the following questions: 

- Describe the parameters of a single Gaussian model and a Gaussian mixture model and how they are determined in the training process.

- How is an unseen feature vector classified (both models)?

- Compare the overall accuracy to the class-specific metrics of all variant of bayesian classifiers implemented in this experiment. Which problem can be observed? Why do you think this problem occurs and what could be done to avoid it?

- Based on the classification results (performance of both models on both training-test datasets), did the classification improve when using a single Gaussian model (w.r.t. the result of the mixture of Gaussian model)? Why / why not? What can be observed when using multiple features?

#### Discussion

*Write the discussion here. Do not forget to answer all questions, item by item, and to identify which answer belongs to which question.*

## Exercise 4: Application to a synthetic toy dataset -  (35%)

### Exercise 4.1: Drawing samples -  5%

Use the given function `lab.generate_gaussian_clusters()` to create synthetic data samples in a 2D feature space. The dataset should be drawn from 5 Gaussians, each corresponding to a cluster. There are 4 classes; three classes correspond to one cluster only, whereas one class (class 1) corresponds to two clusters. The clusters are defined in the following table:

| Cluster | Center x | Center y | Angle  | Variance x | Variance y | Samples | Class |
|-|-|-|-|-|-|-|-|
| __1__ | 40 | 35 | 100 | 20 | 10 | 550 | 0 |
| __2__ | 55 | 190 | 45 | 25 | 15 | 450 | 1 |
| __3__ | 180 | 40 | 125 | 25 | 20 | 350 | 1 |
| __4__ | 120 | 120 | 45 | 35 | 20 | 500 | 2 |
| __5__ | 200 | 200 | 120 | 30 | 15 | 500| 3 |

__Modify__ the variable `clusters` according to the table. Read the documentation of the function in __lab.py__ to get further information about the arguments and the return values.

In [None]:
clusters = [] # To be modified!

samples = lab.generate_gaussian_clusters(clusters)
print('Number of generated samples N =', samples.shape[0])

### Exercise 4.2: Splitting the data into a training set and a test set - 10%

__Implement__ the function below, which takes the $N$ `samples` as well as the the ratio $r_{test} = N_{test}/N$ of training samples and returns:

- $X$: training set features as array with shape $(N_{train}\times 2)$
- $y$: training set labels as array with shape $(N_{train})$ 
- $X_t$: test set features as array with shape $(N_{test}\times 2)$
- $y_t$: test set labels as array with shape$ (N_{test})$

where $N_{train} = N\cdot(1-r_{test})$ is the number of training samples and $N_{test} = N\cdot r_{test}$ is the number of samples for testing. Make sure to __shuffle__ the samples randomly (e.g. using `np.random.shuffle()`)!

In [None]:
def split_train_test(samples, r_test):
    
    # YOUR CODE GOES HERE

    X   =     # training set features
    y   =     # training set labels

    X_t =     # test set features
    y_t =     # test set labels
    
    return X, y, X_t, y_t

__Run__ the next cell. It will use your function to generate and visualize the sets for training and testing.

In [None]:
X, y, X_t, y_t = split_train_test(samples, 0.5)

# Plot both sets:
matplotlib.rcParams['figure.figsize'] = [PlotSize, PlotSize]  
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=CMAP)
plt.scatter(X_t[:, 0], X_t[:, 1], c=y_t, edgecolors='k', cmap=CMAP, marker='v')
plt.xlim((0, 255)); plt.ylim((0, 255))
plt.title('Training samples (circles), Test samples (triangles)')
plt.show()

__Run__ the next cell to fit a single Gaussian model to the training data.

In [None]:
# Create the classifier instance
ndc = NormalDistributionClassifier(num_classes=4, num_features=2)

# Train the classifier
ndc.fit(X, y)

# Plot datasets
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=CMAP)
plt.scatter(X_t[:, 0], X_t[:, 1], c=y_t, edgecolors='k', cmap=CMAP, marker='v')

# Plot confidence ellipses: 0.5 sigma, 1 sigma, 1.5 sigma
lab.plot_sigma_ellipses(ndc, CMAP, sigmas = [0.5, 1, 1.5])

plt.xlim((0, 255)); plt.ylim((0, 255))
plt.title('Training samples (circles), Test samples (triangles)')
plt.show()

Next, the likelihoods for each class can be visualized by computing the likelihood for each feature vector on a $255 \times 255$ grid.

In [None]:
# Create a meshgrid set of 'all' features in the limits
xx, yy = np.meshgrid(np.arange(0, 256, 1), np.arange(0, 256, 1))
mesh_features = np.c_[xx.ravel(), yy.ravel()]

# Get likelihoods for the meshgrid samples
L = ndc.compute_likelihoods(mesh_features)
lab.print_probabilities(L, (256, 256), 'Likelihood', n_cls=4)

Another way of visually analyzing a classifier is to plot the decision boundaries. In the next cell the whole feature space will be classified according to ML classification, which implicitly shows the decision boundaries.

In [None]:
C = ndc.predict(mesh_features)
lab.print_decision_boundaries(C, (256, 256))

#### Considering prior information

In the next example, assume that the prior probability for class 1 is 62 times higher than for the other classes. __Note__ that $P_{prior}$ must sum up to 1, to be a valid distribution. Run the cell and make a sanity-check of the results.

In [None]:
P_prior = # Assume that class 1 is 62 more probable. To be modified! (The distribution should sum up to 1)

# Get posteriors for the new test samples
P_post = ndc.compute_posteriors(L, P_prior)
lab.print_probabilities(P_post, (256, 256), 'Posterior probability', n_cls=4)

### Exercise 4.3: Mixture of Gaussian Model -  5%

__Run__ the next cell trains a Gaussian Mixture Model to the training set of the toy dataset of each class. The list `N_clusters` defines the number of components per class. 

__Modify__ this list in a meaningful way.

In [None]:
N_clusters = # To be modified! 

# Instantiating the classifier
gmm = lab.GaussianMixtureClassifier(N_clusters)

# Train the classifier using EM:
gmm.fit(X, y)   

# Plot datasets
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=CMAP)
plt.scatter(X_t[:, 0], X_t[:, 1], c=y_t, edgecolors='k', cmap=CMAP, marker='v')

# Plot confidence ellipses: 0.5 sigma, 1 sigma, 1.5 sigma
gmm.plot_sigma_ellipses(CMAP, sigmas = [0.5, 1, 1.5]) 

plt.xlim((0, 255)); plt.ylim((0, 255)); 
plt.title('Training samples (circles), Test samples (triangles)')
plt.show()

The next three cells will again compute and show the likelihoods, posteriors for the complete feature space and plot the decision boundaries.

In [None]:
# Get likelihoods for the meshgrid samples
L = gmm.compute_likelihoods(mesh_features)
lab.print_probabilities(L, (256, 256), 'Likelihood', n_cls=4)

In [None]:
P_prior = # Assume that class 1 is 62 more probable. To be modified! (The distribution should sum up to 1)

# Get posteriors for the new test samples
P_post = gmm.likelihoods_to_posteriors(L, P_prior)
lab.print_probabilities(P_post, (256, 256), 'Posterior probability', n_cls=4)

In [None]:
# Plot the decision boundaries
C = gmm.predict(mesh_features) # this will assume a uniform prior
lab.print_decision_boundaries(C, (256, 256))

### Exercise 4.4: Comparison and evaluation -  15%

__Print__ the likelihood, the posterior and the predicted class for the feature vector $X_{140-90} = (140, 90)$ using the NDC and GMM (assuming a uniform prior).

In [1]:
# YOUR CODE GOES HERE!

__Write__ a brief discussion, which answers the following questions:

- Based on the prediction of the feature $X_{140-90}$, discuss the models performance. Which model achieves better results and why? 

- What happens when the number of components of a GMM is chosen to high / to low? Try and document different variants with respect to the number of components.

#### Discussion

*Write the discussion here. Do not forget to answer all questions, item by item, and to identify which answer belongs to which question.*