# Machine Learning Engineer Nanodegree

## Capstone Project

David Howard

10/15/2017

## I. Definition

### Project Overview

SLAM (simultaneous location and mapping) is a well-known technique for using sensor data to create a 
map of a geographic region and also keep track of the sensors location within the constructed map. 
Typically this is done with either a particle filter or a Kalman filter. One application of SLAM 
operates on an unknown region and builds up knowledge about the region via sensor measurements.
 Another application uses existing knowledge, such as maps, along with sensors, and is aimed at 
 determining the sensor&#39;s location within the known region. [[1]](https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping)

In reading about SLAM, I found it interesting that I had previously been trained to be a human SLAM 
algorithm when I was a helicopter pilot in the U.S. Army. We trained to navigate high speed low level 
flight by using only a topographic map and our eyes observing the terrain elevation features (no GPS allowed).
There is a name for this technique, Nap-Of-the-Earth flight.
 [[2]](https://en.wikipedia.org/wiki/Nap-of-the-earth#Helicopter_NOE_flying)

One source of potential data for use in a SLAM system is a digital elevation model, 
generically referred to as DEM. A great source of such data was created by the NASA 
Shuttle Radar Topography Mission (SRTM). In 2000, the Space Shuttle was used to create a 
digital elevation map of the entire world (minus some polar regions). The NASA SRTM data is 
publicly available for download, in various height resolutions. [[3]](https://www2.jpl.nasa.gov/srtm/mission.htm))

**Problem Statement**

This project will attempt to create a simple SLAM type implementation using a convolutional neural 
network to determine position within a known region. The train/test data used for learning will be 
SRTM elevation data treated as an image. Input for evaluation will be images composed of pieces of 
the same data with possible modifications such as distortion, alignment and reduced resolution. 
Output will be a predicted position of the input images.

### Metrics

The outputs of the model will be predictions of the probabilities of a test image 
location matching one of the actual locations. The outputs will be a set of probabilities, 
similar to the dog project. The &#39;absolute&#39; accuracy will be the average of the highest predicted 
probability being correct over the set of test data. The evaluation will also attempt to evaluate 
if the actual location matches any one of the higher of the list of predicted probabilities, it if isn't the highest.

## II. Analysis


### Data Exploration

SRTM data is available online at [[4]](https://dds.cr.usgs.gov/srtm).

SRTM data is available in multiple resolutions. The data is segmented into 1 latitude/longitude degree squares. 
The highest publicly available resolution is 1 arc-second per elevation posting (Level 2, ~30 meters at the equator), 
which results in a 3600x3600 matrix of elevations. The data files are actually 3601x3601 in order to fill overlaps if 
multiple cells are composed together. Other resolutions are 3 arc seconds (Level 1, ~90 meters) and (Level 0, 30 arc seconds). 
A detailed description of the data is available at [[5]](https://dds.cr.usgs.gov/srtm/version2\_1/Documentation/SRTM\_Topo.pdf)

Organization of data

- Level 1
  - Binary file labeled type &#39;.hgt&#39;
  - 1201x1201 grid of height postings
  - Covers 1-degree latitude/longitude
  - Post spacing is 3 arc-seconds
  - Data type is unsigned 16 bit, big-endian
  - Rows are lower to higher
  - Columns are left to right
  - Naming convention specifies location e.g. N39W120.hgt

- Level 2
  - Binary file labeled type &#39;.hgt&#39;
  - 3601x3601 grid of height postings
  - Covers 1-degree latitude/longitude
  - Post Spacing is 1 arc-seconds
  - Data type is unsigned 16 bit, big-endian
  - Rows are lower to higher
  - Columns are left to right
  - Naming convention specifies location e.g. N39W120.hgt

### Exploratory Visualization

he data is in a non-image binary format but can be converted to displayable images for 
visualization by functions in the included ‘srtm’ module. In the following discussions the term 'image' is used interchangeably 
to refer to either the raw .hgt data or a resulting displayable image, since these two formats can be converted from one to the other. Because the data is single values per elevation posting, the resulting images can all be treated as monochrome. 

#### Baseline Data as Images

When converting the .hgt file to an image, the data is normalized to the range [0..255] to increase the contrast between
points of lower and higher elevation.

<table>
<tr>
<td><img src="images/level2/N37W098-b.jpg" width="256" height="256"></img></td>
<td><img src="images/level2/N39W120-b.jpg" width="256" height="256"></img></td>
</tr>
<tr>
<td style="text-align:center">N37W098 (Wichita KS area, flat) </td>
<td style="text-align:center">N39W120 (Reno NV area, mountainous) </td>
</tr>
</table>

#### Subdivision of input data

The full images will be subdivided into an NxN set of squares that will be the input to the model for training and test. The number of subdivisions will be determined by experimentation, based on training time and results. The smaller the subdivided images, the greater the resolution of predicted location. However, if the subdivided image are too small, it may results in less accurate results because of fewer distinguishing features between similar images.The labels will be integer numbers from 0..N\*N starting at the upper left subdivision, row major order. These actual latitude/longitude location can be determined from the label and subdivision level. 

Example of input data subdivision N39W120, 5x5 array.

<table>
<tr>
<td><img src="images/level2/a0.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a1.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a2.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a3.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a4.jpg" width="96" height="96"></img></td>
</tr>
<tr>
<td><img src="images/level2/a5.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a6.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a7.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a8.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a9.jpg" width="96" height="96"></img></td>
</tr>
<tr>
<td><img src="images/level2/a10.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a11.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a12.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a13.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a14.jpg" width="96" height="96"></img></td>
</tr>
<tr>
<td><img src="images/level2/a15.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a16.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a17.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a18.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a19.jpg" width="96" height="96"></img></td>
</tr>
<tr>
<td><img src="images/level2/a20.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a21.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a22.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a23.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a24.jpg" width="96" height="96"></img></td>
</tr>
</table>



#### Augmentation of input data

The subdividing process results in only one image and label for each data point. After subdivision, each resulting  Training with just one image per label is insufficient to achieve good learning results. It can result in an overfit model that could only identify a location if the image is nearly an exact duplicate and orientation. To overcome this, for each subdivided object will be multiplied with distortion using Keras’ ImageDataGenerator function. This function creates multiple instances of each subdivided object with various rotations and offsets. 

The extent of image distortion used is subject to experimentation. One metric in the report will be distortion parameters vs probability of successful detection. 


Example of 4 modifications of one subdivision, upper left corner of N39W120, renormalized and distorted. Image on left is original. ImageDataGen parameters used were rotation=30,height_shift=0.1,width_shift=0.1.
<table>
<tr>
<td><img src="images/level2/a0.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/g1.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/g2.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/g3.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/g4.jpg" width="96" height="96"></img></td>
</tr>
</table>

#### Histograms 

Histograms of 4 sample images. X axis is pixel/height to visualize differences in height distribution. Although not directly used, it was helpful to visualize the magnitude of differences between the subdivided images.

<table>
<tr>
<td><img src="images/level2/a0.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a1.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a2.jpg" width="96" height="96"></img></td>
<td><img src="images/level2/a3.jpg" width="96" height="96"></img></td>
</tr>
</table>
<img src="images/level2/histogram.svg"></img>

### Algorithms and Techniques

The processing algorithms will leverage the techniques and general approach from the previous 'dog project'. The main difference is that the input data files have only one image for each class, rather than many images as were provided in the dog project. This limitation will be mitigated by the use of Keras' feature augmentation to take the unique individual images and vary them to provide sufficient training and test data to train the model. Initially, the feature augmentation hyperparameters will be limited in order to simplify building the model. Once a model is is working then the hyperparameters will be tuned to give more variation in the input data. 

The primary algorithm used will be a convolutional neural network, implemented using python and Keras using the tensorflow-gpu backend. The model will be iterated until results no further improvement is gained with an acceptable training time. 

The model will use some number of Convolutional layers with Dropout and MaxPooling. Variations of kernel size, padding and activation functions will be tried for these layers. The final layer be a Dense layer with GlobalAveragePooling and softmax activation. The model will be compiled with the categorical crossentropy loss function, and with variations of optimizer function evaluated.

The variations of hyperparameters and training time are logged with the resulting evaluation metrics and included in the conclusions of this report.

### Benchmark

There is no established benchmark for this project. A benchmark is provided by using the same input to a naive 2 level feed-forward (non-convolutional) neural network. The results of this benchmark will be compared to the results from the final model.

## III. Methodology

_(approx. 3-5 pages)_

### Data Preprocessing

Processing requires reading an input 3600x3600 input dataset, and producing as output the X and Y

<img src="doc/preprocess.svg"></img>

### Implementation

In this section, the process for which metrics, algorithms, and techniques that you implemented for the given data will need to be clearly documented. It should be abundantly clear how the implementation was carried out, and discussion should be made regarding any complications that occurred during this process. Questions to ask yourself when writing this section:

- **** _Is it made clear how the algorithms and techniques were implemented with the given datasets or input data?_
- **** _Were there any complications with the original metrics or techniques that required changing prior to acquiring a solution?_
- **** _Was there any part of the coding process (e.g., writing complicated functions) that should be documented?_

### Refinement

In this section, you will need to discuss the process of improvement you made upon the algorithms and techniques you used in your implementation. For example, adjusting parameters for certain models to acquire improved solutions would fall under the refinement category. Your initial and final solutions should be reported, as well as any significant intermediate results as necessary. Questions to ask yourself when writing this section:

- **** _Has an initial solution been found and clearly reported?_
- **** _Is the process of improvement clearly documented, such as what techniques were used?_
- **** _Are intermediate and final solutions clearly reported as the process is improved?_

- ****

## IV. Results

_(approx. 2-3 pages)_

### Model Evaluation and Validation

In this section, the final model and any supporting qualities should be evaluated in detail. It should be clear how the final model was derived and why this model was chosen. In addition, some type of analysis should be used to validate the robustness of this model and its solution, such as manipulating the input data or environment to see how the model&#39;s solution is affected (this is called sensitivity analysis). Questions to ask yourself when writing this section:

- **** _Is the final model reasonable and aligning with solution expectations? Are the final parameters of the model appropriate?_
- **** _Has the final model been tested with various inputs to evaluate whether the model generalizes well to unseen data?_
- **** _Is the model robust enough for the problem? Do small perturbations (changes) in training data or the input space greatly affect the results?_
- **** _Can results found from the model be trusted?_

### Justification

In this section, your model&#39;s final solution and its results should be compared to the benchmark you established earlier in the project using some type of statistical analysis. You should also justify whether these results and the solution are significant enough to have solved the problem posed in the project. Questions to ask yourself when writing this section:

- **** _Are the final results found stronger than the benchmark result reported earlier?_
- **** _Have you thoroughly analyzed and discussed the final solution?_
- **** _Is the final solution significant enough to have solved the problem?_

- ****

## V. Conclusion

_(approx. 1-2 pages)_

### Free-Form Visualization

In this section, you will need to provide some form of visualization that emphasizes an important quality about the project. It is much more free-form, but should reasonably support a significant result or characteristic about the problem that you want to discuss. Questions to ask yourself when writing this section:

- **** _Have you visualized a relevant or important quality about the problem, dataset, input data, or results?_
- **** _Is the visualization thoroughly analyzed and discussed?_
- **** _If a plot is provided, are the axes, title, and datum clearly defined?_

### Reflection

In this section, you will summarize the entire end-to-end problem solution and discuss one or two particular aspects of the project you found interesting or difficult. You are expected to reflect on the project as a whole to show that you have a firm understanding of the entire process employed in your work. Questions to ask yourself when writing this section:

- **** _Have you thoroughly summarized the entire process you used for this project?_
- **** _Were there any interesting aspects of the project?_
- **** _Were there any difficult aspects of the project?_
- **** _Does the final model and solution fit your expectations for the problem, and should it be used in a general setting to solve these types of problems?_

### Improvement

In this section, you will need to provide discussion as to how one aspect of the implementation you designed could be improved. As an example, consider ways your implementation can be made more general, and what would need to be modified. You do not need to make this improvement, but the potential solutions resulting from these changes are considered and compared/contrasted to your current solution. Questions to ask yourself when writing this section:

- **** _Are there further improvements that could be made on the algorithms or techniques you used in this project?_
- **** _Were there algorithms or techniques you researched that you did not know how to implement, but would consider using if you knew how?_
- **** _If you used your final solution as the new benchmark, do you think an even better solution exists?_