Fashion MNIST Tutorial - Design Concepts
==================================
Matt Clarkson, 2019-10-30

The Problem
---------------

Problems with research software include:

* Code that only one researcher can run.
* Hard coded parameters, with unknown history. i.e. what params have been tested? When? With what version of code?
* At the end of a project, the code dies, is not re-used, and subsequent researchers feel compelled to re-implement it, in their own nuanced way, thereby wasting time, and also repeating the same loop. This may not be a huge concern in the era of deep-learning, as a new researcher will likely implement something newer. But if you want to use an algorithm in any other piece of code, the code must be designed for re-use.

The Solution
---------------

A researcher should develop code that:

1. is designed for re-use, i.e. a clear, simple interface, so the code can be directly embedded in other third party programs, such as GUI's or other scripts, without cutting-and-pasting, and without being stuck with hard-coded parameters.
2. has core functions, run and tested via unit tests.
3. has command line entry points so an untrained user can just run it.
4. can be used within jupyter notebooks, as this is good for development and supervision meetings.
5. can be pip installed by others, and re-used as is, with almost zero effort.


The classification code itself is inspired by the standard tensor flow tutorials, such as [this one](https://www.tensorflow.org/tutorials/keras/classification) and [this one](https://www.tensorflow.org/tensorboard/get_started).

Prerequisites
=============

Ensure you have already:

* Understood how to use the [PythonTemplate](https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/PythonTemplate), and know that we use tox to run pylint, pytest and coverage.
* Done [SNAPPYTutorial 01](https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/SNAPPYTutorial01)
* Done [SNAPPYTutorial 02](https://weisslab.cs.ucl.ac.uk/WEISS/SoftwareRepositories/SNAPPY/SNAPPYTutorial02)


The Design
===========

In this notebook, we will step through the design of our Fashion MNIST example, and explain the design choices.

Class interface
==================

First, we can choose either a class-based interface, or a function-based interface. I chose class.  A class is a way of grouping data-members and methods into a coherent concept, and providing encapsulation. So, just like a black box, the user doesn't have to know the internals of how a class works, they can just use the interface. If you chose a function based approach, then related methods are not easily grouped together, so in the long run, code gets more disjoint and messy. So I prefer a class. 

So, in file ```sksurgerytf/models/fashion.py``` and we basically have:

```python
class FashionMNIST:
    __init__(params)
    train()
    test(image_to_classify)
```

where

* The constructor is responsible for initialising the network see [this](https://martinfowler.com/articles/injection.html) article by Martin Fowler, and loading/preparing data. Also see [this](http://localhost:8888/notebooks/fashion_design.ipynb) on RAII and books by Scott Meyer to get the idea that once the constructor is complete, you should have a fully usable object. i.e. you must not have an unusable or unready object.
* Train method to train the network. This ```train()``` method could be called from within the constructor.
* Test method to classify each new image. This would be something that a 3rd party user would call, without knowing what goes on inside the black box.

So, by using encapsulation, and a simple class API, we have addressed point 1 of our proposed solution.

Modules for Functions/Classes and Unit Tests
=====================================

Under ```sksurgerytf/``` we can put any other sub-modules, classes and functions as necessary. 

For example:

```
sksurgerytf/maths/matrix_algebra.py
```

and its corresponding unit test:

```
tests/maths/test_matrix_algebra.py
```

However, that said, its difficult to unit test large networks that take days/weeks to train. Unit tests must be fast, so in all likelihood, we are talking about testing small individual function. Try to break out your functionality into bits you can test without running a full training cycle. 

So, by separating classes and functions, and having separate unit tests, we have addressed point 2 of our proposed solution.



Command Line Entry Point
========================

For bash scripting, or for working with other people, it is useful to have a command line entry point. This repo provides a pattern that you can just copy for each command line entry point.

So, the top level python script:
```
sksurgeryfashion.py
```
contains:
```
from sksurgerytf.ui.sksurgery_fashion_command_line import main
```
which runs a command line parser in
```
sksurgerytf/ui/sksurgery_fashion_command_line.py
```
which calls through to the ```fashion.py``` module created above.

In this way, a non-trained user can just run the code, like this:
```bash
# To setup the same virtualenv as tox installed
source .tox/py36/bin/activate

# Run program, just printing command line args
python sksurgeryfashion.py --help
```

So, we now have the same code called by a command line program, AND runnable via unit tests, AND available as a single class, to can be embedded into any other program that imports the module containing the class. So, we have addressed point 3 of our proposed solution.


Running via Jupyter Notebook
========================

The reason we started with a standard python script is because once a network is developed, its more likely to be run in standard python scripts, on a cluster/GPU node, or embedded in a larger program. So the design above supports this. However, Jupyter notebooks are useful for development, and for writing up weekly supervisions.

In [4]:
import os
import sys
print (os.getcwd())
print (sys.path)
sys.path.append("../../")
from sksurgerytf.models import fashion as f
fmn = f.FashionMNIST()

/Users/mattclarkson/build/scikit-surgerytf/doc/notebooks
['/Users/mattclarkson/build/Porcupine/binding/python', '/Users/mattclarkson/build/scikit-surgerytf/.tox/lint/lib/python36.zip', '/Users/mattclarkson/build/scikit-surgerytf/.tox/lint/lib/python3.6', '/Users/mattclarkson/build/scikit-surgerytf/.tox/lint/lib/python3.6/lib-dynload', '/Users/mattclarkson/anaconda3/envs/scikit-surgery/lib/python3.6', '', '/Users/mattclarkson/build/scikit-surgerytf/.tox/lint/lib/python3.6/site-packages', '/Users/mattclarkson/build/scikit-surgerytf/.tox/lint/lib/python3.6/site-packages/IPython/extensions', '/Users/mattclarkson/.ipython', '../../', '../../', '../../']
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               1004