# SimBioSys tech interview exercise

1. Design and train an ANN classifier to determine if an image contains an ellipse or rectangle (both or neither not considered). Please do not use an off-the-shelf model, write a CNN using standard building blocks available in PyTorch or TensorFlow and train from scratch. The classification task is easy enough that you do not need a sophisticated model nor do you need to do extensive training.
2. Compute metrics: accuracy, precision, recall, and ROC AUC (all in `sklearn.metrics`).

## Data

You may use whatever framework you find most comfortable. To save time, I've included a data generator class which creates examples on the fly instead of giving you image data. I wrote this generator for a PyTorch model, so apologies if it is difficult to use with TF. The parameters for the shapes (e,g. radius, centroid, ...) are all random. Additionally, the image is corrupted with a small abount of noise. The generator is used (with PyTorch) as

```python
>>> gen = ShapeGenerator(64) # Batch size of 64
>>> ...
>>> for batched_image, batched_target in gen.iter_training():
>>>     # Training examples are purely random and not repeated
>>>     batched_image = torch.from_numpy(batched_image).cuda()
>>>     batched_target = torch.from_numpy(batched_target).cuda()
>>>     batched_pred = model(batched_image)
>>>     crit = loss_fn(batched_pred, batched_target)
>>>     ...
>>> ...
>>> for batched_image, batched_target in gen.iter_validation():
>>>     # Validation examples are computed on instantiation of ShapeGenerator
>>>     # and are constant over the lifetime of gen
>>>     ...
```

The arrays yielded by the generator are ndarrays, so you will need to convert them to the tensor flavor of your choice.

With a quickly slapped together model with no hyperparameter tuning trained for 2 epochs, I was able to get the following metrics:

| Metric | Value|
|----------|-----|
| Accuracy | 0.9855  |
| Precision | 0.9816 |
| Recall | 0.9899 |
| ROC AUC | 0.9992 |

**You do not need to reach or exceed these metrics.** Honestly, so long as your ROC AUC is greater than 0.5, I'm happy. I am most interested in how you implement the model.

If you need any hints or clarification, please email me at tme@simbiosys.com

## Data generator definition

In [1]:
import numpy as np
from skimage.draw import polygon, ellipse

class ShapeGenerator:
    ntrain = 10000
    nval = 2000
    shape = 128, 128
    noise_std = 0.1
    min_size = 100

    def __init__(self, batch_size):
        self.batch_size = batch_size
        self.nb_training = int(np.ceil(self.ntrain/self.batch_size))
        self.nb_validation = int(np.ceil(self.nval/self.batch_size))
        self.rng = np.random.default_rng(0)
        self.validation_data = list(self._batch_images(self.nval)) # precompute these since the rng has just been seeded

    def _batch_images(self, n):
        done = 0
        while done < n:
            done += self.batch_size
            nb = self.batch_size
            if done > n:
                nb = nb - (done-n)
            Xs = np.zeros((nb, *self.shape), np.float32)
            ys = np.zeros(nb, np.float32)
            for i in range(nb):
                X, y = self.random_shape()
                Xs[i,:] = X
                ys[i] = y

            yield Xs, ys

    def random_shape(self):
        def mk_ellipse():
            wr, wc = self.rng.integers([x//2 for x in self.shape])
            r, c = self.rng.integers(self.shape)
            r0, c0 = r-0.5*wr, c-0.5*wc
            r1, c1 = r+0.5*wr, c+0.5*wc
            verts = np.array([[r0,r0,r1,r1], [c0,c1,c1,c0]])
            th = self.rng.uniform(-np.pi, np.pi)
            rot = np.array([[np.cos(th), np.sin(th)], [-np.sin(th), np.cos(th)]])
            verts = (rot @ (verts-[[r],[c]])) + [[r], [c]]
            rr, cc = polygon(verts[0], verts[1],)
            return rr, cc, 1

        def mk_rectangle():
            r, c = self.rng.integers(self.shape)
            l = min(self.shape)
            a,b = self.rng.integers([l//8, l//8], [l//2, l//2])
            th = self.rng.uniform(-np.pi, np.pi)
            rr, cc = ellipse(r, c, a, b, rotation=th)
            return rr, cc, 0

        if self.rng.uniform()<0.5:
            mk_shape = mk_rectangle
        else:
            mk_shape = mk_ellipse

        for _ in range(10000):
            rr, cc, yy = mk_shape()
            if np.all( (rr>0) &  (rr<self.shape[0]-1) & (cc>0) & (cc<self.shape[1]-1) & (rr.size>self.min_size) ):
                break
        else:
            raise RuntimeError("Failed to find an acceptable set of shape parameters")

        mk = self.rng.normal(0, self.noise_std, self.shape).astype(np.float32)
        mk[rr, cc] += 2
        mk -= 1
        return mk, yy

    def iter_training(self):
        yield from self._batch_images(self.ntrain)

    def iter_validation(self):
        yield from self.validation_data

## Your model here

## Your metric reporting here