<a href="https://colab.research.google.com/github/SuperXiang/ADSProject1/blob/master/Jeremy_Overjet_ML_Engineering_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions: You have 1.5 hrs to complete the test. There are 4 questions. Please feel free to use any Python library/package. If you are not able to complete the question using code, please attempt using pseudocode. If you are unclear on any question, state your assumptions and complete accordingly.

1) Please build a function that takes in a numpy array (h x w x c) that resizes it to a given h_max and w_max but keeps the aspect ratio the same.

In [0]:
import cv2

def resize_same_aspect_ratio(arr, width = None, height = None, inter = cv2.INTER_AREA):
  c = None
  (h, w)=(h_max, w_max)

  if width is None and height is None:
    return arr

  if width is None:
    tmp = height / float(h)
    c = (int(w * tmp), height)

  else:
    tmp = width / float(w)
    c = (width, int(h * tmp))

  return cv2.resize(arr, c, interpolation = inter) 

2) The Dockerfile below installs PyTorch and serves a PyTorch model. Please suggest two ways to optimize the image size.

```python

FROM debian:stable

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# install basics
RUN apt-get update -y \
 && apt-get install -y \
    apt-utils \ 
    git \ 
    curl \ 
    ca-certificates \ 
    bzip2 \
    cmake \ 
    tree \ 
    bmon \ 
    g++ \
    libglib2.0-0 \ 
    libsm6 \
    libxext6 \
    libxrender-dev


# Install Miniconda
RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
 && chmod +x /miniconda.sh \
 && /miniconda.sh -b -p /miniconda

ENV PATH=/miniconda/bin:$PATH

# Create a Python 3.6 environment
RUN /miniconda/bin/conda install -y conda-build \
 && /miniconda/bin/conda create -y --name py36 python=3.6.7 \
 && /miniconda/bin/conda clean -ya

ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false


# Install PyTorch 1.0 Nightly
RUN conda install pytorch-nightly -c pytorch \
 && conda clean -ya

RUN conda install torchvision -c pytorch \
 && conda clean -ya

RUN conda install -c conda-forge pycocotools && conda clean -ya

# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git \
 && cd maskrcnn-benchmark \
 && python setup.py build develop

# install apex
RUN git clone https://github.com/NVIDIA/apex.git \
 && cd apex \
 && python setup.py install --cpp_ext


# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY tools/app.py ./
COPY tools/output_predictions_online.py ./
COPY tools/output_predictions_single.py ./
COPY configs/diseases/calc_deploy.yaml ./
COPY maskrcnn_benchmark ./
COPY ./model_best.pth ./best_weights.pth
COPY tools/requirements.txt ./


# Install production dependencies.
RUN pip install -r requirements.txt

# Install production dependencies.
RUN pip install Flask gunicorn

# Run the web service on container startup.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 app:app

```




1. Put as much as the RUN commands you can into a single RUN statement.
2. Use docker-squash or use --squash while building.  

3) Please compute the Precision and Recall for predictions of bounding boxes against ground truth of bounding boxes given a threshold. A reference is https://www.jeremyjordan.me/evaluating-image-segmentation-models/ (See section: "Calculating Precision"). The function signature is given below. Note that you should implement a function that returns the intersection and union of two bounding boxes.



In [0]:
import pandas as pd
import numpy as np
import cv2

def compute_precision_recall(prediction,ground_truth, threshold):
  """prediction and ground_truth are lists of bounding boxes. East list is of format: [[(x1,y1),(x1,y2),(x2,y2),(x2,y1)],...]
  Assume you have available a function that returns the intersection and the union of two bonding boxes: 
  intersection, union = get_intersection_union(bbox1,bbox2)
 """
  # PLEASE IMPLEMENT THIS
  data = pd.read_csv('ground_truth.csv')
  pred = pd.read_csv('prediction.csv')
  intersection, union = get_intersection_union(prediction,ground_truth)
  iou = intersecion / union

  precision = [] 
  recall = []
  evaluation = pd.DataFrame()
  evaluation['IOU'] = data.apply(iou, axis = 1)
  evaluation['TP/FP'] = evaluation['IOU'].apply(lambda x: 'TP' if x >= 0.5 else 'FP')
  TP = FP = 0
  FN = len(evaluation['TP/FP'] == 'TP')
  for column, row in evaluation.iterrows():
    if row.IOU > 0.5:
      TP = TP + 1
    else:
      FP = FP + 1
    
    try:
      AP = TP / (TP + FP)
      rec = TP / (TP + FN)

    except ZeroDivisionError:
      AP = recall = 0.0

    precision.append(AP)
    recall.append(rec)

  return precision, recall

4) 
You are given filenames with the format: ```00...0patientid00.TIF```. The ```0``` to the left of ```patientid``` repeats a variable number of times as shown by the ellipses. ```patientid``` is numeric such as `18` or `9999` and doesn't begin with zero. The ```0``` to the right of ```patientid``` repeats exactly twice as shown.

Write a function with below signature to extract the patient id. E.g. for ```00001700.TIF```, the function should return ```17```. This function should work for arbitrary extensions such as ```TIF```, ```JPEG``` and so on.

def extract(filename):


  return patient_id
```




In [0]:
import re

def extract(filename):
  tmp = re.findall(r'\d+', filename)
  patient_id = list(map(int, tmp))

  return patient_id