# Interview Exercise - Senior Data Scientist Computer Vision - Interview

---



## Introduction

This notebook contains a short (15 minute) series of exercises to assist in identifying a **Data Scientist - Computer Vision Contractor** for the NHS AI Lab Skunkworks SWAT team.

### What is a SWAT Team?

A SWAT team is a rapid action team spun up to assist an NHS organisation, such as a trust, build and/or implement an AI solution. This may be a new project built from scratch, or a continuation of a project that we have released as Open Source on our [Github](https://github.com/nhsx).

### What is the goal of this exercise?

This is not a Google coding interview. The goal is to ensure sufficient proficiency with the data science tools required for this project, but above all, the **ability to learn quickly, and experiment**.

* Feel free to Google/Stackoverflow for help during this process ("open book")
* Don't worry about completing all the exercises during this interview.

## Instructions

You will be presented with a number of activities. Take your time to read through the instructions and let your interviewer know when you are ready to start.


## Exercise 1 - Working with DICOM files

This project will involve working with DICOM files.

Using the [pydicom](https://github.com/pydicom/pydicom) library, load the `CT_small.dcm` sample and plot it.

In [None]:
pip install matplotlib pydicom

In [None]:
import matplotlib.pyplot as plt
from pydicom import dcmread
from pydicom.data import get_testdata_file

# The path to a pydicom test dataset
path = get_testdata_file("CT_small.dcm")
ds = dcmread(path)
# `arr` is a numpy.ndarray
arr = ds.pixel_array

plt.imshow(arr, cmap="gray")
plt.show()

Our projects typically do not include personal data in line with GDPR and a Data Protection Impact Assessment (DPIA) completed ahead of time.

Can you check for personal data in the test dataset?

In [None]:
ds

## Exercise 2 - Working with DICOM slices

We are going to work with CT data, which comprises of multiple "slices" per scan.

We're going to start by loading the slice data from [PCIR](http://www.pcir.org/researchers/54879843_20060101.html) which we have downloaded locally:


In [None]:
from google.colab import drive
import glob
import numpy as np


drive.mount('/content/gdrive')
slice_data_path="/content/gdrive/Shareddrives/NHS AI Lab & Team/Skunkworks/Projects and procurement/2 SWAT projects/sample_data/DICOM/Crane SPC/"

# load the DICOM files
files = []
for fname in glob.glob(slice_data_path + "*", recursive=False):
    print("loading: {}".format(fname))
    files.append(dcmread(fname))

print("file count: {}".format(len(files)))

In [None]:
# skip files with no SliceLocation (eg scout views)
slices = []
skipcount = 0
for f in files:
    if hasattr(f, 'SliceLocation'):
        slices.append(f)
    else:
        skipcount = skipcount + 1

print("skipped, no SliceLocation: {}".format(skipcount))

In [None]:
# ensure they are in the correct order
slices = sorted(slices, key=lambda s: s.SliceLocation)

# pixel aspects, assuming all slices are the same
ps = slices[0].PixelSpacing
ss = slices[0].SliceThickness
ax_aspect = ps[1]/ps[0]
sag_aspect = ps[1]/ss
cor_aspect = ss/ps[0]

# create 3D array
img_shape = list(slices[0].pixel_array.shape)
img_shape.append(len(slices))
img3d = np.zeros(img_shape)

# fill 3D array with the images from the files
for i, s in enumerate(slices):
    img2d = s.pixel_array
    img3d[:, :, i] = img2d


In [None]:
# plot 3 orthogonal slices
a1 = plt.subplot(2, 2, 1)
plt.imshow(img3d[:, :, img_shape[2]//2])
a1.set_aspect(ax_aspect)

a2 = plt.subplot(2, 2, 2)
plt.imshow(img3d[:, img_shape[1]//2, :])
a2.set_aspect(sag_aspect)

a3 = plt.subplot(2, 2, 3)
plt.imshow(img3d[img_shape[0]//2, :, :].T)
a3.set_aspect(cor_aspect)

plt.show()

In [None]:
from ipywidgets import interact
plt.figure(1)
def dicom_animation(x):
    plt.imshow(slices[x].pixel_array, cmap = plt.cm.gray)
    return x
interact(dicom_animation, x=(0, len(slices)-1));