# Identifeye ML Engineering Challenge

Thank you for spending time on this take-home! We are delighted that you are considering a machine learning role at Identifeye Health. In this exam, you will solve a problem with immediate relevance to us: developing a model for predicting the quality of captured fundus images

## Context:

Doctors rely heavily on good quality images to confidently diagnose diseases and conditions. Thus, it is important to evaluate the quality of a captured image, namely checking that it is artifact free, has the right color balance and the required anatomical features visible, and so on. When an image quality is evaluated to be insufficient, the patient will recapture it to increase the chances of having an image that the doctor is able to interpret reliably. This is one challenge to which we apply machine learning!

Fundus imaging is a well-established retinal imaging technique. Diagnosis is done through the inspection of the optic disc, the macula, and blood vessels. In a good quality image, all these main retinal elements need to be clearly visible, depending on where the image is centered. Additionally, other quality parameters determine the overall usability of the image for interpretation such as illumination uniformity, blur, color balance, contrast, field definition and the presence of artifacts. Below, we show sample bad quality images with their labels in the title, as well as a set of good quality images.


<center> <h3>Bad Quality Images</h3> </center>

 Dust | Eye Blink | Artifact
 :- | :- | :- 
 ![alt](Data/examples/dust.png) | ![alt](Data/examples/eye_blink.png) | ![alt](Data/examples/artifact3.jpg)

Overexposed | Underexposed |  Uneven Illumination
:- | :- | :- 
![alt](Data/examples/overexposed.png) | ![alt](Data/examples/underexposed.png) | ![alt](Data/examples/unevenillum.png)



<center> <h3>Good Quality Images</h3> </center>


Right Eye | Left Eye | Left Eye  
:- | :- | :- 
![alt](Data/examples/8664_right.png) | ![alt](Data/examples/9288_left.png) | ![alt](Data/examples/978_left.png) 

Right Eye | Right Eye | Left Eye 
:- | :- | :- 
![alt](Data/examples/1034_right.png) | ![alt](Data/examples/10457_right.png) | ![alt](Data/examples/11618_left.png) 




## Challenge

**Based on this use case, your assignment is to build a model to predict the quality label of an image.** You have been given images of good (label 0) and bad quality (label 1). We have structured the assignment so that you work with incremental components of the problem in a way that gets you to a more complete understanding. Please make sure to read the directions in each steps!

As you go through this task, it is suggested that you prioritize **exploratory data analysis and principled model selection**. You are also encouraged to consider alternatives to methods you may try and suggest them as future work to explore.

## Ground Rules/Expectations

* You **are not** expected, within the limited time frame, to solve this problem entirely and invent a new way of analyzing image quality. We understand that you have a busy life, and have thus capped the amount of time you can spend on this problem to 3 hours. So, don't prioritize just getting the best performing model for each step!
* You **are** expected to communicate your overarching approach to the problem and the components we have laid out, with clear articulation of answers and presentation of supporting data (tables, plots, etc). So, do walk us through and explain how you used the available data to draw appropriate conclusions!
* You are welcome to use any open-source libraries that you would like. 


## Data

This folder consists of two components:

1. 1300 .pngs in the "fundus" folder consisting of fundus images of varying image quality
2. Two csv files containing the image name and quality label
    - Simple model subset
    
        - This file contains 300 image names and their quality labels. You will be using this subset in the first section of the assignment. Quality labels are good(0) and bad (1). The accuracy of the labels in this subset is confirmed by medically trained professionals. 
        
    - CNN subset
    
        - This file contains 1000 image names and their quality labels. Note that CNN subset does not contain any images that were listed in the simple model subset. Quality labels are good(0) and bad (1). 

Use this data to work through this assignment.

## Rubric

In evaluating this assignment, we will consider the following skills:
1. demonstration of applied ML process
2. coding style and conventions
3. ability to thoughtfully communicate rationale, methods, and next steps

Best of luck!

# Step 0: Exploratory Data Analysis

Explore the data and share your insights!

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pathlib
import cv2

### Define the paths for the dataset 

In [None]:
data_path = pathlib.Path("Data") 
image_path = data_path /  'fundus_images'
simple_model_subset_path = data_path/ 'simple_model_subset.csv'
cnn_subset_path = data_path/ 'cnn_subset.csv'


### Load the label files

In [None]:
simple_model_subset = pd.read_csv(simple_model_subset_path, index_col=0)

cnn_subset = pd.read_csv(cnn_subset_path, index_col=0)


In [None]:
print(simple_model_subset)
print(cnn_subset)

### Define a function for loading images

### Define a function for plotting sample images with their labels 

### Plot and observe the differences between the bad and good images on a data subset 

### Explore the Data 

# Step 1: Simple Model 

Now that we gained insights about the dataset, we can start building models for predicting image quality. In this section, build a model for classifying the images into good and bad classes only using the small data subset. This subset has 300 images from good (label 0) and bad (label 1) classes. Given the size of the dataset, we do not expect you to train a neural network in this section. Feel free to use any other model you like. 

### Q: Define a model for training 

Tell us why you picked this model. What are the advantages/disadvantages?

### Your Answer:

### Q: Metrics

- Explain what metrics you will use for evaluating model performance. 
- If the task was multi level classification (good, accept, bad), what metrics would you be looking at?
- In the case where data is not evenly distributed accross classes, how would you modify the mentioned metrics or add other metrics to best present your model performance?

### Your Answer: 

### Q: Train & Test

Train/test the model of your choice and present the results. 

### Your Answer:

### Q: Plotting Predictions

Part of making a good model is visualizing the results. Update the image plotting function to inspect where the model is working and where it is not. 

### Your Answer:

### Q: Improvements 

Describe what steps, if any, should be taken to make the model perform better.

### Your Answer:

# Step 2: CNN Model

In this section, we will train a CNN model to classify the images into good and bad classes using the larger data subset. This dataset has 1000 images good (label 0) and bad (label 1) classes. Feel free to use any model including 
pre-trained models. 

## Q: Loss function

Your model predicts the quality of an image. At this step we have two classes: good and bad. Tell us which loss function you will be using & why?

### Your Answer:


## Q: Train the model

### Your Answer:

## Q: Training Metrics

Plot the loss and any other metric of interest on the training and validation data. What observations or comments can you making regarding the training job?

### Your Answer:

## Q: Plotting Predictions
Plot some of model predictions with true labels. Share your insights
### Your Answer:

## Q: Improvements
Describe what steps, if any, should be taken to make the model perform better. If you have time, try to sort any issues that you've identified and train a new model. Feel free to write your own functions in the notebook to inspect / correct the training data, if necessary. 
- Is your model making any obvious errors? 
- Compare the results of the CNN model with the simple model you trained earlier. Comment on the outcome of this comparison.

### Your Answer:

# Step 3: Multi-level Classification

In this stage, consider building a model to predict the image quality with multiple class labels: good, accept and bad. You are not expected to provide code or achieve perfection! Rather, we would like to at least see in writing what next steps you think are useful and why. Specifically, try to address:
- What methods you would implement from a model and/or data preprocessing standpoint
- What challenges you foresee in implementing your selected model and/or data preprocessing approach
- How you would evaluate a model that predicts these 3-class labels against the previous models?