# Objectives:
* Understand standard metrics to use in classification
* Define Precision, Recall, F1 measure the the relationship between them
* Define terms like Training Set and Holdout set

In [1]:
from IPython.display import HTML

# Before we continue in this notebook, let's all pause and complete this short quiz so that we can review the responses together once everyone is complete:
https://docs.google.com/forms/d/1rD7J9d-d-ELzczakx_6YmHY9BkNnPN63C9vyw_d98Uk/viewform?edit_requested=true

## NOTE : The quiz above is based upon watching the video below which you've hopefully already seen.  If you missed it, I recommend watching it later:

In [2]:
HTML('<div style="position:relative;height:0;padding-bottom:56.25%"><iframe width="500" height="280" src="https://www.youtube.com/embed/VPZiJGNX4_s" '\
     'frameborder="0" allowfullscreen></iframe></div>')

## We need to do is to define True Positives, False Positives, False Negatives, etc.  This is essential for defining the metrics we will use.  This example relates to disease but these concepts are used for all classification analytics including:
* Disease prediction
* Spam filtering
* Fraud detection
* Image recognition
* etc...

### (OPTIONAL) If you want a better explanation in medical informatics terms, I recommend this video from USMLE Biostatistics illustrates these concepts in terms of disease and medical informatics:

In [5]:
HTML('<div style="position:relative;height:0;padding-bottom:56.25%"><iframe src="https://www.youtube.com/embed/VAogHvCqf3E?ecver=2"'\
     'width="500" height="280" frameborder="0" style="position:absolute;width:100%;height:100%;left:0" allowfullscreen></iframe></div>')

## Now that we have a solid definition of True Positives, False Positives, and this 2x2 grid we can establish the metrics of Precision, Recall and F1 measure.

### (OPTIONAL) If you would like a different view on the subject from NLP, I recommend this video below.  Christopher Manning is an outstanding researcher and professor in Natural Language Processing.  In this video, he explains how we translate our counts of TP, FP, TN, FN into these values we can use to optimize our solutions:

In [9]:
# NOTE : Try to show this video : https://www.youtube.com/watch?v=VAogHvCqf3E
HTML('<div style="position:relative;height:0;padding-bottom:56.25%"><iframe src="https://d19vezwu8eufl6.cloudfront.net/nlp/recoded_videos%2Fnlp-142.mp4 " '\
     'width="500" height="280" frameborder="0" style="position:absolute;width:100%;height:100%;left:0" allowfullscreen></iframe></div>')

# Objectives of our Group Project:
## In this course on rule-based NLP methods for identifying cases of pneumonia, we will be working in groups to develop rules to improve classification of pneumonia based on radiology text report from chest x-rays.

## The way that we will optimize performance is by F1 measure.  This means that Precision and Recall will both be important as F1 is the harmonic mean between them.

In [12]:
HTML('<div><p>Source : Wikipedia</p><p><img src=https://wikimedia.org/api/rest_v1/media/math/render/svg/7d63c1f5c659f95b5dfe5893213cc8ea7f8bea0a></p>')

## One of of the most significant challenges in developing rule-based NLP systems is ensuring that our solution will generalize.  
### By this, we mean that that we want our system to perform well not only in our initial dataset but that it can perform well on any data it may encounter in the future.  We do this by breaking the data up into sets.

# Generally speaking in NLP (and also in Machine Learning) there are typically 2 main datasets that labeled data may be broken into : 

1. **Training Set** : Labeled data used to **"train"** or **develop rules**.  The goal is that this process will extend to good performance on other sets.  Evaluation metrics are used on this dataset, but they must be considered carefully since this is only a subset or all possible instances or documents
2. **Testing Set (Holdout Set)** : This is data that is typically **not observed or used during training time**.  Thus, the name **"hold out"** since this data is typically held to the side so that it cannot be used to improve rules.  This set is typically not used until all training/development is complete.  Evaluation metrics on this set usually indiciate how well a particular dataset may be able to generalize

## Note that you may encounter the term Validation Set in analytics tasks as well.  This is common in Machine Learning tasks.  
## For this course, our Training Set will also serve as our Validation Set.  In other words, we will that "slice" of the entire set for both development and evaluation.

## In this course, we have a total set of 100 chest x-ray reports which an expert pulmonologist has annotated for us to use.   Here is a picture of Dr. Barbara Jones.  Please thank her if you meet her:

In [13]:
HTML('<div><p><a href=https://healthcare.utah.edu/fad/mddetail.php?physicianID=u0102859>Dr. Barbara Jones</a></p>'\
     '<p><img src=https://securembm.uuhsc.utah.edu/zeus/public/mbm-media/faculty-profile?facultyPK=u0102859></p>')

# Plan for Group Projects

## In this course, we will work with a Training Set for the first 2 days.  This data can be used for development of rules.  On the third and final day of the course, the Test (Holdout) set will be made available and we will determine how each team performed on this task.

## (OPTIONAL) Before we conclude, it's useful to illustrate examples in the literature where different terms may be used to define the same mathematical concept.

### This chart from Wikipedia outlines how these metrics all relate.  While we will be focusing on Precision, Recall and F1, it's useful to see how these interact since often clinicians may use other terms like Sensitivity and Specifity.  Some of such terms actually mean the same thing as Precision and Recall.  Specifically, Sensitivity is the same as Recall.  Precision is often called Positive Predictive Value (PPV):

In [14]:
HTML('<div><p>Source : Wikipedia</p><p><img src=images/classification_metrics.jpg></p>')

<br/><br/>This material presented as part of the DeCART Data Science for the Health Science Summer Program at the University of Utah in 2019.<br/>
Presenters : Dr. Wendy Chapman, Kelly Peterson, Alec Chapman, Jianlin Shi <br> Acknowledgement: Many thanks to Olga Patterson because part of the materials are adopted from his previous work.