# Now it's time to work on the final group project.  All of the pieces we've learned are integrated into this initial template to work on improving F1 measure on classification of pneumonia evidence

# You are welcome to spend your time however you'd like but here are a few ideas of how to improve your system:
* Improve targets.  Are there any False Negatives your system is missing?  Are there regular expressions that would help?
* Improve modifiers.  Not all modifiers typically used in practice are the modifiers starter file.  Are there some to add?
* Improve document classification rules.  What rules work best?  What is the best "default" classification?
* Consider sectioning the document.  Are there certain headers or subsections which are more or less likely to contain evidence?  You could modify your own "markup" function to do this or you could add Modifiers to do this in some cases

# Also before we get going, a few Pro Tips:
* Remember that pyConText files need to be tab delimited.  IF you edit these files in JupyterHub, it might be difficult to see the tabs and if you press "TAB" you will actually get spaces, so try to use Copy-and-Paste
* Classification rules and modifiers are difficult.  Don't be afraid to ask for help

## Development of your system:
* We have found the tools below for highlighting and graphing False Positives and False Negatives to be very useful.  We've provided them below in case it helps you as well

In [None]:
# NOTE : You may need to modify this color mapping if you add  Target or Modifier categories not found here
# prepare some colors for displaying any markup we might see
colors = {
    "evidence_of_pneumonia": "orange",
    "definite_negated_existence": "red",
    "probable_negated_existence": "indianred",
    "ambivalent_existence": "orange",
    "probable_existence": "forestgreen",
    "definite_existence": "green",
    "historical": "goldenrod",
    "indication": "pink",
    "acute": "golden"
}

In [None]:
# This function let's us iterate through all documents and view the markup
def view_pycontext_graph(class_results, colors):
    @interact(i=ipywidgets.IntSlider(min=0, max=len(class_results)-1))
    def _view_markup(i):
        class_result = class_results[i]
        rview.markup_to_pydot(class_result)
        display(Image("tmp.png"))
        
        report_html = mark_document_with_html(class_result.context_document, colors = evidence_only_colors, default_color="black")
        
        display(HTML(report_html))
        
# This function let's us iterate through all documents and view the markup
def view_annotation_markup(anno_docs, colors):
    @interact(i=ipywidgets.IntSlider(min=0, max=len(anno_docs)-1))
    def _view_markup(i):
        report_html = pneumonia_annotation_html_markup(anno_docs[i])
        
        display(HTML(report_html))

In [None]:
def list_false_negatives(gold_docs, prediction_function):
    fn_docs={}
    for doc_name, gold_doc in gold_docs.items():
        gold_label=gold_doc.positive_label;
        pred_label = prediction_function(gold_doc.text)
        if gold_label==1 and pred_label==0:
            fn_docs[doc_name]=gold_doc            
    return fn_docs  

def list_false_positives(gold_docs, prediction_function):
    fn_docs={}
    for doc_name, gold_doc in gold_docs.items():
        gold_label=gold_doc.positive_label;
        pred_label = prediction_function(gold_doc.text)
        if gold_label==0 and pred_label==1:
            fn_docs[doc_name]=gold_doc            
    return fn_docs  

In [None]:
%%time

# get our current set of false negatives and false positives if we use our simple toy classifier
# which uses targets and a simplified implementation of modifiers
current_false_negatives = list_false_negatives(annotated_doc_map, target_modifier_toy_classifier.predict)
current_false_positives = list_false_positives(annotated_doc_map, target_modifier_toy_classifier.predict)

# prepare each of these for visualization
fn_report_results = []
fp_report_results = []
print('Marking up False Negatives')
for anno_doc in current_false_negatives.values():
    report_context = markup_context_document(anno_doc.text, modifiers3, targets3)
    # package this up into a class that the RadNLP utilities can use
    results = classrslts(context_document=report_context, exam_type="Chest X-Ray", report_text=anno_doc.text, classification_result='N/A')
    fn_report_results.append(results)
    
print('Marking up False Positives')
for anno_doc in current_false_positives.values():
    report_context = markup_context_document(anno_doc.text, modifiers3, targets3)
    # package this up into a class that the RadNLP utilities can use
    results = classrslts(context_document=report_context, exam_type="Chest X-Ray", report_text=anno_doc.text, classification_result='N/A')
    fp_report_results.append(results)

print('Current total False Negatives : {0}'.format(len(current_false_negatives)))
print('Current total False Positives : {0}'.format(len(current_false_positives)))

## For False Negatives, it's most useful to see the expert span annotations for positive pneumonia evidence to see if there may be targets that should be added

In [None]:
view_annotation_markup(list(current_false_negatives.values()), colors)

## For False Positives, it's most useful to see a pyConText graph since there may need to be modifiers adjusted so that targets can be properly utilized in classification

In [None]:
view_pycontext_graph(fp_report_results, colors)