## Narrative Embeddings for clinical report classification

Ok, so we have a database of clinical incident report, in the form of chronologic stories narrated by one of the healthcare professionnal concerned.

There's a lot of research papers trying to analyze narratives as chronological orderered series of events, with various actors. Using those tools might help greatly to classify our clinical report, every available words or documents vector embeddings are indeed lacking of any narrative representation. I am trying to create a 'narrative embedding' for my reports, that i will use to train and test multi-label classification model. 

This is a three steps project : 

1) Extracting event chains from my dataset according to the 2009 paper of Chambers https://www.usna.edu/Users/cs/nchamber/pubs/acl09-narrative-schema.pdf

2) Create Narrative Event Evolutionnary Graph (NEEG) according to the paper arXiv:1805.05081v2 

3) Implement a Scaled Graph neural network, i.e neural network that will take NEEG in input.

I use Spacy and the HuggingFace coreference parser to extract event chains from the database : https://github.com/huggingface/neuralcoref

In [11]:

import pandas as pd
import numpy as np
from tqdm import tqdm_notebook

import spacy
import neuralcoref

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

<spacy.lang.en.English at 0x1fc89d4e9c8>

In [73]:
# first try on ASRS database
df = pd.read_excel('datasets/ASRStotal.xlsx')
df.dropna(how='all', axis=1)

doc = nlp(df.Narrative[30001])
print(doc._.coref_clusters)
print(df.Narrative[30001])

[It: [It, It], us: [us, we, our, us, our, we, We], the long day: [the long day, the day, the day], the approach: [the approach, the approach], the runway: [the runway, the runway, the parallel runway], the ILS: [the ILS, It, it], the Controller: [the Controller, the Controller]]
It was a late flight and both of us were getting tired from the long day. It was leg three plus I had commuted by air to start the day. The total flight was 3 hours. Everything went great until we were cleared for the approach. It was hazy with a visibility of 5 miles. I was looking for the runway but because our windows were fogged up more than usual I could not see very well. As I armed the approach I did not realize I had the VOR, not the ILS tuned as the active frequency. It was in the backup. As I looked at the FMA, I noticed it had VAPP instead of LOC. I finally saw the runway while still on the initial heading the Controller had given to us and by that time--as I tried to turn--the Controller cancelled o

In [74]:
doc2 = nlp(str(doc._.coref_resolved))

events = []

for token in doc2 :
    event = { 'predicat': '', 'subj':'','obj':'','iobj':''}
    if token.pos_ == "VERB" : 
        event['predicat'] = token
        for t in token.children :
            if t.dep_ == "nsubj" or t.dep_ == "nsubjpass" :
                #print(" ".join([t2.text for t2 in t.subtree]))
                event['subj'] = t.subtree
            if (t.dep_ == "dobj" or t.dep_ == "obj" or t.dep_ == "pobj") :
                #print(token," ".join([t2.text for t2 in t.subtree]))
                event['obj'] = t.subtree
            if t.dep_ == "prep" :
                #print(" ".join([t2.text for t2 in t.subtree]))
                event['iobj'] = t.subtree
        events.append(event)
        
print(len(events))

62
