# Example for using TBI Extractor

Depending upon your installation, you may get "RuntimeWarning" from NumPy. These warnings are visible whenever you import a package that was compiled against an older NumPy than is installed. Ignore these with the following code.

In [1]:
# ignore warnings
import warnings
warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

In [2]:
# imports
# Python     3.6.6
# pandas     0.23.4
import pandas as pd
import nlp_algorithm
import nlp_summarize
import datetime

Directory structure under root directory should be as follows:

> data
        
            example_df.csv
            lexical_modifiers.tsv
            lexical_targets.tsv
        
        
> scripts
        
            nlp_algorithm.py
            nlp_run_algorithm.ipynb
            nlp_summarize.py

In [3]:
# inputs and outputs
root_path = input('Enter root directory path: ') # example: '/data1/nlp/tbiExtractor'
data_path = root_path + '/data'

Enter root directory path: /data1/nlp/tbiExtractor


In [4]:
# load example df
infile = data_path + '/example_df.csv'
df_to_algorithm = pd.read_csv(infile, dtype=str)    

In [5]:
# show structure of df_to_algorithm
df_to_algorithm.head()

Unnamed: 0,PatientNum,CT_report,CT_report_id
0,1001,Findings: There is hyperattenuation predominan...,321306
1,1002,Findings: There is no definite evidence of int...,502453


In [6]:
# submit data to main nlp algorithm and reset index
filepart = 'example'
df_from_algorithm = nlp_algorithm.main_nlp(df_to_algorithm, filepart, data_path)
df_from_algorithm = df_from_algorithm.reset_index()
df_from_algorithm.drop(columns='index', inplace=True)

In [7]:
# show structure of df_from_algorithm
df_from_algorithm.head()

Unnamed: 0,CT_report_id,target,target_group,modifier,modifier_type
0,321306,shear,diffuse_axonal,severe,present
1,321306,"parenchymal hemorrhages,",intraparenchymal_hemorrage,multifocal,present
2,321306,suprasellar cistern,cistern,leftward,present
3,321306,hemorrhage is noted in the occipital horn,intraventricular_hemorrhage,additional,present
4,321306,subarachnoid hemorrhage,subarachnoid_hemorrhage,multifocal,present


In [8]:
# load target list
targets = pd.read_csv(data_path + '/lexical_targets.tsv', delimiter='\t')
target_list = list(targets['Type'].drop_duplicates(keep='first').str.lower())

In [9]:
# get unique ct report ids
ct_report_id_unique = list(df_from_algorithm['CT_report_id'].drop_duplicates(keep='first').astype(str).str.lower())

In [10]:
# for each report, summarize
for ct_report in ct_report_id_unique:
    
    df_from_algorithm = nlp_summarize.ct_summary_report(df_from_algorithm, ct_report, target_list)

In [11]:
# setup output file
get_today = datetime.date.today()
outfile = data_path + '/nlp_algorithm_output_summarized_' + filepart + '_' + str(get_today) + '.csv'

In [12]:
# write output to file
df_from_algorithm.to_csv(outfile, index=False)  

In [13]:
# pivot to create one report per row with each target in a column
pivot_df = df_from_algorithm.pivot(index='CT_report_id', columns='target_group', values='modifier_type')

In [14]:
# show structure of pivot_df
pivot_df

target_group,aneurysm,anoxic,atrophy,cistern,contusion,diffuse_axonal,epidural_hemorrhage,facial_fracture,fluid,gray_white_differentiation,...,intraventricular_hemorrhage,ischemia,mass_effect,microhemorrhage,midline_shift,pneumocephalus,skull_fracture,subarachnoid_hemorrhage,subdural_hemorrhage,swelling
CT_report_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
321306,absent,absent,absent,abnormal,absent,present,absent,absent,present,normal,...,present,absent,absent,absent,absent,absent,absent,present,absent,absent
502453,absent,present,absent,normal,absent,present,absent,present,absent,abnormal,...,absent,absent,absent,absent,absent,absent,present,absent,absent,present


In [15]:
# transpose to show full results
pivot_df.transpose()

CT_report_id,321306,502453
target_group,Unnamed: 1_level_1,Unnamed: 2_level_1
aneurysm,absent,absent
anoxic,absent,present
atrophy,absent,absent
cistern,abnormal,normal
contusion,absent,absent
diffuse_axonal,present,present
epidural_hemorrhage,absent,absent
facial_fracture,absent,present
fluid,present,absent
gray_white_differentiation,normal,abnormal


In [16]:
# compare above results to report
df_to_algorithm.loc[df_to_algorithm['CT_report_id'] == '321306', 'CT_report'].item()

'Findings: There is hyperattenuation predominantly involving the right sylvian fissure, left superior parietal sulci, right cingulate sulci and in the quadrigeminal cistern. There is layering hyperattenuation within the occipital horn of the left lateral ventricle. There is layering hyperattenuation in the suprasellar cistern. Hyperattenuation is noted around the partially visualized spinal cord. Foci of parenchymal hemorrhages are noted in the inferior right temporal lobe, left frontal lobe and the right subthalamic nuclei. There is no significant midline shift. The bony calvarium and the bones of the skull base appear normal. The visualized portions of the paranasal sinuses and the mastoid air cells are clear. No external soft tissue swelling. The orbits are unremarkable. There is a small amount of fluid in the right sphenoid sinus. Impression: 1. Multifocal subarachnoid hemorrhage as described above most notably in the right sylvian fissure and left superior parietal lobe. Hemorrhag

In [17]:
# compare above results to report
df_to_algorithm.loc[df_to_algorithm['CT_report_id'] == '502453', 'CT_report'].item()

'Findings: There is no definite evidence of intracranial hemorrhage, mass effect, midline shift or abnormal extraaxial fluid collection. There are numerous subtle punctate hyperdense foci scattered throughout the brain. The ventricles do not appear enlarged out of proportion to the cerebral sulci. Gray-white differentiation subtle a slightly decreased. There is subtle diffuse swelling of the brain. There are multiple skull base fractures and extensive facial fractures which will be more completely detailed on the accompanying facial bone CT reconstructions. There are mildly displaced bilateral frontal bone fractures through the anterior table of the frontal sinuses and nondisplaced fracture of the greater wing of the right sphenoid.. There are extensive sinus fractures with near complete opacification of the maxillary sinuses, ethmoid air cells, and sphenoid sinuses. There are a few scattered left ethmoid air cell opacities. Right ethmoid air cells are relatively clear.. There are mult

In [18]:
# setup output file
get_today = datetime.date.today()
outfile = data_path + '/nlp_report_output_' + filepart + '_' + str(get_today) + '.csv'

In [19]:
# write output to file
pivot_df.to_csv(outfile, index=False)  

# eof