# Part A - Finding the NASH

Natural Language Processing (NLP) doesn't have to be hard! For many tasks simply finding a bunch of notes that are helpful is enough. In this example we have a nice term (NASH) that is fairly unambiguous. We just want to find patients who may have NASH for some further study.

In [None]:
# First off - load all the silly python libraries we are going to need 
import pandas as pd
import numpy as np
import random
from IPython.core.display import display, HTML


from google.colab import auth
from google.cloud import bigquery
from google.colab import files
import os

In [0]:
auth.authenticate_user() # authenticating - connecting to google

In [0]:
#This is how you connect to our project. You would change this in your own project.
project_id='new-zealand-2018-datathon'
os.environ["GOOGLE_CLOUD_PROJECT"]=project_id
# Read data from BigQuery into pandas dataframes.
def run_query(query):
  return pd.io.gbq.read_gbq(query, project_id=project_id, verbose=False, configuration={'query':{'useLegacySql': False}})

In [0]:
# Now load the data
# Notice this ooks like our SQL query we practiced in the SQL section
notes = run_query('''
SELECT row_id, subject_id, hadm_id, TEXT
FROM `physionet-data.mimiciii_notes.noteevents`
WHERE CATEGORY = 'Discharge summary'
''')

In [0]:
# Here is the list of terms we are going to consider "good"
terms = ['NASH', 'nonalcoholic steathohepatitis']

In [0]:
# Now scan through all of the notes. Do any of the terms appear? If so stash the note 
# id for future use

matches = []

for index, row in notes.iterrows():
    if any(x in row['TEXT'] for x in terms):
        matches.append(row['ROW_ID'])

print("Found " + str(len(matches)) + " matching notes.")

Found 0 matching notes.


In [0]:
# Display a random note that matches. You can rerun this cell to get another note.
# The fancy stuff is just highlighting the match to make it easier to find.

display_id = random.choice(matches)
text = notes[notes['ROW_ID'] == display_id].iloc[0]['TEXT']
for term in terms:
    text = text.replace(term, "<font color=\"red\">" + term + "</font>")
display(HTML("<pre>" + text + "</pre>"))
