<h1>Mapping Bronchiectasis Patients</h1>

Dr Mustafa Elsayed at Southmead Hospital in Bristol is currently conducting research relating to Non-Tuberculosis Mycobacterium (NTM) isolates in patients with Bronchiectasis. Included in his research interests is the possibility of patient to patient transmission of NTM strains. It was at his request that I conducted Geocoding of the Bronchiectasis cohort his research team was using, which is the subject matter of this notebook.

The dataset consisted of 312 unique patients, detailing the patients postcode and the bacterial isolate obtained from sputum culture. Using the Google Maps API I obtained latitude and longitudal coordinates for each patient, and created an interactive map detailing their corresponding sputum culture result; each patient is represente by an interactive pinpoint that details the Pseudomonas or NTM bacterial strain isolated. The hope being that such visualisation of data could give insight to whether related strains cluster in particular patient locations.

Due to the nature of this work I am unable to share the raw data files or findings, but below is the source code used for data processing and producing the interactive map.

<h3>1. Import Dependencies</h3>

In [7]:
import pandas as pd
import requests
import time
import gmplot
from bokeh.io import output_file, output_notebook, show
from bokeh.models import (
  GMapPlot, GMapOptions, ColumnDataSource, Circle, LogColorMapper, BasicTicker, ColorBar,
    Range1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool
)
from bokeh.models.mappers import LinearColorMapper
from bokeh.palettes import Viridis5

<h2>2. Data imports and Ad-Hoc investigations

In [None]:
data = pd.read_excel("Bronchiectasis_Mastersheet_Postcode.xlsx")

In [None]:
data.head()

In [None]:
data.info()

<h3>3. Data munging</h3>

Create a pandas dataframe with the patient ID, postcode, and the NTM and PSE status

In [None]:
data = data[['UID','PostCode','NTM','PSE', 'NTM sp']]

In [None]:
data.info()

In [None]:
data['NTM'].value_counts()

In [None]:
data['PSE'].value_counts()

In [None]:
ntm_dummies = pd.get_dummies(data['NTM'], prefix="NTM")

In [None]:
pse_dummies = pd.get_dummies(data['PSE'], prefix="PSE")

Create dummy columns for the boolean values representing PSE and NTM status

In [None]:
data.drop(['PSE', 'NTM'], inplace=True, axis=1)

In [None]:
data = pd.concat([data, pse_dummies, ntm_dummies], axis=1)

In [None]:
data.info()

<h3>4. Get coordinates for each patient using postcode</h3>

In [None]:
def get_long(postcode):
    """Using patient postcode, fetch longitudal coordinate using Google Maps API"""
    try:
        postcode = postcode.replace(" ", "+")
        response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address={}'.format(postcode))
        resp_json_payload = response.json()
        return resp_json_payload['results'][0]['geometry']['location']['lng']
    except:
        time.sleep(2)
        return get_long(postcode)

In [None]:
def get_lat(postcode):
    """Using patient postcode, fetch latitude coordinate using Google Maps API"""
    try:
        postcode = postcode.replace(" ", "+")
        response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address={}'.format(postcode))
        resp_json_payload = response.json()
        return resp_json_payload['results'][0]['geometry']['location']['lat']
    except:
        time.sleep(2)
        return get_long(postcode)

In [None]:
data['long'] = data.PostCode.apply(get_long)

In [None]:
data['lat'] = data.PostCode.apply(get_lat)

In [None]:
data.info()

<h3>5. Plot data using Gmap Python library</h3>

To get a general feel for the geographic distribution of data, I start with a static map using the <a href="https://github.com/pbugnion/gmaps">GMaps</a> a Python library available on GitHub. 

In [None]:
#Save the processed dataset to a CSV file, drop index
data.to_csv("loc_data.csv", index=False)

In [None]:
#Seperate data into Pseudomonas isolates, NTM isolates, and all other data
pyo = data[data['PSE_Y'] == 1]
ntm = data[data['NTM_Y'] == 1]
all_other = data[(data['PSE_Y'] == 0) & (data['NTM_Y'] == 0)]

In [None]:
#Plot data using the gmap scatter method
gmap = gmplot.GoogleMapPlotter(51.3758, -2.3599, 11)
gmap.scatter(all_other['lat'].tolist(), all_other['long'].tolist(), '#3B0B39', size=40, marker=True)
gmap.scatter(pyo['lat'].tolist(), pyo['long'].tolist(), '#32CD32', size=40, marker=False)
gmap.scatter(ntm['lat'].tolist(), ntm['long'].tolist(), '#ff0000', size=40, marker=False)
gmap.draw("all_samples.html")

<h3>6. Interactive plot</h3>

Happy with the static plot, I will proceed to using the Bokeh JS library to create an interactive display of the Bronchiectasis cohort.

In [2]:
data = pd.read_csv("loc_data.csv")

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 312 entries, 0 to 311
Data columns (total 9 columns):
UID         312 non-null object
PostCode    312 non-null object
NTM sp      236 non-null object
PSE_N       312 non-null int64
PSE_Y       312 non-null int64
NTM_N       312 non-null int64
NTM_Y       312 non-null int64
long        312 non-null float64
lat         312 non-null float64
dtypes: float64(2), int64(4), object(3)
memory usage: 18.3+ KB


In [12]:
def color_col(row):
    """Color code each record according the bacterial isolate result"""
    if row.NTM_Y == 1:
        #1 == red
        return 1
    elif row.PSE_Y == 1:
        #2 == green
        return 2
    else:
        #0 == blue
        return 0

In [13]:
#Create new column corresponding to assigned color
data['color'] = data.apply(color_col, axis=1)

In [None]:
#Create Bokeh GMap object
map_options = GMapOptions(lat=51.3758, lng=-2.3599, map_type="roadmap", zoom=11)
plot = GMapPlot(x_range=Range1d(), y_range=Range1d(), map_options=map_options, sizing_mode='stretch_both')
plot.title.text = "Bronchiectasis Bath"

# For GMaps to function, Google requires you obtain and enable an API key:
#https://developers.google.com/maps/documentation/javascript/get-api-key
plot.api_key = "*********************************"

#Bind to pandas dataframe
source = ColumnDataSource(
    data=dict(
        index=data.UID.tolist(),
        lat=data.lat.tolist(),
        lon=data.long.tolist(),
        desc=data['NTM sp'].tolist(),
        color=data.color.tolist() #0:BLUE:OTHER, 1:RED:NTM, 2:GREEN:PSE 
    )
)

#Instantiate color mapper object
color_mapper = LinearColorMapper(['blue', 'red', 'green'])

#Define glyph objects
circle = Circle(x="lon", y="lat", size=6, fill_color={'field': 'color', 'transform': color_mapper}, fill_alpha=0.5, line_color=None)
plot.add_glyph(source, circle)

#Add plot tools and define tooltips for hover tool
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool(), HoverTool(tooltips = [
    ("Index", "@index"),
    ("Species", "@desc"),
]))
#Save plot to HTML file
output_file("interactive_plot.html")
show(plot)