In [1]:
%run ./resources/library.py

In [2]:
style_notebook()

Digital Case Study: Multidrug-Resistant Tuberculosis (MDR-TB) Outbreak - Revisiting the 2005 Outbreak Investigation in Thailand by John Oeltmann

## Note: Work in progress...

# Notebook 6, Part 2: Establishing Links Between Cases

We are half-way through the simulation [workflow](./resources/Work_flow_diagram2_08022017.pdf) described in Notebook 3. The red rectangle is highlighting our next step.


Now that we have successfully joined the case line listing data with the genotyping data and created the analytic dataset, we will create a map of the genotyped cases and symbolize these cases by drug-resistance type.  We receive an email that includes a MIRU-ID and classification of MIRU-ID by drug-resistance type. We need to create a new field in the analytic dataset for DRTYPE and code each line-listed case by this type. Please run the code below to classify the MIRU-IDs by DRTYPE:


In [3]:
import pandas as pd

pd.__version__

'0.25.3'

In [4]:
pd.set_option('display.max_rows', None)  
pd.set_option('display.max_columns', None)  
pd.set_option('max_colwidth', -1)  
pd.set_option('display.width', 1000)

In [5]:
import folium

folium.__version__

'0.10.0'

## Map 3 From Notebook 5...

In [6]:
import folium
import pandas as pd

from folium import plugins

df5 = pd.read_pickle('outputs/df5.pickle')

# create 3rd map of the case data 
#Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map3 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
#folium.TileLayer("openstreetmap").add_to(map3)

# we will add tile layer options
TileLayer1 = folium.TileLayer('openstreetmap')
TileLayer2 = folium.TileLayer('cartodbpositron')
TileLayer3 = folium.TileLayer('stamentoner')

TileLayer1.layer_name = 'Open Street Map'
TileLayer2.layer_name = 'CartoDB Positron'
TileLayer3.layer_name = 'Stamen Toner'

TileLayer1.add_to(map3)
TileLayer2.add_to(map3)
TileLayer3.add_to(map3)

fg = folium.FeatureGroup(name='All Markers')
map3.add_child(fg)

g1 = plugins.FeatureGroupSubGroup(fg, 'MDR-TB')
map3.add_child(g1)

g2 = plugins.FeatureGroupSubGroup(fg, 'Res to >1 drug')
map3.add_child(g2)

g3 = plugins.FeatureGroupSubGroup(fg, 'Pansusceptible')
map3.add_child(g3)

g4 = plugins.FeatureGroupSubGroup(fg, 'Unknown')
map3.add_child(g4)

# create marker color Python dictionary
marker_color = {1:'red', \
               2:'yellow', \
               3:'blue', \
               4:'green', \
               5:'green'}

# loop through dataframe records
for each in df5.iterrows():
    # load record values into marker variables
    point=each[1]['COORDS']
    caseno=each[1]['CaseNo']
    miru_id=each[1]['FAKEMIRUID']
    drtype=each[1]['DRTYPE']
    # set up popup display string
    popup_string = "<b>Case No.:</b> "+caseno+"<br/>"+ \
                   "<b>Location:</b> "+point+"<br/>"+ \
                   "<b>MIRU ID:</b> "+str(miru_id)+"<br/>"+ \
                   "<b>Drug Resistance:</b> "+drtype
    # construct Folium popup
    popup=folium.Popup(popup_string, max_width='200')
    
    # Add the right marker to map depending on MIRU ID
    if miru_id == 1:
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black', fill_opacity=1, weight=0.8,\
            number_of_sides=3, radius=10).add_to(g1)
    elif miru_id == 2:
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black',fill_opacity=1, weight=0.8,\
            number_of_sides=4, radius=8).add_to(g2)
    elif miru_id == 3:
        folium.CircleMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black',fill=True,fill_opacity=0.8, weight=0.7,\
            radius=7).add_to(g3)
    else:
        folium.CircleMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black',fill=True, fill_opacity=1,weight=0.7,\
            radius=7).add_to(g4)

# let's use the "Fullscreen" plugin
# add the button to the top right corner
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(map3)

# add the layer control
folium.LayerControl(collapsed=False).add_to(map3)

map3

## Step 10. Linking Cases


 The map of TB cases with MIRU-ID and Drug Resistance Type is helpful for the contact investigation team. They have conducted interviews of patients to try to establish potential epi-links between cases with similar genotypes and drug resistance patterns. The team has provided a [spreadsheet](./resources/Thailand_cases_exercise_4th_spreadsheet_07302017.xls) with unique patient IDs and epi-links to other patients.  We will read this spreadsheet as we have done with other source data. Please run the code below to read in the epi-link data.

In [7]:
# read the spreadsheet of epi-links and create a dataframe
# Import the excel file and call it xls_file4
xls_file4 = pd.ExcelFile('resources/Thailand_cases_exercise_4th_spreadsheet_07302017.xls')
xls_file4

<pandas.io.excel._base.ExcelFile at 0x7f2b45e11a90>

In [8]:
# View the excel file's sheet names
xls_file4.sheet_names

['TBDATA4', 'EPILINKDATA']

In [9]:
# Load the xls file's Sheet1 as a dataframe
df6 = xls_file4.parse('EPILINKDATA')

df6

Unnamed: 0,ORIGNO,OLAT,OLON,DESTNO,DLAT,DLON
0,TH-101579,14.702731,100.82844,TH-101823,14.699857,100.828917
1,TH-101579,14.702731,100.82844,TH-102637,14.696637,100.829703
2,TH-101823,14.699857,100.828917,TH-101579,14.702731,100.82844
3,TH-101823,14.699857,100.828917,TH-101783,14.696434,100.828634
4,TH-101823,14.699857,100.828917,TH-102445,14.696133,100.82895
5,TH-101823,14.699857,100.828917,TH-102460,14.700809,100.827556
6,TH-101783,14.696434,100.828634,TH-101823,14.699857,100.828917
7,TH-102445,14.696133,100.82895,TH-101823,14.699857,100.828917
8,TH-102460,14.700809,100.827556,TH-101823,14.699857,100.828917
9,TH-102637,14.696637,100.829703,TH-101579,14.702731,100.82844


In [10]:
df6.to_pickle("outputs/df6.pickle")

### Map 4: Visualizing Epi Links: Drawing PolyLines in Folium

As you may have guessed, `df6` looks like something we can use for a network graph. Recall that a network is made up of nodes (cases) and edges (links between cases).

To visualize these connections on a map we can use the `folium` `PolyLine` feature. A `folium` `PolyLine` is made up of multiple lines. A line is made up of at least two points.

In [11]:
# create 4th map of the case data 
# Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map4 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
folium.TileLayer("openstreetmap").add_to(map4)

# construct lines for epi links
# create an empty list for polyline
lines=[]
# go through df6 records
for each in df6.iterrows():
    # store record values into variables
    # start point
    origin = [each[1]['OLAT'],each[1]['OLON']]
    # end point
    destination = [each[1]['DLAT'],each[1]['DLON']]
    # define line
    line = [origin, destination]
    # add this line to polyline, lines
    lines.append(line)

# create Folium PolyLine and add to map
folium.PolyLine(lines, color='black',weight=2,line_opacity=0.8).add_to(map4)

map4

If you should see multiple black lines representing the `folium` `PolyLine` object on the map, then congratulations! You have demonstrated the first component of the network visualization, the edges, on a `folium` map. 

To establish a better picture, let's add both the PolyLine and markers on the map in `map5`.

### Map 5: Nodes and Edges

In [12]:
# create 5th map of the case data 
# Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map5 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
folium.TileLayer("openstreetmap").add_to(map5)

# construct polyline for epi links
lines=[]
for each in df6.iterrows():
    origin = [each[1]['OLAT'],each[1]['OLON']]
    destination = [each[1]['DLAT'],each[1]['DLON']]
    line = [origin, destination]
    lines.append(line)

# create Folium PolyLine and add to map
folium.PolyLine(lines, color='black',weight=2,line_opacity=0.8).add_to(map5)

df5 = pd.read_pickle('outputs/df5.pickle')
# loop through dataframe records
for each in df5.iterrows():
    # load record values into marker variables
    point=each[1]['COORDS']
    caseno=each[1]['CaseNo']
    miru_id=each[1]['FAKEMIRUID']
    drtype=each[1]['DRTYPE']
    # set up popup display string
    popup_string = "<b>Case No.:</b> "+caseno+"<br/>"+ \
                   "<b>Location:</b> "+point+"<br/>"+ \
                   "<b>MIRU ID:</b> "+str(miru_id)+"<br/>"+ \
                   "<b>Drug Resistance:</b> "+drtype
    # construct Folium popup
    popup=folium.Popup(popup_string, max_width='200')
    
    # Add the right marker to map depending on MIRU ID
    if miru_id == 1:
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black', fill_opacity=1, weight=0.8,\
            number_of_sides=3, radius=10).add_to(map4)
map4

### Reviewing "network analysis" data (data on epi links)

We now have a map of the TB cases by MIRU-ID, Drug Resistant Type, and have displayed known epi links between cases. There are several cases that have the same MIRU-ID and Drug Resistance type, and are located close together, and yet were not established as epi-linked cases. The TB Contact investigation team has asked if the EOC geographers can help to find suspect cases such as these and provide a line listing for a 2nd round of contact interviews.

There are 5 cases with no apparent links to the other cases. Let's use a network graph analysis approach.

In [13]:
# select the cases with the same genotype and miruid that are proximate 
#   but not epi-linked
# read the spreadsheet of epi-links and create a dataframe
# Import the excel file and call it xls_file4
xls_file4 = \
 pd.ExcelFile('resources/Thailand_cases_exercise_4th_spreadsheet_07302017.xls')

xls_file4

<pandas.io.excel._base.ExcelFile at 0x7f2b44661588>

In [14]:
# View the excel file's sheet names
xls_file4.sheet_names

['TBDATA4', 'EPILINKDATA']

In [15]:
# Load the xls file's Sheet1 as a dataframe
df7 = xls_file4.parse('TBDATA4')

df7.head(20)

Unnamed: 0,CaseNo,LON,LAT,COORDS,FAKEMIRUVNTR,FAKEMIRUID,SYMBOL,EPILINKID1,EPILINKID2,EPILINKID3,EPILINKID4
0,TH-101579,100.82844,14.702731,"14.702731,100.82844",012345678901234567890123,1,Red Triangle,TH-101823,TH-102637,NO EPI LINKS,NO EPI LINKS
1,TH-101823,100.828917,14.699857,"14.699857,100.828917",012345678901234567890123,1,Red Triangle,TH-101579,TH101783,TH-102445,TH-102460
2,TH-101783,100.828634,14.696434,"14.696434,100.828634",012345678901234567890123,1,Red Triangle,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
3,TH-102445,100.82895,14.696133,"14.696133,100.82895",012345678901234567890123,1,Red Triangle,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
4,TH-102460,100.827556,14.700809,"14.700809,100.827556",012345678901234567890123,1,Red Triangle,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
5,TH-102637,100.829703,14.696637,"14.696637,100.829703",012345678901234567890123,1,Red Triangle,TH-101579,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
6,TH-101651,100.829259,14.700701,"14.700701,100.829259",012345678901234567890123,1,Red Triangle,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
7,TH-101931,100.829533,14.700087,"14.700087,100.829533",012345678901234567890123,1,Red Triangle,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
8,TH-103347,100.830247,14.697342,"14.697342,100.830247",012345678901234567890123,1,Red Triangle,TH-103009,TH-102909,TH-103773,NO EPI LINKS
9,TH-103009,100.831095,14.699261,"14.699261,100.831095",012345678901234567890123,1,Red Triangle,TH-103347,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS


### Summarizing the data on epi links

Let's zero in on the columns where cases are identified.

In [16]:
df7.iloc[:, [0,7,8,9,10]]

Unnamed: 0,CaseNo,EPILINKID1,EPILINKID2,EPILINKID3,EPILINKID4
0,TH-101579,TH-101823,TH-102637,NO EPI LINKS,NO EPI LINKS
1,TH-101823,TH-101579,TH101783,TH-102445,TH-102460
2,TH-101783,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
3,TH-102445,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
4,TH-102460,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
5,TH-102637,TH-101579,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
6,TH-101651,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
7,TH-101931,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
8,TH-103347,TH-103009,TH-102909,TH-103773,NO EPI LINKS
9,TH-103009,TH-103347,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS


Note on visual inspection that there are two potential errors:
1. One case under `EPILINKID2` with no dash: `TH101783`. This can potentially create unique identification problem of cases during computation. Let's fix that with the code below and record the  changes as `df8`.
2. One case under `EPILINKID1` where the Case Number is 10 digits instead of 9 digits. Assuming we checked the original data that this was mistyped (`TH-104090` as `TH-1014090`), we can correct it also with the code below. 

In [17]:
df8 = df7.iloc[:, [0,7,8,9,10]]
df8.at[1, 'EPILINKID2'] = 'TH-101783'
df8.at[15, 'EPILINKID1'] = 'TH-104090'

df8

Unnamed: 0,CaseNo,EPILINKID1,EPILINKID2,EPILINKID3,EPILINKID4
0,TH-101579,TH-101823,TH-102637,NO EPI LINKS,NO EPI LINKS
1,TH-101823,TH-101579,TH-101783,TH-102445,TH-102460
2,TH-101783,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
3,TH-102445,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
4,TH-102460,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
5,TH-102637,TH-101579,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
6,TH-101651,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
7,TH-101931,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
8,TH-103347,TH-103009,TH-102909,TH-103773,NO EPI LINKS
9,TH-103009,TH-103347,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS


### Preparing for network analysis: Creating nodes and edges

Let's create unique epi link pairs from `df8`, the summarized epi links data set - imagine these as points 1 and 2 of edges between nodes in a network graph.

In [18]:
epi_link_pairs = []
unlinked_nodes = []

for each in df8.iterrows():
    node1 = each[1]['CaseNo']
    node2 = each[1]['EPILINKID1']
    node3 = each[1]['EPILINKID2']
    node4 = each[1]['EPILINKID3']
    node5 = each[1]['EPILINKID4']
    # rows with unlinked cases all have "NO EPI LINKS" entries
    if all(string == 'NO EPI LINKS' for string in [node2,node3,node4,node5]):
        unlinked_nodes.append(node1)
    if node2 != 'NO EPI LINKS':
        epi_link_pairs.append([node1,node2])
    if node3 != 'NO EPI LINKS':
        epi_link_pairs.append([node1,node3])
    if node4 != 'NO EPI LINKS':
        epi_link_pairs.append([node1,node4])
    if node5 != 'NO EPI LINKS':
        epi_link_pairs.append([node1,node5])


The code above creates two sets of data:

1. Epi links represented as pairs (edges): `epi_link_pairs`
2. Potential cases (nodes) with no links: `unlinked_nodes`

In [19]:
epi_link_pairs

[['TH-101579', 'TH-101823'],
 ['TH-101579', 'TH-102637'],
 ['TH-101823', 'TH-101579'],
 ['TH-101823', 'TH-101783'],
 ['TH-101823', 'TH-102445'],
 ['TH-101823', 'TH-102460'],
 ['TH-101783', 'TH-101823'],
 ['TH-102445', 'TH-101823'],
 ['TH-102460', 'TH-101823'],
 ['TH-102637', 'TH-101579'],
 ['TH-103347', 'TH-103009'],
 ['TH-103347', 'TH-102909'],
 ['TH-103347', 'TH-103773'],
 ['TH-103009', 'TH-103347'],
 ['TH-102909', 'TH-103347'],
 ['TH-103773', 'TH-103347'],
 ['TH-104090', 'TH-103927'],
 ['TH-103927', 'TH-104090']]

In [20]:
unlinked_nodes

['TH-101651', 'TH-101931', 'TH-103679', 'TH-104039']

### Deduplicating edges

if you examine the `epi_linked_pairs` list, it contain duplicate pairs of points 1 and 2 (disregarding order in the pairs). The code below will remove the duplicate pairs.

The code `[ tuple(sorted(i)) for i in epi_link_pairs ]` represents list comprehension - a "Pythonic" way of handling data structures concisely.

In [21]:
no_duplicates = set( [ tuple(sorted(i)) for i in epi_link_pairs ])

no_duplicates

{('TH-101579', 'TH-101823'),
 ('TH-101579', 'TH-102637'),
 ('TH-101783', 'TH-101823'),
 ('TH-101823', 'TH-102445'),
 ('TH-101823', 'TH-102460'),
 ('TH-102909', 'TH-103347'),
 ('TH-103009', 'TH-103347'),
 ('TH-103347', 'TH-103773'),
 ('TH-103927', 'TH-104090')}

Let's create a new dataframe, `edges_df1`, out of the `no_duplicates` list.

In [22]:
edges_df1 = pd.DataFrame(no_duplicates)

edges_df1.rename(columns={0:'CaseNo1',1:'CaseNo2'}, inplace=True)

edges_df1

Unnamed: 0,CaseNo1,CaseNo2
0,TH-101783,TH-101823
1,TH-101579,TH-102637
2,TH-101579,TH-101823
3,TH-103927,TH-104090
4,TH-101823,TH-102445
5,TH-102909,TH-103347
6,TH-103347,TH-103773
7,TH-103009,TH-103347
8,TH-101823,TH-102460


Out of the nodes in an edge list, we can create our inventory of nodes, a dataframe called `nodes_df1`.

In [23]:
nodes_df1 = pd.DataFrame(pd.melt(edges_df1).value.unique())

nodes_df1.rename(columns={0:'CaseNo'}, inplace=True)

nodes_df1

Unnamed: 0,CaseNo
0,TH-101783
1,TH-101579
2,TH-103927
3,TH-101823
4,TH-102909
5,TH-103347
6,TH-103009
7,TH-102637
8,TH-104090
9,TH-102445


If you remember from Notebook 5, `df5` is the dataframe that holds all the data for cases, including drug resistance and genotyping data. We can compare rows from `df5` filtered by `FAKEMIRUID==1` to `nodes_df1`. Assuming `nodes_df1` is an inventory of nodes with links, the **difference** between the two will be the nodes found in the list `unlinked_nodes`.

In [24]:
df5.query("FAKEMIRUID==1").CaseNo.unique()

array(['TH-101579', 'TH-101823', 'TH-101783', 'TH-102445', 'TH-102460',
       'TH-102637', 'TH-101651', 'TH-101931', 'TH-103347', 'TH-103009',
       'TH-102909', 'TH-103773', 'TH-103679', 'TH-104090', 'TH-104039',
       'TH-103927'], dtype=object)

To compare the two sets of nodes we use the `numpy` dot function called `setdiff1d()`.

In [25]:
import numpy as np

nonmatch = \
    np.setdiff1d(df5.query("FAKEMIRUID==1").CaseNo.unique(), nodes_df1.CaseNo.unique())

nonmatch

array(['TH-101651', 'TH-101931', 'TH-103679', 'TH-104039'], dtype=object)

In [26]:
unlinked_nodes

['TH-101651', 'TH-101931', 'TH-103679', 'TH-104039']

They have the same cases.

To add more information to the nodes in `nodes_df1` we will do a pandas `merge` of `nodes_df1`and `df5`, linked by `CaseNo`. It will be a SQL inner join. 

In [27]:
nodes_df2 = pd.merge(nodes_df1, df5, on='CaseNo', how='inner')

nodes_df2

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL
0,TH-101783,012345678901234567890123,1,MDR-TB,100.828634,14.696434,"14.696434,100.828634",Red Triangle
1,TH-101579,012345678901234567890123,1,MDR-TB,100.82844,14.702731,"14.702731,100.82844",Red Triangle
2,TH-103927,012345678901234567890123,1,MDR-TB,100.829917,14.700072,"14.700072,100.829917",Red Triangle
3,TH-101823,012345678901234567890123,1,MDR-TB,100.828917,14.699857,"14.699857,100.828917",Red Triangle
4,TH-102909,012345678901234567890123,1,MDR-TB,100.830151,14.69898,"14.69898,100.830151",Red Triangle
5,TH-103347,012345678901234567890123,1,MDR-TB,100.830247,14.697342,"14.697342,100.830247",Red Triangle
6,TH-103009,012345678901234567890123,1,MDR-TB,100.831095,14.699261,"14.699261,100.831095",Red Triangle
7,TH-102637,012345678901234567890123,1,MDR-TB,100.829703,14.696637,"14.696637,100.829703",Red Triangle
8,TH-104090,012345678901234567890123,1,MDR-TB,100.830587,14.698245,"14.698245,100.830587",Red Triangle
9,TH-102445,012345678901234567890123,1,MDR-TB,100.82895,14.696133,"14.696133,100.82895",Red Triangle


Similarly to add more information to the edge nodes, we will do a two-step `pandas` `merge` between:
1. `edges_df1` and `df5[['CaseNo','LAT','LON']]` producing `edges_df2`
2. `edges_df2` and `df5[['CaseNo','LAT','LON']]` producing `edges_df3`

In [28]:
edges_df2 = pd.merge(edges_df1, df5[['CaseNo','LAT','LON']], \
                     left_on='CaseNo1', right_on='CaseNo', how='inner')

edges_df2.rename(columns={'LAT':'LAT1', 'LON':'LON1'}, inplace=True)

edges_df2.drop(columns=['CaseNo'], inplace=True)

edges_df3 = pd.merge(edges_df2, df5[['CaseNo','LAT','LON']], \
                    left_on='CaseNo2', right_on='CaseNo', how='inner')

edges_df3.rename(columns={'LAT':'LAT2', 'LON':'LON2'}, inplace=True)

edges_df3.drop(columns=['CaseNo'], inplace=True)

edges_df3

Unnamed: 0,CaseNo1,CaseNo2,LAT1,LON1,LAT2,LON2
0,TH-101783,TH-101823,14.696434,100.828634,14.699857,100.828917
1,TH-101579,TH-101823,14.702731,100.82844,14.699857,100.828917
2,TH-101579,TH-102637,14.702731,100.82844,14.696637,100.829703
3,TH-103927,TH-104090,14.700072,100.829917,14.698245,100.830587
4,TH-101823,TH-102445,14.699857,100.828917,14.696133,100.82895
5,TH-101823,TH-102460,14.699857,100.828917,14.700809,100.827556
6,TH-102909,TH-103347,14.69898,100.830151,14.697342,100.830247
7,TH-103009,TH-103347,14.699261,100.831095,14.697342,100.830247
8,TH-103347,TH-103773,14.697342,100.830247,14.700229,100.82848


In [29]:
suspected_links_df1 = pd.DataFrame(unlinked_nodes)

suspected_links_df1.rename(columns={0:'CaseNo'}, inplace=True)

suspected_links_df1

Unnamed: 0,CaseNo
0,TH-101651
1,TH-101931
2,TH-103679
3,TH-104039


`suspected_links_df1` should look the same as `nonmatch` converted to a daframe.

In [30]:
pd.DataFrame(nonmatch).rename(columns={0:'CaseNo'})

Unnamed: 0,CaseNo
0,TH-101651
1,TH-101931
2,TH-103679
3,TH-104039


To add more information to `suspected_links_df1` we merge it with `df5` on the `CaseNo` column.

In [31]:
suspected_links_df2 = pd.merge(suspected_links_df1, df5, \
                              on='CaseNo', how='inner')

suspected_links_df2

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL
0,TH-101651,012345678901234567890123,1,MDR-TB,100.829259,14.700701,"14.700701,100.829259",Red Triangle
1,TH-101931,012345678901234567890123,1,MDR-TB,100.829533,14.700087,"14.700087,100.829533",Red Triangle
2,TH-103679,012345678901234567890123,1,MDR-TB,100.829528,14.702659,"14.702659,100.829528",Red Triangle
3,TH-104039,012345678901234567890123,1,MDR-TB,100.829623,14.697903,"14.697903,100.829623",Red Triangle


It's always good to review the result of the code by comparing pairs from `edges_df1` with that of `df8`. You should be able to find all of the linked cases in `df8` in `edges_df1` and the unlinked ones from `suspected_links_df1`.

In [32]:
df8

Unnamed: 0,CaseNo,EPILINKID1,EPILINKID2,EPILINKID3,EPILINKID4
0,TH-101579,TH-101823,TH-102637,NO EPI LINKS,NO EPI LINKS
1,TH-101823,TH-101579,TH-101783,TH-102445,TH-102460
2,TH-101783,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
3,TH-102445,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
4,TH-102460,TH-101823,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
5,TH-102637,TH-101579,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
6,TH-101651,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
7,TH-101931,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS
8,TH-103347,TH-103009,TH-102909,TH-103773,NO EPI LINKS
9,TH-103009,TH-103347,NO EPI LINKS,NO EPI LINKS,NO EPI LINKS


In [33]:
edges_df1

Unnamed: 0,CaseNo1,CaseNo2
0,TH-101783,TH-101823
1,TH-101579,TH-102637
2,TH-101579,TH-101823
3,TH-103927,TH-104090
4,TH-101823,TH-102445
5,TH-102909,TH-103347
6,TH-103347,TH-103773
7,TH-103009,TH-103347
8,TH-101823,TH-102460


In [34]:
suspected_links_df1

Unnamed: 0,CaseNo
0,TH-101651
1,TH-101931
2,TH-103679
3,TH-104039


In [35]:
# export the suspected epi-linked cases to a comma-delimited text file (.csv)
df8.to_csv('outputs/suspected_epilink_cases.csv')

In [36]:
df8.to_pickle('outputs/df8.pickle')
nodes_df2.to_pickle('outputs/nodes_df2.pickle')
edges_df3.to_pickle('outputs/edges_df3.pickle')
suspected_links_df2.to_pickle('outputs/suspected_links_df2.pickle')

### Map 6: Mapping after review using data from network analysis approach

In [37]:
# create 5th map of the case data 
# Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map6 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
folium.TileLayer("openstreetmap").add_to(map6)

# construct polyline for epi links
lines=[]
for each in edges_df3.iterrows():
    origin = [each[1]['LAT1'],each[1]['LON1']]
    destination = [each[1]['LAT2'],each[1]['LON2']]
    line = [origin, destination]
    lines.append(line)

# create Folium PolyLine and add to map
folium.PolyLine(lines, color='black',weight=2,line_opacity=0.8).add_to(map6)

nodes_df2 = pd.read_pickle('outputs/nodes_df2.pickle')
suspected_links_df2 = pd.read_pickle('outputs/suspected_links_df2.pickle')

# loop through dataframe records
for each in pd.concat([nodes_df2, suspected_links_df2]).iterrows():
    # load record values into marker variables
    point=each[1]['COORDS']
    caseno=each[1]['CaseNo']
    miru_id=each[1]['FAKEMIRUID']
    drtype=each[1]['DRTYPE']
    # set up popup display string
    popup_string = "<b>Case No.:</b> "+caseno+"<br/>"+ \
                   "<b>Location:</b> "+point+"<br/>"+ \
                   "<b>MIRU ID:</b> "+str(miru_id)+"<br/>"+ \
                   "<b>Drug Resistance:</b> "+drtype
    # construct Folium popup
    popup=folium.Popup(popup_string, max_width='200')
    
    # Add the right marker to map depending on MIRU ID
    if miru_id == 1:
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black', fill_opacity=1, weight=0.8,\
            number_of_sides=3, radius=10).add_to(map6)
map6

These are the two network maps side by side. Do you notive any missing links?
<img src="images/side-by-side.png" alt="Network maps 1 and 2" width="550px" style="float:left;">

### Second Interview

The contact investigation team has conducted secondary interviews and they have found additional epi-links. They have provided a new spreadsheet with the additional epi-linked cases. We will read this spreadsheet as a new data frame. Please run the code below.

In [38]:
# read the spreadsheet of new epi-linked cases and create a dataframe
# create a polyline shapefile of the new epi-linked cases
# read the spreadsheet of epi-links and create a dataframe
# Import the excel file and call it xls_file4
xls_file5 = \
    pd.ExcelFile('resources/Thailand_cases_exercise_5th_spreadsheet_07312017.xls')

xls_file5

<pandas.io.excel._base.ExcelFile at 0x7f2b44574940>

In [39]:
# View the excel file's sheet names
xls_file5.sheet_names

['TBDATA5', 'NEWEPILINKDATA']

In [40]:
# Load the xls file's Sheet1 as a dataframe
df9 = xls_file5.parse('NEWEPILINKDATA')

df9

Unnamed: 0,ORIGNO,OLAT,OLON,DESTNO,DLAT,DLON
0,TH-101579,14.702731,100.82844,TH-103927,14.700072,100.829917
1,TH-101579,14.702731,100.82844,TH-104039,14.697903,100.829623
2,TH-104039,14.697903,100.829623,TH-101579,14.702731,100.82844
3,TH-103927,14.700072,100.829917,TH-104090,14.698245,100.830587
4,TH-103927,14.700072,100.829917,TH-101579,14.702731,100.82844


In [41]:
suspected_links_df1

Unnamed: 0,CaseNo
0,TH-101651
1,TH-101931
2,TH-103679
3,TH-104039


In [42]:
df6

Unnamed: 0,ORIGNO,OLAT,OLON,DESTNO,DLAT,DLON
0,TH-101579,14.702731,100.82844,TH-101823,14.699857,100.828917
1,TH-101579,14.702731,100.82844,TH-102637,14.696637,100.829703
2,TH-101823,14.699857,100.828917,TH-101579,14.702731,100.82844
3,TH-101823,14.699857,100.828917,TH-101783,14.696434,100.828634
4,TH-101823,14.699857,100.828917,TH-102445,14.696133,100.82895
5,TH-101823,14.699857,100.828917,TH-102460,14.700809,100.827556
6,TH-101783,14.696434,100.828634,TH-101823,14.699857,100.828917
7,TH-102445,14.696133,100.82895,TH-101823,14.699857,100.828917
8,TH-102460,14.700809,100.827556,TH-101823,14.699857,100.828917
9,TH-102637,14.696637,100.829703,TH-101579,14.702731,100.82844


In [43]:
df6['batch'] = 1
df9['batch'] = 2

In [44]:
df6.to_pickle('outputs/df6.pickle')
df9.to_pickle("outputs/df9.pickle")

In [45]:
# create 4th map of the case data 
# Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map6 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
folium.TileLayer("openstreetmap").add_to(map6)

# construct lines for epi links
lines=[]
for each in df6.iterrows():
    origin = [each[1]['OLAT'],each[1]['OLON']]
    destination = [each[1]['DLAT'],each[1]['DLON']]
    line = [origin, destination]
    lines.append(line)

lines_new=[]
for each in df9.iterrows():
    origin = [each[1]['OLAT'],each[1]['OLON']]
    destination = [each[1]['DLAT'],each[1]['DLON']]
    line = [origin, destination]
    lines_new.append(line)

# create Folium PolyLine and add to map
folium.PolyLine(lines, color='black',weight=2,line_opacity=0.8).add_to(map6)
folium.PolyLine(lines_new, color='red',weight=2,line_opacity=0.8).add_to(map6)

df5 = pd.read_pickle('outputs/df5.pickle')
# loop through dataframe records
for each in df5.iterrows():
    # load record values into marker variables
    point=each[1]['COORDS']
    caseno=each[1]['CaseNo']
    miru_id=each[1]['FAKEMIRUID']
    drtype=each[1]['DRTYPE']
    # set up popup display string
    popup_string = "<b>Case No.:</b> "+caseno+"<br/>"+ \
                   "<b>Location:</b> "+point+"<br/>"+ \
                   "<b>MIRU ID:</b> "+str(miru_id)+"<br/>"+ \
                   "<b>Drug Resistance:</b> "+drtype
    # construct Folium popup
    popup=folium.Popup(popup_string, max_width='200')
    
    # Add the right marker to map depending on MIRU ID
    if miru_id == 1:
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black', fill_opacity=1, weight=0.8,\
            number_of_sides=3, radius=10).add_to(map6)
map6

### Merge `df6` and `df9`

In [46]:
df6.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 7 columns):
ORIGNO    16 non-null object
OLAT      16 non-null float64
OLON      16 non-null float64
DESTNO    16 non-null object
DLAT      16 non-null float64
DLON      16 non-null float64
batch     16 non-null int64
dtypes: float64(4), int64(1), object(2)
memory usage: 1.0+ KB


In [47]:
df9.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
ORIGNO    5 non-null object
OLAT      5 non-null float64
OLON      5 non-null float64
DESTNO    5 non-null object
DLAT      5 non-null float64
DLON      5 non-null float64
batch     5 non-null int64
dtypes: float64(4), int64(1), object(2)
memory usage: 408.0+ bytes


Having the same columns, we will just concatenate these.

In [48]:
df10 = pd.concat([df6, df9])
df10.reset_index(inplace=True)
df10.drop(columns=['index'],inplace=True)

df10

Unnamed: 0,ORIGNO,OLAT,OLON,DESTNO,DLAT,DLON,batch
0,TH-101579,14.702731,100.82844,TH-101823,14.699857,100.828917,1
1,TH-101579,14.702731,100.82844,TH-102637,14.696637,100.829703,1
2,TH-101823,14.699857,100.828917,TH-101579,14.702731,100.82844,1
3,TH-101823,14.699857,100.828917,TH-101783,14.696434,100.828634,1
4,TH-101823,14.699857,100.828917,TH-102445,14.696133,100.82895,1
5,TH-101823,14.699857,100.828917,TH-102460,14.700809,100.827556,1
6,TH-101783,14.696434,100.828634,TH-101823,14.699857,100.828917,1
7,TH-102445,14.696133,100.82895,TH-101823,14.699857,100.828917,1
8,TH-102460,14.700809,100.827556,TH-101823,14.699857,100.828917,1
9,TH-102637,14.696637,100.829703,TH-101579,14.702731,100.82844,1


In [49]:
df10.to_pickle('outputs/df10.pickle')

### Prepare an additional `popup` item: number of links per case

In [50]:
subset = df10[['ORIGNO','DESTNO','batch']]
subset['batch'] = subset['batch'].astype(str)
#edge_tuples = [tuple(x) for x in subset.values]
subset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 3 columns):
ORIGNO    21 non-null object
DESTNO    21 non-null object
batch     21 non-null object
dtypes: object(3)
memory usage: 632.0+ bytes


In [51]:
subset.values

array([['TH-101579', 'TH-101823', '1'],
       ['TH-101579', 'TH-102637', '1'],
       ['TH-101823', 'TH-101579', '1'],
       ['TH-101823', 'TH-101783', '1'],
       ['TH-101823', 'TH-102445', '1'],
       ['TH-101823', 'TH-102460', '1'],
       ['TH-101783', 'TH-101823', '1'],
       ['TH-102445', 'TH-101823', '1'],
       ['TH-102460', 'TH-101823', '1'],
       ['TH-102637', 'TH-101579', '1'],
       ['TH-103347', 'TH-103009', '1'],
       ['TH-103347', 'TH-102909', '1'],
       ['TH-103347', 'TH-103773', '1'],
       ['TH-103009', 'TH-103347', '1'],
       ['TH-102909', 'TH-103347', '1'],
       ['TH-103773', 'TH-103347', '1'],
       ['TH-101579', 'TH-103927', '2'],
       ['TH-101579', 'TH-104039', '2'],
       ['TH-104039', 'TH-101579', '2'],
       ['TH-103927', 'TH-104090', '2'],
       ['TH-103927', 'TH-101579', '2']], dtype=object)

In [52]:
no_duplicates2 = set( [ tuple(sorted(i)) for i in subset.values ])

no_duplicates2

{('1', 'TH-101579', 'TH-101823'),
 ('1', 'TH-101579', 'TH-102637'),
 ('1', 'TH-101783', 'TH-101823'),
 ('1', 'TH-101823', 'TH-102445'),
 ('1', 'TH-101823', 'TH-102460'),
 ('1', 'TH-102909', 'TH-103347'),
 ('1', 'TH-103009', 'TH-103347'),
 ('1', 'TH-103347', 'TH-103773'),
 ('2', 'TH-101579', 'TH-103927'),
 ('2', 'TH-101579', 'TH-104039'),
 ('2', 'TH-103927', 'TH-104090')}

In [53]:
edges_df4 = pd.DataFrame(no_duplicates2)

edges_df4.rename(columns={0:'batch', 1:'CaseNo1',2:'CaseNo2'}, inplace=True)

edges_df4

Unnamed: 0,batch,CaseNo1,CaseNo2
0,1,TH-101579,TH-102637
1,2,TH-103927,TH-104090
2,1,TH-102909,TH-103347
3,1,TH-101783,TH-101823
4,1,TH-101579,TH-101823
5,2,TH-101579,TH-104039
6,1,TH-101823,TH-102445
7,1,TH-101823,TH-102460
8,2,TH-101579,TH-103927
9,1,TH-103347,TH-103773


In [54]:
edges_df5 = pd.merge(edges_df4, df5[['CaseNo','LAT','LON']], \
                     left_on='CaseNo1', right_on='CaseNo', how='inner')

edges_df5.rename(columns={'LAT':'LAT1', 'LON':'LON1'}, inplace=True)

edges_df5.drop(columns=['CaseNo'], inplace=True)

edges_df6 = pd.merge(edges_df5, df5[['CaseNo','LAT','LON']], \
                    left_on='CaseNo2', right_on='CaseNo', how='inner')

edges_df6.rename(columns={'LAT':'LAT2', 'LON':'LON2'}, inplace=True)

edges_df6.drop(columns=['CaseNo'], inplace=True)

edges_df6

Unnamed: 0,batch,CaseNo1,CaseNo2,LAT1,LON1,LAT2,LON2
0,1,TH-101579,TH-102637,14.702731,100.82844,14.696637,100.829703
1,1,TH-101579,TH-101823,14.702731,100.82844,14.699857,100.828917
2,1,TH-101783,TH-101823,14.696434,100.828634,14.699857,100.828917
3,2,TH-101579,TH-104039,14.702731,100.82844,14.697903,100.829623
4,2,TH-101579,TH-103927,14.702731,100.82844,14.700072,100.829917
5,2,TH-103927,TH-104090,14.700072,100.829917,14.698245,100.830587
6,1,TH-102909,TH-103347,14.69898,100.830151,14.697342,100.830247
7,1,TH-103009,TH-103347,14.699261,100.831095,14.697342,100.830247
8,1,TH-101823,TH-102445,14.699857,100.828917,14.696133,100.82895
9,1,TH-101823,TH-102460,14.699857,100.828917,14.700809,100.827556


In [55]:
pairs = pd.Series(pd.concat([edges_df4['CaseNo1'],edges_df4['CaseNo2']]).tolist())

pairs

0     TH-101579
1     TH-103927
2     TH-102909
3     TH-101783
4     TH-101579
5     TH-101579
6     TH-101823
7     TH-101823
8     TH-101579
9     TH-103347
10    TH-103009
11    TH-102637
12    TH-104090
13    TH-103347
14    TH-101823
15    TH-101823
16    TH-104039
17    TH-102445
18    TH-102460
19    TH-103927
20    TH-103773
21    TH-103347
dtype: object

In [56]:
nodes_df3 = pd.DataFrame(pd.melt(edges_df4[['CaseNo1','CaseNo2']]).value.unique())
nodes_df3.rename(columns={0:'CaseNo'}, inplace=True)

nodes_df3

Unnamed: 0,CaseNo
0,TH-101579
1,TH-103927
2,TH-102909
3,TH-101783
4,TH-101823
5,TH-103347
6,TH-103009
7,TH-102637
8,TH-104090
9,TH-104039


In [57]:
[ i  for i in pairs ].count('TH-101579')

4

In [58]:
link_count = []
for node in nodes_df3.iterrows():
    count = [ i  for i in pairs ].count(node[1]['CaseNo'])
    link_count.append([node[1]['CaseNo'], count])

link_count_df1 = pd.DataFrame(link_count).rename(columns={0:'CaseNo',1:'links'})

link_count_df1

Unnamed: 0,CaseNo,links
0,TH-101579,4
1,TH-103927,2
2,TH-102909,1
3,TH-101783,1
4,TH-101823,4
5,TH-103347,3
6,TH-103009,1
7,TH-102637,1
8,TH-104090,1
9,TH-104039,1


In [59]:
nodes_df4 = pd.merge(nodes_df3, link_count_df1, on='CaseNo', how='inner')

nodes_df4

Unnamed: 0,CaseNo,links
0,TH-101579,4
1,TH-103927,2
2,TH-102909,1
3,TH-101783,1
4,TH-101823,4
5,TH-103347,3
6,TH-103009,1
7,TH-102637,1
8,TH-104090,1
9,TH-104039,1


In [60]:
nodes_df5 = pd.merge(nodes_df4, df5, on='CaseNo', how='inner')

nodes_df5

Unnamed: 0,CaseNo,links,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL
0,TH-101579,4,012345678901234567890123,1,MDR-TB,100.82844,14.702731,"14.702731,100.82844",Red Triangle
1,TH-103927,2,012345678901234567890123,1,MDR-TB,100.829917,14.700072,"14.700072,100.829917",Red Triangle
2,TH-102909,1,012345678901234567890123,1,MDR-TB,100.830151,14.69898,"14.69898,100.830151",Red Triangle
3,TH-101783,1,012345678901234567890123,1,MDR-TB,100.828634,14.696434,"14.696434,100.828634",Red Triangle
4,TH-101823,4,012345678901234567890123,1,MDR-TB,100.828917,14.699857,"14.699857,100.828917",Red Triangle
5,TH-103347,3,012345678901234567890123,1,MDR-TB,100.830247,14.697342,"14.697342,100.830247",Red Triangle
6,TH-103009,1,012345678901234567890123,1,MDR-TB,100.831095,14.699261,"14.699261,100.831095",Red Triangle
7,TH-102637,1,012345678901234567890123,1,MDR-TB,100.829703,14.696637,"14.696637,100.829703",Red Triangle
8,TH-104090,1,012345678901234567890123,1,MDR-TB,100.830587,14.698245,"14.698245,100.830587",Red Triangle
9,TH-104039,1,012345678901234567890123,1,MDR-TB,100.829623,14.697903,"14.697903,100.829623",Red Triangle


In [61]:
link_count_df1.to_pickle('outputs/link_count_df1.pickle')
nodes_df5.to_pickle('outputs/nodes_df5.pickle')
edges_df6.to_pickle('outputs/edges_df6.pickle')

### Merge `df5` and `link_count_df1`

Let's do a leftside merge between `df5` and `link_count_df1`.

In [62]:
df11 = pd.merge(df5, link_count_df1, on='CaseNo', how='left')

df11.head(10)

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL,links
0,TH-102678,012345678901234567893120,3,PANSUSCEPTIBLE,100.828607,14.704461,"14.704461,100.828607",Blue Circle,
1,TH-101007,012345678901234567894320,4,UNKNOWN,100.829347,14.702266,"14.702266,100.829347",Green Circle,
2,TH-101290,012345678901234567894320,4,UNKNOWN,100.825159,14.699828,"14.699828,100.825159",Green Circle,
3,TH-101067,012345678901234567894320,4,UNKNOWN,100.824887,14.700197,"14.700197,100.824887",Green Circle,
4,TH-101184,012345678901234567890423,5,UNKNOWN,100.829032,14.697482,"14.697482,100.829032",Green Circle,
5,TH-100913,012345678901234567893120,3,PANSUSCEPTIBLE,100.829261,14.702418,"14.702418,100.829261",Green Circle,
6,TH-101176,012345678901234567894320,4,UNKNOWN,100.829228,14.702959,"14.702959,100.829228",Green Circle,
7,TH-101497,012345678901234567894320,4,UNKNOWN,100.82914,14.702244,"14.702244,100.82914",Green Circle,
8,TH-101280,012345678901234567894320,4,UNKNOWN,100.829344,14.702908,"14.702908,100.829344",Blue Circle,
9,TH-101055,012345678901234567894320,4,UNKNOWN,100.829494,14.701485,"14.701485,100.829494",Green Circle,


It is expected that we would end up with `NaN` (not a number) values under `link_count` in some records so let's just fill them with zeros.

In [63]:
df11.fillna(0, inplace=True)

df11.links = df11.links.astype(int)

df11.head(10)

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL,links
0,TH-102678,012345678901234567893120,3,PANSUSCEPTIBLE,100.828607,14.704461,"14.704461,100.828607",Blue Circle,0
1,TH-101007,012345678901234567894320,4,UNKNOWN,100.829347,14.702266,"14.702266,100.829347",Green Circle,0
2,TH-101290,012345678901234567894320,4,UNKNOWN,100.825159,14.699828,"14.699828,100.825159",Green Circle,0
3,TH-101067,012345678901234567894320,4,UNKNOWN,100.824887,14.700197,"14.700197,100.824887",Green Circle,0
4,TH-101184,012345678901234567890423,5,UNKNOWN,100.829032,14.697482,"14.697482,100.829032",Green Circle,0
5,TH-100913,012345678901234567893120,3,PANSUSCEPTIBLE,100.829261,14.702418,"14.702418,100.829261",Green Circle,0
6,TH-101176,012345678901234567894320,4,UNKNOWN,100.829228,14.702959,"14.702959,100.829228",Green Circle,0
7,TH-101497,012345678901234567894320,4,UNKNOWN,100.82914,14.702244,"14.702244,100.82914",Green Circle,0
8,TH-101280,012345678901234567894320,4,UNKNOWN,100.829344,14.702908,"14.702908,100.829344",Blue Circle,0
9,TH-101055,012345678901234567894320,4,UNKNOWN,100.829494,14.701485,"14.701485,100.829494",Green Circle,0


In [64]:
df11.query('CaseNo=="TH-104090"')

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL,links
181,TH-104090,012345678901234567890123,1,MDR-TB,100.830587,14.698245,"14.698245,100.830587",Red Triangle,1


In [65]:
df11.query('CaseNo=="TH-101055"')

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL,links
9,TH-101055,012345678901234567894320,4,UNKNOWN,100.829494,14.701485,"14.701485,100.829494",Green Circle,0


In [66]:
df11.query('FAKEMIRUID==1')

Unnamed: 0,CaseNo,FAKEMIRUVNTR,FAKEMIRUID,DRTYPE,LON,LAT,COORDS,SYMBOL,links
21,TH-101579,012345678901234567890123,1,MDR-TB,100.82844,14.702731,"14.702731,100.82844",Red Triangle,4
31,TH-101823,012345678901234567890123,1,MDR-TB,100.828917,14.699857,"14.699857,100.828917",Red Triangle,4
33,TH-101783,012345678901234567890123,1,MDR-TB,100.828634,14.696434,"14.696434,100.828634",Red Triangle,1
61,TH-102445,012345678901234567890123,1,MDR-TB,100.82895,14.696133,"14.696133,100.82895",Red Triangle,1
62,TH-102460,012345678901234567890123,1,MDR-TB,100.827556,14.700809,"14.700809,100.827556",Red Triangle,1
89,TH-102637,012345678901234567890123,1,MDR-TB,100.829703,14.696637,"14.696637,100.829703",Red Triangle,1
100,TH-101651,012345678901234567890123,1,MDR-TB,100.829259,14.700701,"14.700701,100.829259",Red Triangle,0
122,TH-101931,012345678901234567890123,1,MDR-TB,100.829533,14.700087,"14.700087,100.829533",Red Triangle,0
129,TH-103347,012345678901234567890123,1,MDR-TB,100.830247,14.697342,"14.697342,100.830247",Red Triangle,3
134,TH-103009,012345678901234567890123,1,MDR-TB,100.831095,14.699261,"14.699261,100.831095",Red Triangle,1


In [67]:
df11.to_pickle('outputs/df11.pickle')

### Map 7: Plot map with  enhanced `popup` displays

In [68]:
edges_df6.loc[edges_df6['batch'] == '1']

Unnamed: 0,batch,CaseNo1,CaseNo2,LAT1,LON1,LAT2,LON2
0,1,TH-101579,TH-102637,14.702731,100.82844,14.696637,100.829703
1,1,TH-101579,TH-101823,14.702731,100.82844,14.699857,100.828917
2,1,TH-101783,TH-101823,14.696434,100.828634,14.699857,100.828917
6,1,TH-102909,TH-103347,14.69898,100.830151,14.697342,100.830247
7,1,TH-103009,TH-103347,14.699261,100.831095,14.697342,100.830247
8,1,TH-101823,TH-102445,14.699857,100.828917,14.696133,100.82895
9,1,TH-101823,TH-102460,14.699857,100.828917,14.700809,100.827556
10,1,TH-103347,TH-103773,14.697342,100.830247,14.700229,100.82848


In [69]:
edges_df6.loc[edges_df6['batch'] == '2']

Unnamed: 0,batch,CaseNo1,CaseNo2,LAT1,LON1,LAT2,LON2
3,2,TH-101579,TH-104039,14.702731,100.82844,14.697903,100.829623
4,2,TH-101579,TH-103927,14.702731,100.82844,14.700072,100.829917
5,2,TH-103927,TH-104090,14.700072,100.829917,14.698245,100.830587


Map 7

In [70]:
# create 7th map of the case data 
# Store the coordinates of the refugee camp listed in the EOC email
CAMP_COORDINATES = (14.699859, 100.829019)

# create empty map zoomed in on Refugee camp
map7 = folium.Map(location=CAMP_COORDINATES, zoom_start=16)
folium.TileLayer("openstreetmap").add_to(map7)

# construct lines for epi links
lines=[]
for each in edges_df6.loc[edges_df6['batch'] == '1'].iterrows():
    origin = [each[1]['LAT1'],each[1]['LON1']]
    destination = [each[1]['LAT2'],each[1]['LON2']]
    line = [origin, destination]
    lines.append(line)

lines_new=[]
for each in edges_df6.loc[edges_df6['batch'] == '2'].iterrows():
    origin = [each[1]['LAT1'],each[1]['LON1']]
    destination = [each[1]['LAT2'],each[1]['LON2']]
    line = [origin, destination]
    lines_new.append(line)

# create Folium PolyLine and add to map
folium.PolyLine(lines, color='black',weight=2,line_opacity=0.8).add_to(map7)
folium.PolyLine(lines_new, color='red',weight=2,line_opacity=0.8).add_to(map7)

# loop through dataframe records
for each in nodes_df5.iterrows():
    # load record values into marker variables
    point=each[1]['COORDS']
    caseno=each[1]['CaseNo']
    miru_id=each[1]['FAKEMIRUID']
    drtype=each[1]['DRTYPE']
    links=each[1]['links']
    # Add the right marker to map depending on MIRU ID
    if miru_id == 1:
        # set up popup display string
        popup_string = "<b>Case No.:</b> "+caseno+"<br/>"+ \
                       "<b>Location:</b> "+point+"<br/>"+ \
                       "<b>MIRU ID:</b> "+str(miru_id)+"<br/>"+ \
                       "<b>Drug Resistance:</b> "+drtype+"<br/>" + \
                       "<b>Link Count:</b> "+str(links)
        # construct Folium popup
        popup=folium.Popup(popup_string, max_width='200')
        folium.RegularPolygonMarker(point.split(','), \
            popup=popup, fill_color=marker_color[miru_id], \
            color='black', fill_opacity=1, weight=0.8,\
            number_of_sides=3, radius=10).add_to(map7)
map7

## Case Study Questions

### Question 1. 

Recent papers from South Africa suggest XDR-TB threats involve direct transmission and require high-precision geographic data and molecular epidemiology data to stop transmission. The 2005 outbreak from Thailand suggests the same. How can we plan to gather the required, high-precision geographic data and molecular epi data required for future outbreaks?

Please type your answer below:


### Question 2


 Tuberculosis may often be found within areas of crowding, poor ventilation....how can we learn from the past to predict areas of potential future outbreaks?

Please type your answer below:


### Congratulations, you have completed Notebook 6 of the MDR-TB Case Study!


## References


CDC, 2017 Tuberculosis Genotyping: What is tuberculosis (TB) genotyping? CDC TB Fact Sheets 2017. URL: https://www.cdc.gov/tb/publications/factsheets/statistics/genotyping.htm

CDC, 2017 GENType: New Genotyping Terminology to Integrate 24-locus MIRU-VNTR. CDC TB Fact Sheets 2017. URL: https://www.cdc.gov/tb/publications/factsheets/statistics/genotypingterminology.pdf

CDC, 2017 A New Tool to Diagnose Tuberculosis: The Xpert MTB/RIF Assay. URL: https://www.cdc.gov/tb/publications/factsheets/pdf/xpertmtb-rifassayfactsheet_final.pdf

Oeltmann, J. E., Varma, J. K., Ortega, L., Liu, Y., O’Rourke, T., Cano, M., … Maloney, S. A. (2008). Multidrug-Resistant Tuberculosis  Outbreak among US-bound Hmong  Refugees, Thailand, 2005. Emerging Infectious Diseases, 14(11), 1715–1721. http://doi.org/10.3201/eid1411.071629

Shaw, N.S., et al. 2017. Transmission of Extensively Drug-Resistant Tuberculosis in South Africa. New England Journal of Medicine. January 19, 2017. 376:3.                                                  URL: http://www.nejm.org/doi/pdf/10.1056/NEJMoa1604544

Additional readings:
Reichmann, Lee B and Janice Hopkins Tanne. 2001. Timebomb: the global epidemic of multi-drug resistant tuberculosis. ISBN 0-07-135924-9. McGraw-Hill. URL: https://www.goodreads.com/book/show/1733578