# Lab 04 Tasks

This task involves working with a dataset which contains information about contacts between patients and health-care workers in a hospital ward in France. The data was gathered using wearable sensors which were able to detect face-to-face close range proximity of participants wearing the sensors.

For more details see:

- http://www.sociopatterns.org/datasets/hospital-ward-dynamic-contact-network/
- Génois, M., & Barrat, A. Can co-location be used as a proxy for face-to-face contacts?. EPJ Data Science, 7(1), 11. (2018)

The raw data is provided as 2 tab-separated files:*

1. *hospital-metadata.csv: One line per participant in the hospital ward, where each participant as a unique numeric ID and a "status" (e.g. patient, doctor etc).
2. *hospital-contacts.csv*: Each line indicates a contact event between two participants, with the time of the event and the numeric IDs of the participants involved.

### Task 1

Load the hospital metadata and contact data into two separate Pandas Data Frames.

In [3]:
import pandas as pd

df_metadata = pd.read_csv("lab04-data/hospital-metadata.csv", sep="\t").set_index("participant_id")

df_metadata.head()

Unnamed: 0_level_0,status
participant_id,Unnamed: 1_level_1
1179,admin
1232,admin
1658,admin
1209,admin
1098,admin


In [5]:
df_contactdata = pd.read_csv("lab04-data/hospital-contacts.csv", sep="\t")
df_contactdata.head()

Unnamed: 0,time,participant1,participant2
0,40080,1238,1246
1,41580,1238,1246
2,41660,1238,1246
3,41760,1238,1246
4,44460,1238,1246


### Task 2

Create a weighted undirected network from the two Data Frames, such that:

- There is a node for every participant in the study. Each node should have an attribute indicating their "status".
- Edges between nodes have a "weight", indicating the number of times two participants (nodes) have been in contact.

What size is the resulting network?

In [6]:
import networkx as nx

#Create Weighted Undirected Graph of Participants
g = nx.Graph()
for participant_id, row in df_metadata.iterrows():
    g.add_node(participant_id, status=row["status"])


NodeView((1179, 1232, 1658, 1209, 1098, 1671, 1525, 1535, 1109, 1115, 1142, 1149, 1190, 1193, 1196, 1202, 1207, 1210, 1613, 1632, 1105, 1114, 1116, 1164, 1181, 1205, 1238, 1245, 1246, 1261, 1295, 1485, 1625, 1629, 1100, 1108, 1130, 1148, 1221, 1260, 1157, 1191, 1144, 1152, 1159, 1168, 1660, 1305, 1307, 1320, 1323, 1327, 1332, 1352, 1362, 1363, 1365, 1373, 1374, 1377, 1378, 1383, 1385, 1391, 1393, 1395, 1399, 1401, 1416, 1460, 1469, 1547, 1701, 1702, 1769, 1784, 1513, 1518, 1580, 1590, 1594))

In [8]:
#Count number of times two participants have been in contact. This will be edge Weight
from collections import Counter
counts = Counter()
for i, row in df_contactdata.iterrows():
    node1 = row["participant1"]
    node2 = row["participant2"]
    pair = frozenset([node1, node2])
    counts[pair] += 1

In [10]:
#Add Edge weight from Counts
for p in counts:
    pair = list(p)
    node1, node2 = pair[0], pair[1]
    g.add_edge(node1, node2, weight=counts[p])

In [11]:
#View Network Details
print("The network has %d nodes and %d edges" % (g.number_of_nodes(), g.number_of_edges()))

The network has 81 nodes and 1156 edges


### Task 3

From the network created in Task 2, filter all isolated nodes (i.e. nodes with degree 0).

In [15]:
#Identify number of nodes with degree 0
list(nx.isolates(g)) 

[1671, 1632, 1116, 1629, 1152]

In [17]:
#Remove All Isolated Nodes
g.remove_nodes_from(list(nx.isolates(g)))

In [18]:
#View Network Details
print("The network has %d nodes and %d edges" % (g.number_of_nodes(), g.number_of_edges()))

The network has 76 nodes and 1156 edges


### Task 4

Export the filtered network from Task 3 as a new GEXF file.

In [19]:
nx.write_gexf(g, "hospital.gexf")

### Task 5

Load the GEXF file from Task 4 in *Gephi*, and complete the following steps:
    
1. Colour the nodes in the network, based on their "status" attribute, via the *Appearance Panel* on the *Overview* screen.
2. Calculate the **weighted degree** of the nodes, via the *Statistics Panel* on the *Overview* screen.
3. Scale the size of the nodes, via the *Appearance Panel* on the *Overview* screen, ranked based on their **weighted degree**.
4. Apply the **Force Atlas** algorithm to layout the network on the *Overview* screen. Adjust the parameters on the *Layout* panel to improve the network's readability.
5. Filter the nodes in the network to only display nodes with the "status" attribute equal to "doctor", via the *Filters Panel* on the *Overview* screen.
6. Use the *Preview* screen to adjust the final appearance of the network, and then export an image of the network as a PNG file.

In [None]:
#See hospital.gephi