# Synthetic Human Contact Network

This is the first step in creating the Urban Transportation Disease Spread Simulator. Synthetic Human Contact Network is the backbone of the simulator (TranEpiSim). I utilized the methodology proposed by [**Burger et al. (2017)**](https://dl.acm.org/doi/abs/10.1145/3145574.3145593?casa_token=z6J9RUlCP3UAAAAA:7Ie0L7W9_bmIsaR9JHfutQgyYJ9aIj0dC2ZnuGhQaF4x46QOczVq1FwfH7dsDloAcwDCH26pjOM) to create a large-scale human contact network adapted for a disease system. The code for constructing the network was obtained from **Talha Oz's** Jupyter notebook repository, which provides detailed instructions and can be accessed [**here**](https://nbviewer.org/gist/oztalha/a1c167f3879c5b95f721acef791c8111/Population%20Synthesis%20for%20ABM.ipynb). It has wondeful instructions and you can start from there if you want to develop a network for your own application. I will provide concise instructions and details for this notebook unless changes are made to the network.



## 1. Install requirements


* graph-tool documentation:

    https://graph-tool.skewed.de/static/doc/index.html
* See here for instructions how to install graph-tool on different platforms, including Colab:

    https://graph-tool.skewed.de/static/doc/index.html#installing-graph-tool

    https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions

* To import a library that's not in Colaboratory by default, you can use !apt-get install. Since graph-tool is not in the official repository, we need to add it to the list.

In [1]:
!echo "deb http://downloads.skewed.de/apt focal main" >> /etc/apt/sources.list
!apt-key adv --keyserver keyserver.ubuntu.com --recv-key 612DEFB798507F25
!apt-get update
!apt-get install python3-graph-tool python3-matplotlib python3-cairo

Executing: /tmp/apt-key-gpghome.MG8s5quh4n/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-key 612DEFB798507F25
gpg: key 612DEFB798507F25: public key "Tiago de Paula Peixoto <tiago@skewed.de>" imported
gpg: Total number processed: 1
gpg:               imported: 1
Get:1 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:8 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [1,084 kB]
Hit:10 http://ppa.launchpad.n

In [2]:
# Colab uses a Python install that deviates from the system's! Bad collab! We need some workarounds.
!apt purge python3-cairo
!apt install libcairo2-dev pkg-config python3-dev
!pip install --force-reinstall pycairo
!pip install zstandard

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  python3-cairo*
0 upgraded, 0 newly installed, 1 to remove and 15 not upgraded.
After this operation, 270 kB disk space will be freed.
(Reading database ... 126596 files and directories currently installed.)
Removing python3-cairo:amd64 (1.16.2-2ubuntu2) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
pkg-config is already the newest version (0.29.1-0ubuntu4).
python3-dev is already the newest version (3.8.2-0ubuntu2).
python3-dev set to manually installed.
The following additional packages will be installed:
  libblkid-dev libblkid1 libcairo-script-interpreter2 libffi-dev
  libglib2.0-dev libglib2.0-dev-bin liblzo2-2 libmount-dev libmount1
  libpixman-1-dev libselinux1-dev libsepol1-dev libxcb-render0-dev
  libxcb-shm0-dev
Suggested packages:
  libcairo2-doc libgirepository1.0-dev libglib2.0-doc lib

In [3]:
!pip install signatory # May take some time...

Collecting signatory
  Downloading signatory-1.2.6.1.9.0.tar.gz (62 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.8/62.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: signatory
  Building wheel for signatory (setup.py) ... [?25l[?25hdone
  Created wheel for signatory: filename=signatory-1.2.6.1.9.0-cp310-cp310-linux_x86_64.whl size=13908076 sha256=741d7fa11060ea152c8bc0625bb02e0b5cbb62e520e861cb10ba0f1c6bc8a866
  Stored in directory: /root/.cache/pip/wheels/71/b4/17/46d769da4808e9f83f9790a2b805f81f43ececc2c02f5b1e62
Successfully built signatory
Installing collected packages: signatory
Successfully installed signatory-1.2.6.1.9.0


In [4]:
!pip install pandas geopandas mapclassify rtree simpledbf

Collecting mapclassify
  Downloading mapclassify-2.5.0-py3-none-any.whl (39 kB)
Collecting rtree
  Downloading Rtree-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting simpledbf
  Downloading simpledbf-0.2.6.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: simpledbf
  Building wheel for simpledbf (setup.py) ... [?25l[?25hdone
  Created wheel for simpledbf: filename=simpledbf-0.2.6-py3-none-any.whl size=13785 sha256=54a56590195126ec2de3552cf40c4999f1f85cdf8e05f363debd0c1f5069276c
  Stored in directory: /root/.cache/pip/wheels/e5/41/13/ebdef29165b9309ec4e235dbff19eca8b6759125b0924ad430
Successfully built simpledbf
Installing collected packages: simpledbf, rtree, mapclassify
Successfully installed mapclassify-2.5.0 rtree-1.0.1 simpledbf-0.2.6


In [5]:
!apt-get install gtk+3

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libcaribou-gtk3-module' for regex 'gtk+3'
Note, selecting 'libgtk3.0-cil-dev' for regex 'gtk+3'
Note, selecting 'libghc-gtk3-dev-0.15.1-9336f' for regex 'gtk+3'
Note, selecting 'spacefm-gtk3' for regex 'gtk+3'
Note, selecting 'libghc-gtk3-prof' for regex 'gtk+3'
Note, selecting 'libdbusmenu-gtk3-dev' for regex 'gtk+3'
Note, selecting 'gtk3-engines-breeze' for regex 'gtk+3'
Note, selecting 'libgtk3-webkit2-perl' for regex 'gtk+3'
Note, selecting 'python-wxgtk3.0-dev' for regex 'gtk+3'
Note, selecting 'uim-gtk3-immodule' for regex 'gtk+3'
Note, selecting 'libreoffice-gtk3' for regex 'gtk+3'
Note, selecting 'fcitx5-frontend-gtk3' for regex 'gtk+3'
Note, selecting 'monodoc-gtk3.0-manual' for regex 'gtk+3'
Note, selecting 'libcanberra-gtk3-dev' for regex 'gtk+3'
Note, selecting 'ruby-gtk3' for regex 'gtk+3'
Note, selecting 'packagekit-gtk3-module' for regex 'gtk+3'
Note, selectin

In [6]:
!pip install seaborn plotly



## 2. Add required packages


* See the Requirement.txt for all required packages.
* To utilize Google Colaboratory, it is necessary to mount your Google Drive.

In [7]:
from google.colab import drive
drive.mount('gdrive')

Mounted at gdrive


In [8]:
# General:
import os
import sys
import multiprocessing
from multiprocessing import pool
from multiprocessing import *
from io import StringIO
from IPython.display import display, HTML, Image
import numpy as np
import pandas as pd
import pickle
import datetime as dt
from datetime import timedelta, date
import timeit
import gzip
import shutil
from functools import partial
from dateutil.parser import parse
from concurrent.futures import ThreadPoolExecutor
from itertools import chain
from glob import glob
from sklearn.preprocessing import normalize
import random
# import rtree
import mapclassify

# Spatial:
import geopandas as gpd
from shapely.prepared import prep
from shapely.ops import snap, linemerge, nearest_points
from shapely.geometry import MultiLineString, LineString, Point, Polygon, GeometryCollection
# import pygeos

# Visualization:
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.animation as ani
from matplotlib import rc
import matplotlib.dates as mdates
mpl.rcParams.update(mpl.rcParamsDefault)
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Graph analyses:
from graph_tool.all import graph_tool as gt
from graph_tool.all import *
import networkx as nx
#import cairo



In [7]:
# General:
import os
import sys
import numpy as np
import pandas as pd
import timeit
from itertools import chain
from sklearn.preprocessing import normalize

# Spatial:
import geopandas as gpd
from shapely.geometry import MultiLineString, LineString, Point, Polygon, GeometryCollection

# Visualization:
import matplotlib as mpl
mpl.rcParams.update(mpl.rcParamsDefault)
%matplotlib inline

# Graph analyses:
from graph_tool.all import *
import networkx as nx



## 3. Data sources

**3.1. Roads:** 2010 Census TIGER [shapefiles.](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2010&layergroup=Roads)(soon will be updated with census 2020.)

**3.2. Demographics:** 2010 Census-tract level [Demographic Profile (DP).](https://www.census.gov/programs-surveys/decennial-census/guidance/2010/2010-data-products-at-a-glance.html)(Soon will be updated with census 2020.)

**3.3. School:** The Educational Institution [dataset.](https://geodata.epa.gov/arcgis/rest/services/OEI/ORNL_Education/MapServer)

**3.4. Establishment numbers:** Census Bureau’s County Business Patterns [(CBP).](https://www.census.gov/data/datasets/2010/econ/cbp/2010-cbp.html)

**3.5. Workflow:** Census Bureau’s Longitudinal Employer- Household Dynamics (LEHD) Origin-Destination Employment Statistics [(LODES).](https://lehd.ces.census.gov/data/)

In [9]:
os.chdir('/content/gdrive/MyDrive/TranEpiSim')
# 1. Road file
road = gpd.read_file('data/road/roads.shp')

# 2. Demographic profile
dp = gpd.read_file('data/dp/dp.shp').set_index('GEOID10')
dp['portion'] = dp.apply(lambda tract: tract.geometry.area / tract.Shape_Area, axis=1)

# 3. Schools and daycares
school = gpd.read_file('data/education/school.shp')
daycare = gpd.read_file('data/education/day_care.shp')

# 4. Number of establishments per county per size
cbp = pd.read_csv('data/cbp/cbp10co.zip')
cbp = cbp[(cbp.naics.str.startswith('-'))] #All types of establishments included
cbp['fips'] = cbp.fipstate.map("{:02}".format) + cbp.fipscty.map("{:03}".format)
cbp = cbp.set_index('fips')

# 5. Origin (home) - destination (job) at census-tract level
od = pd.read_csv('data/od/tract-od15Cook.csv',
                 dtype={i:(str if i<2 else int) for i in range(6)})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cbp['fips'] = cbp.fipstate.map("{:02}".format) + cbp.fipscty.map("{:03}".format)


## 4. Synthesize and save population

In [None]:
sys.path.insert(0,"src/")
import synthesizer as syn

#Add workplace counts and sizes to dp
dp['WP_CNT'] = syn.number_of_wp(dp,od,cbp)
dp['WP_PROBA'] = dp.WP_CNT.map(syn.wp_proba)

# Create a unified file for education
school = syn.clean_schools(school,daycare)

population = []
errors = []
wps = []
%prun
dp.apply(lambda t: syn.synthesize(t,od,road,school,errors, population, wps, dp),axis=1)

# Save the results
with open('output/errors.pkl', 'wb') as f:
    pickle.dump(errors, f)
with open('output/population.pkl', 'wb') as f:
    pickle.dump(population, f)
with open('output/wps.pkl', 'wb') as f:
    pickle.dump(wps, f)

 17031840300 started... 17031840300 now ended (4.3 secs)
17031840200 started... 17031840200 now ended (2.8 secs)
17031841100 started... 17031841100 now ended (8.7 secs)
17031841200 started... 17031841200 now ended (6.0 secs)
17031838200 started... 17031838200 now ended (1.6 secs)
17031770201 started... 17031770201 now ended (7.1 secs)
17031804610 started... 17031804610 now ended (2.6 secs)
17031804715 started... 17031804715 now ended (3.5 secs)
17031804108 started... 17031804108 now ended (4.8 secs)
17031803701 started... 17031803701 now ended (3.1 secs)
17031650301 started... 17031650301 now ended (6.3 secs)
17031530503 started... 17031530503 now ended (5.7 secs)
17031530501 started... 17031530501 now ended (5.4 secs)
17031760803 started... 17031760803 now ended (7.6 secs)
17031540102 started... 17031540102 now ended (3.4 secs)
17031540101 started... 17031540101 now ended (4.0 secs)
17031440201 started... 17031440201 now ended (5.8 secs)
17031839000 started... 17031839000 now ended (8

## 5. Create and save the network

In [None]:
# Read synthesized population
with open('output/population.pkl','rb') as f:
    people = pd.concat(pickle.load(f))

# Create and save the networks
g = create_networks(people,k=k,p=.3)
nx.write_gml(g,'output/contact_network.gml')

# Create networks by contact types
for etype in ['hhold','work','school']:
    sg = nx.Graph([(u,v) for u,v,d in g.edges(data=True) if d['etype']==etype])
    nx.write_gml(sg, f'output/{etype}_contact_network.gml')

# Create a network for work-school contacts
work_school = nx.Graph([(u,v) for u,v,d in g.edges(data=True) if d['etype'] in ['work','school']])
nx.write_gml(work_school,'work_school_contact_network.gml')