# Universal graph neural networks for multi-task transfer learning
---

### Overview
1. Import data

---

Assume that every event in the world occurs at a real or virtual point in time $\mathbf{t}$ and space $\mathbf{s}$ that is represented by a node $\mathbf{n}_{t, s}$ which is a concatenation of $\mathbf{t}$, $\mathbf{s}$ and additional node features $\mathbf{f}_{t, s}$. Any relationship or dependency between different events can then be described by three types of adjacency, that is, temporal adjacency $\mathbf{a}_{(t_{o}, t_{d}), s}$, spatial adjacency $\mathbf{a}_{t, (s_{o}, s_{d})}$ and spatio-temporal adjacency $\mathbf{a}_{(t_{o}, t_{d}), (s_{o}, s_{d})}$. Temporal adjacency $\mathbf{a}_{(t_{o}, t_{d}), s}$ describes the relationship between events at the same point $\mathbf{s}$ in space but different origin $\mathbf{t}_{o}$ and destination $\mathbf{t}_{d}$ points in time. Spatial adjacency $\mathbf{a}_{t, (s_{o}, s_{d})}$ describes the relationship between events at the same point $\mathbf{t}$ in time but different origin $\mathbf{s}_{o}$ and destination $\mathbf{s}_{d}$ points in space. Spatio-temporal adjacency $\mathbf{a}_{(t_{o}, t_{d}), (s_{o}, s_{d})}$ respectively describes the relationship between events at different origin and destination points $(\mathbf{t}_{o}, \mathbf{t}_{d}), (\mathbf{s}_{o}, \mathbf{s}_{d})$ in both time and space.

Let coordinates $(\mathbf{t}, \mathbf{s})$ describe the location of nodes with arbitrary data $\mathbf{t} \in \mathbb{R}^{D_{t}}$ and $\mathbf{s} \in \mathbb{R}^{D_{s}}$. Further, let node features $\mathbf{f}_{t, s} \in \mathbb{R}^{D_f}$ with $\mathbf{n}_{t, s} = concat(\mathbf{t}, \mathbf{s}, \mathbf{f}_{t, s}) \in \mathbb{R}^{D_n}$ such that $D_{n}= D_{t} + D_{s} + D_{f}$ and adjacency types $\mathbf{a}_{(t_{o}, t_{d}), s} \in \mathbb{R}^{D_{at}}$, $\mathbf{a}_{t, (s_{o}, s_{d})} \in \mathbb{R}^{D_{as}}$, and $\mathbf{a}_{(t_{o}, t_{d}), (s_{o}, s_{d})} \in \mathbb{R}^{D_{ast}}$ to be described by arbitrary data as well. Then, every consistent dataset can be described by a dynamic graph that evolves in space-time as we measure new data. We call this our **universal graph representation of data**.

The figure below illustrates our universal graph representation of data exemplarly for $\mathbf{t}, \mathbf{s} \in \mathbb{R}^1$. Undirected graphs are a special case of each relationship also having a counter-part into the opposite direction in our proposed representation. We can further note that temporal and spatial adjacency are special cases of spatio-temporal adjacency. Distinguishing these allows us to use more data types for expressing adjacency.


![UniversalDataGraph](../figures/UniversalDataGraph.png)



Given our universal graph representation of data, our goal is to design a neural network prediction model that is able to solve multiple prediction tasks using arbitrary data types for $\mathbf{t}, \mathbf{s}, \mathbf{f}_{t, s}, \mathbf{n}_{t, s}, \mathbf{a}_{(t_{o}, t_{d}), s}, \mathbf{a}_{t, (s_{o}, s_{d})}$ and $\mathbf{a}_{(t_{o}, t_{d}), (s_{o}, s_{d})}$ in each task. In order to achieve this, we must be able to transform our universal graph representation of data into a computation graph that can be trained using traditional deep learning algorithms like backpropagation and stochastic gradient descent. Graph neural networks (GNNs) can solve a wide range of prediction problems as either node-, edge- or graph-level tasks every time that data can be structured as a graph. We therefore use a **GNN architecture as the backbone** of our prediction model, whose parameters we share among different tasks. 

In order to share a single GNN backbone among multiple tasks, we need to ensure that we can handle multiple data types and deal with the characteristic sparsity of data that are involved in different prediction tasks. We therefore introduce the convention that any data in our universal graph representation must be embedded into a **consistent feature space of fixed dimension** before it is processed by our backbone GNN and that missing data for generating these embeddings are tackled by **augmenting our data graph with constant values** like zeros or ones. This allows us to customize operations on the raw data that a task involves and use for example convolutional neural networks (CNNs) for images, transformers for sequences, or GNNs again for graph data. 

In [1]:
import torch
import pandas as pd
import hyper
import data

HYPER = hyper.HyperParameter()

### 1. Import data

In this section, we import each dataset and describe how they fit into our universal graph representation of data. We then identify the type of prediction task we are dealing with and define what our features and labels are. 

In [2]:
datasets = data.Datasets()

#### 1.1 Import Uber Movement data

The Uber Movement data consists of the entries displayed below. Each row of the first table represents a data point consisting of features x and labels y that we are going to describe in more detail below. The data fits into our universal graph representation of data in the following way:

* $\mathbf{t} \in \mathbb{R}^4$: a unique point in time is identified by four values. These are 'year', 'quarter_of_year', 'hour_of_day' and 'daytype'. While the first three are self-explanatory, the 'daytype' column describes weekends with a value of 0 and weekdays with a value of 1. Together, these four values represent our coordinate <b>t</b> in time of dimension $D_{t}=4$. A densely connected layer may be suitable for processing this information.

* $\mathbf{s} \in \mathbb{R}^{3 \cdot D_{cityzone}}$: every data point is expressed as the travel time statistics between different zones of a city at a particular point in time. For every city (described be the column 'city_id' and mapped to the city name in a separate file), we can map city zones (described by the columns 'source_id' and 'destinaiton_id') into a set of geographic coordinates that describe the corners of a polygon for each zone in a separate file. The corners of polygons were initially described by latitude and longitude coordinates. We transformed these into a 3-dimensional unit sphere expressed by a set of (x,y,z) coordinates for every pair of original (lat,long) coordinates. This describes different gographic points on our planet better. Each city zone, however, is described by different polygons. The number of polygon corners $D_{cityzone}$ is therefore variable and dependent on the respective city zone of a city. A transformer layer may therefore be suitable for processing this information.

* $\mathbf{f}_{t, s} \in \emptyset$

* $\mathbf{n}_{t, s} = concat(\mathbf{t}, \mathbf{s}) \in \mathbb{R}^{4 + 3 \cdot D_{cityzone}} $ 

* $\mathbf{a}_{(t_o, t_d), s} \in \emptyset$

* $\mathbf{a}_{t, (s_{o}, s_{d})}\in \mathbb{R}^4$: the only type of adjacency we have in our data is spatial adjacency. These are the travel time statistics between an origin-destination pair of points in $\mathbf{s}$ for the same point in $\mathbf{t}$. The travel time statistics are expressed as regular and geometric mean and standard deviations which give us a spatial adjacency of dimension $D_{as}=4$.

* $\mathbf{a}_{(t_o, t_d), (s_o, s_d)} \in \emptyset$

Given an origin-destination pair of city zones for a particular city at a particular time, we want to predict travel time statistics. This results in a supervised learning problem and an edge-level prediction task. In an end-to-end prediction model, we want our features and labels to be structured in the following way:



In [3]:
datasets.import_ubermovement_sample(HYPER, display_data=True)

Unnamed: 0,city_id,source_id,destination_id,year,quarter_of_year,daytype,hour_of_day,mean_travel_time,standard_deviation_travel_time,geometric_mean_travel_time,geometric_standard_deviation_travel_time
0,5,809,1164,2019,4,0,8,1087.11,209.30,1068.90,1.20
1,4,84,411,2020,1,1,3,266.32,121.32,245.56,1.46
2,5,904,160,2018,3,0,22,1732.00,222.71,1716.77,1.14
3,7,205,432,2016,1,1,10,861.25,167.60,847.09,1.19
4,2,542,2173,2020,1,1,20,2158.75,118.11,2155.49,1.06
...,...,...,...,...,...,...,...,...,...,...,...
19999995,5,1151,760,2019,4,1,10,1515.33,188.20,1501.84,1.15
19999996,7,891,281,2019,4,1,20,1995.56,1255.90,1762.82,1.56
19999997,2,1080,1995,2020,1,1,16,4055.79,988.24,3929.61,1.29
19999998,7,867,21,2018,3,1,6,992.76,260.56,964.30,1.26


Exemplar geographic city zone data for city of Leeds


Unnamed: 0,x_cord_1,x_cord_2,x_cord_3,x_cord_4,x_cord_5,x_cord_6,x_cord_7,x_cord_8,x_cord_9,x_cord_10,...,z_cord_290,z_cord_291,z_cord_292,z_cord_293,z_cord_294,z_cord_295,z_cord_296,z_cord_297,z_cord_298,z_cord_299
0,0.590381,0.590303,0.590438,0.590119,0.590361,0.590260,0.590540,0.590568,0.590443,0.590445,...,0.805492,0.805164,0.805160,0.805520,0.807008,0.807081,0.807136,0.807085,0.806968,0.806991
1,0.590422,0.590334,0.590476,0.590174,0.590398,0.590270,0.590553,0.590668,0.590517,0.590486,...,0.805455,0.805144,0.805106,0.805489,0.806959,0.807065,0.807066,0.807061,0.806949,0.806964
2,0.590388,0.590329,0.590481,0.590176,0.590388,0.590270,0.590527,0.590666,0.590575,0.590561,...,0.805442,0.805163,0.805228,0.805491,0.806961,0.807057,0.807043,0.807062,0.806911,0.806971
3,0.590398,0.590332,0.590461,0.590194,0.590422,0.590347,0.590565,0.590631,0.590580,0.590611,...,0.805395,0.805160,0.805291,0.805479,0.806926,0.807018,0.807029,0.807061,0.806907,0.806970
4,0.590361,0.590305,0.590506,0.590229,0.590440,0.590355,0.590651,0.590669,0.590626,0.590645,...,0.805364,0.805264,0.805308,0.805391,0.806940,0.806989,0.807005,0.807040,0.806861,0.806971
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,,,,,,,,,,,...,,,,,,,,,,
66,,,,,,,,,,,...,,,,,,,,,,
67,,,,,,,,,,,...,,,,,,,,,,
68,,,,,,,,,,,...,,,,,,,,,,


Unnamed: 0,city_id
Guadalajara,0
Stockholm,1
San Francisco,2
Perth,3
Auckland,4
Boston,5
Brussels,6
London,7
Miami,8
Leeds,9


#### 1.2 Import ClimART data

In [4]:
datasets.import_climart_sample(HYPER, display_data=True)

Unnamed: 0,year,hour_of_year,x_cord,y_cord,z_cord,cszrow,gtrow,pressg,oztop,emisrow,...,rsuc_40,rsuc_41,rsuc_42,rsuc_43,rsuc_44,rsuc_45,rsuc_46,rsuc_47,rsuc_48,rsuc_49
0,1991,2050,-0.091897,0.075887,0.992873,0.000000,244.15,100791.21,0.000005,0.969831,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,1991,1435,-0.068821,-0.257608,0.963795,0.529358,298.49,101565.25,0.000005,0.962340,...,44.651253,43.977707,43.420510,42.933510,42.456097,41.987305,41.530230,41.080820,40.717087,40.421080
2,1994,3895,0.340944,0.238611,0.909297,0.511699,263.78,83471.88,0.000003,1.000000,...,451.944180,452.067080,452.176820,452.284640,452.447720,452.661440,452.946380,453.286560,453.589540,453.852200
3,2099,2460,0.149672,-0.906003,0.395925,0.000000,273.52,92331.06,0.000009,0.989625,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,2097,4305,0.794490,0.269864,-0.544021,0.371641,276.00,100094.73,0.000007,0.965906,...,44.199547,43.631054,43.163925,42.749810,42.344850,41.948055,41.564670,41.190006,40.892826,40.652206
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34995,2099,2460,-0.744942,-0.511587,-0.428183,0.020969,305.63,100883.52,0.000006,0.964020,...,1.934605,1.903411,1.878408,1.856656,1.835692,1.815542,1.796215,1.777640,1.762911,1.751088
34996,1991,1230,-0.082869,0.280139,0.956376,0.705204,303.36,92388.58,0.000005,0.987282,...,98.222490,97.839680,97.530800,97.259740,97.006905,96.777440,96.633290,96.579050,96.566154,96.576836
34997,2099,4510,0.409733,0.363492,0.836656,0.820885,299.90,102211.80,0.000006,0.964844,...,37.317398,36.449253,35.732426,35.093975,34.465492,33.843662,33.228992,32.619190,32.121902,31.714237
34998,1994,2255,-0.903161,-0.145099,-0.404038,0.757296,298.14,98177.45,0.000005,0.995801,...,99.160230,98.788230,98.519135,98.331750,98.199560,98.145710,98.154220,98.227180,98.343895,98.469300
