-
Notifications
You must be signed in to change notification settings - Fork 27
/
1_8_exploration.ipynb
133 lines (133 loc) · 5.51 KB
/
1_8_exploration.ipynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1.8 Exploration: Multi-order analysis of [paths and time-stamped social networks](https://github.com/IngoScholtes/kdd2018-tutorial/tree/master/data)\n",
"\n",
"**Ingo Scholtes** \n",
"Data Analytics Group \n",
"Department of Informatics (IfI) \n",
"University of Zurich \n",
"\n",
"\n",
"**August 22 2018**\n",
"\n",
"In the last (open-ended) exploration, you get the chance to apply multi-order representation in the analysis of real data. In addition to the pathway data from session 1, we will consider data that we provide in the SQLite database `temporal_networks.db`. You can check which tables it contains by checking the `metadata` table:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"eu_email_dept1 \t\t E-Mail exchanges in EU institution (department 1)\n",
"eu_email_dept2 \t\t E-Mail exchanges in EU institution (department 2)\n",
"eu_email_dept3 \t\t E-Mail exchanges in EU institution (department 3)\n",
"eu_email_dept4 \t\t E-Mail exchanges in EU institution (department 4)\n",
"sociopatterns_workplace \t\t Social contacts in a workspace\n",
"sociopatterns_hospital \t\t Social contacts in a hospital\n",
"sociopatterns_highschool_2013 \t\t High school contact networks\n",
"manufacturing_email \t\t E-Mail exchanges in Polish manufacturing company\n",
"lotr \t\t Character co-occurrences in The Lord of the Rings\n",
"haggle \t\t Contacts between humans recorded by smart devices\n",
"sociopatterns_primaryschool \t\t Primary School contact networks\n",
"sociopatterns_hypertext \t\t Social contacts at Hypertext conference\n"
]
}
],
"source": [
"import pathpy as pp\n",
"import sqlite3\n",
"\n",
"con = sqlite3.connect('../data/temporal_networks.db',)\n",
"con.row_factory = sqlite3.Row\n",
"\n",
"for row in con.execute('SELECT * from metadata'):\n",
" print('{0} \\t\\t {1}'.format(row['tag'], row['name']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Details on the origin of these data can be found [here](https://github.com/IngoScholtes/kdd2018-tutorial/tree/master/data). Below, we include boilerplate code to load these data sets into the `TemporalNetwork` class in `pathpy`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2018-08-21 16:40:34 [Severity.INFO]\tRetrieving directed time-stamped links ...\n",
"2018-08-21 16:40:35 [Severity.INFO]\tBuilding index data structures ...\n",
"2018-08-21 16:40:35 [Severity.INFO]\tSorting time stamps ...\n",
"2018-08-21 16:40:35 [Severity.INFO]\tfinished.\n",
"Nodes:\t\t\t167\n",
"Time-stamped links:\t82927\n",
"Links/Nodes:\t\t496.5688622754491\n",
"Observation period:\t[1262454010, 1285880892]\n",
"Observation length:\t 23426882 \n",
"Time stamps:\t\t 57842 \n",
"Avg. inter-event dt:\t 405.02207776490724\n",
"Min/Max inter-event dt:\t 1/225913\n"
]
}
],
"source": [
"table = 'manufacturing_email'\n",
"\n",
"# Check whether network is directed or not\n",
"directed_network = bool(con.execute(\"SELECT directed FROM metadata WHERE tag='{0}'\".format(table)).fetchone()['directed'])\n",
"t = pp.TemporalNetwork.from_sqlite(con.execute('SELECT source, target, time FROM ' + table), \n",
" directed=directed_network)\n",
"print(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using these data and the methods introduced in our tutorial, we suggest to study the following problems (in ascending order of difficulty):\n",
"\n",
"- Generate higher-order visualisations of the US Flight and London Tube data and visually compare the graph layouts calculated for the first and optimal-order models.\n",
"- Use the `MultiOrderModel` class to learn the optimal order of a temporal network. How does the detected optimal order change with the time scale $\\delta$ that you use in the extraction of causal paths?\n",
"- Use the `MultiOrderModel` class to learn the optimal order of the London Tube data set. How does the detected optimal order compare to the prediction performance studied in exploration 1.4?\n",
"- Study the change in the algebraic connectivity between the second-order model and the second-order null model for (i) a temporal network data set and (ii) the US Flights data.\n",
"- Perform a spectral clustering of a dynamic social network based on the Laplacian of higher-order networks at different orders. How does the clustering differ from a first-order clustering?\n",
"\n",
"Again, these are only suggestions and you are welcome to use the time to study other data sets or questions that come to your mind. We'll be happy to help you with the analysis."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}