-
Notifications
You must be signed in to change notification settings - Fork 0
/
research.html
393 lines (347 loc) · 24.4 KB
/
research.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="description"
content="Yazan Boshmaf, Research Scientist in Cyber Security at Qatar Computing Research Institute, HBKU">
<meta name="keywords" content="Yazan Boshmaf, Boshmaf, News, Security, Privacy, Research, QCRI, HBKU">
<title>Research - Yazan Boshmaf</title>
<link rel="stylesheet" href="css/common.css" type="text/css">
<style type="text/css">
.title {
margin: 5px 0 5px 0;
font-weight: bold;
}
</style>
</head>
<body>
<div class="fore">
<h1>Research Statement</h1>
<h5 style="margin-bottom: 0px; margin-top: 5px;">A highly opinionated response to why I do research</h5>
<h2><a href="index.html">« Home</a></h2>
<div class=spacer></div>
<p class="indent">
<p style="padding: 16px 38px;"><em><b>Summary.</b> We live in the Information Age. Innovations in information
technology, such as the World Wide Web (a.k.a. the Web), play a pivotal role in advancing today's
knowledge-based societies and should be protected at all costs. My research revolves around the security and
privacy of the Web, with an emphasis on problems that impact the way we adopt and use its underlying
technologies. My ultimate goal is to empower Web users by building software systems that are resilient to centralized control, give users full control
of their data, and offer higher levels of privacy and security.</em></p>
<div class="title">Background</div>
<p>One characteristic of the Information Age is that our society is surrounded by a high-tech global economy in
which wealth is knowledge and growth is learning. This system of the world leads to a fundamental question about
our experience with the Web: When we go online, how do we identify ourselves, connect with others, and gain
knowledge?</p>
<p>When I started exploring this question in 2010, as part of an early investigation of Web single sign-on [<a
href="#1">1</a>], the answer was predominantly "Facebook." True then as it is now, we live in the Facebook Age.
We identify ourselves with our Facebook accounts, we connect with others by sending friend requests, and we learn
about the world from our newsfeeds. For many, the Web has become Facebook; or Google, Twitter, and the like.
Regardless of which “walled garden” we choose to visit, the Web is far distant from its open standard origin.
Today’s Web is dominated by a few large websites that use closed, centralized systems. Moreover, most of these
websites implement a business model that allows them to be used for free in exchange for user data, which is then
used for highly-targeted advertising. Other than treating users as commodities, this “free-to-use” business model
is fundamentally flawed. A price of zero greatly reduces liabilities to users, especially privacy and security,
incentivizes the use of abusive retention strategies, which profile users and herd them towards sharing more data,
and prohibits market learning by hiding pricing signals, which hinders competition and innovation. Yet, the Web
seems to work well in practice, does it not? Starting early 2011, I set out to test this proposition, focusing on
the privacy and security implications of using large, centralized websites like Facebook.</p>
<div class="title">The Centralized Web: The Case Against Facebook</div>
<p>As Facebook had 500 million monthly active users in 2011, and projecting its first billion within a year, I
decided to investigate how feasible it is for an attacker to achieve a seemingly far-fetched goal: To automate,
and effectively fake, the whole Facebook experience for many adversarial objectives, such as private data
collections and misinformation.</p>
<div style="text-align: center;">
<img src="img/social_botnet.png" alt="Social botnet" style="width:360px;" />
<br><span><b>Figure 1.</b> Social botnet</span>
</div>
<p>After presenting the case for this investigation to The Office of the Privacy Commissioner of Canada, I started
designing a Web automation tool called a socialbot. In concept, a socialbot controls a fictitious Facebook account
and can perform social activities similar to those of real users, such as sending friend requests and posting
messages. As such, a socialbot is designed to deceive and pass itself off as a human being. A network of
socialbots can be controlled by an adversary in a command-and-control fashion, similar to malware botnets. For a
social botnet to operate at scale, all aspects of its operation must be automated, including account creation,
profile setup, befriending, and posting messages. The main challenge is not automation per se, but being able to
infiltrate and connect with the rest of the Facebook network. This is the case because successful social
infiltration is essential for evaluating privacy and security concerns, as isolated socialbots pose minimal risk
to users and are easy to detect. To overcome this challenge, the socialbots are designed to exploit known concepts
in sociology, such as targeting users with whom they have mutual friends (i.e., triadic closure), in order to
improve their infiltration success rate.</p>
<p>After consulting with the university’s research ethics board, I carefully operated a network of 100 socialbots on
Facebook for eight weeks, resulting in one main finding: With up to 80% infiltration rate, 250GB of raw data
collected from more than one million users, and less than 20% detection rate, Facebook and its users are left
defenseless [<a href="#2">2</a>,<a href="#3">3</a>]. This was the first comprehensive study that showed the
vulnerability of large, centralized online social networks (OSNs) like Facebook to fake, abusive automation, and
has received the outstanding paper award at ACSAC '11. It also resulted in a private presentation to the InfoSec
Technology Transition Council for the Science & Technology Directorate at the U.S. Department of Homeland
Security, in addition to an international news coverage, years before Facebook was formally questioned by the U.S.
Senate about its handling of user data, fake news, and political bias in 2018.</p>
<p>In 2012, shortly after this investigation, I started looking at how OSNs can protect their users from automated
fake accounts [<a href="#4">4</a>]. Being closed and centralized, it is not possible to evaluate new defenses
against the ones deployed by OSNs in practice, unless researchers find a way to collaborate with these OSNs. Back
then, there were two main approaches to detect fake accounts. The first is statistical and relies on detecting
anomalies in user behavior using machine learning. The second is mathematical and relies on computing properties
of the social graph, which represents users and their friendships, using combinatorics and probability theory. In
2013, I showed that both approaches are ineffective against socialbots, as their threat model assumes fake
accounts that cannot behave like real users; they must be spammy and relatively isolated from real accounts [<a
href="#5">5</a>]. This investigation resulted in the best paper award at ASONAM '13, and highlighted the
challenge for designing defense systems that consider a more realistic and resourceful adversary who can operate a
social botnet.</p>
<div style="text-align: center;">
<img src="img/integro.png" alt="Integro's high-level design" style="width:340px;" />
<br><span><b>Figure 2.</b> Integro's high-level design</span>
</div>
<p>From 2014 to 2015, after studying the human factors affecting social infiltration [<a href="#6">6</a>], I worked
on building and improving Integro: A defense system that combines both approaches in a novel,
infiltration-resilient user ranking scheme that helps OSNs detect fake accounts [<a href="#7">7</a>,<a
href="#8">8</a>]. Integro predicts victims of fake accounts (i.e., potential victims) using supervised machine
learning, with features extracted from basic account information. As such, it makes no limiting assumptions about
fakes and shifts the focus to real accounts. Integro weights the social graph such that edges incident to
potential victims have lower weights than others. It then computes a rank for each account based on the landing
probability of a short, modified random walk that starts from a known real account. This walk is biased towards
traversing accounts that are reachable through edges with higher weights, which means it is highly unlikely to
land on fake accounts and most real accounts will be ranked higher than fake accounts, even if fakes have
infiltrated many real accounts.</p>
<p>I implemented Integro on top of open-source distributed systems and deployed it at Tuenti, a progressive OSN with
millions of users. A production-class evaluation resulted in detecting nearly 10 times more fake accounts than
existing defense systems. This was done in under 30 minutes for a social graph with 160 million nodes on 33
commodity machines.</p>
<p>It is important to highlight that websites should take the initiative and integrate, or at least evaluate, new
defense systems against theirs, hopefully before innovation is stifled by new, restrictive Web regulations that
are drafted in response to emerging security and privacy concerns. This rings true with the new EU GDPR law put in
place and the recent push in the U.S. to regulate high-tech companies.</p>
<div class="title">The Decentralized Web: The Case for Bitcoin</div>
<p>In 2016, I started exploring how victim prediction can be applied to different Web security problems [<a
href="#9">9</a>]. I quickly realized, however, that this approach gives more legitimacy to collecting user data
by closed, centralized systems. Moreover, operating “big data silos” will eventually be infeasible, as there will
be far more data than a single, centralized system can manage for free.</p>
<p>With this perspective in mind, I revisited the threat of social botnets as an instance of the Sybil attack, which
is a well-studied problem in distributed, decentralized identity-based system, such as Peer-to-Peer (P2P)
networks. In the Sybil attack, an adversary abuses the system by forging multiple, dishonest identities, each
called a Sybil, and joins the system under these identities for malicious objectives. A decentralized, distributed
system that is resilient to the Sybil attack is a good candidate for influencing the design of next-generation Web
systems. This is when I shifted my focus to Bitcoin and its underlying blockchain technology.</p>
<p>Bitcoin is a digital currency that uses a P2P network to move money from one user to another, each represented by
a single node in the network, without the need for intermediaries, such as banks. Transactions are verified by
loosely-organized nodes called miners and recorded in a public, distributed ledger called a blockchain. The
security of the blockchain is established by a chain of cryptographic puzzles that miners compete to solve. Each
miner that successfully solves a crypto puzzle is allowed to record a set of transactions, and to collect a reward
in Bitcoins. The more mining power (i.e., resources) a miner applies, the better are its chances to solve the
puzzle first. This reward structure, also called proof-of-work, provides an incentive for miners to contribute
their resources to the system, and is essential to the currency’s decentralized nature.</p>
<p>The Bitcoin network relies on IP address-based identity. In order to defend against Sybil attackers who attempts
to fill the network with dishonest nodes, an honest node only makes an outbound connection to one IP address per
/16 subnet (i.e., one IP out of a block of 65 thousand IPs). This reduces the probability of an honest node
connecting to peers that are dishonest all the time. To defend against a Sybil attacker who controls miners,
consensus in Bitcoin requires the majority of miners in the network to agree on the state of the blockchain. As it
is financially infeasible for an adversary to control the majority of the mining power in the network, the
security of the blockchain is preserved. This new crypto economy defined by Bitcoin is novel, and is already
fueling new blockchain-based Web systems, such as Filecoin for data storage and Steemit for social networking.</p>
<div style="text-align: center;">
<img src="img/cibr-lab.png" alt="CIBR-Labs's high-level architecture" style="width:620px;" />
<br><span><b>Figure 3.</b> CIBR's high-level architecture</span>
</div>
<p>In 2017, to better understand how decentralized systems like Bitcoin affect the way we use the Web, I started a new
focused effort called <a href="https://cibr.qcri.org" target="_blank">Cybersecurity Initiative for Blockchain Research (CIBR)</a>,
and kicked off the research by looking at ways to link off-chain data (e.g., a Twitter account) to on-chain data
(e.g., a Bitcoin address), focusing on applications related to privacy research and law enforcement. Existing blockchain
analysis systems, such as BlockSci, focus on efficiently analyzing on-chain data, with no or limited support for auxiliary data. To
address this limitation, I am currently building and improving CIBR-Lab: A full-stack search, tagging, classification, and
analysis system for blockchains [<a href="#10">10</a>,<a href="#11">11</a>]. The challenge is to find a way to systematically incorporate auxiliary data into the
blockchain so that a new class of queries, which involve clustering, linking, classifying, and searching on/off-chain data, can
be performed using existing analysis systems. CIBR-Lab addresses this challenge by annotating on-chain data with
tags, which are auxiliary, off-chain data crawled from the Web and other information networks, such as Tor. These
tags are managed separately from the analysis sub-system as annotations, which can be clustered, linked, classified, and
searched through blockchain transactions. To simplify blockchain analysis, CIBR-Lab separates functionality by defining a
component-based, layered system architecture (i.e., a stack), where blockchain, data management, annotation, and
analytics have separate component with well-defined and extendable interfaces between them. This allows CIBR-Lab to provide quick,
partial answers to high-level queries like: "which Twitter accounts made Bitcoin payments to Silk Road."</p>
<p>As a proof of concept, we deployed CIBR-Lab in 2019 to investigate emerging security and privacy issues arising from
analyzing off/on-chain data of popular technologies. In particular, we showed that one can deanonymize Tor hidden
service users [<a href="#12">12</a>], uncover Ponzi schemes on Bitcoin [<a href="#13">13</a>], and identify users
who made donations to open-source projects on GitHub[<a href="#14">14</a>]. My ongoing work with CIBR-Lab has led
to a private presentation to The Office of Technology Research and Investigation at the U.S. Federal Trade Commission,
multi-year, multi-million dollar research grants from Qatar National Research Fund, and an international news
coverage.</p>
<div class="title">Moving Forward: Towards a Private, Secure, and Decentralized Web</div>
<p>For the short term, I plan to continue working on CIBR-Lab, focusing on fraud detection and transaction risk
predictoin. I then plan to deploy CIBR-Lab as an interactive blockchain analytics service, with dashboards displaying
real-time results of important, user-defined queries. Based on feedback from the U.S. Federal Trade Commission
and Qatar Financial Center Regulatory Authority, such capabilities are extremely helpful to protect customers,
comply with Know Your Customer and Anti-Money Laundering laws, and draft new, investor-friendly cryptocurrency
regulations.</p>
<p>For the longer term, I plan to analyze the privacy and security of general-purpose, decentralized Internet
platforms, such as Ethereum, EOS, and IPFS, which are currently used to build next-generation websites in the new
Web, or Web 3.0. As users
should have the freedom to choose where their data is stored and who is allowed to access it, I plan to focus on
achieving true data ownership on the Web. One future direction is to decouple Web data from its applications, with
mapping information residing on a public, permissioned blockchain. As such, a Web browser acts as a decentralized
content explorer that executes user-defined, self-contained, tokenized applications. This also allows both
versions of the Web to co-exist, with current centralized websites acting as both content and application
providers.</p>
<div class="title">References</div>
<ol>
<li id="1">
<p>
<div><span class="pdf"><a href="http://www.nspw.org/2009/proceedings/2010/nspw2010-sun.pdf">A Billion Keys, but
Few Locks: The Crisis of Web Single Sign-on</a></span></div>
<small>
San-Tsai Sun, Yazan Boshmaf, Kirstie Hawkey, and Konstantin Beznosov
<div class=ref><i>Proc. of 2010 New Security Paradigms Workshop</i><br>
NSPW ’10, Colonial Inn, Concord, MA, Dec 2010</div>
</small>
</p>
</li>
<li id="2">
<p>
<div><span class="pdf"><a href="http://lersse-dl.ece.ubc.ca/record/264/files/264.pdf">The Socialbot Network:
When Bots Socialize for Fame and Money</a></span></div>
<small>
Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu
<div class=ref><i>Proc. of 27th Annual Computer Security Applications Conference</i><br>
ACSAC ’11, Orlando, FL, Dec 2011 — <b>outstanding paper award</b></div>
</small>
</p>
</li>
<li id="3">
<p>
<div><span class="pdf"><a href="http://137.82.84.194/record/277/files/277.pdf">Design and Analysis of a Social
Botnet</a></span></div>
<small>
Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu
<div class=ref><i>Elsevier Computer Networks</i><br>
Volume 57 Issue 2, Pages 556-578, Feb 2013</div>
</small>
</p>
</li>
<li id="4">
<p>
<div><span class="pdf"><a href="http://lersse-dl.ece.ubc.ca/record/275/files/275.pdf">Key Challenges in
Defending against Malicious Socialbots</a></span></div>
<small>
Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, Matei Ripeanu
<div class=ref><i>Proc. of 5th USENIX Workshop on Large-Scale Exploits and Emergent Threats</i><br>
LEET ’12, San Jose, CA, April 2012</div>
</small>
</p>
</li>
<li id="5">
<p>
<div><span class="pdf"><a href="http://lersse-dl.ece.ubc.ca/record/286/files/ASONAM_2013.pdf">Graph-based Sybil
Detection in Social and Information Systems</a></span></div>
<small>
Yazan Boshmaf, Konstantin Beznosov, and Matei Ripeanu
<div class=ref><i>Proc. of 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining</i><br>
ASONAM ’13, Niagara Falls, Canada, August 2013 — <b>best paper award</b></div>
</small>
</p>
</li>
<li id="6">
<p>
<div><span class="pdf"><a
href="https://www.usenix.org/system/files/conference/soups2014/soups14-paper-rashtian.pdf">To Befriend or
Not? A Model of Friend Request Acceptance on Facebook</a></span></div>
<small>
Hootan Rashtian, Yazan Boshmaf, Pooya Jaferian, and Konstantin Beznosov
<div class=ref><i>Proc. of 2014 Symposium on Usable Privacy and Security</i><br>
SOUPS ’14, Menlo Park, CA, July 2014</div>
</small>
</p>
</li>
<li id="7">
<p>
<div><span class="pdf"><a href="http://lersse-dl.ece.ubc.ca/record/296/files/NDSS_260_Final.pdf">Leveraging
Victim Prediction for Robust Fake Account Detection in OSNs</a></span></div>
<small>
Yazan Boshmaf, Dionysios Logothetis, Georgos Siganos, Jorge Lería, Jose Lorenzo, Matei Ripeanu, and Konstantin
Beznosov
<div class=ref><i>Proc. of 2015 Network and Distributed System Security Symposium</i><br>
NDSS ’15, San Diego, CA, Feb 2015</div>
</small>
</p>
</li>
<li id="8">
<p>
<div><span class="pdf"><a
href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43862.pdf">Thwarting
Fake OSN Accounts by Predicting their Victims</a></span></div>
<small>
Yazan Boshmaf, Matei Ripeanu, Konstantin Beznosov, and Elizeu Santos-Neto
<div class=ref><i>Proc. 8th ACM Workshop on Artificial Intelligence and Security</i><br>
AI-Sec ’15, Denver, CO, Oct 2015</div>
</small>
</p>
</li>
<li id="9">
<p>
<div><span class="pdf"><a
href="http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45557.pdf">Harvesting
the Low-Hanging Fruits: Defending Against Automated Large-Scale Cyber-Intrusions by Focusing on the
Vulnerable Population</a></span></div>
<small>
Hassan Halawa, Konstantin Beznosov, Yazan Boshmaf, Baris Coskun, Elizeu Santos-Neto, and Matei Ripeanu
<div class=ref><i>Proc. of 2016 New Security Paradigms Workshop</i><br>
NSPW ’16, C Lazy U Ranc, CO, Sep 2016</div>
</small>
</p>
</li>
<li id="10">
<p>
<div><span class="pdf"><a href="https://arxiv.org/pdf/1809.06044">BlockTag: Design and Applications of a Tagging
System for Blockchain Analysis</a></span></div>
<small>
Yazan Boshmaf, Husam Al Jawaheri, and Mashael Al Sabah
<div class=ref><i>Proc. of 34th International Conference on ICT Systems Security and Privacy
Protection</i><br>
IFIP SEC ’19, Lisbon, Portugal, Jun 2019</div>
</small>
</p>
</li>
<li id="11">
<p>
<div><span class="pdf"><a href="https://arxiv.org/pdf/2209.07202.pdf">Dizzy: Large-Scale Crawling and Analysis of Onion Services</a></span></div>
<small>
Yazan Boshmaf, Isuranga Perera, Udesh Kumarasinghe, Sajitha Liyanage, and Husam Al Jawaheri
<div class=ref><i>arXiv preprint</i><br>
arXiv:2209.07202, Sep 2022</div>
</small>
</p>
</li>
<li id="12">
<p>
<div><span class="pdf"><a href="https://arxiv.org/pdf/1801.07501.pdf">Deanonymizing Tor Hidden Service Users
Through Bitcoin Transactions Analysis</a></span></div>
<small>
Husam Al Jawaheri, Mashael Al Sabah, Yazan Boshmaf, and Aiman Erbad
<div class=ref><i>Elsevier Computers & Security</i><br>
Volume 89, Article 101684, Feb 2020</div>
</small>
</p>
</li>
<li id="13">
<p>
<div><span class="pdf"><a href="https://arxiv.org/pdf/1910.12244">Investigating MMM Ponzi Scheme on
Bitcoin</a></span></div>
<small>
Yazan Boshmaf, Charitha Elvitigala, Husam Al Jawaheri, Primal Wijesekera, and Mashael Al Sabah
<div class=ref><i>Proc. of 15th ACM Asia Conference on Computer and Communications Security</i><br>
AsiaCCS ’20, Taipei, Taiwan, Oct 2020</div>
</small>
</p>
</li>
<li id="14">
<p>
<div><span class="pdf"><a href="https://arxiv.org/pdf/1907.04002">Characterizing Bitcoin Donations
to Open Source Software on GitHub</a></span></div>
<small>
Yury Zhauniarovich, Yazan Boshmaf, Husam Al Jawaheri, and Mashael Al Sabah
<div class=ref><i>arXiv preprint</i><br>
arXiv:1907.04002, Jul 2019</div>
</small>
</p>
</li>
</ol>
</p>
</div>
</body>
</html>