# The Geoprivacy Implications of RTT Measurement Data

**Author: Brian Trammell <brian@trammell.ch>**

**Revision of** *7 November 2018*; latest revision is [here](https://nbviewer.jupyter.org/github/britram/trilateration/blob/master/paper.ipynb)

***Abstract***

abstract goes here



# Introduction

*what we look at, and why...*

This paper is derived from the [original trilateration notebook](https://github.com/britram/trilateration/blob/rev1/paper.ipynb) developed for the QUIC RTT Design Team in August 20178, as well as Trammell, B., and Kühlewind, M., ["Revisiting the Privacy Implications of two-way Internet Latency Data"](https://github.com/mami-project/rtt-privacy-paper), in the proceedings of the 2018 Passive and Active Measurement Conference, Berlin, March 2018. It updates the methodology slightly, bases its discussion on a different and newer subset of the RIPE Atlas anchoring measurements.

## The Components of Round-Trip Time


We begin with an examination of the components of end-to-end latency as can be
observed at either endpoint of a transport-layer connection, the sender of an
ICMP Echo Request, or the observer of a TCP flow with full information about
sequence and acknowledgment numbers and timestamps in both directions of a
flow. This observable RTT $RTT_{obs}$ is given by the equation below,
for $f$ hops in one direction and $r$ hops in the opposite direction, where
$D_{prop}$ is propagation delay on a link, $D_{queue}$ is queueing delay at a
forwarding node, $D_{proc}$ is processing delay at a forwarding node,
$D_{stack}$ is stack delay at the remote endpoint (the time it takes for a
packet to make it from the network interface to the application and back,
including acknowledgment delay~\cite{Ding15} when traffic is unidirectional),
and $D_{app}$ is application delay at that endpoint.

\begin{equation}
    \begin{split}
        RTT_{obs} = \sum_{n=0}^f(D_{prop_{n \rightarrow n+1}} + D_{queue_n} + D_{proc_n}) + \\
        \sum_{m=0}^r(D_{prop_{m \rightarrow m+1}} + D_{queue_m} + D_{proc_m}) + \\
         D_{stack} + D_{app}
    \end{split}
\end{equation}

This equation illustrates the confounding effect of end-to-end RTT
measurement, which we will explore in more detail later. Each potential threat
to privacy uses only one component of delay measured in the observable RTT,
but all components are mixed together in a given RTT sample. The challenge in
exploiting this information is then to reduce the irrelevant components to a
known constant. For example, in the geolocation case, the desired RTT would be
(a) perfectly symmetric and (b) made up of only propagation delay (c) in a
straight line between endpoints, which would allow a distance measurement as
in the equation below, where $kc$ is the speed of light in the
Internet, assuming a known and constant factor for refraction in optical fiber
and/or propagation in other physical media. $dist$ is an inequality because
even in an ideal case (c) does not hold: the light path following the great
circle between two points and the light path actually followed by physical
Internet infrastructure differ.

<a id="eq-dist"></a>

\begin{equation}
    dist < kc \frac{\sum_{n=0}^fD_{prop_{n \rightarrow n+1}} + \sum_{m=0}^rD_{prop_{m \rightarrow m+1}}}{2}
\end{equation}
                        


On the flip side, if light distance could be known, and processing and queueing
delay were zero, these terms could be subtracted out from yielding only stack
and application delay, turning RTT observations into load observations as in
the following equation:

\begin{equation}
    load \propto D_{stack} + D_{app}
\end{equation}

The utility of RTT measurements to various geolocation and activity
fingerprinting tasks, then, is directly related to the separability of these
terms. This is the question we address in the rest of this work.


# Methodology

In this work, we focus on the relationship between round-trip-time and distance, in an attempt to characterize and quantify the *geoprivacy risk* associated with a given collection of RTT data; in other words, the probability that said data could be used to produce or refine an estimate of the location of an Internet endpoint with unknown location, and the maximum resolution of that estimate.

In other words, we reconsider the problem of using Internet latency measurements for geolocation, but consider this problem to be the *attacker's* problem. This problem has been well-considered in the literature: *[EDITOR'S NOTE: link citations from the PAM paper here]*.

Our methodology assumes that our input is access to RTT data between an endpoint with a known location, and an endpoint with an unknown location, and our output is an estimate of the location of the unknown endpoint. This would be the case, for example with passively-measured RTT between an access network and a content-delivery network, for example, where the location of the particular CDN endpoint can be assumed to reasonably high resolution given an understanding of how the CDN distributes and assigns names and addresses to its front-end servers.

The most useful approach to geolocation from latency measurements is *exclusion*: given a measurements from a known point toward an unknown point, the unknown point cannot be outside a sphere of radius $r$ equal to the distance given by the [second equation](#eq-dist) above. Taking a series of RTT measurements, and taking the minimum of these to minimize $D_{queue}$ and $D_{proc}$ in each direction, gives 

\begin{equation}
    r < kc\frac{min(RTT_n)}{2}
\end{equation}

(where $\frac{RTT}{2}$ represents the maximum one-way delay a given RTT can represent).

By intersecting the spheres around multiple known points, then intersecting that intersected volume with the surface of the Earth, where most endpoints are presumed to be, the region within which the unknown endpoint can be narrowed down.

To validate the utility of RTT data for this methodology, we need a source of RTT data covering a relatively diverse set of Internet paths where the locations of both endpoints are known with some precision; we turn to [RIPE Atlas](https://atlas.ripe.net) for this data.

## Data Sources



RIPE Atlas periodically takes *anchoring measurements...*

In [2]:
# Add code here to retrieve or prepare data from source, if appropriate.
# Alternately, data preparation can be done in a separate notebook, linked from here.

# Analysis and Findings

data analysis results, plots, tables, etc go here.

In [3]:
# code for plots and tables etc. goes here

# Conclusions

conclusions go here