In [1]:
import pandas as pd
import folium
from folium.plugins import MarkerCluster

We aim to evaluate the privacy risks associated with the user trajectories that can be extracted from pois.csv and queries.csv.
We assume an adversary that is honest but curious. They have access to query metadata (as in queries.csv), namely IP address, location, timestamp and POI type as well as the locations that are returned by the service for each POI type (as in pois.csv). We maintain the handout's assumption that each IP address corresponds to a unique user. We also assume that the adversary can link an IP address to the real-life person that they target. Their goal is to use this data to breach the privacy of specific users and find out about user sensitive locations (work / home) and interests.

We propose three attacks:
- We first show how this data allows us to map a user's movement over a certain period of time (in this case twenty days).
- Then, we show, using the same tool as previously, that this allows us to discover sensitive locations (home / work) by filtering data (where is the user most often on the weekend or after work hours during the week etc.)
- Finally, we use query.py to learn more about the user's interests and places they might frequent.

# Mapping User Movement
This attack relies on the query metadata. We use the ip_address field to isolate a user's data. We then place each query's location on a map along with the processed timestamp which, considering 05/05/2025 at 00:00 as the simulation start time, shows the date and time of day.
To make the map more readable, we include a start and end parameters to only visualize a subset of data. In this case, we might only want to see what a week looks like for this user as it typically is a cycle that repeats for most people.

In [17]:
def make_map(start, end, ip):
    df = pd.read_csv(
        "queries.csv",
        sep='\s+',
        header=0,
        names=["ip_address", "lat", "lon", "timestamp", "poi_type_query"]
    )
    df = df[df["ip_address"] == ip]
    sim_start = pd.to_datetime("2025-05-05 00:00")
    df["datetime"] = sim_start + pd.to_timedelta(df["timestamp"], unit="h")

    start_time = sim_start + pd.Timedelta(days=start - 1)
    end_time = sim_start + pd.Timedelta(days=end)
    df = df[(df["datetime"] >= start_time) & (df["datetime"] < end_time)]

    def ordinal(n):
        if 10 <= n % 100 <= 20:
            suffix = "th"
        else:
            suffix = {1: "st", 2: "nd", 3: "rd"}.get(n % 10, "th")
        return f"{n}{suffix}"

    df["day_ordinal"] = df["datetime"].dt.day.apply(ordinal)
    df["label"] = (
        df["datetime"].dt.strftime("%A ") +
        df["day_ordinal"] +
        df["datetime"].dt.strftime(" %H:%M")
    )

    df.head()
    center = [df['lat'].mean(), df['lon'].mean()]
    m = folium.Map(location=center, zoom_start=13)
    cluster = MarkerCluster().add_to(m)

    for _, row in df.sort_values("datetime").iterrows():
        popup = folium.Popup(
            html=(
                f"<b>Time:</b> {row['label']}<br>"
                f"<b>Query:</b> {row['poi_type_query']}"
            ),
            max_width=200
        )
        folium.Marker(
            location=[row["lat"], row["lon"]],
            popup=popup,
            icon=folium.Icon(icon="info-sign")
        ).add_to(cluster)

        coords = df.sort_values("datetime")[["lat","lon"]].values.tolist()
    folium.PolyLine(
        locations=coords,
        weight=3,
        opacity=0.7
    ).add_to(m)
    return m
make_map(1, 1, "146.71.112.211")

It is possible to extract different types of information depending on the selected dates. Let's assume we are following the user with IP address 146.71.112.211. We can change the start and end parameters of the python script in order to map the user's movement over a day or multiple days and know where they were at very specific times and dates. For example, here we selected day 1 which is a monday and we can see that the user was in the Renens Gare area around midday and in Prilly around the night.

This same mapping script allows us to guess users' sensitive locations. Let's view the user's locations over a week:

In [18]:
make_map(1, 7, "146.71.112.211")

We can see that a lot of their requests still emerge from around the same two locations around Renens Gare and Prilly. We know that the two places where a person spends most of their time during a week are home and work, typically work during the day and home at night. We also usually research for places to eat for lunch while we are at work while we are usually at home during the evenings, perhaps researching places to unwind such as bars, and in the mornings on the weekend. This matches perfectly with our user's data as they often send queries for "restaurant" and "cafeteria" from 11 am to 1 pm from the location near Renens Gare, specifically at Av. du Tir-Fédéral 15, 1024 Ecublens or perhaps another building nearby. We can thus safely assume that this would be the location where they work. A quick look on Google Maps shows us that a couple of companies where this person could be working such as Holinger SA, Unimed SA or a kindergarten. We also notice that the user sends a lot of queries from the Prilly area during the week evenings or early mornings on the weekends more specifically from Chem. des Charmilles 10, 1004 Lausanne which, just from looking at the map followed by a quick sanity check on Google Map, looks like a residential area which is most likely where the user lives.



# User Interests

