# Tracking wihout space and time attributes

This notebook showcases an example where a personal database does not contain time or space attributes. Still that data may be critical when combined with other data sources that have spatial and/or temporal attributes.

Copyright Jens Dittrich & Christian Schön, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

Define the schema and load the data of **Mr. M's** personal database:

In [1]:
-- personal sneakers "database" (in fact just one tiny relation) of Mr. M:
PRAGMA foreign_keys = OFF;

DROP TABLE IF EXISTS sneakers;

PRAGMA foreign_keys = ON;

CREATE TABLE sneakers (
    id INTEGER PRIMARY KEY,
    label TEXT,
    size FLOAT
);



In [2]:
-- load csv data into tables:
-- enable csv mode:
.mode csv

-- import the necessary files:
.import data/sneakers/MrM_sneakers_no_header.csv sneakers

-- enable pretty formatting:
.mode columns
.headers on



In [3]:
SELECT * FROM sneakers;

id          label       size      
----------  ----------  ----------
4           Flyer       41.0      
6           Jumper      44.0      
7           Runner      40.0      
9           Flyer       43.0      
10          Walker      39.0      
12          Jumper      41.0      


Define the schema and load the data of the **shop's** private database:

In [4]:
-- database available at a shop:
-- just a subset of a possibly larger schema to make the point
PRAGMA foreign_keys = OFF;

DROP TABLE IF EXISTS shoes;
DROP TABLE IF EXISTS purchases;

PRAGMA foreign_keys = ON;

CREATE TABLE shoes (
    id INTEGER PRIMARY KEY,
    label TEXT,
    size FLOAT
);

CREATE TABLE purchases (
    shoes_id INTEGER,
    amount INTEGER,
    date TEXT,
    FOREIGN KEY(shoes_id) REFERENCES shoes(id)
);



In [5]:
-- load csv data into tables:
-- enable csv mode:
.mode csv

-- import the necessary files:
.import data/sneakers/shop_shoes_no_header.csv shoes
.import data/sneakers/shop_purchases_no_header.csv purchases

-- enable pretty formatting:
.mode columns
.headers on



In [6]:
SELECT * FROM shoes;

id          label       size      
----------  ----------  ----------
1           Runner      43.0      
2           Walker      42.0      
3           Flyer       46.0      
4           Flyer       41.0      
5           Walker      44.0      
6           Jumper      44.0      
7           Jumper      43.0      
8           Jumper      41.0      


In [7]:
SELECT * FROM purchases;

shoes_id    amount      date               
----------  ----------  -------------------
2           1           2019-04-01 09:03:29
3           1           2019-04-02 10:03:29
8           1           2019-04-27 12:01:19


In [8]:
-- show shoe purchases with timestamps:
DROP VIEW IF EXISTS ShoePurchases;

CREATE VIEW ShoePurchases as
SELECT label, size, amount, date
FROM shoes, purchases
WHERE shoes.id = purchases.shoes_id;

SELECT *
FROM ShoePurchases;

label       size        amount      date               
----------  ----------  ----------  -------------------
Walker      42.0        1           2019-04-01 09:03:29
Flyer       46.0        1           2019-04-02 10:03:29
Jumper      41.0        1           2019-04-27 12:01:19


In [9]:
-- Which shoes exist both in Mr. M's and
-- the shop's database in the same sizes?
DROP VIEW IF EXISTS SameShoes;

CREATE VIEW SameShoes as
SELECT label, size
FROM sneakers
INTERSECT
SELECT label, size
FROM shoes;

SELECT *
FROM SameShoes;

label       size      
----------  ----------
Flyer       41.0      
Jumper      41.0      
Jumper      44.0      


In [10]:
-- Does Mr. M own a shoe that was acquired on April 27?
DROP VIEW IF EXISTS MrMOwnsPurchasedShoe;

CREATE VIEW MrMOwnsPurchasedShoe as
SELECT ShoePurchases.label, ShoePurchases.size, date
FROM ShoePurchases, SameShoes
WHERE ShoePurchases.label = SameShoes.label
    AND ShoePurchases.size = SameShoes.size;

SELECT *
FROM MrMOwnsPurchasedShoe;

label       size        date               
----------  ----------  -------------------
Jumper      41.0        2019-04-27 12:01:19


From this we conclude that Mr. M He owns a shoe of a brand and size that was was purchased at that store on 2019-04-27 12:01:19. 
This **does not** imply that we are talking about the same physical shoe. Nor does it imply that Mr. M bought that shoe on that day. He might have acquired it elsewhere...

Notice that no banking card information is associated here (then he/she would be easy to track anayways). Let's assume that the shop was paid with cash.

**What if?**

1. That shoe is a rare item that was only sold once in a couple of days

2. We get access to a slightly older than April 27th copy of Mr. M's database that does not contain that shoe?

3. We systematically acquire other data sources from that shop and from that day that show activities of Mr. M, e.g. other purchases, camera video surveillance footage, audio recordings, etc. The search space for all of this got dramatically smaller due to our analysis above. 

All this information is circumstantial evidence (Deutsch: Indizien) but no proof.
