# Tracking One Train

The challenge here is that we need to be able to track sbahns along the path they follow using only information about when they are planning on departing the next station. There is no unique id for each train trip, instead we have to:
- identify the trip by the first station it starts at
- match the next stop based on the route and planned departure

### Questions about the Data Model

- lines have start and end, but that isn't always the same as destination

#### Lines

This data in general we created by hand by david, so there may be some bugs or idiosyncracies in the data

line:

 - name of the line and some variation 
 - only provides distinction for the physical forks in the track, not including the variatons sbahns have

station_id

 - a station id, however these aren't unique to a line

start & end

 - these should represent something like the start and end of the lines. I guess this can help make the distinction between the variations to
 
order
 - the order of the line
 
#### Stations
 
station_id:
 
 - the station id this is unique in the table

station_name:

 - normal name of station unique in table 
 
 
#### Departures
 
station

- station id from which the sbahn will depart

departure_id

- departure id is a unique id for the departure event, should be unique globally

departure_time

- datetime for planned departure

product

- which sbahn line, however the variation is not defined

destination

- normal name of destination, this can be different than the end of line

delay 

- amount of delay if present

passed_before
platform
canceled
created_at
updated_at


## Imports 

In [3]:
import os
from uuid import uuid4
# settings.py
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine


## DB Connection 

In [4]:
engine = create_engine('postgresql://{username}:{password}@167.99.243.10/sbahn'.format(username=os.getenv('POSTGRE_USERNAME'), password=os.getenv('POSTGRE_PASSWORD')))

## Queries

### Queries for Stations

In [5]:
station_names = "select distinct(station_name) \
FROM departures \
JOIN stations on station_id = station \
WHERE destination = 'Flughafen München' \
AND product = 'SBAHN:S1'; "

In [6]:
station_names_df = pd.read_sql(station_names, engine)

In [7]:
station_names_df.sort_values(by='station_name')

Unnamed: 0,station_name
6,Donnersbergerbrücke
19,Eching
15,Fasanerie
10,Feldmoching
8,Flughafen Besucherpark
18,Hackerbrücke
3,Hauptbahnhof
11,Hirschgarten
0,Isartor
12,Karlsplatz


In [8]:
stops = {'station_name':
    [
        'Leuchtenbergring',
        'Ostbahnhof München',
        'Rosenheimer Platz',
        'Isartor',
        'Marienplatz',
        'Karlsplatz',
        'Hauptbahnhof',
        'Hackerbrücke',
        'Donnersbergerbrücke',
        'Hirschgarten',
        'Laim',
        'Moosach',
        'Fasanerie',
        'Feldmoching',
        'Oberschleißheim',
        'Unterschleißheim',
        'Lohhof',
        'Eching',
        'Neufahrn',
        'Flughafen Besucherpark',
    ]
}

In [9]:
ordered_stops = pd.DataFrame(stops).reset_index()
ordered_stops.columns = ['stop_index', 'station_name']

In [10]:
ordered_stops

Unnamed: 0,stop_index,station_name
0,0,Leuchtenbergring
1,1,Ostbahnhof München
2,2,Rosenheimer Platz
3,3,Isartor
4,4,Marienplatz
5,5,Karlsplatz
6,6,Hauptbahnhof
7,7,Hackerbrücke
8,8,Donnersbergerbrücke
9,9,Hirschgarten


### Getting Stops for line

In [13]:
stops_query = """
SELECT * 
FROM lines AS l
WHERE SUBSTRING(l.line,1,2) = 'S1'
AND (l.start = 'Flughafen München' OR l.end = 'Flughafen München');
"""

In [14]:
ordered_stops = pd.read_sql(stops_query, engine)

In [15]:
ordered_stops

Unnamed: 0,line,station_id,start,end,order
0,S1a,de:09162:900,Flughafen München,Leuchtenbergring,21
1,S1a,de:09162:1,Flughafen München,Leuchtenbergring,16
2,S1a,de:09162:2,Flughafen München,Leuchtenbergring,17
3,S1a,de:09162:3,Flughafen München,Leuchtenbergring,18
4,S1a,de:09162:300,Flughafen München,Leuchtenbergring,10
5,S1a,de:09162:31,Flughafen München,Leuchtenbergring,12
6,S1a,de:09162:310,Flughafen München,Leuchtenbergring,9
7,S1a,de:09162:320,Flughafen München,Leuchtenbergring,8
8,S1a,de:09162:4,Flughafen München,Leuchtenbergring,19
9,S1a,de:09162:5,Flughafen München,Leuchtenbergring,20


### Queries for Stops along S1 -> Flughafen

Here we are focusing on one variation of the route taken by the S1 Sbahn.

In [8]:
all_stops = "\
select departure_id, product, destination, station_name, departure_time \
from departures \
join stations on station_id = station \
where departure_time::timestamp > timestamp '2020-12-07 04:00:00'\
AND departure_time::timestamp < timestamp '2020-12-08 08:00:00' \
AND destination = 'Flughafen München' \
AND product = 'SBAHN:S1' \
ORDER BY departure_time asc \
"

In [9]:
all_stops_df = pd.read_sql(all_stops, engine)

In [10]:
all_stops_df

Unnamed: 0,departure_id,product,destination,station_name,departure_time
0,c2d3f8f623aa3bc066085d9f279b42b1#1607313720000...,SBAHN:S1,Flughafen München,Karlsplatz,2020-12-07 04:02:00
1,1e9c4149d58f67e0f8aaaac8f212accc#1607313780000...,SBAHN:S1,Flughafen München,Hauptbahnhof,2020-12-07 04:03:00
2,1fa5aeac2ab6370704db0d59e7b22ae3#1607313840000...,SBAHN:S1,Flughafen München,Flughafen Besucherpark,2020-12-07 04:04:00
3,4d7eef504808df3521b9dae4bba5a7f2#1607313900000...,SBAHN:S1,Flughafen München,Hackerbrücke,2020-12-07 04:05:00
4,1c399d918d0280c432c33aa4afbafba7#1607314020000...,SBAHN:S1,Flughafen München,Donnersbergerbrücke,2020-12-07 04:07:00
...,...,...,...,...,...
1082,2adbe22978a66f15cdb8b20eafd1f784#1607413920000...,SBAHN:S1,Flughafen München,Eching,2020-12-08 07:52:00
1083,2fb5f6af00bac3e3bca051239c7251b1#1607414100000...,SBAHN:S1,Flughafen München,Ostbahnhof München,2020-12-08 07:55:00
1084,0320be1ebbadb5c3e3ed09bd4524bc4f#1607414160000...,SBAHN:S1,Flughafen München,Moosach,2020-12-08 07:56:00
1085,5f7de375ad5bdec28506341d8a1efff7#1607414220000...,SBAHN:S1,Flughafen München,Rosenheimer Platz,2020-12-08 07:57:00


## Joining Stops with Stop Index


The query below is actually imperfect because we could potentially match on stations which are are traveling on a variation of the line but going to the same start/end example leuchtenbergering.

What is also problematic currently is that the data models only represents a train line in a single direction. Here we match destination on the end. Meaning we're only going to query for trains going to our arbitrarily defined end station. 

To get the proper order when we match on `lines.start` we could find the per line `lines.order` max, and then take the difference to recalculate the proper order.

In [105]:
all_stops_and_indexes = """
select d.departure_id, s.line, d.destination, s.station_id, s.station_name, d.departure_time, s.start, s.end,  s.order, d.delay
from departures as d, (
    select station_name, l.station_id, l.line, l.start, l.end, l.order
    from lines as l, stations as s
    where s.station_id = l.station_id) as s
where d.station = s.station_id
AND SUBSTRING(d.product, 7,2) = SUBSTRING(s.line, 1,2)
AND d.destination = s.end
;
"""

In [106]:
#all_stops_df = pd.merge(all_stops_df, ordered_stops, on='station_name', how='left')

In [107]:
all_stops_df = pd.read_sql(all_stops_and_indexes, engine)

In [108]:
all_stops_df.shape

(34513, 10)

## Reversing Orders


## Creating Train Trips

### Create Unique IDs

For each trip that starts at stop 0, this is obviously a unique train. Start from this point we'll try and work forward iteratively to detect the next train.

In [118]:
all_stops_df['train_id'] = all_stops_df.apply(lambda x: uuid4() if x.order == 1 else None, axis=1)

In [119]:
all_stops_df.train_id.isnull().sum()


33512

### Implementing Logic To Find Next Stop 

In [120]:
def find_next_stop(data, current_index, timestamp):
    candidates = data.copy()
    candidates = candidates[candidates.order.eq(current_index + 1)]
    candidates = candidates[candidates.departure_time.gt(timestamp)]
    candidates = candidates[candidates.train_id.isnull()]
    candidates['time_delta'] = candidates.departure_time - timestamp
    candidates = candidates.sort_values(by='time_delta')
    rows, cols = candidates.shape
    min_time_delta = candidates.time_delta.min()
    if rows == 0:
        print('none found')
        return None
    elif min_time_delta > pd.Timedelta(25, unit='minutes'):
        print('Time delta to big')
        return None
    elif min_time_delta < pd.Timedelta(0, unit='minutes'):
        print('Time delta too small')
        return None
    else:
        return candidates.iloc[0]['departure_id']

Now we just need to loop through each index of the stops in order and try and find the corresponding next stop.

In [121]:
lines_destinations = all_stops_df[['line', 'destination']]
unique_lines_destionations = lines_destinations.drop_duplicates()

In [122]:
# We want to loop over each line and their stops in the correct order and then just 
# assume the nearest candidate in time and physical stop is the right match
for index, row in unique_lines_destionations.iterrows():
    print(row.line, row.index)
    correct_destination = all_stops_df.destination.eq(row.destination)
    correct_line = all_stops_df.line.eq(row.line)

    stop_indices = all_stops_df[correct_destination & correct_line].order.unique()
    stop_indices.sort()

    # We skip the last station since it's missing from the departures anyways
    for stop in stop_indices[0:len(stop_indices)-1]:
        print(stop)
        id_not_null = ~all_stops_df.train_id.isnull()
        current_stops = all_stops_df[id_not_null & correct_destination & correct_line & all_stops_df.order.eq(stop)]
        next_stops = all_stops_df[correct_destination & correct_line & all_stops_df.order.eq(stop+1)]

        for index, row in current_stops.iterrows():
            next_stops_departure_id = find_next_stop(next_stops, stop, row.departure_time)

            if next_stops_departure_id is not None:
                all_stops_df.loc[all_stops_df.departure_id.eq(next_stops_departure_id), 'train_id'] = row.train_id    

S2b Index(['line', 'destination'], dtype='object')
1
none found
none found
none found
none found
none found
none found
none found
none found
none found
none found
2
3
4
none found
none found
none found
none found
none found
none found
5
6
7
8
9
none found
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
S2a Index(['line', 'destination'], dtype='object')
1
none found
none found
none found
none found
none found
none found
2
none found
3
4
5
6
7
8
none found
none found
none found
none found
none found
none found
none found
none found
none found
9
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
none found
none found
none found
none found
10
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta 

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
none found
14
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to 

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
none found
23
Time delta to big
Time delta to big
Time delta to big
Time delta to 

Time delta to big
Time delta to big
Time delta to big
Time delta to big
27
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time de

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
S7 Index(['line', 'destination'], dtype='object')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
S1b Index(['line', 'destination'], dtype='object')
1
2
3
4
5
6
7
8
9
none found
none found
10
11
12
13
none found
14
15
16
17
18
19
S1a Index(['line', 'destination'], dtype=

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
6
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time del

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
none found
none found
none found
none found
none found
10
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time d

Time delta to big
Time delta to big
none found
14
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to 

Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
Time delta to big
19
Time delta to big
Time de

In [123]:
all_stops_df[~all_stops_df.train_id.isnull()].order.max()

36

In [124]:
all_stops_df.train_id.count()

30161