Code to retrieve data from the opensky database

See example from the documentation: https://traffic-viz.github.io/gallery/kaliningrad.html

Note: there's no need to store username and password in a config file - the jupyter output will feature a link for trino authentication in a separate browser window

Installing the traffic python package if necessary


In [66]:
!pip install traffic





Include libraries

In [67]:
from datetime import datetime, timedelta
import pandas as pd
#from cartes.crs import Lambert93  # type: ignore

Always use python engine for dataframe queries, see
https://stackoverflow.com/questions/67063643/is-there-a-way-to-force-pandas-dataframe-query-to-use-python-as-default-engine

In [68]:
from functools import partialmethod

pd.DataFrame.query = partialmethod(pd.DataFrame.query, engine="python")



... and the traffic library used to hadle the opensky queries

In [69]:
import traffic
traffic.config_file

PosixPath('/root/.config/traffic/traffic.conf')

In [70]:
from traffic.data import opensky
#from traffic.core import Flight
#from traffic.data import eurofirs


Define list of airports we are interested in: based on https://de.wikipedia.org/wiki/Liste_der_Verkehrsflugh%C3%A4fen_in_Deutschland

In [78]:
icaolist=["EDBH", "EDDB", "EDVE", "EDDW", "ETMN", "EDLW", "EDDC", "EDDL", "EDDE", "EDDF", "EDFH", "EDNY", "EDDH", "EDDV", "EDAH", "ETSI", "EDSB", "EDVK",
           "EDDK", "EDTL", "EDDP", "EDHL", "EDBC", "EDJA", "EDDM", "EDDG", "EDBN", "EDLV", "EDDN", "EDMO", "EDLP", "ETNL", "EDDR", "EDGS", "EDDS", "EDXW"]
# note: it was necessary to split the list in two parts, having one giant list was not supported by the database query
print(icaolist)

['EDBH', 'EDDB', 'EDVE', 'EDDW', 'ETMN', 'EDLW', 'EDDC', 'EDDL', 'EDDE', 'EDDF', 'EDFH', 'EDNY', 'EDDH', 'EDDV', 'EDAH', 'ETSI', 'EDSB', 'EDVK', 'EDDK', 'EDTL', 'EDDP', 'EDHL', 'EDBC', 'EDJA', 'EDDM', 'EDDG', 'EDBN', 'EDLV', 'EDDN', 'EDMO', 'EDLP', 'ETNL', 'EDDR', 'EDGS', 'EDDS', 'EDXW']


First test: let's get the flightlist of all flights starting at any airport from the list

 The syntax is:
 OpenSky.flightlist(start, stop=None, *args, departure_airport=None, arrival_airport=None, airport=None, callsign=None, icao24=None, cached=True, compress=False, limit=None, **kwargs)

In [72]:
# time range to retrieve
startdate="2019-12-01"
stopdate="2019-12-31"

In [81]:
pdDeparture = opensky.flightlist(
    startdate,
    stopdate,
    departure_airport=icaolist,
)


In [82]:
print(f'retrieved {len(pdDeparture)} flights departing at any of the selected airports')


retrieved 4189 flights departing at any of the selected airports


In [85]:
pdArrival = opensky.flightlist(
    startdate,
    stopdate,
    arrival_airport=icaolist,
)


In [86]:
print(f'retrieved {len(pdArrival)} flights arriving at any of the selected airports')

retrieved 4506 flights arriving at any of the selected airports


Now combine the two, while dropping duplicate rows (i.e. flights start went from one airport to another on the list should only be listed once).

Output the (beginning and the )end of the retrieved dataframe

In [87]:
pdAll = pd.concat([pdDeparture, pdArrival]).drop_duplicates().reset_index(drop=True)

In [88]:
print(f'retrieved {len(pdAll)} flights departing or arriving at any of the selected airports')

retrieved 7760 flights departing or arriving at any of the selected airports


Just a quick look how the information inside the data frame looks like

In [89]:
#print(t.head())

pdAll.tail()

Unnamed: 0,icao24,firstseen,departure,lastseen,arrival,callsign,day
7755,3c48f0,2019-12-01 20:12:00+00:00,LFPG,2019-12-01 21:05:35+00:00,EDDL,EWG7XK,2019-12-01 00:00:00+00:00
7756,3c48f0,2019-12-01 16:22:02+00:00,EDDT,2019-12-01 17:14:29+00:00,EDDL,EWG4SF,2019-12-01 00:00:00+00:00
7757,3c4b33,2019-12-01 06:11:20+00:00,,2019-12-01 14:28:57+00:00,EDDF,DLH499,2019-12-01 00:00:00+00:00
7758,3c4aab,2019-12-01 04:26:30+00:00,,2019-12-01 06:25:47+00:00,EDDF,CFG103,2019-12-01 00:00:00+00:00
7759,4ba9a9,2019-12-01 06:35:34+00:00,,2019-12-01 09:19:29+00:00,EDDV,THY7ZS,2019-12-01 00:00:00+00:00


Mount google drive to save the resulting csv file

In [91]:
from google.colab import drive
drive.mount('/content/drive')

import os

# where to save the data (and making shure that the directory exist)
folder_path = "/content/drive/My Drive/2024KomplexeNetze/Daten"
if not os.path.exists(folder_path):
    os.makedirs(folder_path)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Last step: save the retrieved data into csv, with a name defined by start and stop date.

In [93]:
outputfile=startdate+"_"+stopdate+".csv"
file_path = os.path.join(folder_path, outputfile)
pdAll.to_csv(file_path, index=False)  # ((saving without the DataFrame index))

In [94]:
print(f'Output file has been generated in {file_path}')

Output file has been generated in /content/drive/My Drive/2024KomplexeNetze/Daten/2019-12-01_2019-12-03.csv


--------------------------------------
Note: if we want, we could also get some airport info from provided by traffic, like geo coordinates, city, name.

In [95]:
from traffic.data import airports
# the command below is also getting Hamburg, USA, and hospital helipads in HH, ...
#airports.search("Hamburg")


In [96]:
airports.query('icao=="EDDB" and country=="Germany"')

Unnamed: 0,name,iata,icao,latitude,longitude,country,altitude,type,municipality
22735,Berlin Brandenburg Airport,BER,EDDB,52.362247,13.500672,Germany,157.0,large_airport,Berlin


I thought about adding this information to the output file - but that would just increase the file size a lot, we can just add that info at the end of the analysis for making nice plots, and work with icao until then.

iata==iata seems to be the best way to filter on all airports that don't have None in that column.

In [None]:
airports.query('country=="Germany" and iata==iata')

Unnamed: 0,name,iata,icao,latitude,longitude,country,altitude,type,municipality
22659,Leipzig–Altenburg Airport,AOC,EDAC,50.981945,12.506389,Germany,640.0,medium_airport,Nobitz
22663,Heringsdorf Airport,HDF,EDAH,53.878700,14.152300,Germany,93.0,medium_airport,Zirchow
22676,Riesa-Göhlis Airport,IES,EDAU,51.293610,13.356111,Germany,322.0,small_airport,Riesa
22679,Rechlin-Lärz Airport,REB,EDAX,53.306389,12.752222,Germany,220.0,small_airport,Lärz
22683,Cochstedt Airport,CSO,EDBC,51.856400,11.420300,Germany,594.0,small_airport,Hecklingen
...,...,...,...,...,...,...,...,...,...
24853,Geilenkirchen Air Base,GKE,ETNG,50.960800,6.042420,Germany,296.0,medium_airport,Geilenkirchen
24856,Rostock-Laage Airport,RLG,ETNL,53.918201,12.278300,Germany,138.0,medium_airport,Laage
24859,Schleswig Air Base,WBG,ETNS,54.459301,9.516330,Germany,70.0,medium_airport,Jagel
24865,Wiesbaden Army Airfield,WIE,ETOU,50.049801,8.325400,Germany,461.0,medium_airport,Wiesbaden


Still not sure which of these gives me the best "airport" lists for commercial flights - maybe I'll have to plug in my own list / from another source.

Any query that spits out Schleswig Jagel is not exclusive enough...


Another idea: check the published Covid-19 dataset and see which G
erman airports are in there?