# Traffic and Airprox Correlations
> Author: A.Pilko@soton.ac.uk

2019 Air traffic data and 2000-2021 airprox data is used to investigate correlations in the datasets.

## Hypotheses:
- Airprox locations will have less ordered traffic flow, concretely the variance of traffic direction will positively correlate with airprox locations
- Airprox locations will positively correlate with traffic density
- Airprox locations will positively correlate with mean traffic flow speed
- Airprox locations will positively correlate with the variance of the flow speed


Import required libraries and pre-cleaned data

In [1]:
import geopandas as gpd
import pandas as pd
import seaborn as sns
import traffic
import numpy as np
import pyproj
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
import joblib as jl

from cartopy.crs import Projection
from traffic.drawing import countries, lakes, ocean
from traffic.data import airports

% matplotlib notebook

  from tqdm.autonotebook import tqdm


In [7]:
airprox_gdf = gpd.GeoDataFrame(pd.read_pickle('../data/airprox_asp_2000_2021.pkl'))
tfc_clean = traffic.core.Traffic.from_file('../data/cornwall/cornwall_tfc_clean_30s_lt3000ft_2019_f16.pkl.bz2')
# tfc_clean = traffic.core.Traffic.from_file('../data/southeng_tfc_clean_lt5000ft_2019.pkl.bz2')

In [4]:
tfc_clean_data = pd.read_pickle('../data/cornwall/cornwall_tfc_clean_30s_lt3000ft_2019_f16.pkl.bz2')

In [27]:
tfc_clean.data.describe()

Unnamed: 0,altitude,groundspeed,latitude,longitude,track,vertical_rate
count,111733.0,111733.0,111733.0,111733.0,111733.0,111733.0
mean,,,,,,
std,,,0.0,0.0,,
min,4.167969,0.0,49.78125,-6.378906,0.0,-5376.0
25%,1250.0,100.0,50.1875,-5.535156,91.1875,-448.0
50%,1800.0,125.0,50.34375,-5.136719,230.875,-64.0
75%,2700.0,149.0,50.40625,-4.976562,260.0,64.0
max,4000.0,498.0,50.8125,-4.648438,360.0,4928.0


## Airspace

There isn't much point analysing the traffic patterns for controlled airspace where ATC are issuing instructions or aircraft are (usually) following standard routes (SIDs, STARs). The UK airspace is used to filter out the traffic state vectors that are located in controlled airspace. All the traffic that is only in uncontrolled airspace is then used for the actual analysis.

In [17]:
import requests

req = requests.get('https://storage.googleapis.com/29f98e10-a489-4c82-ae5e-489dbcd4912f/gb_asp.geojson')
with open('gb_asp.geojson', 'w') as f:
    f.write(req.text)

In [21]:
ASP_TYPES = {
    0: "Other",
    1: "Restricted",
    2: "Danger",
    3: "Prohibited",
    4: "Controlled Tower Region (CTR)",
    5: "Transponder Mandatory Zone (TMZ)",
    6: "Radio Mandatory Zone (RMZ)",
    7: "Terminal Maneuvering Area (TMA)",
    8: "Temporary Reserved Area (TRA)",
    9: "Temporary Segregated Area (TSA)",
    10: "Flight Information Region (FIR)",
    11: "Upper Flight Information Region (UIR)",
    12: "Air Defense Identification Zone (ADIZ)",
    13: "Airport Traffic Zone (ATZ)",
    14: "Military Airport Traffic Zone (MATZ)",
    15: "Airway",
    16: "Military Training Route (MTR)",
    17: "Alert Area",
    18: "Warning Area",
    19: "Protected Area",
    20: "Helicopter Traffic Zone (HTZ)",
    21: "Gliding Sector",
    22: "Transponder Setting (TRP)",
    23: "Traffic Information Zone (TIZ)",
    24: "Traffic Information Area (TIA)",
    25: "Military Training Area (MTA)",
    26: "Controlled Area (CTA)",
    27: "ACC Sector (ACC)",
    28: "Aerial Sporting Or Recreational Activity",
    29: "Low Altitude Overflight Restriction"
}

ASP_CLASS = {
    0: "A",
    1: "B",
    2: "C",
    3: "D",
    4: "E",
    5: "F",
    6: "G",
    7: "Special Use Airspace (SUA)",
    8: "Unclassified"
}

ASP_ACTIVITIES = {
    0: "None - No specific activity (default)",
    1: "Parachuting Activity",
    2: "Aerobatics Activity",
    3: "Aeroclub And Arial Work Area",
    4: "Ultra Light Machine (ULM) Activity",
    5: "Hang Gliding/Paragliding"
}

ASP_ALT_UNIT = {
    0: "Meter",
    1: "Feet",
    6: "Flight Level",
}

ASP_ALT_DATUM = {
    0: "GND",
    1: "MSL",
    2: "STD",
}

In [61]:

asp_gdf = gpd.read_file('gb_asp.geojson')
asp_gdf = asp_gdf[(asp_gdf['approved'] == True) & (asp_gdf['onDemand'] == False) & (asp_gdf['onRequest'] == False) & (
        asp_gdf['byNotam'] == False) & (asp_gdf['specialAgreement'] == False)]
asp_gdf = asp_gdf.cx[
          tfc_clean.data.longitude.min():tfc_clean.data.longitude.max(),
          tfc_clean.data.latitude.min(): tfc_clean.data.latitude.max()
          ].reset_index()
asp_upper_lims = pd.DataFrame(pd.json_normalize(asp_gdf.upperLimit))
asp_lower_lims = pd.DataFrame(pd.json_normalize(asp_gdf.lowerLimit))
asp_upper_lims.columns = ['upperLimit_value', 'upperLimit_unit', 'upperLimit_ref']
asp_lower_lims.columns = ['lowerLimit_value', 'lowerLimit_unit', 'lowerLimit_ref']
asp_lim_df = pd.concat([asp_lower_lims, asp_upper_lims], axis=1)
pd.concat([asp_gdf, asp_lim_df], axis=1)

# asp_gdf

Unnamed: 0,index,_id,approved,name,type,icaoClass,activity,onDemand,onRequest,byNotam,...,updatedAt,createdBy,updatedBy,geometry,lowerLimit_value,lowerLimit_unit,lowerLimit_ref,upperLimit_value,upperLimit_unit,upperLimit_ref
0,344,62cacff80ba090870156219e,True,CULDROSE ATZ 134.050,13,7,0,False,False,False,...,2022-07-10 13:11:20.571000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.25417 50.12692, -5.25824 50.12683...",0,1,0,2268,1,1
1,362,62cacffa0ba0908701562254,True,PREDANNACK ATZ 134.050,13,7,0,False,False,False,...,2022-07-10 13:11:22.765000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.23167 50.03526, -5.23492 50.03519...",0,1,0,2299,1,1
2,384,62cacffd0ba0908701562330,True,CULDROSE MATZ 134.050,14,6,0,False,False,False,...,2022-07-10 13:11:25.291000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.35278 50.03139, -5.35278 50.03139...",0,1,0,3299,1,1
3,385,62cacffd0ba090870156233a,True,CULDROSE MATZ 134.050,14,6,0,False,False,False,...,2022-07-10 13:11:25.414000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.03667 49.98972, -4.99611 50.05083...",1268,1,1,3268,1,1
4,422,62cad0010ba09087015624ae,True,D005A PREDANNACK (NOTAM),0,6,0,False,False,False,...,2022-07-10 13:11:29.653000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.23167 50.05191, -5.23655 50.05181...",0,1,0,8000,1,1
5,423,62cad0010ba09087015624b8,True,D005B PREDANNACK CORRIDOR (NOTAM),0,6,0,False,False,False,...,2022-07-10 13:11:29.759000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.28944 50.03556, -5.37833 49.97167...",0,1,0,8000,1,1
6,424,62cad0010ba09087015624c2,True,D006A FALMOUTH BAY,2,7,0,False,False,False,...,2022-07-10 13:11:29.870000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-4.78306 50.21222, -4.70778 50.12389...",0,1,0,22000,1,1
7,425,62cad0010ba09087015624cb,True,D006B FALMOUTH BAY,2,7,0,False,False,False,...,2022-07-10 13:11:29.965000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-5.08500 49.98528, -4.99667 50.08333...",0,1,0,8000,1,1
8,426,62cad0020ba09087015624d6,True,D007A FOWEY,2,7,0,False,False,False,...,2022-07-10 13:11:30.115000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-4.61194 50.30028, -4.53111 50.30556...",0,1,0,22000,1,1
9,427,62cad0020ba09087015624df,True,D007C FOWEY INNER,2,7,0,False,False,False,...,2022-07-10 13:11:30.214000+00:00,OPONcQnzWGOLiJSceNaf8pvx1fA2,OPONcQnzWGOLiJSceNaf8pvx1fA2,"POLYGON ((-4.72611 50.29250, -4.61194 50.30028...",0,1,0,2000,1,1


In [62]:
asp_gdf = gpd.read_file('gb_asp.geojson')
asp_gdf = asp_gdf[(asp_gdf['approved'] == True) & (asp_gdf['onDemand'] == False) & (asp_gdf['onRequest'] == False) & (
        asp_gdf['byNotam'] == False) & (asp_gdf['specialAgreement'] == False)]
asp_gdf = asp_gdf.cx[
          tfc_clean.data.longitude.min():tfc_clean.data.longitude.max(),
          tfc_clean.data.latitude.min(): tfc_clean.data.latitude.max()
          ].reset_index()
asp_upper_lims = pd.DataFrame(pd.json_normalize(asp_gdf.upperLimit))
asp_lower_lims = pd.DataFrame(pd.json_normalize(asp_gdf.lowerLimit))
asp_upper_lims.columns = ['upperLimit_value', 'upperLimit_unit', 'upperLimit_ref']
asp_lower_lims.columns = ['lowerLimit_value', 'lowerLimit_unit', 'lowerLimit_ref']
asp_lim_df = pd.concat([asp_lower_lims, asp_upper_lims], axis=1)
asp_gdf = pd.concat([asp_gdf, asp_lim_df], axis=1)
asp_gdf = asp_gdf.drop(
    labels=['_id', 'approved', 'specialAgreement', 'onDemand', 'onRequest', 'byNotam', 'createdAt', 'createdBy',
            'updatedAt', 'updatedBy', 'upperLimit', 'lowerLimit'], axis=1)
for col in ['type', 'icaoClass', 'activity']:
    asp_gdf[col] = pd.Categorical(asp_gdf[col])
asp_gdf['type'] = asp_gdf['type'].cat.rename_categories(ASP_TYPES)
asp_gdf['icaoClass'] = asp_gdf['icaoClass'].cat.rename_categories(ASP_CLASS)
asp_gdf['activity'] = asp_gdf['activity'].cat.rename_categories(ASP_ACTIVITIES)

In [63]:
def alt_std(row):
    cr = row.copy()
    if cr['upperLimit_unit'] == 0:
        cr['upperLimit_value'] *= 3.28084
    elif cr['upperLimit_unit'] == 6:
        cr['upperLimit_value'] *= 100

    if cr['lowerLimit_unit'] == 0:
        cr['lowerLimit_value'] *= 3.28084
    elif cr['lowerLimit_unit'] == 6:
        cr['lowerLimit_value'] *= 100

    return cr


# asp_gdf = asp_gdf.apply(alt_std, axis=1).dropna()
asp_gdf = asp_gdf[asp_gdf['lowerLimit_value'] <= 5000]
asp_gdf = asp_gdf.drop(labels=['upperLimit_unit', 'upperLimit_ref', 'lowerLimit_unit', 'lowerLimit_ref'], axis=1)

asp_gdf

Unnamed: 0,index,name,type,icaoClass,activity,country,geometry,lowerLimit_value,upperLimit_value
0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-5.25417 50.12692, -5.25824 50.12683...",0,2268
1,362,PREDANNACK ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-5.23167 50.03526, -5.23492 50.03519...",0,2299
2,384,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,None - No specific activity (default),GB,"POLYGON ((-5.35278 50.03139, -5.35278 50.03139...",0,3299
3,385,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,None - No specific activity (default),GB,"POLYGON ((-5.03667 49.98972, -4.99611 50.05083...",1268,3268
4,422,D005A PREDANNACK (NOTAM),Other,G,None - No specific activity (default),GB,"POLYGON ((-5.23167 50.05191, -5.23655 50.05181...",0,8000
5,423,D005B PREDANNACK CORRIDOR (NOTAM),Other,G,None - No specific activity (default),GB,"POLYGON ((-5.28944 50.03556, -5.37833 49.97167...",0,8000
6,424,D006A FALMOUTH BAY,Danger,Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-4.78306 50.21222, -4.70778 50.12389...",0,22000
7,425,D006B FALMOUTH BAY,Danger,Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-5.08500 49.98528, -4.99667 50.08333...",0,8000
8,426,D007A FOWEY,Danger,Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-4.61194 50.30028, -4.53111 50.30556...",0,22000
9,427,D007C FOWEY INNER,Danger,Special Use Airspace (SUA),None - No specific activity (default),GB,"POLYGON ((-4.72611 50.29250, -4.61194 50.30028...",0,2000


In [132]:
tfc_gdf = gpd.GeoDataFrame(tfc_clean.data,
                           geometry=gpd.points_from_xy(tfc_clean.data['longitude'], tfc_clean.data['latitude'],
                                                       tfc_clean.data['altitude']), crs='epsg:4326')

Since shapely only supports 2D geometries, we need to get creative to filter based on 3D airspace volumes. We iterate airspaces and select all traffic that is between the floor and ceiling of that airspace. A 2D point-in-polygon test is then run as usual.

This takes a decent chunk of time...

In [65]:
def tfc_within(lim_asp):
    lim_tfc = tfc_gdf[
        (tfc_gdf['altitude'] >= lim_asp['lowerLimit_value']) & (tfc_gdf['altitude'] <= lim_asp['upperLimit_value'])]
    return lim_tfc.sjoin(gpd.GeoDataFrame(lim_asp.to_frame().T).set_crs(asp_gdf.crs), predicate='within')


joined_dfs = jl.Parallel(n_jobs=-1, verbose=10)(jl.delayed(tfc_within)(lim_asp) for _, lim_asp in asp_gdf.iterrows())

con_asp_tfc_gdf = pd.concat(joined_dfs, axis=0)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of  22 | elapsed:    4.1s remaining:   25.7s
[Parallel(n_jobs=-1)]: Done   6 out of  22 | elapsed:    6.6s remaining:   17.5s
[Parallel(n_jobs=-1)]: Done   9 out of  22 | elapsed:    8.4s remaining:   12.1s
[Parallel(n_jobs=-1)]: Done  12 out of  22 | elapsed:    9.6s remaining:    8.0s
[Parallel(n_jobs=-1)]: Done  15 out of  22 | elapsed:   12.0s remaining:    5.6s
[Parallel(n_jobs=-1)]: Done  18 out of  22 | elapsed:   14.0s remaining:    3.1s
[Parallel(n_jobs=-1)]: Done  22 out of  22 | elapsed:   16.6s finished


In [66]:
con_asp_tfc_gdf.to_pickle('../data/cornwall_con_asp_tfc_2019.pkl.bz2', compression='bz2')
print(con_asp_tfc_gdf.shape)
con_asp_tfc_gdf.head()

(47812, 18)


Unnamed: 0,timestamp,altitude,groundspeed,latitude,longitude,track,vertical_rate,flight_id,geometry,index_right,index,name,type,icaoClass,activity,country,lowerLimit_value,upperLimit_value
59,2019-08-18 13:47:00+00:00,2200.0,121.0,50.125,-5.265625,79.0625,-64.0,GCDEO_140,POINT Z (-5.26562 50.12500 2200.00000),0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,0,2268
60,2019-08-18 13:47:30+00:00,2176.0,128.0,50.125,-5.238281,78.75,-128.0,GCDEO_140,POINT Z (-5.23828 50.12500 2176.00000),0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,0,2268
7,2019-08-22 11:34:00+00:00,750.0,108.0,50.09375,-5.316406,39.0,192.0,GCDEO_151,POINT Z (-5.31641 50.09375 750.00000),0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,0,2268
8,2019-08-22 11:34:30+00:00,790.0,101.0,50.09375,-5.300781,348.5,128.0,GCDEO_151,POINT Z (-5.30078 50.09375 790.00000),0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,0,2268
9,2019-08-22 11:35:00+00:00,830.0,101.0,50.09375,-5.285156,348.5,128.0,GCDEO_151,POINT Z (-5.28516 50.09375 830.00000),0,344,CULDROSE ATZ 134.050,Airport Traffic Zone (ATZ),Special Use Airspace (SUA),None - No specific activity (default),GB,0,2268


In [67]:
unc_asp_tfc_gdf = pd.merge(tfc_gdf, con_asp_tfc_gdf, how="outer", indicator=True
                           ).query('_merge=="left_only"').drop(labels=['_merge'], axis=1)
unc_asp_tfc_gdf

Unnamed: 0,timestamp,altitude,groundspeed,latitude,longitude,track,vertical_rate,flight_id,geometry,index_right,index,name,type,icaoClass,activity,country,lowerLimit_value,upperLimit_value
29,2019-08-21 07:00:00+00:00,1425.0,121.0,50.43750,-5.082031,227.375,384.0,IOS207_1001,POINT Z (-5.08203 50.43750 1425.00000),,,,,,,,,
30,2019-08-21 07:00:30+00:00,1575.0,125.0,50.43750,-5.097656,227.875,384.0,IOS207_1001,POINT Z (-5.09766 50.43750 1575.00000),,,,,,,,,
31,2019-08-21 07:01:00+00:00,1625.0,135.0,50.40625,-5.121094,229.875,64.0,IOS207_1001,POINT Z (-5.12109 50.40625 1625.00000),,,,,,,,,
32,2019-08-21 07:01:30+00:00,1600.0,137.0,50.40625,-5.140625,230.250,-128.0,IOS207_1001,POINT Z (-5.14062 50.40625 1600.00000),,,,,,,,,
33,2019-08-21 07:02:00+00:00,1575.0,135.0,50.37500,-5.164062,228.875,0.0,IOS207_1001,POINT Z (-5.16406 50.37500 1575.00000),,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114691,2019-04-03 15:48:30+00:00,1900.0,123.0,50.40625,-4.929688,339.500,-704.0,RCH172_4568,POINT Z (-4.92969 50.40625 1900.00000),,,,,,,,,
114692,2019-04-03 15:49:00+00:00,1900.0,123.0,50.40625,-4.929688,339.500,-704.0,RCH172_4568,POINT Z (-4.92969 50.40625 1900.00000),,,,,,,,,
114693,2019-04-03 15:49:30+00:00,1900.0,123.0,50.40625,-4.929688,339.500,-704.0,RCH172_4568,POINT Z (-4.92969 50.40625 1900.00000),,,,,,,,,
114694,2019-04-03 15:50:00+00:00,1900.0,123.0,50.40625,-4.929688,339.500,-704.0,RCH172_4568,POINT Z (-4.92969 50.40625 1900.00000),,,,,,,,,


In [68]:
unc_asp_tfc_gdf.to_pickle('../data/cornwall_unc_asp_tfc_2019.pkl.bz2', compression='bz2')

In [69]:
unc_asp_tfc_gdf['type'] = 0
unc_asp_tfc_gdf['icaoClass'] = 6
unc_asp_tfc_gdf['name'] = 'UNCONTROLLED AIRSPACE'

unc_asp_tfc_gdf = unc_asp_tfc_gdf.drop(
    labels=['index_right', 'country', 'lowerLimit_value', 'upperLimit_value', 'activity'], axis=1)

In [70]:
unc_asp_tfc = traffic.core.Traffic(unc_asp_tfc_gdf)

Aggregate traffic data by projected XY and collect statistics for each cell.

In [142]:
res = 6000
tfc_unc_xy_gdf = unc_asp_tfc.compute_xy('epsg:3857')
tfc_agg = tfc_unc_xy_gdf.assign(
    x=lambda elt: (elt.x // res) * res,
    y=lambda elt: (elt.y // res) * res,
).groupby(["x", "y"]).agg(altitude_mean=pd.NamedAgg('altitude', np.nanmean),
                          altitude_std=pd.NamedAgg('altitude', np.std), track_mean=pd.NamedAgg('track', np.nanmean),
                          track_std=pd.NamedAgg('track', np.std),
                          groundspeed_mean=pd.NamedAgg('groundspeed', np.nanmean),
                          groundspeed_std=pd.NamedAgg('groundspeed', np.std),
                          flight_id_nunique=('flight_id', 'nunique'))

Only use cells with over 30 samples in order for the Central Limit Theorem to hold. This ensures the distributions we extract from these cells are valid approximations of a Gaussian distribution.

In [72]:
tfc_magg = tfc_agg[tfc_agg['flight_id_nunique'] > 30]
tfc_gdf = tfc_agg.reset_index()
tfc_mgdf = tfc_magg.reset_index()
tfc_magg.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique
x,y,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47
-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.0,43.585051,75
-618000.0,6480000.0,2666.0,1036.673502,200.0,77.101737,135.0,26.402432,384
-618000.0,6486000.0,2548.0,1203.78045,195.375,77.314834,137.875,19.681153,230
-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.75,31.620701,37
-612000.0,6462000.0,1770.0,1152.675931,183.25,89.316928,107.75,29.373423,48
-612000.0,6468000.0,2014.0,894.61493,164.125,82.422277,124.1875,29.476093,99
-612000.0,6474000.0,2202.0,916.310429,177.0,90.639142,116.375,33.543025,132
-612000.0,6480000.0,2110.0,910.520221,178.5,92.427015,115.3125,33.185733,208
-612000.0,6486000.0,3018.0,956.18668,188.375,82.156879,139.375,20.70973,426


In [73]:
airprox_gdf

Unnamed: 0,AirproxID,Latitude,Longitude,Altitude,Risk,Aircraft1_Classification,Aircraft1_Category,Aircraft1_Type,Aircraft1_FlightRules,Aircraft2_Classification,Aircraft2_Category,Aircraft2_Type,Aircraft2_FlightRules,Combined_Rules,x,y,geometry,name,type,icaoClass
3081,2014131,52.616667,-1.033333,2.0,c,general_aviation,rotorcraft_-_helicopter,ROBINSON - R22,vfr,general_aviation,fixed_wing_-_aeroplane,PIPER - PA34,vfr,vfr,-115030.140486,6.912404e+06,POINT Z (-1.03333 52.61667 2.00000),LEICESTER ATZ 122.130,Airport Traffic Zone (ATZ),Special Use Airspace (SUA)
1118,2015219,53.016667,-0.483333,6.0,b,military,fixed_wing_-_aeroplane,GROB - G115,vfr,military,fixed_wing_-_aeroplane,GROB - G115,vfr,vfr,-53804.420550,6.986081e+06,POINT Z (-0.48333 53.01667 6.00000),CRANWELL ATZ 124.450,Airport Traffic Zone (ATZ),Special Use Airspace (SUA)
1118,2015219,53.016667,-0.483333,6.0,b,military,fixed_wing_-_aeroplane,GROB - G115,vfr,military,fixed_wing_-_aeroplane,GROB - G115,vfr,vfr,-53804.420550,6.986081e+06,POINT Z (-0.48333 53.01667 6.00000),BARKSTON/CRANWELL MATZ 124.450,Military Airport Traffic Zone (MATZ),G
1530,2007083,50.683333,-1.116667,10.0,c,general_aviation,civil_private_or_club,SCOUT A.H. MK I,vfr,suas,model_aircraft,MODEL AIRCRAFT,vfr,vfr,-124306.764719,6.565469e+06,POINT Z (-1.11667 50.68333 10.00000),BEMBRIDGE 123.255,Gliding Sector,Special Use Airspace (SUA)
919,2011013,51.016667,-2.633333,10.0,e,military,rotorcraft_-_helicopter,OTHER - Military (Lynx),vfr,military,fixed_wing_-_aeroplane,OTHER - Military (Hawk),vfr,vfr,-293141.325756,6.624242e+06,POINT Z (-2.63333 51.01667 10.00000),YEOVILTON ATZ 127.350,Airport Traffic Zone (ATZ),Special Use Airspace (SUA)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4587,2021235,51.416667,-0.100000,1350.0,c,emergency_services,rotorcraft_-_helicopter,MBB - BK117 (EC145),vfr,general_aviation,rotorcraft_-_helicopter,AGUSTA - A109,vfr,vfr,-11131.949079,6.695331e+06,POINT Z (-0.10000 51.41667 1350.00000),UNCONTROLLED AIRSPACE,0,6
4588,2021236,52.933333,-4.550000,260.0,d,general_aviation,other_-_paraglider,OTHER (Paraglider),vfr,unknown_aircraft,fixed_wing_-_aeroplane,UNKNOWN,unknown,unknown-vfr,-506503.683109,6.970676e+06,POINT Z (-4.55000 52.93333 260.00000),UNCONTROLLED AIRSPACE,0,6
4593,2021242,50.933333,-2.883333,14900.0,e,emergency_services,fixed_wing_-_aeroplane,BAE - AVRO146RJ - 100 - 70,vfr,military,fixed_wing_-_aeroplane,OTHER - Military (Hawk T1),vfr,vfr,-320971.198454,6.609510e+06,POINT Z (-2.88333 50.93333 14900.00000),UNCONTROLLED AIRSPACE,0,6
4595,2021245,53.666667,-0.533333,7400.0,e,civil_commercial,fixed_wing_-_aeroplane,CESSNA - 404,vfr,military,fixed_wing_-_aeroplane,OTHER - Military (E3),unknown,unknown-vfr,-59370.395090,7.107278e+06,POINT Z (-0.53333 53.66667 7400.00000),UNCONTROLLED AIRSPACE,0,6


In [74]:
x_idx = np.array(tfc_agg.index.levels[0])
y_idx = np.array(tfc_agg.index.levels[1])

In [75]:
airprox_gdf = airprox_gdf[
    (airprox_gdf.Latitude >= tfc_clean.data.latitude.min()) &
    (airprox_gdf.Latitude <= tfc_clean.data.latitude.max()) &
    (airprox_gdf.Longitude >= tfc_clean.data.longitude.min()) &
    (airprox_gdf.Longitude <= tfc_clean.data.longitude.max()) &
    ((airprox_gdf.icaoClass == 6) | (airprox_gdf.icaoClass == 'G') | (
            airprox_gdf.type == 'Radio Mandatory Zone (RMZ)') | (airprox_gdf.type == 'Gliding Sector'))
    ]

In [76]:
transformer = pyproj.Transformer.from_proj(pyproj.Proj("epsg:4326"), pyproj.Proj("epsg:3857"), always_xy=True)
x, y = transformer.transform(
    airprox_gdf.Longitude.values,
    airprox_gdf.Latitude.values,
)
airprox_gdf = airprox_gdf.assign(x=x, y=y)

Match up the locations of airproxes with the traffic stats in that cell

In [77]:
tfc_grid = np.array(tfc_magg.reset_index()[['x', 'y']])
airprox_locs = np.array(airprox_gdf[['x', 'y']])

In [78]:
tfc_idxs = cdist(tfc_grid, airprox_locs).argmin(axis=0)

In [79]:
tfc_cells = tfc_magg.iloc[tfc_idxs].reset_index()
airproxes_with_tfc = pd.concat([airprox_gdf.reset_index(), tfc_cells], axis=1)
airproxes_with_tfc = airproxes_with_tfc.drop(labels=['index', 'x', 'y'], axis=1)
airproxes_with_tfc

Unnamed: 0,AirproxID,Latitude,Longitude,Altitude,Risk,Aircraft1_Classification,Aircraft1_Category,Aircraft1_Type,Aircraft1_FlightRules,Aircraft2_Classification,...,name,type,icaoClass,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique
0,2011120,50.1,-5.133333,500.0,a,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,military,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,719.0,628.12901,190.625,98.662258,89.8125,41.818283,33
1,2017058,50.083333,-5.25,800.0,c,military,fixed_wing_-_aeroplane_-_military_aeroplane_-_...,OTHER - Military (Hawk),vfr,suas,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,2268.0,925.52438,185.375,96.109859,121.375,42.351927,34
2,2007147,50.083333,-5.25,1000.0,b,general_aviation,civil_glider,GLIDER (UNSPECIFIED),vfr,general_aviation,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,2268.0,925.52438,185.375,96.109859,121.375,42.351927,34
3,2000040,50.366667,-4.7,2900.0,c,commercial_air_transport,civil_air_transport_(scheduled_passenger),DHC-8 (DASH 8),ifr,military,...,UNCONTROLLED AIRSPACE,0,6,2790.0,766.637039,234.75,52.695919,168.5,47.797436,166
4,2001127,50.366667,-4.75,1000.0,a,general_aviation,civil_commercial_(hire_&_reward),DAUPHIN SA 365,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2876.0,683.453017,231.125,54.735886,166.25,42.238962,284
5,2004103,50.466667,-4.683333,800.0,b,general_aviation,civil_private_or_club,"R-21/00 /12 /60, ALPHA",vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2596.0,661.289644,159.625,107.047996,124.0,53.301231,38
6,2008099,50.316667,-4.816667,500.0,c,emergency_services,ambulance_helicopter,EC135,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2698.0,715.118536,250.625,97.594131,150.125,40.977537,308
7,2011077,50.216667,-5.416667,1200.0,b,emergency_services,rotorcraft_-_helicopter,EUROCOPTER - EC135,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,1959.0,908.24035,172.375,97.297264,100.625,44.333899,211
8,2012072,50.2,-5.25,2000.0,c,general_aviation,fixed_wing_-_aeroplane,CFM - SHADOW,vfr,emergency_services,...,UNCONTROLLED AIRSPACE,0,6,2418.0,1008.605906,128.25,87.974833,130.0,34.663851,82
9,2012122,50.15,-5.4,500.0,e,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2158.0,933.774234,138.0,90.607474,122.875,30.901723,112


In [80]:
non_airprox_tfc = pd.merge(tfc_magg.reset_index(), tfc_cells.reset_index(), how="outer", indicator=True
                           ).query('_merge=="left_only"')
non_airprox_tfc

Unnamed: 0,x,y,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique,index,_merge
0,-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47,,left_only
1,-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.000,43.585051,75,,left_only
2,-618000.0,6480000.0,2666.0,1036.673502,200.000,77.101737,135.000,26.402432,384,,left_only
3,-618000.0,6486000.0,2548.0,1203.780450,195.375,77.314834,137.875,19.681153,230,,left_only
4,-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.750,31.620701,37,,left_only
...,...,...,...,...,...,...,...,...,...,...,...
146,-528000.0,6534000.0,2584.0,724.309867,218.875,92.609117,121.750,56.325335,40,,left_only
147,-522000.0,6498000.0,1915.0,1073.142653,216.500,89.706484,134.250,58.717125,32,,left_only
148,-522000.0,6504000.0,2326.0,901.429218,205.875,82.045453,131.375,50.987008,72,,left_only
150,-522000.0,6516000.0,2770.0,744.627203,211.125,60.467294,164.750,48.855528,106,,left_only


Sanity check the data at this point by plotting

In [81]:
airproxes_with_tfc.explore('altitude_mean', cmap='inferno')

Examine the spatial coverage of the data. this is the area within which we can apply CLT and extract valid distributions

In [83]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['x'],
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['y'],
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['flight_id_nunique'],
    alpha=0.5,
    cmap='inferno')

# aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Sample Count')
cb = fig.colorbar(flow)
cb.set_label('Samples')
# ax.legend([aps], ['Airprox'])

<IPython.core.display.Javascript object>

Plot a correlation matrix between all variables using the Pearson Correlation Coefficient

In [84]:
# corr = airproxes_with_tfc.corr(method='spearman')
corr = airproxes_with_tfc.apply(lambda x: pd.factorize(x)[0]).corr(method='pearson', min_periods=1)
fig, ax = plt.subplots(figsize=(20, 20))
sns.heatmap(corr, square=True, cmap=sns.color_palette('icefire', as_cmap=True), annot=True, ax=ax)
plt.savefig('corr.svg')

<IPython.core.display.Javascript object>

Compute vectors for the quiver plot

In [85]:
tfc_mgdf['track_scale'] = 1 - (tfc_mgdf['track_std'] / tfc_mgdf['track_std'].max())
tfc_mgdf['track_u'] = np.cos(np.radians(tfc_mgdf['track_mean'])) * tfc_mgdf['track_scale']
tfc_mgdf['track_v'] = np.sin(np.radians(tfc_mgdf['track_mean'])) * tfc_mgdf['track_scale']
tfc_mgdf.head()

Unnamed: 0,x,y,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique,track_scale,track_u,track_v
0,-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47,0.324261,-0.314444,-0.079324
1,-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.0,43.585051,75,0.307043,-0.238978,0.192801
2,-618000.0,6480000.0,2666.0,1036.673502,200.0,77.101737,135.0,26.402432,384,0.412062,-0.387314,-0.140741
3,-618000.0,6486000.0,2548.0,1203.78045,195.375,77.314834,137.875,19.681153,230,0.410437,-0.395807,-0.108922
4,-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.75,31.620701,37,0.263054,-0.216429,0.149509


Plot the mean traffic flow direction for cells with sufficient samples. The scale of the vectors is inversely proportional to the standard deviation of the distribution of directions for that cell. In practice, this means the longer the arrow the more unidirectional and organised the traffic flow is.

Vector colouring is based on direction of the vector and is only to provide more visual difference.

Airprox locations are superimposed for information only.

Both a quiver and contour plot are made to based on the same data

In [86]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.quiver(tfc_mgdf['x'],
                 tfc_mgdf['y'],
                 tfc_mgdf['track_u'],
                 tfc_mgdf['track_v'],
                 tfc_mgdf['track_mean'],
                 scale_units=None,
                 cmap='cool')

aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean traffic flow')
cb = fig.colorbar(flow)
cb.set_label('Mean traffic flow')
ax.legend([aps], ['Airprox'])

# airports['EGHQ'].point.plot(ax)
# airports['EGHE'].point.plot(ax)
# airports['EGHC'].point.plot(ax)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f3ec1465630>

In [87]:
from cartopy.crs import Projection
from traffic.drawing import countries, lakes, ocean
from traffic.data import airports

fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(tfc_mgdf['x'],
                      tfc_mgdf['y'],
                      # tfc_gdf['track_u'],
                      # tfc_gdf['track_v'],
                      tfc_mgdf['track_mean'],
                      alpha=0.5,
                      cmap='inferno')

aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean traffic flow')
cb = fig.colorbar(flow)
cb.set_label('Mean traffic bearing')
ax.legend([aps], ['Airprox'])

# airports['EGHQ'].point.plot(ax)
# airports['EGHE'].point.plot(ax)
# airports['EGHC'].point.plot(ax)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f3ee4cf7a60>

In [88]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(
    tfc_mgdf['x'],
    tfc_mgdf['y'],
    tfc_mgdf['altitude_mean'],
    alpha=0.5,
    cmap='inferno')

# aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean altitude')
cb = fig.colorbar(flow)
cb.set_label('Mean altitude')
# ax.legend([aps], ['Airprox'])

<IPython.core.display.Javascript object>

# Aggregate Stats

In [168]:
ceiling_alt = 3000 / 3.281
cell_vol = res * res * ceiling_alt
# Get traffic density in each uncontrolled cell
cell_traffic_densities = np.array(tfc_magg['flight_id_nunique']) / cell_vol
print(f'Mean Traffic Density in uncontrolled airspace for data area: {cell_traffic_densities.mean()} aircraft/m^3')

Mean Traffic Density in uncontrolled airspace for data area: 5.315831616384595e-09 aircraft/m^3


In [174]:
import shapely.geometry as sg

print('World space stats (uncontrolled volumes):')
xy_points = gpd.points_from_xy(tfc_unc_xy_gdf.data['x'], tfc_unc_xy_gdf.data['y'], tfc_unc_xy_gdf.data['altitude'])
convex_hull = sg.MultiPoint(xy_points).convex_hull
min_rot_rect = convex_hull.minimum_rotated_rectangle
print(f"Total area: {min_rot_rect.area} m^2")
print(f"Total volume: {min_rot_rect.area * ceiling_alt} m^3")
coords = [np.array(c) for c in min_rot_rect.exterior.coords[:-1]]
coord_dists = np.unique(cdist(coords, coords).round(decimals=3))
coord_dists = coord_dists[coord_dists > 0]
print(f'Total x,y,z dimensions are {coord_dists[0]}m, {coord_dists[1]}m, {ceiling_alt}m with xy diagonal {coord_dists[2]}m')

World space stats (uncontrolled volumes):
Total area: 34316832283.439693 m^2
Total volume: 31377780204303.28 m^3
Total x,y,z dimensions are 179871.055m, 190785.74m, 914.3553794574824m with xy diagonal 262207.542m


## Testing hypotheses
All tests are done to 5% significance unless otherwise specified.

In [89]:
from scipy import stats as ss

sig_lvl = 0.05

### Track correlation
First, the correlation of direction variance with airprox location is tested. The mean standard deviation for directions in the entire area is found and compared to that of just where airproxes occurred:

In [90]:
print('Overall mean of stddev: ', non_airprox_tfc['track_std'].mean(), ' for ', len(non_airprox_tfc['track_std']),
      ' samples')
print('Airprox location mean of stddev: ', airproxes_with_tfc['track_std'].mean(), 'for ',
      len(airproxes_with_tfc['track_std']), ' samples')

Overall mean of stddev:  90.62413543410744  for  137  samples
Airprox location mean of stddev:  88.66543962695775 for  16  samples


In [91]:
F, p = ss.bartlett(non_airprox_tfc['track_std'], airproxes_with_tfc['track_std'])
print(f'Bartlett equal variance test gives score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

Bartlett equal variance test gives score of 0.32156393744218986 at a p-significance of 0.5706692882690401
Null hypothesis is accepted.


### Density Correlation

The count of unique flights within a cell is used as a measure of traffic density.

Otherwise the same procedure as above

In [92]:
print('Overall mean: ', non_airprox_tfc['flight_id_nunique'].mean(), ' for ', len(non_airprox_tfc['flight_id_nunique']),
      ' samples')
print('Airprox location mean: ', airproxes_with_tfc['flight_id_nunique'].mean(), 'for ',
      len(airproxes_with_tfc['flight_id_nunique']), ' samples')

Overall mean:  179.2189781021898  for  137  samples
Airprox location mean:  124.0625 for  16  samples


In [93]:
F, p = ss.f_oneway(non_airprox_tfc['flight_id_nunique'], airproxes_with_tfc['flight_id_nunique'])
print(f'One-Way ANOVA test gives F-score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

One-Way ANOVA test gives F-score of 1.4844277015737615 at a p-significance of 0.224983231354477
Null hypothesis is accepted.


### Speed correlation

First the difference in the overall traffic flow speed is compared between airprox and non-airprox traffic

In [94]:
print('Overall mean: ', np.array(non_airprox_tfc['groundspeed_mean']).mean(), ' for ',
      len(non_airprox_tfc['groundspeed_mean']),
      ' samples')
print('Airprox location mean: ', np.array(airproxes_with_tfc['groundspeed_mean']).mean(), 'for ',
      len(airproxes_with_tfc['groundspeed_mean']), ' samples')

Overall mean:  136.9  for  137  samples
Airprox location mean:  126.9 for  16  samples


In [95]:
F, p = ss.f_oneway(non_airprox_tfc['groundspeed_mean'], airproxes_with_tfc['groundspeed_mean'])
print(f'One-Way ANOVA test gives F-score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

One-Way ANOVA test gives F-score of 2.0247388537652915 at a p-significance of 0.15681822524529967
Null hypothesis is accepted.


Now the difference in *spread* of traffic flow speeds is compared between airprox and non-airprox traffic

In [96]:
print('Overall mean of stddev: ', np.array(non_airprox_tfc['groundspeed_std']).mean(), ' for ',
      len(non_airprox_tfc['groundspeed_std']),
      ' samples')
print('Airprox location mean of stddev: ', np.array(airproxes_with_tfc['groundspeed_std']).mean(), 'for ',
      len(airproxes_with_tfc['groundspeed_std']), ' samples')

Overall mean of stddev:  37.533975855413445  for  137  samples
Airprox location mean of stddev:  41.439882197447524 for  16  samples


In [97]:
F, p = ss.bartlett(non_airprox_tfc['groundspeed_std'], airproxes_with_tfc['groundspeed_std'])
print(f'Bartlett equal variance test gives score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

Bartlett equal variance test gives score of 1.2311940900286606 at a p-significance of 0.2671749416304806
Null hypothesis is accepted.


Only use cells with over 30 samples in order for the Central Limit Theorem to hold. This ensures the distributions we extract from these cells are valid approximations of a Gaussian distribution.

In [98]:
tfc_magg = tfc_agg[tfc_agg['flight_id_nunique'] > 30]
tfc_gdf = tfc_agg.reset_index()
tfc_mgdf = tfc_magg.reset_index()
tfc_magg.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique
x,y,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47
-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.0,43.585051,75
-618000.0,6480000.0,2666.0,1036.673502,200.0,77.101737,135.0,26.402432,384
-618000.0,6486000.0,2548.0,1203.78045,195.375,77.314834,137.875,19.681153,230
-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.75,31.620701,37
-612000.0,6462000.0,1770.0,1152.675931,183.25,89.316928,107.75,29.373423,48
-612000.0,6468000.0,2014.0,894.61493,164.125,82.422277,124.1875,29.476093,99
-612000.0,6474000.0,2202.0,916.310429,177.0,90.639142,116.375,33.543025,132
-612000.0,6480000.0,2110.0,910.520221,178.5,92.427015,115.3125,33.185733,208
-612000.0,6486000.0,3018.0,956.18668,188.375,82.156879,139.375,20.70973,426


In [99]:
airprox_gdf

Unnamed: 0,AirproxID,Latitude,Longitude,Altitude,Risk,Aircraft1_Classification,Aircraft1_Category,Aircraft1_Type,Aircraft1_FlightRules,Aircraft2_Classification,Aircraft2_Category,Aircraft2_Type,Aircraft2_FlightRules,Combined_Rules,x,y,geometry,name,type,icaoClass
3805,2011120,50.1,-5.133333,500.0,a,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,military,fixed_wing_-_aeroplane,OTHER - Military (Hawk),vfr,vfr,-571440.052739,6463612.0,POINT Z (-5.13333 50.10000 500.00000),CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G
872,2017058,50.083333,-5.25,800.0,c,military,fixed_wing_-_aeroplane_-_military_aeroplane_-_...,OTHER - Military (Hawk),vfr,suas,rpas_-_unmanned_aircraft_below_150,UNKNOWN (RPAS),unknown,unknown-vfr,-584427.326665,6460720.0,POINT Z (-5.25000 50.08333 800.00000),CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G
398,2007147,50.083333,-5.25,1000.0,b,general_aviation,civil_glider,GLIDER (UNSPECIFIED),vfr,general_aviation,civil_commercial_(hire_&_reward),F/A 406 CARAVAN II,vfr,vfr,-584427.326665,6460720.0,POINT Z (-5.25000 50.08333 1000.00000),CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G
41,2000040,50.366667,-4.7,2900.0,c,commercial_air_transport,civil_air_transport_(scheduled_passenger),DHC-8 (DASH 8),ifr,military,military_helicopter,"SEA KING, S-61 (MIL MODELS)",vfr,ifr-vfr,-523201.606728,6510020.0,POINT Z (-4.70000 50.36667 2900.00000),UNCONTROLLED AIRSPACE,0,6
340,2001127,50.366667,-4.75,1000.0,a,general_aviation,civil_commercial_(hire_&_reward),DAUPHIN SA 365,vfr,general_aviation,civil_private_or_club,CESSNA 150,vfr,vfr,-528767.581268,6510020.0,POINT Z (-4.75000 50.36667 1000.00000),UNCONTROLLED AIRSPACE,0,6
924,2004103,50.466667,-4.683333,800.0,b,general_aviation,civil_private_or_club,"R-21/00 /12 /60, ALPHA",vfr,general_aviation,civil_private_or_club,CESSNA 172,vfr,vfr,-521346.281882,6527490.0,POINT Z (-4.68333 50.46667 800.00000),UNCONTROLLED AIRSPACE,0,6
1683,2008099,50.316667,-4.816667,500.0,c,emergency_services,ambulance_helicopter,EC135,vfr,general_aviation,civil_private_or_club,R44 ASTRO (ROBINSON),vfr,vfr,-536188.880654,6501299.0,POINT Z (-4.81667 50.31667 500.00000),UNCONTROLLED AIRSPACE,0,6
2185,2011077,50.216667,-5.416667,1200.0,b,emergency_services,rotorcraft_-_helicopter,EUROCOPTER - EC135,vfr,general_aviation,fixed_wing_-_aeroplane,CESSNA - 172,vfr,vfr,-602980.57513,6483884.0,POINT Z (-5.41667 50.21667 1200.00000),UNCONTROLLED AIRSPACE,0,6
2341,2012072,50.2,-5.25,2000.0,c,general_aviation,fixed_wing_-_aeroplane,CFM - SHADOW,vfr,emergency_services,rotorcraft_-_helicopter,EUROCOPTER - EC145,vfr,vfr,-584427.326665,6480985.0,POINT Z (-5.25000 50.20000 2000.00000),UNCONTROLLED AIRSPACE,0,6
2389,2012122,50.15,-5.4,500.0,e,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,general_aviation,fixed_wing_-_aeroplane,DORNIER - DO28A,vfr,vfr,-601125.250284,6472294.0,POINT Z (-5.40000 50.15000 500.00000),UNCONTROLLED AIRSPACE,0,6


In [100]:
x_idx = np.array(tfc_agg.index.levels[0])
y_idx = np.array(tfc_agg.index.levels[1])

In [101]:
airprox_gdf = airprox_gdf[
    (airprox_gdf.Latitude >= tfc_clean.data.latitude.min()) &
    (airprox_gdf.Latitude <= tfc_clean.data.latitude.max()) &
    (airprox_gdf.Longitude >= tfc_clean.data.longitude.min()) &
    (airprox_gdf.Longitude <= tfc_clean.data.longitude.max()) &
    ((airprox_gdf.icaoClass == 6) | (airprox_gdf.icaoClass == 'G') | (
            airprox_gdf.type == 'Radio Mandatory Zone (RMZ)') | (airprox_gdf.type == 'Gliding Sector'))
    ]

In [102]:
transformer = pyproj.Transformer.from_proj(pyproj.Proj("epsg:4326"), pyproj.Proj("epsg:3857"), always_xy=True)
x, y = transformer.transform(
    airprox_gdf.Longitude.values,
    airprox_gdf.Latitude.values,
)
airprox_gdf = airprox_gdf.assign(x=x, y=y)

Match up the locations of airproxes with the traffic stats in that cell

In [103]:
tfc_grid = np.array(tfc_magg.reset_index()[['x', 'y']])
airprox_locs = np.array(airprox_gdf[['x', 'y']])

In [104]:
tfc_idxs = cdist(tfc_grid, airprox_locs).argmin(axis=0)

In [105]:
tfc_cells = tfc_magg.iloc[tfc_idxs].reset_index()
airproxes_with_tfc = pd.concat([airprox_gdf.reset_index(), tfc_cells], axis=1)
airproxes_with_tfc = airproxes_with_tfc.drop(labels=['index', 'x', 'y'], axis=1)
airproxes_with_tfc

Unnamed: 0,AirproxID,Latitude,Longitude,Altitude,Risk,Aircraft1_Classification,Aircraft1_Category,Aircraft1_Type,Aircraft1_FlightRules,Aircraft2_Classification,...,name,type,icaoClass,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique
0,2011120,50.1,-5.133333,500.0,a,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,military,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,719.0,628.12901,190.625,98.662258,89.8125,41.818283,33
1,2017058,50.083333,-5.25,800.0,c,military,fixed_wing_-_aeroplane_-_military_aeroplane_-_...,OTHER - Military (Hawk),vfr,suas,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,2268.0,925.52438,185.375,96.109859,121.375,42.351927,34
2,2007147,50.083333,-5.25,1000.0,b,general_aviation,civil_glider,GLIDER (UNSPECIFIED),vfr,general_aviation,...,CULDROSE MATZ 134.050,Military Airport Traffic Zone (MATZ),G,2268.0,925.52438,185.375,96.109859,121.375,42.351927,34
3,2000040,50.366667,-4.7,2900.0,c,commercial_air_transport,civil_air_transport_(scheduled_passenger),DHC-8 (DASH 8),ifr,military,...,UNCONTROLLED AIRSPACE,0,6,2790.0,766.637039,234.75,52.695919,168.5,47.797436,166
4,2001127,50.366667,-4.75,1000.0,a,general_aviation,civil_commercial_(hire_&_reward),DAUPHIN SA 365,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2876.0,683.453017,231.125,54.735886,166.25,42.238962,284
5,2004103,50.466667,-4.683333,800.0,b,general_aviation,civil_private_or_club,"R-21/00 /12 /60, ALPHA",vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2596.0,661.289644,159.625,107.047996,124.0,53.301231,38
6,2008099,50.316667,-4.816667,500.0,c,emergency_services,ambulance_helicopter,EC135,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2698.0,715.118536,250.625,97.594131,150.125,40.977537,308
7,2011077,50.216667,-5.416667,1200.0,b,emergency_services,rotorcraft_-_helicopter,EUROCOPTER - EC135,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,1959.0,908.24035,172.375,97.297264,100.625,44.333899,211
8,2012072,50.2,-5.25,2000.0,c,general_aviation,fixed_wing_-_aeroplane,CFM - SHADOW,vfr,emergency_services,...,UNCONTROLLED AIRSPACE,0,6,2418.0,1008.605906,128.25,87.974833,130.0,34.663851,82
9,2012122,50.15,-5.4,500.0,e,military,rotorcraft_-_helicopter,EH INDUSTRIES - EH101,vfr,general_aviation,...,UNCONTROLLED AIRSPACE,0,6,2158.0,933.774234,138.0,90.607474,122.875,30.901723,112


In [106]:
non_airprox_tfc = pd.merge(tfc_magg.reset_index(), tfc_cells.reset_index(), how="outer", indicator=True
                           ).query('_merge=="left_only"')
non_airprox_tfc

Unnamed: 0,x,y,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique,index,_merge
0,-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47,,left_only
1,-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.000,43.585051,75,,left_only
2,-618000.0,6480000.0,2666.0,1036.673502,200.000,77.101737,135.000,26.402432,384,,left_only
3,-618000.0,6486000.0,2548.0,1203.780450,195.375,77.314834,137.875,19.681153,230,,left_only
4,-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.750,31.620701,37,,left_only
...,...,...,...,...,...,...,...,...,...,...,...
146,-528000.0,6534000.0,2584.0,724.309867,218.875,92.609117,121.750,56.325335,40,,left_only
147,-522000.0,6498000.0,1915.0,1073.142653,216.500,89.706484,134.250,58.717125,32,,left_only
148,-522000.0,6504000.0,2326.0,901.429218,205.875,82.045453,131.375,50.987008,72,,left_only
150,-522000.0,6516000.0,2770.0,744.627203,211.125,60.467294,164.750,48.855528,106,,left_only


Sanity check the data at this point by plotting

In [107]:
airproxes_with_tfc.explore('altitude_mean', cmap='inferno')

Examine the spatial coverage of the data. this is the area within which we can apply CLT and extract valid distributions

In [108]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['x'],
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['y'],
    tfc_gdf[tfc_gdf['flight_id_nunique'] > 30]['flight_id_nunique'],
    alpha=0.5,
    cmap='inferno')

# aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Sample Count')
cb = fig.colorbar(flow)
cb.set_label('Samples')
# ax.legend([aps], ['Airprox'])

<IPython.core.display.Javascript object>

Plot a correlation matrix between all variables using the Pearson Correlation Coefficient

In [109]:
# corr = airproxes_with_tfc.corr(method='spearman')
corr = airproxes_with_tfc.apply(lambda x: pd.factorize(x)[0]).corr(method='pearson', min_periods=1)
fig, ax = plt.subplots(figsize=(20, 20))
sns.heatmap(corr, square=True, cmap=sns.color_palette('icefire', as_cmap=True), annot=True, ax=ax)
plt.savefig('corr.svg')

<IPython.core.display.Javascript object>

Compute vectors for the quiver plot

In [110]:
tfc_mgdf['track_scale'] = 1 - (tfc_mgdf['track_std'] / tfc_mgdf['track_std'].max())
tfc_mgdf['track_u'] = np.cos(np.radians(tfc_mgdf['track_mean'])) * tfc_mgdf['track_scale']
tfc_mgdf['track_v'] = np.sin(np.radians(tfc_mgdf['track_mean'])) * tfc_mgdf['track_scale']
tfc_mgdf.head()

Unnamed: 0,x,y,altitude_mean,altitude_std,track_mean,track_std,groundspeed_mean,groundspeed_std,flight_id_nunique,track_scale,track_u,track_v
0,-624000.0,6486000.0,2524.0,1221.827691,194.125,88.615884,130.875,26.369813,47,0.324261,-0.314444,-0.079324
1,-618000.0,6474000.0,2574.0,903.288369,141.125,90.873882,134.0,43.585051,75,0.307043,-0.238978,0.192801
2,-618000.0,6480000.0,2666.0,1036.673502,200.0,77.101737,135.0,26.402432,384,0.412062,-0.387314,-0.140741
3,-618000.0,6486000.0,2548.0,1203.78045,195.375,77.314834,137.875,19.681153,230,0.410437,-0.395807,-0.108922
4,-618000.0,6492000.0,3048.0,931.113359,145.375,96.642526,146.75,31.620701,37,0.263054,-0.216429,0.149509


Plot the mean traffic flow direction for cells with sufficient samples. The scale of the vectors is inversely proportional to the standard deviation of the distribution of directions for that cell. In practice, this means the longer the arrow the more unidirectional and organised the traffic flow is.

Vector colouring is based on direction of the vector and is only to provide more visual difference.

Airprox locations are superimposed for information only.

Both a quiver and contour plot are made to based on the same data

In [111]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.quiver(tfc_mgdf['x'],
                 tfc_mgdf['y'],
                 tfc_mgdf['track_u'],
                 tfc_mgdf['track_v'],
                 tfc_mgdf['track_mean'],
                 scale_units=None,
                 cmap='cool')

aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean traffic flow')
cb = fig.colorbar(flow)
cb.set_label('Mean traffic flow')
ax.legend([aps], ['Airprox'])

# airports['EGHQ'].point.plot(ax)
# airports['EGHE'].point.plot(ax)
# airports['EGHC'].point.plot(ax)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f3ee430f700>

In [112]:
from cartopy.crs import Projection
from traffic.drawing import countries, lakes, ocean
from traffic.data import airports

fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(tfc_mgdf['x'],
                      tfc_mgdf['y'],
                      # tfc_gdf['track_u'],
                      # tfc_gdf['track_v'],
                      tfc_mgdf['track_mean'],
                      alpha=0.5,
                      cmap='inferno')

aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean traffic flow')
cb = fig.colorbar(flow)
cb.set_label('Mean traffic bearing')
ax.legend([aps], ['Airprox'])

# airports['EGHQ'].point.plot(ax)
# airports['EGHE'].point.plot(ax)
# airports['EGHC'].point.plot(ax)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f3ee4253700>

In [113]:
fig, ax = plt.subplots(
    1, 1, figsize=(11, 11), subplot_kw=dict(projection=Projection('epsg:3857')),
)

ax.add_feature(countries())
ax.add_feature(lakes())
ax.add_feature(ocean())

flow = ax.tricontourf(
    tfc_mgdf['x'],
    tfc_mgdf['y'],
    tfc_mgdf['altitude_mean'],
    alpha=0.5,
    cmap='inferno')

# aps = ax.scatter(airprox_gdf['x'], airprox_gdf['y'], c='r', marker='x')

ax.set_title('Mean altitude')
cb = fig.colorbar(flow)
cb.set_label('Mean altitude')
# ax.legend([aps], ['Airprox'])

<IPython.core.display.Javascript object>

## Testing hypotheses
All tests are done to 5% significance unless otherwise specified.

In [114]:
from scipy import stats as ss

sig_lvl = 0.05

if airproxes_with_tfc.shape[0] < 30:
    print('Insufficient samples for CLT!')
    assert False
elif airproxes_with_tfc.shape[0] < 50:
    print('Low number of samples. Consider more airprox samples.')

### Track correlation
First, the correlation of direction variance with airprox location is tested. The mean standard deviation for directions in the entire area is found and compared to that of just where airproxes occurred:

In [115]:
print('Overall mean of stddev: ', non_airprox_tfc['track_std'].mean(), ' for ', len(non_airprox_tfc['track_std']),
      ' samples')
print('Airprox location mean of stddev: ', airproxes_with_tfc['track_std'].mean(), 'for ',
      len(airproxes_with_tfc['track_std']), ' samples')

Overall mean of stddev:  90.62413543410744  for  137  samples
Airprox location mean of stddev:  88.66543962695775 for  16  samples


In [116]:
F, p = ss.bartlett(non_airprox_tfc['track_std'], airproxes_with_tfc['track_std'])
print(f'Bartlett equal variance test gives score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

Bartlett equal variance test gives score of 0.32156393744218986 at a p-significance of 0.5706692882690401
Null hypothesis is accepted.


### Density Correlation

The count of unique flights within a cell is used as a measure of traffic density.

Otherwise the same procedure as above

In [117]:
print('Overall mean: ', non_airprox_tfc['flight_id_nunique'].mean(), ' for ', len(non_airprox_tfc['flight_id_nunique']),
      ' samples')
print('Airprox location mean: ', airproxes_with_tfc['flight_id_nunique'].mean(), 'for ',
      len(airproxes_with_tfc['flight_id_nunique']), ' samples')

Overall mean:  179.2189781021898  for  137  samples
Airprox location mean:  124.0625 for  16  samples


In [118]:
F, p = ss.f_oneway(non_airprox_tfc['flight_id_nunique'], airproxes_with_tfc['flight_id_nunique'])
print(f'One-Way ANOVA test gives F-score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

One-Way ANOVA test gives F-score of 1.4844277015737615 at a p-significance of 0.224983231354477
Null hypothesis is accepted.


### Speed correlation

First the difference in the overall traffic flow speed is compared between airprox and non-airprox traffic

In [119]:
print('Overall mean: ', np.array(non_airprox_tfc['groundspeed_mean']).mean(), ' for ',
      len(non_airprox_tfc['groundspeed_mean']),
      ' samples')
print('Airprox location mean: ', np.array(airproxes_with_tfc['groundspeed_mean']).mean(), 'for ',
      len(airproxes_with_tfc['groundspeed_mean']), ' samples')

Overall mean:  136.9  for  137  samples
Airprox location mean:  126.9 for  16  samples


In [120]:
F, p = ss.f_oneway(non_airprox_tfc['groundspeed_mean'], airproxes_with_tfc['groundspeed_mean'])
print(f'One-Way ANOVA test gives F-score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

One-Way ANOVA test gives F-score of 2.0247388537652915 at a p-significance of 0.15681822524529967
Null hypothesis is accepted.


Now the difference in *spread* of traffic flow speeds is compared between airprox and non-airprox traffic

In [121]:
print('Overall mean of stddev: ', np.array(non_airprox_tfc['groundspeed_std']).mean(), ' for ',
      len(non_airprox_tfc['groundspeed_std']),
      ' samples')
print('Airprox location mean of stddev: ', np.array(airproxes_with_tfc['groundspeed_std']).mean(), 'for ',
      len(airproxes_with_tfc['groundspeed_std']), ' samples')

Overall mean of stddev:  37.533975855413445  for  137  samples
Airprox location mean of stddev:  41.439882197447524 for  16  samples


In [122]:
F, p = ss.bartlett(non_airprox_tfc['groundspeed_std'], airproxes_with_tfc['groundspeed_std'])
print(f'Bartlett equal variance test gives score of {F} at a p-significance of {p}')
if p <= sig_lvl:
    print(f'The hypothesis is accepted (F={F}, p={p})')
else:
    print('Null hypothesis is accepted.')

Bartlett equal variance test gives score of 1.2311940900286606 at a p-significance of 0.2671749416304806
Null hypothesis is accepted.
