## Purpose of this document

The purpose of this notebook is to discuss possible bugs in the linkage between regulations and airblocks. It shows why I think there might still be missing links in the database. The basic interlinking reuirements have been documented in the datAcron Interlinking Report of 31. January, 2017 (UPRC, Giorgos Santipantakis, George Vouros, Christos Doulkeridis), p.12.


#### Summary
We show that a lot (over 2000 of ~2700 total) regulation events are not linked or not linkable to their respective airspaces in the triple store. Especially the weather regulations that are important for FM01 - FM03 scenarios are vastly reduced from 177 to 18. 
The possible reason for this is that the regulations file is deliverd by CRIDA, whereas the airspace definition file is delivered as a DDR file, and the airspaces or sectors may have different codes. Fore example, the sector LECBFMP that is mentioned in one of the regulation raw data files can nowhere be found in the airspaces, sectors or airblocks. 
This has a lot of consequences. If we cannot find the correct linkage, then:
 - we can conclude that the realization of scenarios FM01 and maybe FM02 is not feasible.
 - my bachelor's thesis will conclude that it is not possible to learn from the data provided, as the required links cannot be stablished.
 
I hope we can analyze and discuss this situation as soon as possible. 


#### Table of contents
  1. Prior work: the query recommended by Giorgos
  2. Analysis of Giorgos' query
  3. Comparison to a simpler query
  4. Comparison to another query (data transformation)
  5. Comparison to the raw data
  6. Conclusion


## 1. Prior Work: the query recommended by Giorgos

According to Giorgos' mail of 08th July, the following query should show the graph path between regulations and airblocks. 
In my mail of 12th of July, I promised Giorgos to check this query thoroughly. Her we go.

We will see that Giorgos' query indeed returns some results. But the results are incomplete, and the links of the most important regulations (weather regulations) are completely missing!

In [1]:
import os.path
import numpy as np
import matplotlib as mp
import matplotlib.pyplot as plt
import pandas as pd
import json
import cesiumpy
import random
from geomet import wkt
from pandas.io.json import json_normalize, read_json
from SPARQLWrapper import SPARQLWrapper, JSON, XML, RDF
from datetime import datetime
from IPython.display import HTML

#Set some parameters for nicer visualizations
pd.set_option('display.expand_frame_repr', False) #do not wrap the printout of Pandas DataFrames
pd.set_option('display.precision', 2)
mp.rcParams['figure.figsize'] = (15, 9)
mp.pyplot.style.use = False


# initialize my connection module which allows to connect oto both datAcron graph databases
from datacron_connector import TripleStoreConnector
ts107 = TripleStoreConnector(0)
ts109 = TripleStoreConnector(1)

#some technical comments
# PREFIX bif: <java:datAcronTester.unipi.gr.sparql_functions.>   <--- only to be used in 109

In [3]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT ?regulation ?type ?airspace (myfn:getGeom(?g) as ?WKT) WHERE {
  ?regulation a ?type . 
  ?type       rdfs:subClassOf :FM_Regulation .
  ?regulation dul:hasRegion ?airspace .
  ?airspace   dul:hasPart ?sector .
  ?sector     dul:hasPart ?block .
  ?block      :hasGeometry ?g
}
"""
 #?s rdf:type/rdfs:subClassOf* :SpatiotemporalRegion 

df = ts109.query(qry)
df = ts109.clean(df)
df.describe()

Unnamed: 0,regulation,type,airspace,WKT
count,11938,11938,11938,11938
unique,696,11,171,1215
top,LGSTD506_411,ATC_Routing,Airspace_LIRRALL_411,"POLYGON ((-5.5 47.25, -4.9975 47.5, -3.5227777..."
freq,424,3380,848,105


In [4]:
df.head(5)

Unnamed: 0,regulation,type,airspace,WKT
0,LFLBA11_411,ATC_AerodromeCapacityRegulation,Airspace_LFLBTMA_411,"POLYGON ((5.59916666666667 45.9711111111111, 5..."
1,LFLBA11_411,ATC_AerodromeCapacityRegulation,Airspace_LFLBTMA_411,"POLYGON ((5.9175 45.6375, 5.93083333333333 45...."
2,LFLBA11_411,ATC_AerodromeCapacityRegulation,Airspace_LFLBTMA_411,"POLYGON ((5.53333333333333 45.6833333333333, 5..."
3,LFLBA11_411,ATC_AerodromeCapacityRegulation,Airspace_LFLBTMA_411,"POLYGON ((5.94527777777778 45.9302777777778, 5..."
4,LFLBA11_411,ATC_AerodromeCapacityRegulation,Airspace_LFLBTMA_411,"POLYGON ((5.96 45.9519444444444, 6.05916666666..."





## 2. Analysis of the results of Giorgos' query

As we can see, this completely unlimited query will return 696 unique regulations for 171 unique airspaces. The extract above shows that the regulations of the complete european airspace are stored in the database, because "LFLB..." codes stand for airspaces and airports in France. How many ATC_WeatherRegulations do we have in this result?

In [7]:
weather_rows = (df['type'] == 'ATC_WeatherRegulation')   #create a boolean mask and filter for Weather Regulations
df_wx = df[weather_rows]                                 #apply boolean mask
df_wx.describe()

Unnamed: 0,regulation,type,airspace,WKT
count,229,229,229,229
unique,18,1,10,99
top,EGSAJ15_411,ATC_WeatherRegulation,Airspace_EGTTSAJ_411,"POLYGON ((17.8666666666667 59.3969444444444, 1..."
freq,44,229,88,14


In [8]:
df_wx['regulation'].unique()

array(['ESSAE07_411', 'ESSAE07A_411', 'ESSAA08_411', 'KCHI1K15_411',
       'EDGAOS12_411', 'EGSAJ15_411', 'GCINB17_411', 'GCINB18_411',
       'ESSAT19A_411', 'LET1E121_411', 'ESSAR21A_411', 'ESSAT22_411',
       'LEBP1I23_411', 'ESSAE25_411', 'EGSAJ26_411', 'EGPNE26_411',
       'EGTLA26_411', 'EDGPAD27_411'], dtype=object)

We can see that only 18 (!) weather regulations have been discovered by Giorgos query, of which only two (!) are relevant for the spanish airspace. We will now compare this now to a query that ignores airblock linkage and to the raw data.





## 3. Comparison to a simpler query

Will a simpler query that searches only for regulations and not for linked airblocks return different results? If yes, then the linkage to airspaces or airblocks in the triple store is incomplete, buggy or needs further clarification.

In [10]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT ?regulation ?type ?airspace 
WHERE {
  ?regulation a ?type . 
  ?type       rdfs:subClassOf :FM_Regulation .
  ?regulation dul:hasRegion ?airspace .
}
"""
 #?s rdf:type/rdfs:subClassOf* :SpatiotemporalRegion 

df = ts109.query(qry)
df = ts109.clean(df)
df.describe()

Unnamed: 0,regulation,type,airspace
count,2704,2704,2704
unique,2704,14,590
top,LFPGA01M_411,ATC_Capacity,Airspace_LTBA_411
freq,1,641,77


The simpler query returns 2704 regulations, way more than the 696 unique regulations returned by the first query! This means that with the construction of a "link" to the airblocks, we are loosing more than 2000 regulations! How many weather regulations did we loose?

In [12]:
weather_rows = (df['type'] == 'ATC_WeatherRegulation')
df_wx = df[weather_rows]
df_wx.describe()

Unnamed: 0,regulation,type,airspace
count,177,177,177
unique,177,1,68
top,GCXOA07_411,ATC_WeatherRegulation,Airspace_LSZH_411
freq,1,177,22


We get 177 weather regulations, way more than the 18 remaining wx regulations that were available in the linked query above.

## 4. Comparison with a data transformation query provided by UNIPI

The following query lists all weather regulations and links them to the NOAA grib files. It is taken from the document "data_transformation sparql examples.pdf" that was provided by the university of Piraeus. How many wx regulations will this query discover?

In [17]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>
SELECT ?r (myfn:toNOAAurl(str(?ts)) as ?n) WHERE {
?r a :ATC_WeatherRegulation ; dul:hasTimeInterval ?t .
?t :TimeStart ?ts .
}
"""
 #?s rdf:type/rdfs:subClassOf* :SpatiotemporalRegion 

df = ts109.query(qry)
df = ts109.clean(df)
df.describe()

Unnamed: 0,r,n
count,177,177
unique,177,67
top,GCXOA07_411,ftp://nomads.ncdc.noaa.gov/GFS/Grid4/201604/20...
freq,1,13


This query finds 177 weather regulations, which is consistent with the number of wx regulations found by our query above.

## 5. Comparison with the raw data provided by CRIDA

In [15]:
df_crida = pd.read_csv('data/CRIDA-Regulations-20160401-20160430.csv', delimiter=';')
df_crida.head(3)

Unnamed: 0,dateReference,RegulationStart,RegulationId,TrafficVolumeSet,ReferenceLocation,ReferenceLocationType,TrafficVolume,RegulationEnd,RegulationActivity,RegulationCancelTime,RegulationDuration,AiracCycle,RegulationCategory,RegulationReasonCode,RegulationDescription
0,20160401,20160401 00:00:00,AR1ORT01,SCENAR,ORTIS,SP,AR1ORT,20160401 04:00:00,T,,240,411,T,R,
1,20160401,20160401 00:00:00,AR2RV01,SCENAR,SARAY,SP,AR2RV,20160401 04:00:00,T,,240,411,T,R,
2,20160401,20160401 00:00:00,BBDX01N,LFBBFMP,LFBBBDX,AS,LFBBDX,20160401 04:00:00,C,20160331 17:44:05,-375,411,T,I,


In [16]:
weather_rows = (df_crida['RegulationReasonCode'] == 'W')
df_wx = df_crida[weather_rows]
df_wx.describe()

Unnamed: 0,dateReference,RegulationDuration,AiracCycle
count,173.0,173.0,173.0
mean,20200000.0,121.84,411.1
std,8.79,155.87,0.31
min,20200000.0,-824.0,411.0
25%,20200000.0,70.0,411.0
50%,20200000.0,108.0,411.0
75%,20200000.0,160.0,411.0
max,20200000.0,1040.0,412.0


The raw data returns 173 wx regulations, the other four are hidden due to delimiter problems in the csv import. In total, there are as well 177 weather regulations between the 1st of April 2016 and the 30th of April 2016.

## 6. Conclusion

We have seen that a lot (over 2000) regulation events are not linked or not linkable to their respective airspaces. Especially the weather regulations that are important for FM01 - FM03 scenarios are vastly reduced from 177 to 18. 

The possible reason for this is that the regulations file is deliverd by CRIDA, whereas the airspace definition file is delivered as a DDR file, and the airspaces or sectors may have different codes. Fore example, the sector LECBFMP that is mentioned in one of the regulations can nowhere be found in the airspaces, sectors or airblocks. 

This has a lot of consequences. If we cannot find the correct linkage, then:
 - we can conclude that the realization of scenarios FM01 - FM03 is not feasible
 - my bachelor's thesis will conclude that it is not possible to learn from the data provided, as the required links cannot be stablished.
 
I hope we can analyze and discuss this situation as soon as possible. 


Brgds
Jörg

![The datAcron ontology](images/FM01 regulations detection and prediction.png)



