# Regulation Inspection

The goal is to be able to forecast regulations, based on raw data like WX conditions or traffic density. 

My focus is on WX regulations.
According to the document "CRIDA-Regulations-20160401-20160430.csv", regulations have the following properties:

| dateReference | RegulationStart   | RegulationId | TrafficVolumeSet | ReferenceLocation | ReferenceLocationType | TrafficVolume | RegulationEnd     | RegulationActivity | RegulationCancelTime | RegulationDuration | AiracCycle | RegulationCategory | RegulationReasonCode | RegulationDescription | 
|---------------|-------------------|--------------|------------------|-------------------|-----------------------|---------------|-------------------|--------------------|----------------------|--------------------|------------|--------------------|----------------------|-----------------------| 
| 20160401      | 20160401 04:40:00 | EDDFA01      | EDGGFMP1         | EDDF              | AD                    | EDDFAWX       | 20160401 07:20:00 | C                  | 20160401 05:52:57    | 72                 | 411        | T                  | W                    | STRONG WINDS          | 
| 20160401      | 20160401 06:00:00 | LEBLA01M     | LECBFMP          | LEBL              | AD                    | LEBLARR       | 20160401 13:00:00 | C                  | 20160401 11:41:42    | 341                | 411        | T                  | W                    | CB + STORM            | 
| 20160401      | 20160401 08:40:00 | LECVN01M     | LECBFMP          | LECBCVN           | AS                    | LECBCVN1      | 20160401 11:20:00 | C                  | 20160401 08:16:03    | -23                | 411        | T                  | C                    |                       | 



Acc. "DatAcron Data sources for Flow Management scenarios.pptx", Traffic Volume is the sector name.

These are three typical examples for WX-caused regulations. It is important to note the column "referenceLocation" which indicates us that the critical weather, that caused the regulation, is not co-located with the regulated sector. Note also the "referenceLocationType", which denotes if the area of interest is an airport or an airspace.

The second important column is the "RegulationCancelTime", which denotes that a regulation may be cancelled if, for example, the actual weather turns out to be better than expected, or the network resilience is good enough to cope with demand exceeding capacity. The third regulation was even cancelled before it begun, yielding a negative "regulation time".



With the following data exploration, I hope to pull these important properties of the regulation out of the data store. 


In [1]:
import pandas as pd
from pandas.io.json import json_normalize, read_json
from SPARQLWrapper import SPARQLWrapper, JSON, XML, RDF

#import my connection module which allows to connect to the datAcron rdf store
from datacron_connector import TripleStoreConnector

ts107 = TripleStoreConnector(0)
ts109 = TripleStoreConnector(1)

Let's check what kind of regulations are stored in the triple store.

## Get all airblocks of spain

First, I will get all airblocks, together with their airspaces and super-airspaces.

In [100]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT  ?bigairspace ?airspace ?block ?wkt ?lowerlevel ?upperlevel 
WHERE { 
   
   ?airspace dul:hasPart       ?block .
   ?block       :hasLowerLevel ?lowerlevel ;
                :hasUpperLevel ?upperlevel ;
                :hasGeometry   ?geom.
   ?geom        :hasMBR_WKT  ?wkt .
   
   OPTIONAL {?bigairspace dul:hasPart ?airspace.}.
   
   FILTER regex(str(?block), 'Airblock_LE', "i")
}
"""

df2 = ts109.query(qry)
df2 = ts109.clean(df2)
%time df2 = df2.sort_values('block')

df2.to_csv('allspanishairblocks.csv')
df2.head(10)


CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 106 ms


Unnamed: 0,bigairspace,airspace,block,wkt,lowerlevel,upperlevel
0,Airspace_LETOT_411,Airspace_LEABTA_411,Airblock_LEABTA_249LE,"POLYGON ((-2.99166666666667 39, -2.99166666666...",0.0,7467.6
1,Airspace_LEABTMA_411,Airspace_LEABTA_411,Airblock_LEABTA_249LE,"POLYGON ((-2.99166666666667 39, -2.99166666666...",0.0,7467.6
2,Airspace_LETOT_411,Airspace_LEAMTA_411,Airblock_LEAMTA_650LE,"POLYGON ((-2.84083333333333 36.4166666666667, ...",0.0,4419.6
3,Airspace_LEAMTMA_411,Airspace_LEAMTA_411,Airblock_LEAMTA_650LE,"POLYGON ((-2.84083333333333 36.4166666666667, ...",0.0,4419.6
4,Airspace_LETOT_411,Airspace_LEAMTA_411,Airblock_LEAMTA_651LE,"POLYGON ((-1.83333333333333 36.85, -1.83333333...",0.0,4419.6
5,Airspace_LEAMTMA_411,Airspace_LEAMTA_411,Airblock_LEAMTA_651LE,"POLYGON ((-1.83333333333333 36.85, -1.83333333...",0.0,4419.6
6,Airspace_LETOT_411,Airspace_LEAMTA_411,Airblock_LEAMTA_652LE,"POLYGON ((-1.86111111111111 36.7138888888889, ...",0.0,4419.6
7,Airspace_LEAMTMA_411,Airspace_LEAMTA_411,Airblock_LEAMTA_652LE,"POLYGON ((-1.86111111111111 36.7138888888889, ...",0.0,4419.6
9,Airspace_LEAMTMA_411,Airspace_LEAMTA_411,Airblock_LEAMTA_653LE,"POLYGON ((-1.92361111111111 36.4530555555556, ...",0.0,4419.6
8,Airspace_LETOT_411,Airspace_LEAMTA_411,Airblock_LEAMTA_653LE,"POLYGON ((-1.92361111111111 36.4530555555556, ...",0.0,4419.6


## Get all information about the three regulations above

First, I will check what kind of regulations are available and then, I will check what information is available on the three regulations from the example in the introduction.

In [8]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?s
WHERE {
  ?s rdfs:subClassOf* :FM_Regulation .
}
"""
 #?s rdf:type/rdfs:subClassOf* :SpatiotemporalRegion 

df = ts107.query(qry)
df = ts107.clean(df)
df.head(10)



Unnamed: 0,s
0,FM_Regulation
1,ATC_WeatherAlternateRegulation
2,ATC_SecurityRegulation
3,ATC_RestrictionWeatherAtDestinationRegulation
4,ATC_RestrictionStaffShortageRegulation
5,ATC_RestrictionRegulation
6,ATC_RestrictionAtDestinationRegulation
7,ATC_RestrictionAtDepartureRegulation
8,ATC_OtherRegulationAtDestination
9,ATC_ImmigrationCustomsHealthRegulation


What properties are available for the WX regulations? I hope to see some of the columns listed in the introduction.

In [13]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT ?p
WHERE {
  ?s a :ATC_WeatherRegulation ; 
     ?p ?o .
}
"""
df = ts107.query(qry)
df = ts107.clean(df)

df

Unnamed: 0,p
0,type
1,RegulationAiracCycle
2,RegulationDescription
3,hasRegion
4,hasParticipant
5,hasTimeInterval


PROBLEM

It seems that at least the 

 - reference location
 - reference location type
 - and the cancellation time 
 
are not available. Maybe hidden somewhere? I will inspect further. In the next query, I inspect all properties of the three regulations cited above.

In [117]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?s ?stype ?sector ?time ?participant ?description ?start ?end
WHERE {
  VALUES ?s {:EDDFA01_411 :LEBLA01M_411 :LECVN01M_411} 
  ?s    :hasRegion             ?sector ;
     dul:hasParticipant        ?participant ;
        :RegulationDescription ?description ;
     dul:hasTimeInterval       ?time  ;
         a                     ?stype.
  
  ?time :TimeStart ?start ;
        :TimeEnd   ?end .      
}
"""

reguls = ts107.query(qry)
reguls = ts107.clean(reguls)
reguls.head(20)




Unnamed: 0,s,stype,sector,time,participant,description,start,end
0,EDDFA01_411,ATC_WeatherRegulation,Sector_EDGGFMP1,intv_1459485600000_1459485600000,ATC_EDDF,STRONG WINDS,2016-04-01T04:40:00,2016-04-01T04:40:00
1,LEBLA01M_411,ATC_WeatherRegulation,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00
2,LECVN01M_411,ATC_Capacity,Sector_LECBFMP,intv_1459500000000_1459500000000,ATC_LECBCVN,,2016-04-01T08:40:00,2016-04-01T08:40:00


RESULT TECHNICAL

Unfortunately, neither reference location type nor cancellation time are encoded anywhere. But these would be necessary. The _reference location_ seems to be encoded in the _particpant_ property. Another interesting entity is the Sector_xyz, which I have not seen anwhere else.

NOTE

The 109 yields no results with the query above.

Lets inspect the _participant_ and the _sector_ objects further. It will be important to be able to get their coordinates, because at _particpant_, is the place where the WX regulation was caused. And at _sector_, the coordinates are important to be able to intersect them with the trajectories. 

The following query asks if the participants are ?subjects or ?objects in any other triple in the data store.
The query thereafter asks if the sectors are ?subjects or ?objects in any other triple in the data store.


In [119]:
qry = """PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>"""

qry1 = qry + ' SELECT ?s ?p WHERE {?s ?p :ATC_LECBCVN }'
qry2 = qry + ' SELECT ?p ?o WHERE {:ATC_LECBCVN ?p ?o }'
p1 = ts107.clean(ts107.query(qry1))
p2 = ts107.clean(ts107.query(qry2))

print(p1)
print(' ')
print(p2)

              s               p
0  LECVN01M_411  hasParticipant
 
Empty DataFrame
Columns: [p, o]
Index: []


In [123]:


qry = """PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>"""

qry1 = qry + """ SELECT ?s ?type ?p 
                 WHERE {?s ?p :Sector_LECBFMP;
                           a ?type .
                 }"""

qry2 = qry + ' SELECT ?p ?o WHERE {:Sector_LECBFMP ?p ?o }'
p1 = ts107.clean(ts107.query(qry1))
p2 = ts107.clean(ts107.query(qry2))
p1.to_csv('data/sectorrefs.csv')
print(p1.head(5))
print(' ')
print(p2)

              s                              type          p
0  LECVN01M_411                      ATC_Capacity  hasRegion
1  LEBLA03A_411   ATC_AerodromeCapacityRegulation  hasRegion
2  LEMNI103_411                      ATC_Capacity  hasRegion
3  LEBLA03L_411  ATC_EnvironmentalIssueRegulation  hasRegion
4  LEBP1U04_411                      ATC_Capacity  hasRegion
 
Empty DataFrame
Columns: [p, o]
Index: []


Conclusion: except for the regulation triples, no information about the _participant_ objects and the _sector_ Objects is available (tried with all three). Maybe something wrong with the query?


WHAT IS OK

It seems like the "referenceLocation" is coded as "hasParticipant". So, while I do not 100% agree if "hasParticipant" is the correct property for real-life semantics, at least the data is there.


PROBLEMS

As we can see, some data that was available in the raw data is missing.
 - start and end times are the same, what went wrong?
 - is the "hasParticipant" property really the reference location?
 - if yes, unfortunately, as seen in the query above, "participant" seems like a dead end. Except for appearing in the context of a regulation, there are no other triples. I would have expected to get a link to an airport (if ref loc is AD type) or to an airspace (if ref loc is an airspace). In this case, I would have expected to get the link to the airports LEBL and EDDF. Please note also, that the system has added ATC\_ in front of the reference location. In my eyes, this changes semantics, because it does not consider the "reference location type", which can be an airspace _or_ an aerodrome. And, just to remember, the EDDF and the LEBL example regulations were _aerodrome_ regulations.
 - the notion of Sector_xyz seems to be unknown for the rest of the database. This also looks like I am hitting a dead end, because I can therefore not link the regulations to airblocks (and their coordinates).

Somehow, I need to get from regulations to airblocks.

Maybe something wrong with my queries?




#### Another fun thing

The query below explodes if I try to add the subquery for the times. Then, I get incoherent results (multiple times for the same regulation ID, negative durations ...) Maybe a bug in the processing of the data?

In [134]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?s ?sector ?time ?participant ?description ?start ?end ?duration
WHERE {
  VALUES ?s {:EDDFA01_411 :LEBLA01M_411 :LECVN01M_411 } 
  ?s    :hasRegion             ?sector ;
     dul:hasParticipant        ?participant ;
        :RegulationDescription ?description ;
     dul:hasTimeInterval       ?time .
   
   ?time :TimeStart ?start ;
        :TimeEnd   ?end ; 
        :duration ?duration .

        
}
""" 

df = ts107.query(qry)
df = ts107.clean(df)
df.head(20)




Unnamed: 0,s,sector,time,participant,description,start,end,duration
0,EDDFA01_411,Sector_EDGGFMP1,intv_1459485600000_1459485600000,ATC_EDDF,STRONG WINDS,2016-04-01T04:40:00,2016-04-01T04:40:00,72
1,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,-50
2,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,44
3,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,341
4,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,-35
5,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,80
6,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,870
7,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,36
8,LEBLA01M_411,Sector_LECBFMP,intv_1459490400000_1459490400000,ATC_LEBL,CB + STORM,2016-04-01T06:00:00,2016-04-01T06:00:00,99
9,LECVN01M_411,Sector_LECBFMP,intv_1459500000000_1459500000000,ATC_LECBCVN,,2016-04-01T08:40:00,2016-04-01T08:40:00,80


# Weather Analysis (related to page 5 of tutorial) 

I think I will need some help here too!

Though we have seen that Sectors and Participants cannot be linked with other objects, I can infer from the semantics of the regulations that the airports EDDF and LEBL and the airspace LECB must have had severe wx at that time. 

Lets try to pull it out.

To my knowledge, the wx is written to the nodes, so I need to find the trajectories which have a spatio-temporal intersection with the wx event at the sector. For this, I begin with the basic intersection query from the tutorial, page 5.

In [144]:
# PREFIX bif: <java:datAcronTester.unipi.gr.sparql_functions.>   <--- only to be used in 109

qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?vessel ?time ?wkt ?speed  
WHERE {
?s1   a :Node ;
        :ofMovingObject ?vessel ;
        :hasSpeed ?speed ;
     dul:hasConstituent/:TimeStart ?time ;
     dul:hasConstituent/:hasWKT ?wkt .

FILTER(bif:st_distance(bif:st_geomfromtext ("POINT(13.139045 44.466133)"),bif:st_geomfromtext(?wkt))<=5)
}
"""

df = ts107.query(qry)
df = ts107.clean(df)
df.head(20)




EndPointInternalError: EndPointInternalError: endpoint returned code 500 and response. 

Response:
b'Virtuoso S1T00 Error SR171: Transaction timed out\n\nSPARQL query:\ndefine sql:big-data-const 0 \n#output-format:application/sparql-results+json\n\nPREFIX : <http://www.datacron-project.eu/datAcron#>\nPREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>\nPREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n\nSELECT ?vessel ?time ?wkt ?speed  \nWHERE {\n?s1   a :Node ;\n        :ofMovingObject ?vessel ;\n        :hasSpeed ?speed ;\n     dul:hasConstituent/:TimeStart ?time ;\n     dul:hasConstituent/:hasWKT ?wkt .\n\nFILTER(bif:st_distance(bif:st_geomfromtext ("POINT(13.139045 44.466133)"),bif:st_geomfromtext(?wkt))<=5)\n}\n'

PROBLEMS
 - I get a timeout on the 107 store
 - I get zero results on the 109 store
 - note: on the 107 store, the bif: prefix must not be used, on the 109 store, it must be used. Therefore, you cannot query both stores with the same string atm.
 
 
Back to EDDF and LEBL and LECB. Our goal is to get some weather for these places at regulation time. 

As a first step, we get the geometries of the _reference locations_.

In [137]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT ?airport ?iata ?placename ?geo ?o
WHERE {
  VALUES ?airport {'EDDF' 'LEBL' 'LECB'}
  ?s :hasICAOcode ?airport ;
     :hasIATAcode ?iata ;
     :hasPlaceName ?placename ;
     :hasGeometry ?geo .
  ?geo :hasMBR_WKT ?o .
     
}
"""

reflocs_manually = ts107.query(qry)
reflocs_manually = ts107.clean(reflocs_manually)
reflocs_manually.head(20)

Unnamed: 0,airport,iata,placename,geo,o
0,LEBL,BCN,BARCELONA/EL PRAT,geom_2_0783333778381348_41_29694366455078,POINT (2.0783333778381348 41.29694366455078)
1,EDDF,FRA,FRANKFURT MAIN,geom_8_570555686950684_50_03333282470703,POINT (8.570555686950684 50.03333282470703)


INFO

For the above query, once again, the 109 yields no results.


Next step: lets get time and position of Frankfurt airport and try to find some spatio-temporal intersecting trajectories.

In [139]:
time = reguls.iloc[0]['start']     #get the time of EDDF regulation
rpoint = reflocs_manually.iloc[0]['o'] #get the coordinates of Frankfurt airport (manually)

print(time)
print(rpoint)

2016-04-01T04:40:00
POINT (2.0783333778381348 41.29694366455078)


In the next query, I have to enter the times manually, because (as we have seen) the end time of a regulation is not available in the system. AFAIK, the nodes themselves only have start times, as they only represent a point in time.

In [145]:

qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX bif: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT ?vessel ?time ?wkt ?speed 
WHERE {
    ?s1     a              :Node ;
           :ofMovingObject ?vessel ;
           :hasSpeed       ?speed ;
        dul:hasConstituent/:TimeStart ?time ;
        dul:hasConstituent/:hasWKT ?wkt .
        
    FILTER(bif:st_distance( bif:st_geomfromtext ('""" + rpoint + """'), bif:st_geomfromtext(?wkt)) <= 500 &&
    xsd:dateTime(substr(?time,1,19))<xsd:dateTime("2016-04-01 04:40:00") && 
    xsd:dateTime(substr(?time,1,19))>xsd:dateTime("2016-04-01 06:52:00"))
}
"""

df = ts109.query(qry)
df = ts109.clean(df)
df.head(20)





Unnamed: 0,vessel,time,wkt,speed


PROBLEM

 - According to this query, no trajectory intersects with EDDF.... I think this must be a query problem.
 
 
 
Lets go on and check if wx conditions are available in the system.

In [4]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sw: <http://sweet.jpl.nasa.gov>

SELECT ?s ?w
WHERE {
    ?s     :hasWeatherCondition ?w .
}
"""

df = ts107.query(qry)
df = ts107.clean(df)
df.head(5)




Unnamed: 0,s,w
0,node_0_1453961518000_15.896575_43.7204233333333,weather_43.7204233333333_15.896575_0.0_1453971600
1,node_100000001_1453961006000_3.05081333333333_...,weather_36.7109833333333_3.05081333333333_0.0_...
2,node_104938_1453961970000_2.35546_51.042906666...,weather_51.0429066666667_2.35546_0.0_1453971600
3,node_10772765_1453961146000_4.71531666666667_5...,weather_51.828_4.71531666666667_0.0_1453971600
4,node_10772765_1453961746000_4.71518833333333_5...,weather_51.8279616666667_4.71518833333333_0.0_...


LITTLE PROBLEM

Only in the 107 data store, I get results.

In [5]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sw: <http://sweet.jpl.nasa.gov>

SELECT ?p ?o
WHERE {
   :weather_43.7204233333333_15.896575_0.0_1453971600   ?p ?o
}
"""

df = ts107.query(qry)
df = ts107.clean(df)
df.head(5)




Unnamed: 0,p,o
0,type,WeatherCondition
1,reportedDewPoint,14.859985
2,reportedMaxTemperature,14.799988
3,reportedMinTemperature,14.799988
4,reportedPressure,101688.46
