# Constructing a feature vector

In notebook 5.1, we have reconstructed the airspace openings and sliced them into occurences of one hour. Additionally, one auxiliary feature was prepared: general traffic demand/capacity ratio. The goal of this notebook is to add the weather information (features) and the regulation information (label) to the DataFrame. In the end, we want to have a big matrix where every row represents one obervation of length < one hour and and the wx that was observed at the time. The general construction therefore follow the approach of Andrienko and Andrienko:

TODO hier zeigen aufteilung in referrers und features

## 0. Imports and preparation

In [None]:
import os.path
import numpy as np
import matplotlib as mp
import matplotlib.pyplot as plt
import pandas as pd
import json
import cesiumpy
import random
from geomet import wkt
from pandas.io.json import json_normalize, read_json
from SPARQLWrapper import SPARQLWrapper, JSON, XML, RDF
from datetime import datetime
from IPython.display import HTML

#Set some parameters for nicer visualizations
pd.set_option('display.expand_frame_repr', False) #do not wrap the printout of Pandas DataFrames
pd.set_option('display.precision', 2)
mp.rcParams['figure.figsize'] = (15, 9)
mp.pyplot.style.use = False


# initialize my connection module which allows to connect oto both datAcron graph databases
from datacron_connector import TripleStoreConnector
ts107 = TripleStoreConnector(0)
ts109 = TripleStoreConnector(1)

#some technical comments
# PREFIX bif: <java:datAcronTester.unipi.gr.sparql_functions.>   <--- only to be used in 109

## 1. Get the open airblocks file from the last notebook

Pull the open_airblocks_hourly.csv that we created in the last notebook 5.1:

In [None]:
oblock_file = 'data/open_airblocks_hourly.csv'
if not os.path.isfile(oblock_file):
    print('File note found!')
    
else:
    #File found. Reload it and convert / assert data types:
    print('File found. reading from filestore...')
    import ast
    dfob = pd.read_csv(oblock_file, index_col=0)
    print('JSON to dict conversion...')
    dfob['actualJSON'] = dfob['actualJSON'].map(ast.literal_eval) #convert string to dict
    
    print('datatype conversion...')
    dfob['capacity'] = dfob['capacity'].apply(pd.to_numeric, errors='ignore', downcast='unsigned')
    dfob['lowerlevel'] = dfob['lowerlevel'].apply(pd.to_numeric, errors='ignore', downcast='float')
    dfob['upperlevel'] = dfob['upperlevel'].apply(pd.to_numeric, errors='ignore', downcast='float')
    dfob['start'] = pd.to_datetime(dfob['start'])
    dfob['end'] = pd.to_datetime(dfob['end'])
    dfob['duration'] = pd.to_timedelta(dfob['duration'])
    

dfob.head(3)

In [3]:
len(dfob)

272369

## 2. Get regulation data

First we will check what types of regulations are available. Thereafter, we will pull all regulations from the triple store and filter them for the relevant regulations within the spanish airspace.

In [4]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?s
WHERE {
  ?s rdfs:subClassOf* :FM_Regulation .
}
"""
 #?s rdf:type/rdfs:subClassOf* :SpatiotemporalRegion 

df = ts109.query(qry)
df = ts109.clean(df)
f = str(df['s'].tolist())
f

"['FM_Regulation', 'ATC_AccidentCausedRegulation', 'ATC_AerodromeCapacityRegulation', 'ATC_AirportFalicilitiesLimitationRegulation', 'ATC_AirspaceManagementRegulation', 'ATC_Capacity', 'ATC_DeIcingRegulation', 'ATC_EnvironmentalIssueRegulation', 'ATC_Equipment', 'ATC_ImmigrationCustomsHealthRegulation', 'ATC_IndustrialAction', 'ATC_OtherRegulation', 'ATC_OtherRegulationAtDestination', 'ATC_RestrictionAtDepartureRegulation', 'ATC_RestrictionAtDestinationRegulation', 'ATC_RestrictionRegulation', 'ATC_RestrictionStaffShortageRegulation', 'ATC_RestrictionWeatherAtDestinationRegulation', 'ATC_Routing', 'ATC_SecurityRegulation', 'ATC_SpecialEventRegulation', 'ATC_Staffin', 'ATC_WeatherAlternateRegulation', 'ATC_WeatherRegulation', 'NON_ATC_Equipment', 'NON_ATC_IndustrialAction']"

The above list shows all possible tpyes of regulations. Not all regulations are relevant for the weather regulation detection scenario. After discussion with CRIDA experts, the following regulations are mainly caused by weather: ATC_WeatherRegulation, ATC_AerodromeCapacityRegulation, ATC_DeIcingRegulation ATC_RestrictionWeatherAtDestinationRegulation .

In [21]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?regID ?regType ?sector ?start ?end ?canceltime ?description ?refLoc ?icao ?coord 
WHERE { 
  ?regID dul:hasRegion          ?sector ;
         :hasReferenceLocation  ?refLoc ;
         :RegulationDescription ?description ;
         dul:hasTimeInterval    ?time  ;
         :regulationCancelTime  ?ctime ;
         a                      ?regType.

  ?time :TimeStart ?start ;
        :TimeEnd   ?end .

  ?ctime :TimeStart ?canceltime
        
  OPTIONAL {?refLoc :hasICAOcode ?icao.}
  OPTIONAL {?refLoc :hasGeometry/:hasWKT ?coord.}
}
"""

df = ts109.query(qry)
df = ts109.clean(df)

Post-Processing:

In [22]:
df['start'] = pd.to_datetime(df['start']) 
df['end'] = pd.to_datetime(df['end'])
df['canceltime'] = pd.to_datetime(df['canceltime'])
spainmask = (df['sector'].str.contains('Airspace_LE'))
dfw = df[spainmask]

Filter for relevant regulation types only:

In [23]:
mask1 = (dfw['regType'].str.contains('ATC_WeatherRegulation'))
mask2 = (dfw['regType'].str.contains('ATC_AerodromeCapacityRegulation'))
mask3 = (dfw['regType'].str.contains('ATC_DeIcingRegulation'))
mask4 = (dfw['regType'].str.contains('ATC_RestrictionWeatherAtDestinationRegulation'))
dfw2 =dfw[mask1 | mask2 | mask3 |mask4]

Filter out the WIP regulations (WIP means work in progress - e.g. construction works at an airport):

In [24]:
mask5 = (~ dfw2['description'].str.contains('WIP'))
dfw3 = dfw2[mask5]
dfw3

Unnamed: 0,regID,regType,sector,start,end,canceltime,description,refLoc,icao,coord
135,LEBLA01M_411,ATC_WeatherRegulation,Airspace_LEBL_411,2016-04-01 06:00:00,2016-04-01 13:00:00,2016-04-01 11:41:42,CB + STORM_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
136,LEBLA01A_411,ATC_AerodromeCapacityRegulation,Airspace_LEBL_411,2016-04-01 16:40:00,2016-04-01 19:40:00,2016-04-01 18:36:00,_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
138,LEBLA08_411,ATC_WeatherRegulation,Airspace_LEBL_411,2016-04-08 06:50:00,2016-04-08 15:00:00,2016-04-08 11:42:15,HEAVY RAIN AND CB ACTIVITY_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
139,LEBLA08A_411,ATC_AerodromeCapacityRegulation,Airspace_LEBL_411,2016-04-08 14:00:00,2016-04-08 15:40:00,2016-04-08 13:12:54,_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
142,LEBLA16_411,ATC_AerodromeCapacityRegulation,Airspace_LEBL_411,2016-04-16 15:00:00,2016-04-16 17:00:00,2016-04-16 14:58:25,INCIDENT ON RWY 25L_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
143,LEBLA21M_411,ATC_WeatherRegulation,Airspace_LEBL_411,2016-04-21 07:40:00,2016-04-21 11:20:00,2016-04-21 10:32:11,THUNDERSTORMS_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
146,LEBLA23A_411,ATC_AerodromeCapacityRegulation,Airspace_LEBL_411,2016-04-23 17:40:00,2016-04-23 19:40:00,2016-04-23 19:05:28,DUE WEATHER SITUATION_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
147,LEBLA24_411,ATC_AerodromeCapacityRegulation,Airspace_LEBL_411,2016-04-24 18:00:00,2016-04-24 20:00:00,2016-04-24 18:47:38,_,Place_Barcelona_El_Prat_Airport,,POINT (2.078003349812917 41.30315527974634)
537,LEALA15M_411,ATC_WeatherRegulation,Airspace_LEAL_411,2016-04-15 07:00:00,2016-04-15 08:30:00,2016-04-15 08:03:43,LOW VIS_,Place_Alicante_Elche_Airport,LEAL,POINT (-0.5572304403635879 38.286640899392886)
538,LEALA15M_411,ATC_WeatherRegulation,Airspace_LEAL_411,2016-04-15 07:00:00,2016-04-15 08:30:00,2016-04-15 08:03:43,LOW VIS_,Place_Alicante_Elche_Airport,LEAL,POINT (-0.558055579662323 38.282222747802734)


## 3. Join Airblock data and regulation data


In this section, we are going to check each and every airblock of our `dfob` DataFrame which was loaded from open_airblocks_hourly.csv. For eack airblock opening, we are going to check if the relevant sector was regulated during the time of the opening.


In [9]:
dfob.head(2)

Unnamed: 0,config,airspace,sector,block,actualWKT,lowerlevel,upperlevel,start,end,capacity,duration,actualJSON,centerPoint,demand,ratio
83350,AirspaceConfiguration_LECSCTA_C2A_411,Airspace_LECSCTA_411,Airspace_LECSMALB_411,Airblock_LECSMALB_007LE,"POLYGON ((-3.31805555555556 37, -3.35611111111...",0.0,9296.4,2016-04-01 02:00:00,2016-04-01 02:59:59,24.99,00:59:59,"{'type': 'Polygon', 'coordinates': [[[-3.31805...","[-2.033055555555555, 36.664583333333354]",14.0,0.56
83340,AirspaceConfiguration_LECSCTA_C2A_411,Airspace_LECSCTA_411,Airspace_LECSCORB_411,Airblock_LECSCORB_004LE,"POLYGON ((-4.30027777777778 38.5194444444444, ...",0.0,9296.4,2016-04-01 02:00:00,2016-04-01 02:59:59,24.99,00:59:59,"{'type': 'Polygon', 'coordinates': [[[-4.30027...","[-2.55986111111111, 38.168611111111105]",83.98,3.36


In [17]:
from IPython.display import clear_output

dfob['regulated'] = False
dfob['regulRefLoc'] = ''
dfob['regulDescr'] = ''
dfob['regulCoord'] = ''


#### statt dem sector muss ich glaube ich den airspace aus dem blockfile nehmen! 
dfiter = dfob.head(5000)

for i, row in dfiter.iterrows():
    for regindex, regrow in dfw3.iterrows():
        if row['sector'] == regrow['sector']: # and (regrow['end'] >= row['start'] &
                                                # regrow['end'] <= row['end'] |
                                                # regrow['start'] >= row['start'] &
                                                # regrow['start'] <= row['end'] |
                                                # regrow['start'] <= row['start'] &
                                                # regrow['end'] >= row['end']) :
            dfiter.set_value(i, 'regulated', True)
            dfiter.set_value(i, 'regulRefLoc', regrow['refLoc'])
            dfiter.set_value(i, 'regulDescr', regrow['description'])
            dfiter.set_value(i, 'regulCoord', regrow['coord'])
    clear_output(wait = True)
    print('Row' + str(i) + ' done.')


Row91349 done.


In [18]:
dfiter.to_csv('data/open_airblocks_hourly_with_reguls.csv')

TODO:

 - an das airblock dataset ranschreiben ob reguliert wurde; dann mit refLoc
 - wetter von wunderground.com ranschreiben
 
 

## Das funktioniert auf dem 107! Trajektorien und Sektoren:


PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> 
SELECT ?fp ?segment ?entry ?exit ?sector 
WHERE { 
   ?fp a :FM_FTFM;
                           :reportsTrajectory ?t.
   ?t dul:hasPart ?segment.
   ?segment a :Segment;
            :within ?sector;
            :hasTemporalFeature ?time.
   ?time :TimeStart ?entry;
         :TimeEnd ?exit.                        
} ORDER BY ?sector 

LIMIT 10000

## SPARQL Müllablage:

In [26]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT ?regulation ?airspace (myfn:getGeom(?g) as ?WKT) WHERE {
  ?regulation a ?type . ?type rdfs:subClassOf :FM_Regulation .
  ?regulation dul:hasRegion ?airspace .
  ?airspace dul:hasPart ?s .
  ?s dul:hasPart ?b .
  ?b :hasGeometry ?g
}

"""

df3 = ts109.query(qry)
#df3 = ts109.clean(df)

df3.to_csv('data/regsfromgiorgos.csv')
df3.head(5)

Unnamed: 0,regulation,airspace,WKT
0,http://www.datacron-project.eu/datAcron#LFLBA1...,http://www.datacron-project.eu/datAcron#Airspa...,"POLYGON ((5.59916666666667 45.9711111111111, 5..."
1,http://www.datacron-project.eu/datAcron#LFLBA1...,http://www.datacron-project.eu/datAcron#Airspa...,"POLYGON ((5.9175 45.6375, 5.93083333333333 45...."
2,http://www.datacron-project.eu/datAcron#LFLBA1...,http://www.datacron-project.eu/datAcron#Airspa...,"POLYGON ((5.53333333333333 45.6833333333333, 5..."
3,http://www.datacron-project.eu/datAcron#LFLBA1...,http://www.datacron-project.eu/datAcron#Airspa...,"POLYGON ((5.94527777777778 45.9302777777778, 5..."
4,http://www.datacron-project.eu/datAcron#LFLBA1...,http://www.datacron-project.eu/datAcron#Airspa...,"POLYGON ((5.96 45.9519444444444, 6.05916666666..."


In [39]:
qry = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?aspace ?sector ?block ?lowerlevel ?upperlevel ?regulation ?refLoc ?regstart ?regend
WHERE { 
   
   # get all blocks for the open airspaces (vastly increasing the result set):
   ?aspace dul:hasPart                   ?sector.
   ?sector dul:hasPart                   ?block .
   ?block       :hasLowerLevel           ?lowerlevel ; :hasUpperLevel ?upperlevel ; :hasGeometry   ?geom.
   
   # get regulations, if exist, for the open airspaces:
   ?regulation dul:hasRegion             ?sector ;
                  :hasReferenceLocation  ?refLoc ;
               dul:hasTimeInterval       ?regtime  .
   ?regtime       :TimeStart             ?regstart ;
                  :TimeEnd               ?regend .   
         
   
   # now we have spatially cross-producted all airblocks with all regulations there ever were.
   # We need to apply the temporal filter to show only results where airblock_open_time ~ regulation_time 
   # note: it would be better if I could temporally intersect right here with the filter but I couldn't get it done.
   # with something like (xsd:dateTime(?regend) > xsd:dateTime(?start) &&....
   
   # filter for only spanish airblocks:
   FILTER (regex(str(?block), 'Airblock_LE', "i") )
   
}
"""

df = ts109.query(qry)
df = ts109.clean(df)

In [None]:
Comment 8:

PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT ?regulation ?airspace (myfn:getGeom(?g) as ?WKT) WHERE {
  ?regulation a ?type . ?type rdfs:subClassOf :FM_Regulation .
  ?regulation dul:hasRegion ?airspace .
  ?airspace dul:hasPart ?s .
  ?s dul:hasPart ?b .
  ?b :hasGeometry ?g
} limit 10


In [None]:
"If you try:

        PREFIX : <http://www.datacron-project.eu/datAcron#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> 
        PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.> 
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

        SELECT  * WHERE {
        ?s :hasMBR_WKT ?mbr
        } LIMIT 10

you get the MBR. If you try:

        PREFIX : <http://www.datacron-project.eu/datAcron#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
        PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

        SELECT ?s (myfn:getGeom(?s) as ?actualGeom) WHERE {
        ?s :hasMBR_WKT ?mbr
        } LIMIT 10

you get the WKT of the actual geometry :) .

Kind regards,
Giorgos"

Background info on the work: we are using built-in-functions to apply the geospatial intersection between a sector and a trajectory. Here are the functions that, according to UPRC, are available in the 109 triple store:
"...
 - after(a,b): returns true if a is temporally after b
 - before(a,b): returns true if a is temporally before b
 - crosses(a,b): returns true if geometry a crosses geometry b
 - crossesWKT(a,b): same as crosses/2, but arguments here are WKT
 - distanceWKT(a,b): computes the distance (in degrees) between two WKT geometries
 - during(a,b): returns true, if a is a temporal interval within b
 - during_sf(a,b): same as during/2, but here a can also start or finish b
 - equal(a,b): returns true if the temporal intervals a,b are equal
 - finishes(a,b): returns true if interval a has the same end time as b
 - getGeom(a): returns the WKT representation of the geometry a (a is the URI of the geometry)
 - maxDate(a,b): returns the latest datetime among a and b
 - meets(a,b): returns true if end time of a is start time of b (and vice versa)
 - minDate(a,b): returns the oldest datetime among a and b
 - starts(a,b): returns true if a temporally starts b
 - temporalAny(a,b): returns true if there is any temporal relation between a and b
 - toNOAAurl(a): returns the link to the NOAA file given a datetime a (using "a minus 2" hours).
We can add any other function needed on the 109 endpoint (but not on 107). Please also notice, that the functions at 107 are not available at 109 and vice versa. ..."



DAS IST DIE QUERY DIE SCHONMAL FUNKTIONIERT UND DIE ICH BRAUCHE:


    PREFIX : <http://www.datacron-project.eu/datAcron#>
    PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX bif: <java:datAcronTester.unipi.gr.sparql_functions.>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

    SELECT ?fp ?node ?nwkt ?naltitude ?ntemp  
    WHERE{
    ?fp a    :FM_FTFM;
             :reportsTrajectory/a :IntendedTrajectory;
             :reportsTrajectory/dul:hasPart ?node;
             :reportsTrajectory/ dul:hasGeometry/:hasWKT ?twkt .

    ?node    
             :hasGeometry/:hasWKT ?nwkt;
             :hasGeometry/:hasAltitude ?naltitude;          
             dul:hasTemporalFeature/:TimeStart ?ntemp.       
  
    FILTER(bif:crossesWKT(?twkt, 
      "POLYGON ((-4.19666666666667 42.6947222222222, -3.61527777777778 42.7497222222222, -3.13083333333333 42.7936111111111, 				-3.32638888888889 42.6677777777778, -3.00472222222222 42.385, -2.75222222222222 42.3458333333333, -2.7375 					42.4277777777778, -2.62083333333333 42.5280555555556, -2.62444444444444 42.2563888888889, -4.375 							42.3333333333333, -4.19666666666667 42.6947222222222))") 
      
       && ?ntemp > xsd:dateTime("2016-04-01T09:00:00") && ?ntemp < xsd:dateTime("2016-04-01T18:00:00"))
    } LIMIT 20