# datAcron Ontology Traversals

This Jupyter NB serves as validation for the example queries and additionally as navigation and exploration notebook for the datAcron triple store. Where data is saved to CSVs, these CSV files are available in the data/ folder.


## Definition of a query function

First, we define a query function that connects to the RDF store and returns the query results. There are two triple stores available: 
 - http://83.212.239.107:8890/sparql : the big triple store containing all data
 - http://83.212.239.109:3434/sparql : the centralized sandbox triple store
 
In general, I connect to the big store, because we will see later that the 109er store will yield no results in most cases.

In [320]:
import pandas as pd
from pandas.io.json import json_normalize
from SPARQLWrapper import SPARQLWrapper, JSON

def query_data(sparql_query, sparql_service_url='http://83.212.239.107:8890/sparql'):
    """
    Query the endpoint with the given query string and return the results as a pandas Dataframe.
    """
    # create the connection to the endpoint; set return format; ask for result
    sparql = SPARQLWrapper(sparql_service_url)  
    sparql.setQuery(sparql_query)
    sparql.setReturnFormat(JSON)
    result = sparql.query().convert()
    
    #clean up the column mess (thanks to David Knodt)
    for row in result['results']['bindings']:
        for key in row.keys():
            row[key] = row[key]['value']            
    if len(result["results"]["bindings"]):
        return json_normalize(result["results"]["bindings"])
    else:
        return pd.DataFrame(columns=(result['head']['vars']))
    

    

In [321]:
def query_value_cleaning(df):
    """
    Drop URI prefixes from result values. Excpects a Pandas DataFrame as parameter.
    """
    if type(df) != pd.core.frame.DataFrame:
        print('Query Cleaning nicht möglich - es wurde kein Pandas DataFrame übergeben.')
        print ('der Typ des Objekts ist ein {}'.format(type(df)) )
        return df
    
    prefixes = ['http://www.datacron-project.eu/datAcron#']
    prefixes.append('http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#')
    prefixes.append('http://www.w3.org/2001/XMLSchema#')
    prefixes.append('http://www.w3.org/2000/01/rdf-schema#')
    prefixes.append('java:datAcronTester.unipi.gr.sparql_functions.')
    prefixes.append('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
    prefixes.append('http://www.openlinksw.com/schemas/virtrdf#')
    
    for item in prefixes:
        for column in df:
            df[column].replace(regex=True, inplace=True, to_replace=item, value='')
    return df

## 2. Tutorial-Queries from "SPARQL_queries_example.pdf"

The following (long) section represents my work trying to replicate the given example queries.

### 2.1 Pull all concepts

The first query returns all concepts, for which at least one assertion exists in the triple.

In [322]:
sparql_query = """
PREFIX datp: <http://datacron-project.eu#>
SELECT DISTINCT ?Concept 
WHERE {
    GRAPH ?g {[] a ?Concept}
} LIMIT 100
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
df.head(10)


Unnamed: 0,Concept
0,QuadMapFormat
1,QuadStorage
2,array-of-QuadMapFormat
3,QuadMap
4,QuadMapValue
5,array-of-QuadMapColumn
6,QuadMapColumn
7,array-of-QuadMapATable
8,QuadMapATable
9,QuadMapFText


### 2.2 Pull all properties

this query pulls all properties from the RDF store, for which at least one instance exists.
Unfortunately, this query yields no results, neither in the ...107 nor in the ...109 data store. TODO
In both cases, the result set is empty.


In [323]:
sparql_query = """PREFIX datp: <http://datacron-project.eu#>
SELECT DISTINCT ?Property 
WHERE {
GRAPH ?g { [] ?Property []} 
FILTER(?g=<http://localhost:8890/DAV>)
}
"""

df = query_data(sparql_query)
#df = query_value_cleaning(df)
df.head(10)



Unnamed: 0,Property


### 2.3 Pull sector configs, affected airspaces, time periods und capacities

This query pull the sector configurations which were selected in the past and lists the affected aurspaces and defined capacities.

In [324]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * 
WHERE {
  ?c a :FM_Configuration ;
       :hasCapacity ?capacity ;
       :configurationOfAirspace ?airspace ;
       dul:hasConstituent/:TimeStart ?start ;
       dul:hasConstituent/:TimeEnd ?end.
}
ORDER BY ?start
LIMIT 2000
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
df.to_csv('data/configs_and_affected_airspaces1.csv')
df.head(5)



Unnamed: 0,airspace,c,capacity,end,start
0,Airspace_LFMMXCTA_411,AirspaceConfiguration_LFMMXCTA_CF1_411,999,2016-04-01T23:59:00,2016-03-31T00:00:00
1,Airspace_LFMMXCTA_411,AirspaceConfiguration_LFMMXCTA_CF1_411,999,2016-04-02T23:59:00,2016-03-31T00:00:00
2,Airspace_LFMMXCTA_411,AirspaceConfiguration_LFMMXCTA_CF1_411,999,2016-04-03T23:59:00,2016-03-31T00:00:00
3,Airspace_LFMMXCTA_411,AirspaceConfiguration_LFMMXCTA_CF1_411,999,2016-04-04T23:59:00,2016-03-31T00:00:00
4,Airspace_LFMMXCTA_411,AirspaceConfiguration_LFMMXCTA_CF1_411,999,2016-04-05T23:59:00,2016-03-31T00:00:00


### 2.4 Configurations and their sectors (own work)

If we want to know which configuration affects which sectors, we can alter the above query: we omit the timestamp and select only DISTINCT values. 

In [325]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT * 
WHERE {
  ?c a :FM_Configuration ;
       :hasCapacity ?capacity ;
       :configurationOfAirspace ?airspace .
}
LIMIT 25000
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
%time df = df.sort_values('airspace')
df.to_csv('data/configs_and_affected_airspaces2.csv')
df.head(10)



CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 56.4 ms


Unnamed: 0,airspace,c,capacity
0,Airspace_BIRDCTA_411,AirspaceConfiguration_BIRDCTA_CONF1_411,999
1,Airspace_BIRDICTA_411,AirspaceConfiguration_BIRDICTA_CNF1_411,999
2,Airspace_BIRDTOCA_411,AirspaceConfiguration_BIRDTOCA_CONF1_411,999
3,Airspace_DAAACTA_411,AirspaceConfiguration_DAAACTA_CONF1_411,999
4,Airspace_DAAATCTA_411,AirspaceConfiguration_DAAATCTA_CNF1_411,999
5,Airspace_DTTCCTA_411,AirspaceConfiguration_DTTCCTA_CONF2_411,999
6,Airspace_DTTCTCTA_411,AirspaceConfiguration_DTTCTCTA_CNF1_411,999
13,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38
12,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2H_411,999
11,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W1_411,999


RESULT: Technical result: to avoid timeouts, it is better to pull the data first and let Pandas do the sorting. If I include an ORDER BY clause, the query will timeout. Instead, I use pandas sort_values method.

### 2.5 Pull airspaces with capacity != 999

The following query (from the tutorial) pulls only those configuration, where a capacity number != 999 was set. 

In [237]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * 
WHERE {
   ?c a :FM_Configuration ;
      :hasCapacity ?capacity ;
      :configurationOfAirspace ?airspace ;
      dul:hasConstituent/:TimeStart ?start ;
      dul:hasConstituent/:TimeEnd ?end.
   FILTER(?capacity !='999')
}
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
%time df = df.sort_values('airspace')
df.to_csv('data/capacity_limited_configs.csv')
df.head(10)



CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 21.2 ms


Unnamed: 0,airspace,c,capacity,end,start
993,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T06:59:00,2016-04-19T08:00:00
973,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T08:59:00,2016-04-13T08:00:00
974,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T08:59:00,2016-04-13T08:00:00
975,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T08:59:00,2016-04-13T08:00:00
977,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-13T08:59:00,2016-04-19T04:40:00
978,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-13T08:59:00,2016-04-19T04:40:00
979,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-13T08:59:00,2016-04-19T04:40:00
980,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T06:59:00,2016-04-19T04:40:00
981,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T06:59:00,2016-04-19T04:40:00
982,Airspace_EBBUCTA_411,AirspaceConfiguration_EBBUCTA_CE2W2L_411,38,2016-04-19T06:59:00,2016-04-19T04:40:00


### 2.6 Configs and airspaces with limited capacity: only DISCTINCT values

I want to know if there is a direct relation between a configuration and the resulting airspace capacity. If this is the case, then the following query should only return one row per configuration ID.


In [330]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT * 
WHERE {
   ?c a :FM_Configuration ;
      :hasCapacity ?capacity ;
      :configurationOfAirspace ?airspace .
   FILTER(?capacity !='999')
}
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
%time df = df.sort_values('c')
df.to_csv('data/capacity_limited_configs2.csv')
df.iloc[10:14]



CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 997 µs


Unnamed: 0,airspace,c,capacity
11,Airspace_EDWWCTAE_411,AirspaceConfiguration_EDWWCTAE_E7A_411,30
10,Airspace_EDWWCTAE_411,AirspaceConfiguration_EDWWCTAE_E7A_411,15
12,Airspace_EDWWCTAE_411,AirspaceConfiguration_EDWWCTAE_E8A_411,15
13,Airspace_EDWWCTAE_411,AirspaceConfiguration_EDWWCTAE_E8A_411,30


RESULT: SEMANTICAL: as we can see with airspace configuration AirspaceConfiguration_EDWWCTAE_E7A_411 (EDWW is Bremen Radar), there are two different capacities for a single configuration. Therefore, the assumption that a direct dependency config --> capacity exists, does not hold. 

It has to be noted though, that, looking at spanish sector configs, I could not find duplicate entries.



### 2.7 The other way around: pull all configs for a specific airspace

The following query pulls all Konfigurations that belong to the Airspace Airspace_LSAGCTA. 


In [331]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * 
WHERE {
  ?c a :FM_Configuration ;
     :hasCapacity ?capacity;
     :configurationOfAirspace :Airspace_LSAGCTA_411 ;
     dul:hasConstituent/:TimeStart ?start ;
     dul:hasConstituent/:TimeEnd ?end.
}
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
%time df = df.sort_values('start')
df.to_csv('data/configs_of_athens.csv')
df.head(3)



CPU times: user 20 ms, sys: 0 ns, total: 20 ms
Wall time: 20.7 ms


Unnamed: 0,c,capacity,end,start
0,AirspaceConfiguration_LSAGCTA_I1A_411,40,2016-04-01T03:54:00,2016-04-01T00:00:00
565,AirspaceConfiguration_LSAGCTA_I1A_411,40,2016-04-18T10:44:00,2016-04-01T00:00:00
566,AirspaceConfiguration_LSAGCTA_I1A_411,40,2016-04-18T10:49:00,2016-04-01T00:00:00


RESULT TECHNICAL:

Note the syntactic subtlety: obviously, 'Airspace_LSAGCTA_411' is not the name but the ID of the Airspace in the triple store. Therefore, we do not need to search for a name and then combine the queries; it is sufficent to enter the Id directly with the ":" syntax.  

The following query confirms that.

In [332]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?p ?o
WHERE {
  :Airspace_LSAGCTA_411 ?p ?o .
}
LIMIT 2000
"""

df = query_data(sparql_query)
df = query_value_cleaning(df)
%time df = df.sort_values('p')
df.to_csv('data/airspace_properties.csv')
df.head(10)



CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 620 µs


Unnamed: 0,o,p
1,Airspace_LSAGE_411,hasPart
2,Airspace_LSAGN_411,hasPart
3,Airspace_LSAGS_411,hasPart
0,FM_Airspace,type


### 2.9 Further inspect a specific airspace (own query)

The above result show us that the airspace Airspace_LSAGCTA_411 consist of three sub-airspaces: Airspace_LSAGE_411, Airspace_LSAGN_411, Airspace_LSAGS_411.

Lets inspect these sub-airspaces, and look what they are made of.

RESULT TECHNICAL
Use the VALUES syntax to repeat the input-parameters in the result and get complete triples in the result table.

In [350]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT  ?s ?p ?o
WHERE { 
  VALUES ?s { :Airspace_LSAGE_411 :Airspace_LSAGN_411 :Airspace_LSAGS_411}
  ?s ?p ?o. 
}

LIMIT 2000
"""

blocks = query_data(sparql_query)
blocks = query_value_cleaning(blocks)
%time blocks = blocks.sort_values('p')
blocks.to_csv('data/airspace_inspection.csv')
blocks

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 1.2 ms


Unnamed: 0,o,p,s
49,INITIAL EAST,hasName,Airspace_LSAGE_411
50,_,hasName,Airspace_LSAGN_411
51,INITIAL SUD,hasName,Airspace_LSAGS_411
28,Airblock_LSAGN_LSAGN_726LF_0_195_0_195,hasPart,Airspace_LSAGN_411
29,Airblock_LSAGN_LSAGN_727LF_0_195_0_19,hasPart,Airspace_LSAGN_411
30,Airblock_LSAGS_LSAGS_040LF_0_195_0_195,hasPart,Airspace_LSAGS_411
31,Airblock_LSAGS_LSAGS_077LF_195_245_195_245,hasPart,Airspace_LSAGS_411
32,Airblock_LSAGS_LSAGS_416LI_0_245_0_245,hasPart,Airspace_LSAGS_411
33,Airblock_LSAGS_LSAGS_418LI_195_245_195_245,hasPart,Airspace_LSAGS_411
34,Airblock_LSAGS_LSAGS_505LS_0_245_0_245,hasPart,Airspace_LSAGS_411


RESULT SEMANTICAL

The sub-Airspaces consist of airblocks. It should be possible to inspect these airblocks in more detail. 

TODO
Unfortunately, the following query does not return mor details on any airblock ID that I enter into the parameters. Normally, I would have excpected to get the 3D coordinates of the airblocks. This point prevents me from proceeding with my regulations analysis.

In [351]:
# Construct the list of airblocks
str = ''
for i in range(len(blocks)):
    if blocks.iloc[i]['p'] == 'hasPart':
        str = str +' :' + blocks.iloc[i]['o']
        

#Inject the list of airblocks into the sparql query and get all properties of all airblocks
sparql_query2 = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?p ?o
WHERE {
  VALUES ?s {""" + str + """}
  ?s ?p ?o.
}
LIMIT 2000
"""
print(sparql_query2)

df2 = query_data(sparql_query2)
df2 = query_value_cleaning(df2)
%time df2 = df2.sort_values('p')

df2




PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?p ?o
WHERE {
  VALUES ?s { :Airblock_LSAGN_LSAGN_726LF_0_195_0_195 :Airblock_LSAGN_LSAGN_727LF_0_195_0_19 :Airblock_LSAGS_LSAGS_040LF_0_195_0_195 :Airblock_LSAGS_LSAGS_077LF_195_245_195_245 :Airblock_LSAGS_LSAGS_416LI_0_245_0_245 :Airblock_LSAGS_LSAGS_418LI_195_245_195_245 :Airblock_LSAGS_LSAGS_505LS_0_245_0_245 :Airblock_LSAGS_LSAGS_506LS_155_245_155_245 :Airblock_LSAGS_LSAGS_523LF_195_245_195_245 :Airblock_LSAGS_LSAGS_710LF_0_245_0_245 :Airblock_LSAGS_LSAGS_711LF_195_245_195_245 :Airblock_LSAGS_LSAGS_712LF_195_245_195_245 :Airblock_LSAGS_LSAGS_713LF_195_245_195_245 :Airblock_LSAGS_LSAGS_714LF_195_245_195_245 :Airblock_LSAGS_LSAGS_714LI_0_245_0_245 :Airblock_LSAGS_LSAGS_715LI_195_245_195_245 :Airblock_LSAGS_LSAGS_719LF_155_245_155_245 :Airblock_LSAGS_LSAGS_720LF_155_245_155_245 :Airblock_LSAGS_LSAGS_734LF_195_245_195_245 :Airblock_LSAGS_LSAGS_803LS_130_245

Unnamed: 0,p,o


### 2.9 Get the name of an airspace 

As known from above, the string with _411 at the end is the airspace ID. In order to get the humanly readable name of an airspcace, query: 

In [349]:
sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT ?s ?o WHERE {
      ?s a :FM_Airspace ;
         :hasName ?o.
}
LIMIT 100
"""
df = query_data(sparql_query)
query_value_cleaning(df)
#df.to_csv("airspaceconfigs.csv")
df.head(5)

Unnamed: 0,o,s
0,BARDARBUNGA,Airspace_BIRDBARDAR_411
1,LUXEMBURG SECTOR 195-245,Airspace_EBBULUH_411
2,ADJACENT CCAMS EB AOI,Airspace_EBCCAOI_411
3,ADJACENT CCAMS EB AOP,Airspace_EBCCAOP_411
4,ELSENBORN01,Airspace_EBR04_411


### 2.10 Change of subject: Trajectories Inspection

The following query (from the tutorial) pulls 100 trajectories out of the store, directly with their coordinates.

NOTE
Be aware that the direct availability of the coordinates is a convenience function created by the RDF maintainers. It prevents us from needing to traverse into every single semantic node of a tracjectory to get the coordinates and allows for easy pltting.

TODO
Unfortunately, this convenience function only gives us 2D coordinates without altitude or time information. But what we need in order to cross trajectories and airblocks is the 4D coordinates including time and altitude. The chosen data format, well known text (WKT) is perfectly able to handle 4D coordinates, because z coordinate is available for altitude rep and m coordinate for linear data (time passed since epoch could be considered as a linear distance of a point from the epoch).


In [400]:
sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT DISTINCT * WHERE {
  ?flight_plan_ID a :FM_RTFM ;
                    :departureAirport ?dep ;
                    :destinationAirport ?dest ;
                    :reportsTrajectory ?trajectory_ID .
  ?trajectory_ID dul:hasConstituent/:hasWKT ?geom

} LIMIT 100

"""

#to query another endpoint, change the URL for the service and the query
df = query_data(sparql_query)
query_value_cleaning(df)
df.to_csv("data/trajectories.csv")
df.head(10)

Unnamed: 0,dep,dest,flight_plan_ID,geom,trajectory_ID
0,Place_MontpellierMediterranee_Airport,Place_BastiaPoretta_Airport,flight_plan_AA51114077,"LINESTRING(3.961389 43.583333, 3.968056 43.580...",traj_AA51114077_20160401003000
1,Place_Bristol___Lulsgate,Place_TenerifeSur_ReinaSofia_Airport,flight_plan_AA51114336,"LINESTRING(-2.719167 51.382778, -2.736111 51.3...",traj_AA51114336_20160331154500
2,Place_Valencia_Manises_Airport,Place_Bucuresti_HenriCoanda_Airport,flight_plan_AA51114524,"LINESTRING(-0.481667 39.489444, -0.492222 39.4...",traj_AA51114524_20160331183000
3,Place_Nantes_Atlantique_Airport,Place_London___Gatwick_Airport,flight_plan_AA51115368,"LINESTRING(-1.607778 47.156944, -1.601389 47.1...",traj_AA51115368_20160331155000
4,Place_QuimperPluguffan_Airport,Place_ParisOrly_Airport,flight_plan_AA51121974,"LINESTRING(-4.167778 47.975, -4.143056 47.9677...",traj_AA51121974_20160401044000
5,Place_Valencia_Manises_Airport,Place_Lisboa___Portela,flight_plan_AA51125894,"LINESTRING(-0.481667 39.489444, -0.472222 39.4...",traj_AA51125894_20160401045000
6,Place_ParisOrly_Airport,Place_Lisboa___Portela,flight_plan_AA51134032,"LINESTRING(2.379444 48.723333, 2.365278 48.720...",traj_AA51134032_20160401043000
7,Place_Barcelona_ElPrat_Airport,Place_Lisboa___Portela,flight_plan_AA51134247,"LINESTRING(2.078333 41.296944, 2.163333 41.290...",traj_AA51134247_20160401044000
8,Place_Erzurum_Airport,Place_Istanbul_SabihaGokcen_Airport,flight_plan_AA51137475,"LINESTRING(41.170556 39.955833, 41.197778 39.9...",traj_AA51137475_20160401053000
9,Place_Praha_Ruzyne_Airport,Place_FrankfurtMain_Airport,flight_plan_AA51138917,"LINESTRING(14.26 50.100833, 14.247222 50.0975,...",traj_AA51138917_20160401052000


We can visualize these trajectories in 2D using gemoet [https://pypi.python.org/pypi/geomet/0.1.0] for  WKT-->GeoJson transformation and folium [https://github.com/python-visualization/folium], which is a python wrapper for Leaflet.js interactive maps.

In [401]:
from geomet import wkt
import json

#Convert WKT column into a gejson column
df["geojson"] = df["geom"].apply(lambda x: json.dumps(wkt.loads(x)))
df = df.drop('geom', 1)
df.head(3)

Unnamed: 0,dep,dest,flight_plan_ID,trajectory_ID,geojson
0,Place_MontpellierMediterranee_Airport,Place_BastiaPoretta_Airport,flight_plan_AA51114077,traj_AA51114077_20160401003000,"{""type"": ""LineString"", ""coordinates"": [[3.9613..."
1,Place_Bristol___Lulsgate,Place_TenerifeSur_ReinaSofia_Airport,flight_plan_AA51114336,traj_AA51114336_20160331154500,"{""type"": ""LineString"", ""coordinates"": [[-2.719..."
2,Place_Valencia_Manises_Airport,Place_Bucuresti_HenriCoanda_Airport,flight_plan_AA51114524,traj_AA51114524_20160331183000,"{""type"": ""LineString"", ""coordinates"": [[-0.481..."


In [402]:
df.iloc[0]['geojson']

'{"type": "LineString", "coordinates": [[3.961389, 43.583333], [3.968056, 43.580833], [3.974722, 43.578333], [3.986389, 43.575278], [3.998333, 43.5725], [4.163333, 43.530556], [4.269167, 43.503889], [4.398889, 43.471111], [4.540278, 43.435278], [4.716944, 43.390556], [4.917222, 43.34], [4.928889, 43.336944], [5.159722, 43.276667], [5.333056, 43.231667], [5.729722, 43.229167], [5.828889, 43.228611], [6.023889, 43.226667], [6.601944, 43.219444], [6.841667, 43.346111], [6.893333, 43.373333], [7.3625, 43.327778], [7.621389, 43.301944], [8.303056, 43.170833], [8.434722, 43.145556], [8.626944, 43.118333], [8.759167, 43.099722], [9.059444, 43.057222], [9.083333, 43.053889], [9.395556, 42.978333], [9.488333, 42.955833], [9.603889, 42.927778], [9.574444, 42.847222], [9.568056, 42.829444], [9.542222, 42.758889], [9.526111, 42.714722], [9.513333, 42.679444], [9.506944, 42.661944], [9.474722, 42.573611], [9.484722, 42.55]]}'

In [403]:
import folium

map1 = folium.Map(location=[40,10], zoom_start=4, control_scale=True, prefer_canvas=True)
for index, row in df.head(5).iterrows():
    c = folium.GeoJson(row['geojson'], name = (row['dep']+ row['dest']),overlay=True, 
                       style_function = lambda feature: {'fillColor': '#ffaf00','color': 'blue', 'weight': 2.5,'dashArray': '5, 5'},
                       highlight_function = lambda feature: {'fillColor': '#ffaf00','color': 'green', 'weight': 5,'dashArray': '5, 5'})
    c.add_child(folium.Popup(row['dep'] +'\n' + row['dest']))
    c.add_to(map1)
folium.LayerControl().add_to(map1)
map1.save(outfile='maps/map1.html')

map1    

### 2.11 Visualize trajectory with time

With the query above, we were able to get the 2D representation of a trajectory. To include time information, I need to inspect the trajectories in more detail:

Mit der obigen Abfrage konnten wir die Trajektorien in 2D anschauen. Etwas detaillierter können wir zu einer einzelnen Trajektory ID (z.B. Bristol-Teneriffa traj_AA51114336_20160331154500) die Daten wie folgt rausziehen:

In [411]:
sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?node ?time ?position WHERE {
    :traj_AA51114077_20160401003000 :hasPart ?node .
    ?node dul:hasConstituent/:TimeStart ?time ;
          dul:hasConstituent/:hasWKT ?position ; 
          ?p ?o .
}ORDER BY ?time

"""

singletraj = query_data(sparql_query)
query_value_cleaning(singletraj)
singletraj.to_csv("data/singletrajectory.csv")
#convert position to geoJson
singletraj["geojson"] = singletraj["position"].apply(lambda x: wkt.loads(x))
singletraj.head(5)

Unnamed: 0,node,position,time,geojson
0,n_AA51114077_20160401003000_1,POINT(3.961389 43.583333),2016-04-01T03:52:30,"{'type': 'Point', 'coordinates': [3.961389, 43..."
1,n_AA51114077_20160401003000_1,POINT(3.961389 43.583333),2016-04-01T03:52:30,"{'type': 'Point', 'coordinates': [3.961389, 43..."
2,n_AA51114077_20160401003000_1,POINT(3.961389 43.583333),2016-04-01T03:52:30,"{'type': 'Point', 'coordinates': [3.961389, 43..."
3,n_AA51114077_20160401003000_1,POINT(3.961389 43.583333),2016-04-01T03:52:30,"{'type': 'Point', 'coordinates': [3.961389, 43..."
4,n_AA51114077_20160401003000_1,POINT(3.961389 43.583333),2016-04-01T03:52:30,"{'type': 'Point', 'coordinates': [3.961389, 43..."


In [412]:
singletraj.iloc[0]['geojson']

{'coordinates': [3.961389, 43.583333], 'type': 'Point'}

I will now define a fucntion inject_time which allows to enter the time into the geojson column.

In [413]:
def inject_time(geojson, time):
    """
    Injects Time dimension into geoJSON coordinates. Expects  a dict in geojson POINT format.
    """
    #geojson['coordinates'] = [geojson['coordinates'][0], geojson['coordinates'][1], time]
    geojson['properties'] = {'times' : [time]}
    return geojson

singletraj['geojson3D'] =  singletraj.apply(lambda x: inject_time(x['geojson'], x['time']), axis=1)

singletraj.iloc[0]['geojson3D']

{'coordinates': [3.961389, 43.583333],
 'properties': {'times': ['2016-04-01T03:52:30']},
 'type': 'Point'}

In [414]:
def extract_coordinates(geojson):
    """
    Returns only the coordinates from a geoJSON POINT object
    """
    return geojson['coordinates']

singletraj['coord_only'] =  singletraj['geojson'].apply(lambda x: extract_coordinates(x))

singletraj.iloc[0]['coord_only']

[3.961389, 43.583333]

Now we have, for each row, a valid geoJSON point which includes the timestamp. We now need to combine the points to a new geoJSON dict that can be accepted by folium. Reminder:

'{"type": "LineString", "coordinates": [[3.961389, 43.583333], [3.968056, 43.580833], [3.974722, 43.578333], [3.986389, 43.575278], [3.998333, 43.5725], [4.163333, 43.530556], [4.269167, 43.503889], [4.398889, 43.471111],  [9.484722, 42.55]]}


In [421]:
ng = {}
ng['type'] = 'LineString'
ng['coordinates'] = []
ng['properties'] = {'times': []}

for x in range(len(singletraj)):
    ng['coordinates'].append(singletraj.iloc[x]['coord_only'])
    ng['properties']['times'].append(singletraj.iloc[x]['time'])

ng

ng2 = {}
ng2['type']  = 'Feature'
ng2['geometry'] = ng

ng2

{'geometry': {'coordinates': [[3.961389, 43.583333],
   [3.961389, 43.583333],
   [3.961389, 43.583333],
   [3.961389, 43.583333],
   [3.961389, 43.583333],
   [3.961389, 43.583333],
   [8.626944, 43.118333],
   [8.626944, 43.118333],
   [9.506944, 42.661944],
   [9.506944, 42.661944],
   [3.968056, 43.580833],
   [3.968056, 43.580833],
   [4.917222, 43.34],
   [4.917222, 43.34],
   [3.986389, 43.575278],
   [3.986389, 43.575278],
   [3.998333, 43.5725],
   [3.998333, 43.5725],
   [4.163333, 43.530556],
   [4.163333, 43.530556],
   [4.269167, 43.503889],
   [4.269167, 43.503889],
   [4.398889, 43.471111],
   [4.398889, 43.471111],
   [4.540278, 43.435278],
   [4.540278, 43.435278],
   [4.716944, 43.390556],
   [4.716944, 43.390556],
   [5.159722, 43.276667],
   [5.159722, 43.276667],
   [5.729722, 43.229167],
   [5.729722, 43.229167],
   [8.303056, 43.170833],
   [8.303056, 43.170833],
   [8.759167, 43.099722],
   [8.759167, 43.099722],
   [9.059444, 43.057222],
   [9.059444, 43.057222

Lets show these coordinates as a trajectory in a map.

In [423]:
from folium.plugins import TimestampedGeoJson

map2 = folium.Map(location=[40,10], zoom_start=4, control_scale=True, prefer_canvas=True)
tgj = TimestampedGeoJson(ng2)
map2.add_child(tgj, name='sometimestampedgeojson')


#c = folium.GeoJson(ng, name = 'Traj with time',overlay=True, 
#                       style_function = lambda feature: {'fillColor': '#ffaf00','color': 'blue', 'weight': 2.5,'dashArray': '5, 5'},
#                       highlight_function = lambda feature: {'fillColor': '#ffaf00','color': 'green', 'weight': 5,'dashArray': '5, 5'})
#c.add_child(folium.Popup('some popup'))
#c.add_to(map2)
#folium.LayerControl().add_to(map2)
map2.save(outfile='maps/map2.html')

map2  

In [370]:



#Convert WKT column into a gejson column
df["geojson"] = df["geom"].apply(lambda x: json.dumps(wkt.loads(x)))
df = df.drop('geom', 1)
df.head(3)

po = json.loads(singletraj.iloc[0]['geojson'])

print(po)
po['coordinates'] = [po['coordinates'][0], po['coordinates'][1], singletraj.iloc[0]['time'] ]
print(po)
type(po)

{'type': 'Point', 'coordinates': [3.961389, 43.583333]}
{'type': 'Point', 'coordinates': [3.961389, 43.583333, '2016-04-01T03:52:30']}


dict

In [311]:

sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * WHERE {
    :n_AA51114077_20160401003000_1 ?p ?o .
}
"""

#to query another endpoint, change the URL for the service and the query
node = query_data(sparql_query)
query_value_cleaning(st)
node.to_csv("node.csv")
node.head(10)

Unnamed: 0,o,p
0,http://www.datacron-project.eu/datAcron#t_1459...,http://www.ontologydesignpatterns.org/ont/dul/...
1,http://www.datacron-project.eu/datAcron#geom_LFMT,http://www.ontologydesignpatterns.org/ont/dul/...
2,http://www.datacron-project.eu/datAcron#n_AA51...,http://www.datacron-project.eu/datAcron#precedes


TODO Warum bekomme ich hier kein Wetter angezeigt? Warum keine Flughöhe?

Und auch diesen Trajectory wollen wir natürlich wieder mit Leaflet visualisieren und nutzen dafür den Timing-Popup von Folium!

In [318]:
sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT ?vessel ?time ?wkt ?speed ?event ?weather 
WHERE {
    ?s1 a :Node ;
          :ofMovingObject ?vessel ;
          :hasSpeed ?speed ;
        dul:hasConstituent/:TimeStart ?time ;
        dul:hasConstituent/:hasWKT ?wkt .
FILTER(bif:st_distance( bif:st_geomfromtext ("POINT(13.139045 44.466133)"), bif:st_geomfromtext(?wkt)) <= 5 &&
    xsd:dateTime(substr(?time,1,19))<xsd:dateTime("2016-01-08 16:12:41") && 
    xsd:dateTime(substr(?time,1,19))>xsd:dateTime("2016-01-08 14:12:41"))
}
"""

#to query another endpoint, change the URL for the service and the query
ci = query_data(sparql_query)
query_value_cleaning(ci)
ci.to_csv("closeins.csv")
ci.head(10)



EndPointInternalError: EndPointInternalError: endpoint returned code 500 and response. 

Response:
b'Virtuoso S1T00 Error SR171: Transaction timed out\n\nSPARQL query:\ndefine sql:big-data-const 0 \n#output-format:application/sparql-results+json\n \nPREFIX : <http://www.datacron-project.eu/datAcron#>\nPREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>\n\nSELECT ?vessel ?time ?wkt ?speed ?event ?weather \nWHERE {\n    ?s1 a :Node ;\n          :ofMovingObject ?vessel ;\n          :hasSpeed ?speed ;\n        dul:hasConstituent/:TimeStart ?time ;\n        dul:hasConstituent/:hasWKT ?wkt .\nFILTER(bif:st_distance( bif:st_geomfromtext ("POINT(13.139045 44.466133)"), bif:st_geomfromtext(?wkt)) <= 5 &&\n    xsd:dateTime(substr(?time,1,19))<xsd:dateTime("2016-01-08 16:12:41") && \n    xsd:dateTime(substr(?time,1,19))>xsd:dateTime("2016-01-08 14:12:41"))\n}\n'

In [317]:
sparql_query = """
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema\#>

SELECT ?vessel ?time ?wkt 
WHERE {
    ?s1 a :Node ;
    :ofMovingObject ?vessel ;
    OPTIONAL{:occurs ?event ;}
    dul:hasConstituent/:TimeStart ?time ;
    dul:hasConstituent/:hasWKT ?wkt .
    FILTER(bif:st_distance(bif:st_geomfromtext ("POINT(13.139045 44.466133)"),bif:st_geomfromtext(?wkt))<=5 &&
    xsd:dateTime(substr(?time,1,19))<xsd:dateTime("2016-01-08 16:12:41") && 
    xsd:dateTime(substr(?time,1,19))>xsd:dateTime("2016-01-08 14:12:41"))
}
"""

#to query another endpoint, change the URL for the service and the query
ci = query_data(sparql_query)
query_value_cleaning(ci)
ci.to_csv("closeins.csv")
ci.head(10)



QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed. 

Response:
b'Virtuoso 37000 Error SP030: SPARQL compiler, line 0: Bad character \'\\\' (0x5c) in SPARQL expression at \'\\\'\n\nSPARQL query:\ndefine sql:big-data-const 0 \n#output-format:application/sparql-results+json\n\nPREFIX : <http://www.datacron-project.eu/datAcron#>\nPREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>\nPREFIX xsd: <http://www.w3.org/2001/XMLSchema\\#>\n\nSELECT ?vessel ?time ?wkt \nWHERE {\n    ?s1 a :Node ;\n    :ofMovingObject ?vessel ;\n    OPTIONAL{:occurs ?event ;}\n    dul:hasConstituent/:TimeStart ?time ;\n    dul:hasConstituent/:hasWKT ?wkt .\n    FILTER(bif:st_distance(bif:st_geomfromtext ("POINT(13.139045 44.466133)"),bif:st_geomfromtext(?wkt))<=5 &&\n    xsd:dateTime(substr(?time,1,19))<xsd:dateTime("2016-01-08 16:12:41") && \n    xsd:dateTime(substr(?time,1,19))>xsd:dateTime("2016-01-08 14:12:41"))\n}\n'

TODO es kann doch nicht sein, dass ich auf einer Beispielquery einen Tiemout bekomme ????? Und ich habe sogar noch die OPTIONALS weggelassen...

### Regulations rausziehen

Regulations leben in einem Zeitintervall. Wenn wir also Regulations rausziehen wollen, so müssen wir immer angeben, welches Zeitintervall abgedeckt werden soll. Hierzu muss ein Startdatum $t_s$, ein Enddatum $t_e$ und die Intervallgröße $\Delta t$ angegeben werden. Die Anzahl der Timesteps ergibt sich dann als $$n = \frac{t_e - t_s}{\Delta t}$$
Es folgt: der $i-$te Zeitraum ist der, der bei $t_s + i \cdot \Delta t$ beginnt.

Wir beginnne mit einem Beispiel für den Airspace Airspace_LTBA_411, weil dieser sehr oft reguliert wird.

In [143]:
sparql_query = """ 
PREFIX : <http://www.datacron-project.eu/datAcron#>
PREFIX dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX myfn: <java:datAcronTester.unipi.gr.sparql_functions.>

SELECT (count (DISTINCT ?regulation) as ?count)
WHERE {?regulation rdf:type/rdfs:subClassOf* :FM_Regulation ;
                   dul:hasRegion 'Airspace_LTBA_411' ;
                   dul:hasTimeInterval '6' .
       ?t :TimeStart ?s ;
          :TimeEnd   ?e .
FILTER(myfn:overlaps(?s, ?e, '2016-04-01 00:00:00'^^<http://www.w3.org/2001/XMLSchema#DateTime>, 
                             '2016-04-30 23:59:59'^^<http://www.w3.org/2001/XMLSchema#DateTime>))
}
"""

df = query_data(sparql_query, False)
query_value_cleaning(df)
#df.to_csv("airspaceconfigs.csv")
#df = df[df.p != "#type"]
df.head(20)

small central store at 109 is being used




TypeError: byte indices must be integers or slices, not str