# Semantic Annotation of Data using JSON Linked Data
 
The Earthcube Geosemantics Framework (http://ecgs.ncsa.illinois.edu/) developed a prototype of a decentralized framework that combines the Linked Data and RESTful web services to annotate, connect, integrate, and reason about integration of geoscience resources. The framework allows the semantic enrichment of web resources and semantic mediation among heterogeneous geoscience resources, such as models and data. 

This notebook provide examples on how the Semantic Annotation Service can be used to manage linked controlled vocabularies using JSON Linked Data (JSON-LD), including how to query the built-in RDF graphs for existing linked standard vocabularies based on the Community Surface Dynamics Modeling System (CSDMS), Observations Data Model (ODM2) and Unidata udunits2 vocabularies, how to query build-in crosswalks between CSDMS and ODM2 vocabularies using SKOS, and how to add new linked vocabularies to the service. JSON-LD based definitions provided by these endpoints will be used to annotate sample data available within the IML Critical Zone Observatory data repository using the Clowder Web Service API (https://data.imlczo.org/). By supporting JSON-LD, the Semantic Annotation Service and the Clowder framework provide examples on how portable and semantically defined metadata can be used to better annotate data across repositories and services.

## Linked Data

The Linked Data paradigm emerged in the context of Semantic Web technologies for publishing and sharing data over the Web. It connects related individual Web resources in a Graph database, where resources represent the graph nodes, and an edge connects a pair of nodes. Publishing and linking scientific resources using Semantic Web technologies require that the user community follows the three principles of the linked Data. First, each resource needs to be represented using a unique Uniform Resource Identifier (URI), which consists of: (i) A Uniform Resource Locator (URL) to define the server path over the Web, and (ii) A Uniform Resource Name (URN) to describe the exact name of the resource. Second, the relationships between resources are described using the triple format, where a subject S has a predicate P with an object O. A predicate is either an undirected relationship (bi-directional), where it connects two entities in both ways or a directed relationship (uni-directional), where the presence of a relationship between two entities in one direction does not imply the presence of a reverse relationship. The triple format is the structure unit for the Linked Data system. Finally, the HyperText Transfer Protocol (HTTP) is used as a universal access mechanism for resources on the Web. More detials are avilable here (http://linkeddata.org/)

## Basic Requirements

In [45]:
import requests
import json

host = 'http://hcgs.ncsa.illinois.edu'

In [None]:
r = requests.get(host + "/gsis/CSNqueryName?graph=csdms&name=air__dynamic_shear_viscosity")
r.json()

### Time Instant Annotation
Returns a temporal annoation for a time instant in a JSON-LD format.

## Semantic Annotation Service

In [None]:
r = requests.get(host + "/gsis/sas/temporal?time=2014-01-01T08:01:01-09:00")
r.json()

### Lists the names of all graphs in the Knowledge base

Returns a list of all the graphs stored in the knowledge base.

In [20]:
r = requests.get(host + "/gsis/listGraphNames")
r.json()

{'graph_names': ['csdms',
  'odm2-vars',
  'udunits2-base',
  'udunits2-derived',
  'udunits2-accepted',
  'udunits2-prefix',
  'google-unit',
  'model-2',
  'model-3',
  'data-1',
  'data-2',
  'data-3',
  'variable_name_crosswalk',
  'variable_name_crosswalk-owl',
  'variable_name_crosswalk-skos',
  'model_test',
  'model_test11',
  'config_vars.ttl',
  'model-x',
  'Info',
  'Inf',
  'demo-model',
  'csv-mappings',
  'models_graph9d7d400f53864989a05d3ae539f30a78',
  'models_graph37baec3114d74ca6abd72cce75f966db',
  'models_graphe604316f14334985aaf4ebd6fe220e77',
  'models_graph26e29f5026664f11b244072bf6956f74']}

In [21]:
r = requests.get(host + "/gsis/read?graph=udunits2-prefix")
r.json()

{'@graph': [{'@id': 'http://mmisw.org/ont/mmitest/udunits2-prefix/Prefix',
   '@type': 'owl:Class',
   'rdfs:label': 'Prefix',
   'subClassOf': 'skos:Concept'},
  {'@id': 'http://mmisw.org/ont/mmitest/udunits2-prefix/atto',
   '@type': 'http://mmisw.org/ont/mmitest/udunits2-prefix/Prefix',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/name': 'atto',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/symbol': 'a',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/value': '1e-18'},
  {'@id': 'http://mmisw.org/ont/mmitest/udunits2-prefix/centi',
   '@type': 'http://mmisw.org/ont/mmitest/udunits2-prefix/Prefix',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/name': 'centi',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/symbol': 'c',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/value': '.01'},
  {'@id': 'http://mmisw.org/ont/mmitest/udunits2-prefix/deci',
   '@type': 'http://mmisw.org/ont/mmitest/udunits2-prefix/Prefix',
   'http://mmisw.org/ont/mmitest/udunits2-prefix/name': 'deci

In [22]:
r = requests.get(host + "/gsis/sas/vars/list?term=wind+speed")
r.json()

['csn:earth_surface_wind__range_of_speed',
 'csn:land_surface_wind__reference_height_speed',
 'csn:land_surface_wind__speed_reference_height',
 'csn:projectile_origin_wind__speed',
 'odm2:windGustSpeed',
 'odm2:windSpeed']

In [23]:
r = requests.get(host + "/gsis/CSNqueryName?graph=csdms&name=air__dynamic_shear_viscosity")
r.json()

{'name': 'air__dynamic_shear_viscosity',
 'type': 'http://ecgs.ncsa.illinois.edu/2015/csn/name',
 'object_fullname': 'air',
 'quantity_fullname': 'dynamic_shear_viscosity',
 'base_object': 'air',
 'base_quantity': 'viscosity',
 'object_part': ['air'],
 'quantity_part': ['dynamic', 'shear', 'viscosity']}

In [24]:
r = requests.get(host + "/gsis/CSNqueryName?graph=odm2-vars&name=windSpeed")
r.json()

{}

## Clowder

In [42]:
clowder = 'https://data.imlczo.org/clowder'
key = ''
headers = {'Content-type': 'application/json'}

### create dataset

In [43]:
def create_dataset(name, description, access, space, collection):
    '''
     params: name, description, access: PUBLIC vs PRIVATE, 
         space: a list of string can be empty,
         collection: a list of string, can be empty
    '''
    url = "{}/api/datasets/createempty?key={}".format(clowder, key)
    payload = json.dumps({'name':name, 
                          'description':description,
                          'access':access,
                          'space':space,
                          'collection':collection}) 

    r = requests.post(url,
                     data=payload,
                     headers=headers)
    print(r.status_code)
    print(r.text)
    r.json()

In [46]:
json = create_dataset(name="new dataset", description="...", access="PRIVATE", 
               space=['5ebb1c114f0c0ce46113839d'],
              collection=['5ebb1c2d4f0c0ce4611383bb'])
# print(json.get("id"))

200
{"id":"5ebc21604f0c0ce4611411c7"}


In [None]:
url = "{}/api/uploadToDataset/{}?key={}".format(clowder, datasetid, key)
for i in range(1, 9000):
    print('Uploading file ' + str(i))
    result = requests.post(url, files={"File": open('emptyfile', 'rb')})