# Geosemantics Framework

We face many challenges in the process of extracting meaningful information from data [add examples of challenges related to integration of models and data]. Frequently, these obstacle  compel scientists to perform the integration of models with data manually. Manual integration becomes exponentially difficult when a user aims to integrate long-tail data (data collected by individual researchers or small research groups) and long-tail models (models developed by individuals or small modeling communities). We focus on these long-tail resources because despite their often-narrow scope, they have significant impacts in scientific studies and present an opportunity for addressing critical gaps through automated integration. The goal of the Goesemantics Framework is to provide a framework rooted in semantic techniques and approaches to support “long-tail” models and data integration.


## Semantic Annotation of Data using JSON Linked Data
 
The Earthcube Geosemantics Framework (http://ecgs.ncsa.illinois.edu/) developed a prototype of a decentralized framework that combines the Linked Data and RESTful web services to annotate, connect, integrate, and reason about integration of geoscience resources. The framework allows the semantic enrichment of web resources and semantic mediation among heterogeneous geoscience resources, such as models and data. 

This notebook provides examples on how the Semantic Annotation Service can be used to manage linked controlled vocabularies using JSON Linked Data (JSON-LD), including how to query the built-in RDF graphs for existing linked standard vocabularies based on the Community Surface Dynamics Modeling System (CSDMS), Observations Data Model (ODM2) and Unidata udunits2 vocabularies, how to query build-in crosswalks between CSDMS and ODM2 vocabularies using SKOS, and how to add new linked vocabularies to the service. JSON-LD based definitions provided by these endpoints will be used to annotate sample data available within the IML Critical Zone Observatory data repository using the Clowder Web Service API (https://data.imlczo.org/). By supporting JSON-LD, the Semantic Annotation Service and the Clowder framework provide examples on how portable and semantically defined metadata can be used to better annotate data across repositories and services.

## Linked Data

The Linked Data paradigm emerged in the context of Semantic Web technologies for publishing and sharing data over the Web. It connects related individual Web resources in a Graph database, where resources represent the graph nodes, and an edge connects a pair of nodes. Publishing and linking scientific resources using Semantic Web technologies require that the user community follows the three principles of the linked Data: 
1.  Each resource needs to be represented using a unique Uniform Resource Identifier (URI), which consists of: (i) A Uniform Resource Locator (URL) to define the server path over the Web, and (ii) A Uniform Resource Name (URN) to describe the exact name of the resource. 
2. The relationships between resources are described using the triple format, where a subject S has a predicate P with an object O. A predicate is either an undirected relationship (bi-directional), where it connects two entities in both ways or a directed relationship (uni-directional), where the presence of a relationship between two entities in one direction does not imply the presence of a reverse relationship. The triple format is the structure unit for the Linked Data system. 
3. The HyperText Transfer Protocol (HTTP) is used as a universal access mechanism for resources on the Web. 

For more information about linked data, please visit https://www.w3.org/standards/semanticweb/data.


## Basic Requirements

In [None]:
import requests
import json
import ipywidgets as widgets

from IPython.display import display

host = 'http://hcgs.ncsa.illinois.edu'

In [None]:
r = requests.get(host + "/gsis/CSNqueryName?graph=csdms&name=air__dynamic_shear_viscosity")
r.json()

## Temporal Annotation Services
Time values are represented in UTC (Coordinated Universal Time) format. Times are expressed in local time, together with a time zone offset in hours and minutes. For more information, please visit https://www.w3.org/TR/NOTE-datetime for more information.

### Time Instant Annotation
Query parameters: 
* **time** (string): time value in UTC format

In [None]:
# Get a temporal annotation for a time instant in a JSON-LD format.
time = '2014-01-01T08:01:01-09:00'
r = requests.get(f"{host}/gsis/sas/temporal?time={time}")
print(r.json())

### Time Interval Annotation
Query parameters: 
* **beginning** (string): time value in UTC format.
* **end** (string): time value in UTC format.

In [None]:
# Get a temporal annotation for a time interval in a JSON-LD format.
beginning = '2014-01-01T08:01:01-10:00'
end = '2014-12-31T08:01:01-10:00'
r = requests.get(f"{host}/gsis/sas/temporal?beginning={beginning}&end={end}")
print(r.json())

### TIme Series Annotation
Query parameters:
* **beginning** (string): time value in UTC format.
* **end** (string): time value in UTC format.
* **interval** (float): time step.

In [None]:
# Get a temporal annotation for a time series in a JSON-LD format.
beginning = '2014-01-01T08:01:01-10:00'
end = '2014-03-01T08:01:01-10:00'
interval = '4'
r = requests.get(f"{host}/gsis/sas/temporal?beginning={beginning}&end={end}&interval={interval}")
print(r.json())

## Semantic Annotation Service

### Lists the names of all graphs in the Knowledge base

Returns a list of all the graphs stored in the knowledge base.

In [None]:
r = requests.get(host + "/gsis/listGraphNames")
r.json()

In [None]:
r = requests.get(host + "/gsis/read?graph=udunits2-prefix")
r.json()

In [None]:
r = requests.get(host + "/gsis/sas/vars/list?term=wind+speed")
r.json()

In [None]:
r = requests.get(host + "/gsis/CSNqueryName?graph=csdms&name=air__dynamic_shear_viscosity")
r.json()

In [None]:
r = requests.get(host + "/gsis/CSNqueryName?graph=odm2-vars&name=windSpeed")
r.json()

## Variable Annotation Services

### Lists the names of all graphs in the Knowledge base

In [None]:
# Get a lists of the names of all graphs in the Knowledge base
r = requests.get(f"{host}/gsis/listGraphNames")
print(r.json())

### List the content of a Graph (for example, CSDMS Standard Names)

In [None]:
# Get the content stored in a specific graph in a JSON-LD format.
graph = 'csdms'
r = requests.get(f"{host}/gsis/read?graph={graph}")
print(r.json())

### List of CSDMS Standard Names and ODM2 Variable Names

In [None]:
# Get the CSDMS Standard Names as a flat list.
r = requests.get(f"{host}/gsis/sas/sn/csn")
print(r.json())

In [None]:
# Get the ODM2 Variable Names as a flat list.
r = requests.get(f"{host}/gsis/sas/sn/odm2")
print(r.json())

### Search Across Registered Graphs

In [None]:
# Get all properties of a given CSDMS Standard Name from a specific graph in a JSON-LD format.
graph = 'csdms'
name = 'air__dynamic_shear_viscosity'
r = requests.get(f"{host}/gsis/CSNqueryName?graph={graph}&name={name}")
print(r.json())

### Units

In [None]:
# Get the list of udunits2 units in JSON format.
r = requests.get(f"{host}/gsis/sas/unit/udunits2")
print(r.json())

In [None]:
# Get the list of Google units in JSON format.
r = requests.get(host + "/gsis/sas/unit/google")
print(r.json())

## Clowder

In [None]:
clowder = 'https://data.imlczo.org/clowder'
key = ''
headers = {'Content-type': 'application/json', 'X-API-Key': key}

## Search by metadata

Search by keyword

In [None]:
query = 'test'
url = "{}/api/search?query={}".format(clowder, query)
r = requests.get(url, headers=headers)
r.raise_for_status()
r.json()

Search by `ODM2 Variable Name = precipitation`

In [None]:
query = '"ODM2 Variable Name":"precipitation"'
url = "{}/api/search?query={}".format(clowder, query)
r = requests.get(url, headers=headers)
json = r.json()
datasetId = json.get('results')[0].get('id')
print('Dataset id: ' + datasetId)

In [None]:
query = '"ODM2 Variable Name":"precipitation"'
url = "{}/api/search?query={}".format(clowder, query)
r = requests.get(url, headers=headers)
json = r.json()
datasetId = json.get('results')[0].get('id')
print('Dataset id: ' + datasetId)

In [None]:
# List files in dataset
url = "{}/api/datasets/{}/files".format(clowder, datasetId)
r = requests.get(url)
json = r.json()
# Download the first file
fileId = json[0].get('id')
fileName = json[0].get('filename')
url = "{}/api/files/{}/blob".format(clowder, fileId)
r = requests.get(url)
open(fileName, 'wb').write(r.content)

### Create new dataset

In [None]:
def create_dataset(name, description, access, space, collection):
    '''
     params: name, description, access: PUBLIC vs PRIVATE, 
         space: a list of string can be empty,
         collection: a list of string, can be empty
    '''
    url = "{}/api/datasets/createempty".format(clowder)
    payload = json.dumps({'name':name, 
                          'description':description,
                          'access':access,
                          'space':space,
                          'collection':collection}) 

    r = requests.post(url,
                     data=payload,
                     headers=headers)
    print(r.status_code)
    print(r.text)
    return r.json()

In [None]:
import json
json = create_dataset(name="new dataset", description="...", access="PRIVATE", 
               space=['5ebb1c114f0c0ce46113839d'],
              collection=['5ebb1c2d4f0c0ce4611383bb'])
newDatasetId = json.get('id')
print(f'View at {clowder}/datasets/{newDatasetId}')

## Upload file to new dataset

In [None]:
url = "{}/api/uploadToDataset/{}".format(clowder, newDatasetId)
files = {'file': open(fileName, 'rb')}
r = requests.post(url, files=files, headers={'X-API-Key': key})
r.raise_for_status()
print(r.json())
fileId = r.json().get('id')
print(f'View at {clowder}/files/{fileId}')

## Add metadata to new file

In [None]:
import json
url = "{}/api/files/{}/metadata.jsonld".format(clowder, fileId)
payload = {
    "@context":[
        "https://clowder.ncsa.illinois.edu/contexts/metadata.jsonld"
    ],
    "agent":{
        "@type":"cat:extractor",
        "name":"ECGS Notebook",
        "extractor_id":"https://clowder.ncsa.illinois.edu/api/extractors/ecgs"
    },
    "content":{
        "foo": "bar"
    }
}
r = requests.post(url, headers = headers, data=json.dumps(payload))
r.json()

In [None]:
# The context file describes the basic elements of a Clowder metadata document
r = requests.get('https://clowder.ncsa.illinois.edu/contexts/metadata.jsonld')
r.json()