---
title: "Working with JSON-LD"
author: "Charles F. Vardeman II"
date: "2023-09-09"
format:
    html:
        code-fold: true
        grid:
          margin-width: 350px
reference-location: margin
citation-location: margin
bibliography: kg.bib
---

## Some Notes on JSON-LD
We have been using the [kglab tutorial](https://derwen.ai/docs/kgl/tutorial/) to learn about using W3C web standards to construct knowledge graphs. One particularly useful standard that is briefly touched on in the tutorial is JSON-LD in the [serialization exercise](https://derwen.ai/docs/kgl/ex1_0/#serialization-as-json-ld). This corresponds to [building a knowledge graph with rdflib](kglab/examples/ex1_0.ipynb) in the kglab/examples subdirectory. [*JSON-LD context*](https://www.w3.org/TR/json-ld11/#the-context)

### Transforming Standard JSON to a Knowledge Graph
The beauty of JSON-LD lies in its ability to transform standard JSON data into a knowledge graph simply and efficiently. By adding a '@context' to a JSON document, developers can define how the data should be interpreted semantically. This turns a flat data structure into an interconnected web of information, opening up powerful querying and linking possibilities.

### JSON-LD 1.1 Features
The release of [JSON-LD 1.1](https://www.w3.org/TR/json-ld11) introduced a range of new features aimed at enhancing its functionality and ease of use. Key updates include improved context management, support for graph containers, and the ability to nest node objects. These features make it easier to construct intricate knowledge graphs and offer more control over how data is contextualized.

## JSON-LD and Web APIs
In the modern web ecosystem, APIs serve as the bridges between different services and applications. JSON-LD's compatibility with web APIs makes it a top choice for developers needing to consume or provide structured, linked data. Its seamless integration with RESTful services ensures that you can work within a familiar environment while benefiting from enhanced data semantics.

Moreover, JSON-LD's ability to express linked data allows for more advanced operations such as data aggregation, filtering, and transformation directly via API calls. This creates opportunities for developing richer, more interactive applications that can adapt in real-time to changes in underlying data. For example, by utilizing JSON-LD in a RESTful API for a content management system, you could dynamically link related articles, authors, and tags, thereby providing a more enriched user experience.

Additionally, JSON-LD's interoperability means it can be easily coupled with other web standards like OAuth for secure authentication or CORS for cross-origin resource sharing. This makes it not just a data format, but a comprehensive solution for building robust and scalable API ecosystems.

Finally, JSON-LD also plays a significant role in the realm of Web APIs for semantic search engines and linked data platforms. These APIs can consume JSON-LD to understand the contextual relationships between different pieces of information, thereby enabling more intelligent and nuanced search queries.



## Example: Exposing JSON-LD Context via HTTP Link Header
In many real-world applications, the JSON-LD context can be exposed via an HTTP link header, similar to how Schema.org does it. This enables clients to discover the context automatically and understand how to interpret the linked data.

Suppose you have a RESTful API for a blog platform, and you want to expose a JSON-LD context for articles. The HTTP response could include a link header pointing to the JSON-LD context:

```http
HTTP/1.1 200 OK
Content-Type: application/json
Link: <https://yourapi.com/docs/jsonldcontext.json>; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json"
```

With this setup, clients consuming the API can follow the link to fetch the context and understand the semantics of the data. Here is a simplified example of what the `jsonldcontext.json` might look like:

```json
{
  "@context": {
    "title": "http://schema.org/headline",
    "author": "http://schema.org/author",
    "datePublished": "http://schema.org/datePublished",
    "content": "http://schema.org/text"
  }
}
```

By using the link header to expose the JSON-LD context, you're making it easier for clients to consume and understand your API's data. This aligns well with JSON-LD 1.1 conventions and allows for greater interoperability and semantic richness.


## Example: Using JSON-LD to construct a Knowledge Graph from Wikipedia tables

Wikipedia can provide a starting point for structuring and enriching knowledge graphs. For our Navy project, we will want to provide LLM Agents with context for factual information that may have changed since "pre-training" using [`Retrieval Augmented Generation`](https://learn.microsoft.com/en-us/azure/machine-learning/concept-retrieval-augmented-generation?view=azureml-api-2#Technical%20overview%20of%20using%20RAG%20on%20Large%20Language%20Models%20(LLMs)). For our purposes, we want to augment our KG with broader contextual information that may be useful to agents for question and answering. One set of context, is the [List of current ships of the United States Navy](https://en.wikipedia.org/wiki/List_of_current_ships_of_the_United_States_Navy) that contain a number of tables that could be useful to add to our knowledge graph. [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) does contain most of the `ships` but the entities are not always complete. For example, the Wikidata entity page for [USS Abraham Lincoln](https://www.wikidata.org/wiki/Q587557) shows a rather rich set of relations including `events` that correspond to `ship naming ceremony`, `ship commissioning`. However, ships like [USS Carter Hall](https://www.wikidata.org/wiki/Q2468798) are very basic and don't contain a lot of information we can extract. Looking at the [History for `List of current ships of the United States Navy`](https://en.wikipedia.org/w/index.php?title=List_of_current_ships_of_the_United_States_Navy&action=history) tells us that the page is fairly active and Wikipedia contributors are keeping it more up to date than the Wikidata entries.

Lets' do a little [Exploratory Data Analysis](https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners) using large language models similar to [Getting Started With LLMs](https://www.kaggle.com/code/jhoward/getting-started-with-llms/). We will use [Beautiful Soup](https://pypi.org/project/beautifulsoup4/) to retreive the Wikipedia page and find the "html tables" in the document.

In [1]:
import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_current_ships_of_the_United_States_Navy"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find_all("table")[0]


Now that we have the tables, lets create a pandas dataframe from the table and sanity check that the table is the same as [what is in the first table](https://en.wikipedia.org/wiki/List_of_current_ships_of_the_United_States_Navy#Commissioned).

In [2]:
#| label: fig-commissioned
#| fig-cap: Dataframe from Wikipedia Commissioned ship table.
#| echo: true

import pandas as pd

df = pd.read_html(str(table))[0]
df

  df = pd.read_html(str(table))[0]


Unnamed: 0,Ship name,Hull number,Class,Type,Commission date,Homeport[2],Note
0,USS Abraham Lincoln,CVN-72,Nimitz,Aircraft carrier,11 November 1989,"San Diego, CA",[3]
1,USS Alabama,SSBN-731,Ohio,Ballistic missile submarine,25 May 1985,"Bangor, WA",[4]
2,USS Alaska,SSBN-732,Ohio,Ballistic missile submarine,25 January 1986,"Kings Bay, GA",[5]
3,USS Albany,SSN-753,Los Angeles,Attack submarine,7 April 1990,"Norfolk, VA",[6]
4,USS Alexandria,SSN-757,Los Angeles,Attack submarine,29 June 1991,"San Diego, CA",[7] Scheduled to be decommissioned 2026[8]
...,...,...,...,...,...,...,...
234,USS William P. Lawrence,DDG-110,Arleigh Burke,Destroyer,19 May 2011,"San Diego, CA",[242]
235,USS Winston S. Churchill,DDG-81,Arleigh Burke,Destroyer,10 March 2001,"Norfolk, VA",[243]
236,USS Wichita,LCS-13,Freedom,Littoral combat ship,12 January 2019,"Mayport, FL",[244] Proposed to be decommissioned 2023[17]
237,USS Wyoming,SSBN-742,Ohio,Ballistic missile submarine,13 July 1996,"Kings Bay, GA",[245]


So, if we want to use the column names as the basis for eventually constructing a URI, we unfortunately need to make it web safe and remove spaces and other issues. The other question is "what does a row" represent? If we want to build a knowledge graph of all of the ships of the navy, we might want to consider a basic ontology to start with. The first table is "commissioned" ships and the idea of `commisioned` is a role relationship since it has a start time and a end time. 

In [5]:
column_names = df.columns.tolist()
column_names

['Ship name',
 'Hull number',
 'Class',
 'Type',
 'Commission date',
 'Homeport[2]',
 'Note']

In [8]:
#| label: fig-pandasJSON
#| fig-cap: JSON generated directly from Pandas Dataframe.
#| echo: true

import json
import uuid

from pprint import pprint

# Normalize column names in DataFrame
normalized_columns = {col: col.replace(" ", "_").replace("[", "").replace("]", "") for col in column_names}
df.rename(columns=normalized_columns, inplace=True)

# Let's change the date to be consistent with xsd:date
# df['Commission_date'] = pd.to_datetime(df['Commission_date']).dt.strftime('%Y-%m-%d')

# Convert DataFrame to JSON-formatted string
df_json_str = df.to_json(orient="records", indent=4)

# Convert JSON-formatted string to Python object
df_json = json.loads(df_json_str)

# Pretty-print the first record
pprint(df_json[0])

{'Class': 'Nimitz',
 'Commission_date': '11 November 1989',
 'Homeport2': 'San Diego, CA',
 'Hull_number': 'CVN-72',
 'Note': '[3]',
 'Ship_name': 'USS\xa0Abraham Lincoln',
 'Type': 'Aircraft carrier'}
