# LangGraph_Agent

> Quick and dirty hand walkthrough prototype of a FAIR agent using LangGraph

## References for Prototype

- [PIDInst Schema](https://github.com/iodepo/odis-arch/issues/68)
- [LangGraph HIL Reference](https://github.com/langchain-ai/langgraph/blob/main/examples/human-in-the-loop.ipynb)
- [Forcing JSON-LD output structure](https://www.reddit.com/r/LangChain/comments/1dna58k/need_advice_in_structuring_json_output_in/)
- [PIDINST Schema Elements](https://schema.eudat.eu/eudatinstruments_metadataelements/)
- [Schema.org Shapes](https://datashapes.org/schema.ttl)
- [Herbie - The Semantic Laboratory Notebook & Research Database](https://codebase.helmholtz.cloud/hereon-mb/herbie)

In [1]:
# It appears that the 'langchain_openai' package is not installed. 
# We'll use a Jupyter bang command to install the necessary package.

import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
experiment_description = """
The owner of the instrument is the NSF facility MagLab (https://ror.org/03s53g630). Josiah Carberry’s (https://orcid.org/0000-0002-1825-0097) data acquisition info:

The experimental data (spectra) are acquired by a spectrometer (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments). The spectrometers are (almost) fully automated and controlled via the LightField software (Teledyne Princeton Instruments). LightField automatically saves the acquired data and all experiment settings (spectrometer settings) in one file. https://www.princetoninstruments.com/products/software-family/lightfield

LightField saves files in *.SPE format (whatever it means).


Folder name: PI name_Experiment ID_Magnet system-Instrument_Start date
File name: Type of the experiment: PL, Ra(man), Re(flectance), Tr(ansmittance) Sample short name: **** Magnetic field: ***T (or from to ) Temperature: ***K Light source: SC, 532nm, 785nm, … - Power: ***mW or uW, or percentage Central frequency / wl/energy: ***cm-1, nm, eV Slit: value: *** um Acq.time: ***min or sec Objective NA: ***NA Other: gate voltage, pressure, …

PL_WSe2-MoSe2_00.0T_to_05.2T_ 10K_633nm-100uW_720nm_30um_2min_0.65NA.SPE Ra_CsPr_30T_7.2K_532nm-2mW_550cm-1_30um_3x2min_0.82NA.SPE Re_InSe_0T_5K_SC-20%600meV_50um_5sec 0.65NA_Gate Sweep -10V to +20V.SPE
"""

In [7]:
# Necessary imports
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from dotenv import load_dotenv

load_dotenv()

# Initialize the LLM model and prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are expert in FAIR data practices, schema.org, ROCrates. "
            "Construct a JSON-LD ROCrate manifest based on this description. "
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

llm = ChatOpenAI(model="gpt-4-turbo")

model_generate = prompt | llm

# Function to generate JSON-LD string
def generate_jsonld_from_description(experiment_description: str) -> str:
    model_as_text = ""
    
    request = HumanMessage(
        content=experiment_description
    )
    
    for chunk in model_generate.stream({"messages": [request]}):
        model_as_text += chunk.content

    # The result should be the generated JSON-LD object
    return model_as_text

# Example usage
experiment_description = """
The owner of the instrument is the NSF facility MagLab (https://ror.org/03s53g630). Josiah Carberry’s (https://orcid.org/0000-0002-1825-0097) data acquisition info:

The experimental data (spectra) are acquired by a spectrometer (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments). The spectrometers are (almost) fully automated and controlled via the LightField software (Teledyne Princeton Instruments). LightField automatically saves the acquired data and all experiment settings (spectrometer settings) in one file. https://www.princetoninstruments.com/products/software-family/lightfield

LightField saves files in *.SPE format (whatever it means).

Folder name: PI name_Experiment ID_Magnet system-Instrument_Start date
File name: Type of the experiment: PL, Ra(man), Re(flectance), Tr(ansmittance) Sample short name: **** Magnetic field: ***T (or from to ) Temperature: ***K Light source: SC, 532nm, 785nm, … - Power: ***mW or uW, or percentage Central frequency / wl/energy: ***cm-1, nm, eV Slit: value: *** um Acq.time: ***min or sec Objective NA: ***NA Other: gate voltage, pressure, …

PL_WSe2-MoSe2_00.0T_to_05.2T_ 10K_633nm-100uW_720nm_30um_2min_0.65NA.SPE Ra_CsPr_30T_7.2K_532nm-2mW_550cm-1_30um_3x2min_0.82NA.SPE Re_InSe_0T_5K_SC-20%600meV_50um_5sec 0.65NA_Gate Sweep -10V to +20V.SPE
"""

jsonld_output = generate_jsonld_from_description(experiment_description)
print(jsonld_output)


To construct a JSON-LD ROCrate manifest for the described dataset, we will include details about the dataset, the experiment, the instruments used, and the associated software. Below is an example of what the ROCrate manifest might look like based on the provided details:

```json
{
  "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "about": {"@id": "./"},
      "identifier": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Spectrometer Experimental Data",
      "description": "Experimental data acquired using spectrometers at the NSF facility MagLab.",
      "author": {
        "@id": "https://orcid.org/0000-0002-1825-0097"
      },
      "dateCreated": "2023-09-20",
      "license": "https://creativecommons.org/licenses/by/4.0/"
    },
    {
      "@id": "https://orcid.org/000

In [15]:
# Necessary imports
import os
import json
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import JsonOutputParser
from dotenv import load_dotenv

load_dotenv()

# Initialize the LLM model and prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert in FAIR data practices, schema.org, ROCrates. "
            "Construct a JSON-LD ROCrate manifest based on this description. Ensure the output is valid JSON."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

llm = ChatOpenAI(model="gpt-4-turbo")

model_generate = prompt | llm

# Function to generate JSON-LD string using JSONOutputParser
def generate_jsonld_from_description(experiment_description: str) -> dict:
    model_as_text = ""
    
    request = HumanMessage(
        content=experiment_description
    )
    
    for chunk in model_generate.stream({"messages": [request]}):
        model_as_text += chunk.content

    # Parse the output using JSONOutputParser
    output_parser = JsonOutputParser()
    
    try:
        json_ld_output = output_parser.parse(model_as_text)
    except Exception as e:
        raise ValueError(f"Generated output is not valid JSON: {e}")

    return json_ld_output

# Example usage
experiment_description = """
The owner of the instrument is the NSF facility MagLab (https://ror.org/03s53g630). Josiah Carberry’s (https://orcid.org/0000-0002-1825-0097) data acquisition info:

The experimental data (spectra) are acquired by a spectrometer (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments). The spectrometers are (almost) fully automated and controlled via the LightField software (Teledyne Princeton Instruments). LightField automatically saves the acquired data and all experiment settings (spectrometer settings) in one file. https://www.princetoninstruments.com/products/software-family/lightfield

LightField saves files in *.SPE format (whatever it means).

Folder name: PI name_Experiment ID_Magnet system-Instrument_Start date
File name: Type of the experiment: PL, Ra(man), Re(flectance), Tr(ansmittance) Sample short name: **** Magnetic field: ***T (or from to ) Temperature: ***K Light source: SC, 532nm, 785nm, … - Power: ***mW or uW, or percentage Central frequency / wl/energy: ***cm-1, nm, eV Slit: value: *** um Acq.time: ***min or sec Objective NA: ***NA Other: gate voltage, pressure, …

PL_WSe2-MoSe2_00.0T_to_05.2T_ 10K_633nm-100uW_720nm_30um_2min_0.65NA.SPE Ra_CsPr_30T_7.2K_532nm-2mW_550cm-1_30um_3x2min_0.82NA.SPE Re_InSe_0T_5K_SC-20%600meV_50um_5sec 0.65NA_Gate Sweep -10V to +20V.SPE
"""

jsonld_output = generate_jsonld_from_description(experiment_description)
print(json.dumps(jsonld_output, indent=2))


{
  "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Experimental Spectra Data",
      "description": "This dataset contains experimental spectra acquired using spectrometers controlled via LightField software.",
      "datePublished": "2023-01-01",
      "license": "http://creativecommons.org/licenses/by/4.0/",
      "publisher": {
        "@id": "https://ror.org/03s53g630",
        "@type": "Organization",
        "name": "MagLab"
      }
    },
    {
      "@id": "https://orcid.org/0000-0002-1825-0097",
      "@type": "Person",
      "name": "Josiah Carberry"
    },
    {
      "@id": "https://www.princetoninstruments.com/products/software-family/lightfield",
      "@type": "SoftwareApplication",
      "name": "LightField",
      "publisher": "Teledyne Princeton Instruments"
    },
    {
      "@id": "./data/",
      "@type": "Dataset",
      "name": "Spectra Files",
      "content": [
        {
    

In [13]:
import json
import rdflib
from pyshacl import validate

# Load SHACL shapes
shacl_graph = rdflib.Graph()
shacl_graph.parse("./pidinst-sh.ttl", format="ttl")

# Convert JSON-LD to an RDF graph
data_graph = rdflib.Graph().parse(data=json.dumps(jsonld_output), format="json-ld")

# Validate data graph against SHACL shapes
conforms, results_graph, results_text = validate(
    data_graph,
    shacl_graph=shacl_graph,
    inference='rdfs',
    abort_on_first=False,
    meta_shacl=False,
    advanced=True,
    debug=False
)

# Output results
print("Conforms:", conforms)
print("Results:")
print(results_text)


Conforms: True
Results:
Validation Report
Conforms: True

