## JSON, JSONLoader and JSON Agent

#### JSON(JavaScript Object Notation)

- There are many online json viewer, One Exammple
- open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).

In [1]:
%%capture
!pip install langchain watermark openai jq

In [2]:
%load_ext watermark
%watermark -a "Dhaval Antala" -vmp langchain,openai,jq

Author: Dhaval Antala

Python implementation: CPython
Python version       : 3.10.0
IPython version      : 8.25.0

langchain: 0.2.6
openai   : 1.32.0
jq       : 1.7.0

Compiler    : Clang 12.0.0 
OS          : Darwin
Release     : 23.5.0
Machine     : arm64
Processor   : arm
CPU cores   : 8
Architecture: 64bit



In [3]:
import os
import openai
import warnings

warnings.filterwarnings("ignore")

In [4]:
# get your openai api key from https://platform.openai.com/account/api-keys 🔑
from getpass import getpass

OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")
     

In [12]:
# download facebook_chat.json from langchain github repo
!wget https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/integrations/document_loaders/example_data/facebook_chat.json
  

--2024-07-09 12:31:11--  https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/integrations/document_loaders/example_data/facebook_chat.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2167 (2.1K) [text/plain]
Saving to: ‘facebook_chat.json.1’


2024-07-09 12:31:11 (9.70 MB/s) - ‘facebook_chat.json.1’ saved [2167/2167]



In [14]:
import json 
from pathlib import Path
from pprint import pprint

file_path = "/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1"
data = json.loads(Path(file_path).read_text())

In [15]:
pprint(data)

{'image': {'creation_timestamp': 1675549016, 'uri': 'image_of_the_chat.jpg'},
 'is_still_participant': True,
 'joinable_mode': {'link': '', 'mode': 1},
 'magic_words': [],
 'messages': [{'content': 'Bye!',
               'sender_name': 'User 2',
               'timestamp_ms': 1675597571851},
              {'content': 'Oh no worries! Bye',
               'sender_name': 'User 1',
               'timestamp_ms': 1675597435669},
              {'content': 'No Im sorry it was my mistake, the blue one is not '
                          'for sale',
               'sender_name': 'User 2',
               'timestamp_ms': 1675596277579},
              {'content': 'I thought you were selling the blue one!',
               'sender_name': 'User 1',
               'timestamp_ms': 1675595140251},
              {'content': 'Im not interested in this bag. Im interested in the '
                          'blue one!',
               'sender_name': 'User 1',
               'timestamp_ms': 1675595109305},
   

### USING JSONLoader 

- The JSONLoader uses a specified jq schema to parse the JSON files
- It uses the `jq` python package. Check this manual for a detailed documentation of the jq syntax.

In [16]:
from langchain.document_loaders import JSONLoader

In [17]:
loader = JSONLoader(
    file_path = "/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1",
    jq_schema = '.messages[].content', 
    text_content = False
)

In [18]:
data = loader.load()

In [19]:
pprint(data)

[Document(page_content='Bye!', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 1}),
 Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 2}),
 Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 3}),
 Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 4}),
 Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 5}),
 Document(page_content='Here is $129', metadata={'source': '/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1', 'seq_num': 6

#### Extracting metadata

In [20]:
# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["sender_name"] = record.get("sender_name")
    metadata["timestamp_ms"] = record.get("timestamp_ms")

    return metadata

In [None]:
loader = JSONLoader(
    file_path='/Users/dhavalantala/Desktop/langchain/langchain/facebook_chat.json.1',
    jq_schema='.messages[]',
    content_key="content",
    text_content=False,
    metadata_func=metadata_func
)

data = loader.load()

#### JSON Agent
- Agent designed to interact with large JSON/dict objects

- When is it needed ?
     - This is useful when you want to answer questions about a JSON blob that’s too large to fit in the context window of an LLM.
     - The agent is able to iteratively explore the blob to find what it needs to answer the user’s question

- Let's JSON agent to answer some questions about the API spec

In [25]:
# download the yaml file from openai github page
!wget https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml -O openai_openapi.yml

--2024-07-09 12:43:26--  https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 656684 (641K) [text/plain]
Saving to: ‘openai_openapi.yml’


2024-07-09 12:43:27 (8.98 MB/s) - ‘openai_openapi.yml’ saved [656684/656684]



In [26]:
import os
import yaml

from langchain.agents import (
    create_json_agent,
    AgentExecutor
)
from langchain.agents.agent_toolkits import JsonToolkit
from langchain.chains import LLMChain
from langchain.llms.openai import OpenAI
from langchain.requests import TextRequestsWrapper
from langchain.tools.json.tool import JsonSpec

In [30]:
with open("/Users/dhavalantala/Desktop/langchain/langchain/openai_openapi.yml") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
json_spec = JsonSpec(dict_=data, max_value_length=4000)
json_toolkit = JsonToolkit(spec=json_spec)

json_agent_executor = create_json_agent(
    llm=OpenAI(temperature=0),
    toolkit=json_toolkit,
    verbose=True
)
     

In [31]:
json_agent_executor.run("What are the required parameters in the request body to the /completions endpoint?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: json_spec_list_keys
Action Input: data[0m
Observation: [36;1m[1;3m['openapi', 'info', 'servers', 'tags', 'paths', 'components', 'security', 'x-oaiMeta'][0m
Thought:[32;1m[1;3m I should look at the keys in the paths key to see what paths are available
Action: json_spec_list_keys
Action Input: data["paths"][0m
Observation: [36;1m[1;3m['/chat/completions', '/completions', '/images/generations', '/images/edits', '/images/variations', '/embeddings', '/audio/speech', '/audio/transcriptions', '/audio/translations', '/files', '/files/{file_id}', '/files/{file_id}/content', '/fine_tuning/jobs', '/fine_tuning/jobs/{fine_tuning_job_id}', '/fine_tuning/jobs/{fine_tuning_job_id}/events', '/fine_tuning/jobs/{fine_tuning_job_id}/cancel', '/fine_tuning/jobs/{fine_tuning_job_id}/checkpoints', '/models', '/models/{model}', '/moderations', '/assistants', '/assistants/{assistant_id}', '/threads', '/threads/{thread_id}', '/thread

"['model', 'prompt']"