<a href="https://colab.research.google.com/github/CHETAN1KUKREJA/llm-backend-prompt-engineering/blob/main/Jsonformer_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install langchain_core langchain_huggingface langchain_experimental

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain_experimental
  Downloading langchain_experimental-0.3.3-py3-none-any.whl.metadata (1.7 kB)
Collecting langchain-community<0.4.0,>=0.3.0 (from langchain_experimental)
  Downloading langchain_community-0.3.11-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community<0.4.0,>=0.3.0->langchain_experimental)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community<0.4.0,>=0.3.0->langchain_experimental)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.11 (from langchain-community<0.4.0,>=0.3.0->langchain_experimental)
  Downloading langchain-0.3.11-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain_core
  Downloading langchain_core-0.3.24-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-se

In [3]:
!pip install transformers accelerate jsonformer

Collecting jsonformer
  Downloading jsonformer-0.12.0-py3-none-any.whl.metadata (5.0 kB)
Downloading jsonformer-0.12.0-py3-none-any.whl (6.6 kB)
Installing collected packages: jsonformer
Successfully installed jsonformer-0.12.0


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

print("Loading model and tokenizer...")
model_name = "databricks/dolly-v2-3b"
model = AutoModelForCausalLM.from_pretrained(model_name, use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_cache=True)
print("Loaded model and tokenizer")

Loading model and tokenizer...
Loaded model and tokenizer


In [7]:
action_schemas = {
            "talk": {
                "type": "object",
                "properties": {
                    "volume": {"type": "string", "enum": ["quiet", "normal", "loud"]},
                    "content": {"type": "string"}
                }
            },
            "take": {
                "type": "object",
                "properties": {
                    "item": {"type": "string"},
                    "amount": {"type": "number"}
                }
            },
            "drop": {
                "type": "object",
                "properties": {
                    "item": {"type": "string"},
                    "amount": {"type": "number"}
                }
            }
            # Add more action schemas as needed
        }

In [8]:
examples = {
            "talk": {
                "query": "I want to ask Maria what she has to trade",
                "response": {
                    "action": "talk",
                    "action_input": {
                        "volume": "normal",
                        "content": "Hello Maria, what do you have to trade?"
                    }
                }
            },
            "take": {
                "query": "I want to take 5 apples",
                "response": {
                    "action": "take",
                    "action_input": {
                        "item": "apples",
                        "amount": 5
                    }
                }
            },
        }


In [9]:
prompt_template = """You must respond using JSON format, with a single action and single action input. You have to read the entire human query and most of the time the output is given between '<toolcall> </toolcall>'
          but if you dont find this you can select one action from the list that is stated in the query
          There may be a lot of Thought process in the query you dont have to get swayed by that and focus on the action and action and action input.
          Available actions: {actions}
          EXAMPLES:
          {formatted_examples}
          BEGIN! Parse the following request into an appropriate action:
          Human: {query}
          Assistant:
        """

In [10]:
def generate_prompt(query,available_actions=None):
        """Generate a prompt with examples for available actions"""
        if available_actions is None:
            available_actions = list(action_schemas.keys())

        formatted_examples = "\n".join(
            f"Human: {ex['query']}\n"
            f"Assistant: {ex['response']}\n"
            for action in available_actions
            if action in examples
            for ex in [examples[action]]
        )
        return prompt_template.format(
            actions=", ".join(available_actions),
            formatted_examples=formatted_examples,
            query=query  # Will be filled in during parse_action
        )

In [11]:
output_from_LLM="""
## Step 0: Plan what to do for a short period
To maximize the amount of money, the first step is to gather information about the current state of the trade centre and the forest. Since there is another agent, Maria, at the trade centre, it might be beneficial to interact with her to see if there are any trade opportunit
ies. The plan is to:
1. Talk to Maria to gather information about her current stock of apples and money.
2. Ask Maria if she is willing to trade apples for money.
3. If a trade is possible, negotiate the terms of the trade.
4. If no trade is possible with Maria, consider going to the forest to collect apples.

## Step 1: Extract a sequence of actions and parameters pairs from the plan
### Substep 1.1.1: Output description of the action
1. Action: talk
2. Parameters: volume, content
3. list:
- Talk to Maria to gather information about her current stock of apples and money.
- Ask Maria if she is willing to trade apples for money.

### Substep 1.1.2: Verify the action and parameters
1. The action "talk" appears in the plan.
2. The sentences covering the action and parameters appear in the plan.
3. The parameter "content" might need to be specified based on Maria's response, which will be answered by her.
Considering the need to interact with Maria first, the sequence of actions starts with talking to her.

## Step 2: Format the plan as function calls in JSON objects within single XML tags
Given the plan and the actions available, the first step is to talk to Maria. The function call for this action is:

<tool_call>
{"action": "talk", "action_input": {"volume": "normal", "content": "Hello Maria, what are you trading today?"}}
</tool_call>

This initial interaction is aimed at gathering information and setting the stage for potential trades or other actions based on Maria's response. Further actions will depend on her answer, which could involve negotiating a trade, deciding to go to the forest, or other option
s based on the information exchanged.
Finished in 162.5875s
"""

In [12]:
def get_decoder_schema(available_actions=None):
        """
        Generate a decoder schema based on available actions

        Args:
            available_actions (list): List of action names that are currently available
        """
        if available_actions is None:
            available_actions = list(action_schemas.keys())

        return {
            "type": "object",
            "properties": {
                "action": {
                    "type": "string",
                    "enum": available_actions
                },
                "action_input": {
                    "type": "object",
                    "oneOf": [
                        {
                            "if": {"properties": {"action": {"const": action}}},
                            "then": action_schemas[action]
                        }
                        for action in available_actions
                    ]
                }
            }
        }

In [21]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b",use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", use_fast=True, use_cache=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

In [40]:
import json
import os

import requests
from langchain_core.tools import tool

HF_TOKEN = os.environ.get("HUGGINGFACE_API_KEY")


@tool
def ask_star_coder(query: str, temperature: float = 1.0, max_new_tokens: float = 250):
    """Query the BigCode StarCoder model about coding questions."""
    url = "https://api-inference.huggingface.co/models/bigcode/starcoder"
    headers = {
        "Authorization": f"Bearer {HF_TOKEN}",
        "content-type": "application/json",
    }
    payload = {
        "inputs": f"{query}\n\nAnswer:",
        "temperature": temperature,
        "max_new_tokens": int(max_new_tokens),
    }
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    response.raise_for_status()
    return json.loads(response.content.decode("utf-8"))

In [41]:
prompt = """You must respond using JSON format, with a single action and single action input.
You may 'ask_star_coder' for help on coding problems.

{arg_schema}

EXAMPLES
----
Human: "So what's all this about a GIL?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "What is a GIL?", "temperature": 0.0, "max_new_tokens": 100}}"
}}
Observation: "The GIL is python's Global Interpreter Lock"
Human: "Could you please write a calculator program in LISP?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "Write a calculator program in LISP", "temperature": 0.0, "max_new_tokens": 250}}
}}
Observation: "(defun add (x y) (+ x y))\n(defun sub (x y) (- x y ))"
Human: "What's the difference between an SVM and an LLM?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "What's the difference between SGD and an SVM?", "temperature": 1.0, "max_new_tokens": 250}}
}}
Observation: "SGD stands for stochastic gradient descent, while an SVM is a Support Vector Machine."

BEGIN! Answer the Human's question as best as you are able.
------
Human: 'What's the difference between an iterator and an iterable?'
AI Assistant:""".format(arg_schema=ask_star_coder.args)

In [50]:
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

hf_model = pipeline(
    "text-generation", model="cerebras/Cerebras-GPT-590M", max_new_tokens=200,device="cuda"
)

original_model = HuggingFacePipeline(pipeline=hf_model)

generated = original_model.predict(prompt, stop=["Observation:", "Human:"])
print(generated)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


You must respond using JSON format, with a single action and single action input.
You may 'ask_star_coder' for help on coding problems.

{'query': {'title': 'Query', 'type': 'string'}, 'temperature': {'default': 1.0, 'title': 'Temperature', 'type': 'number'}, 'max_new_tokens': {'default': 250, 'title': 'Max New Tokens', 'type': 'number'}}

EXAMPLES
----
Human: "So what's all this about a GIL?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": {"query": "What is a GIL?", "temperature": 0.0, "max_new_tokens": 100}"
}
Observation: "The GIL is python's Global Interpreter Lock"
Human: "Could you please write a calculator program in LISP?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": {"query": "Write a calculator program in LISP", "temperature": 0.0, "max_new_tokens": 250}
}
Observation: "(defun add (x y) (+ x y))
(defun sub (x y) (- x y ))"
Human: "What's the difference between an SVM and an LLM?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": 

In [45]:
decoder_schema = {
    "title": "Decoding Schema",
    "type": "object",
    "properties": {
        "action": {"type": "string", "default": ask_star_coder.name},
        "action_input": {
            "type": "object",
            "properties": ask_star_coder.args,
        },
    },
}

In [48]:
from langchain_experimental.llms import JsonFormer

json_former = JsonFormer(json_schema=decoder_schema, pipeline=hf_model)

In [49]:
results = json_former.predict(prompt, stop=["Observation:", "Human:"])
print(results)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


You must respond using JSON format, with a single action and single action input.
You may 'ask_star_coder' for help on coding problems.

{'query': {'title': 'Query', 'type': 'string'}, 'temperature': {'default': 1.0, 'title': 'Temperature', 'type': 'number'}, 'max_new_tokens': {'default': 250, 'title': 'Max New Tokens', 'type': 'number'}}

EXAMPLES
----
Human: "So what's all this about a GIL?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": {"query": "What is a GIL?", "temperature": 0.0, "max_new_tokens": 100}"
}
Observation: "The GIL is python's Global Interpreter Lock"
Human: "Could you please write a calculator program in LISP?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": {"query": "Write a calculator program in LISP", "temperature": 0.0, "max_new_tokens": 250}
}
Observation: "(defun add (x y) (+ x y))
(defun sub (x y) (- x y ))"
Human: "What's the difference between an SVM and an LLM?"
AI Assistant:{
  "action": "ask_star_coder",
  "action_input": 

In [4]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
The token `balvinder` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authentic

In [None]:
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

In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Load model directly

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
hf_model = pipeline("text-generation", model=model, tokenizer=tokenizer,max_new_tokens=2048,device="cuda")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [13]:
from langchain_experimental.llms import JsonFormer

json_former = JsonFormer(json_schema=get_decoder_schema(), pipeline=hf_model)

In [14]:
results = json_former.predict(generate_prompt(output_from_LLM), stop=["Observation:", "Human:"])
print(results)

  results = json_former.predict(generate_prompt(output_from_LLM), stop=["Observation:", "Human:"])


You must respond using JSON format, with a single action and single action input. You have to read the entire human query and most of the time the output is given between '<toolcall> </toolcall>' 
          but if you dont find this you can select one action from the list that is stated in the query
          There may be a lot of Thought process in the query you dont have to get swayed by that and focus on the action and action and action input.
          Available actions: talk, take, drop
          EXAMPLES:
          Human: I want to ask Maria what she has to trade
Assistant: {'action': 'talk', 'action_input': {'volume': 'normal', 'content': 'Hello Maria, what do you have to trade?'}}

Human: I want to take 5 apples
Assistant: {'action': 'take', 'action_input': {'item': 'apples', 'amount': 5}}

          BEGIN! Parse the following request into an appropriate action:
          Human: 
## Step 0: Plan what to do for a short period                       
To maximize the amount of mone

In [20]:
def clean_jsonformer_response(response, prompt_string):
    # Strip leading/trailing spaces from the prompt string and response
    prompt_string = prompt_string.strip()
    response = response.strip()

    # Check if the response starts with the prompt string and remove it
    if response.startswith(prompt_string):
        response = response[len(prompt_string):].strip()

    # Return the cleaned response
    return response


In [23]:
a=clean_jsonformer_response(results,generate_prompt(output_from_LLM))

In [26]:
import json


AttributeError: 'int' object has no attribute 'write'

In [32]:
json.dumps(a, indent=4)

'"{\'action\': \'talk\', \'action_input\': {\'volume\': \'normal\', \'content\': \'Hello Maria, what are you trading today?\'}}"'

In [33]:
import importlib
import pkg_resources

def get_library_version(lib_name):
    try:
        module = importlib.import_module(lib_name)
        version = module.__version__
        print(f"{lib_name}=={version}")
    except ImportError:
        print(f"{lib_name} is not installed.")
    except AttributeError:
        # Some modules might not have __version__, fallback to pkg_resources
        try:
            version = pkg_resources.get_distribution(lib_name).version
            print(f"{lib_name}=={version}")
        except pkg_resources.DistributionNotFound:
            print(f"{lib_name} is not installed.")

# List of libraries to check
libraries = [
    "huggingface_hub",
    "langchain",
    "langchain_experimental",
    "transformers",
    "torch"  # Include torch since it's often used with these models
]

for lib in libraries:
    get_library_version(lib)


  import pkg_resources


huggingface_hub==0.26.3
langchain==0.3.11
langchain_experimental==0.3.3
transformers==4.46.3
torch==2.5.1+cu121
