Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect natural JSON order by default #1184

Closed

Conversation

lukestanley
Copy link
Contributor

@lukestanley lukestanley commented Feb 13, 2024

The current behaviour of the JSON schema to grammar string converter is to sort the properties, even when no specific order was specified, which is a big pain when you want the existing, natural order of a JSON schema from Pydantic since order can have a big impact on the output the LLM's have.

@abetlen
Copy link
Owner

abetlen commented Feb 13, 2024

@lukestanley good catch, I think we both found the issue at the same time last night, it should be fixed now from d1822fe onward

@abetlen abetlen closed this Feb 13, 2024
@lukestanley
Copy link
Contributor Author

lukestanley commented Feb 13, 2024

Hahaha! That's nuts! I was just putting together an example:

import json
from typing import List
from pprint import pprint
from pydantic import BaseModel, Field
import llama_cpp



class Actor(BaseModel):
    name: str = Field(..., description="Name of an actor")
    film_names: List[str] = Field(..., description="List of films they starred in")

    class Config:
        json_schema_extra = {
            "example": {
                "name": "Jim Carrey",
                "film_names": [
                    "Ace Ventura: Pet Detective",
                    "The Mask",
                    "Dumb and Dumber",
                ],
            }
        }


schema = Actor.model_json_schema()
del schema["example"]
json_schema = json.dumps(schema)
example = Actor.model_config['json_schema_extra']['example']
grammar = llama_cpp.LlamaGrammar.from_json_schema(json_schema)


prompt = f"""Instruction:
Provide the name of actors in JSON format matching this: {json_schema}.
Please provide a comedy actor.
Output:
{json.dumps(example)}
Instruction: Now please provide an actor who started as playing a mafia figure but plays lots of different roles.
Output:
"""

llm = llama_cpp.Llama(model_path="/home/user/Downloads/phi-2.Q4_K_M.gguf")
output_text = llm(
    prompt,
    max_tokens=300,
    stop=["Instruction:", "Output:"],
    echo=False,
    temperature=0.3,
    grammar=grammar,
)["choices"][0]["text"]
model = Actor.model_validate_json(output_text)
pprint(model, sort_dicts=False, indent=2)


stream = llm(
    prompt,
    max_tokens=300,
    stop=["Instruction:", "Output:"],
    echo=False,
    temperature=0.3,
    grammar=grammar,
    stream=True
)

output_text = ""
for chunk in stream:
    result = chunk["choices"][0]
    print(result["text"], end='', flush=True)
    output_text = output_text + result["text"]
model = Actor.model_validate_json(output_text)
pprint(model, sort_dicts=False, indent=2)

Would something like that be useful somewhere in the examples directory?
If so I can do a PR.
@abetlen

@abetlen
Copy link
Owner

abetlen commented Feb 14, 2024

Sure thing! For showing off the json prop order thing instructor has some good examples of adding a chain_of_thought property before an answer property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants