👩‍🏫 adrienbrault/instructrice

Typing LLM completions

Best in class LLMs are able to output JSON following a schema you provide, usually JSON-Schema. This significantly expands the ways you can leverage LLMs in your application!

Think of the input as:

A context, anything that is or can be converted to text, like emails/pdfs/html/xlsx
A schema, "Here is the form you need to fill to complete your task"
An optional prompt, giving a specific task, rules, etc

And the output/outcome is whichever structure best matches your use case and domain.

The python instructor cookbook has interesting examples.

Introduction

Instructrice is a PHP library that simplifies working with structured output from LLMs in a type-safe manner.

Features:

Flexible schema options:
- Classes using api-platform/json-schema
- Dynamically generated types PSL\Type
- Or a JSON-Schema array generated by a third party library, or in plain PHP
symfony/serializer integration to deserialize LLMs outputs
Streaming first:
- As a developer you can be more productive with faster feedback loops than waiting for outputs to complete. This also makes slower local models more usable.
- You can provide a much better and snappier UX to your users.
- The headaches of parsing incomplete JSON are handled for you.
A set of pre-configured LLMs with the best available settings. Set your API keys and switch between different providers and models without having to think about the model name, json mode, function calling, etc.

A Symfony Bundle is also available.

Installation and Usage

$ composer require adrienbrault/instructrice:@dev

use AdrienBrault\Instructrice\InstructriceFactory;
use AdrienBrault\Instructrice\LLM\Provider\Ollama;
use AdrienBrault\Instructrice\LLM\Provider\OpenAi;
use AdrienBrault\Instructrice\LLM\Provider\Anthropic;

$instructrice = InstructriceFactory::create(
    defaultLlm: Ollama::HERMES2PRO_MISTRAL_7B,
    apiKeys: [ // Unless you inject keys here, api keys will be fetched from environment variables
        OpenAi::class => $openAiApiKey,
        Anthropic::class => $anthropicApiKey,
    ],
);

List of object

use AdrienBrault\Instructrice\Attribute\Prompt;

class Character
{
    // The prompt annotation lets you add instructions specific to a property
    #[Prompt('Just the first name.')]
    public string $name;
    public ?string $rank = null;
}

$characters = $instructrice->getList(
    Character::class,
    'Colonel Jack O\'Neil walks into a bar and meets Major Samanta Carter. They call Teal\'c to join them.',
);

/*
dump($characters);
array:3 [
  0 => Character^ {
    +name: "Jack"
    +rank: "Colonel"
  }
  1 => Character^ {
    +name: "Samanta"
    +rank: "Major"
  }
  2 => Character^ {
    +name: "Teal'c"
    +rank: null
  }
]
*/

Object

$character = $instructrice->get(
    type: Character::class,
    context: 'Colonel Jack O\'Neil.',
);

/*
dump($character);
Character^ {
  +name: "Jack"
  +rank: "Colonel"
}
*/

Dynamic Schema

$label = $instructrice->get(
    type: [
        'type' => 'string',
        'enum' => ['positive', 'neutral', 'negative'],
    ],
    context: 'Amazing great cool nice',
    prompt: 'Sentiment analysis',
);

/*
dump($label);
"positive"
*/

You can also use third party json schema libraries like goldspecdigital/oooas to generate the schema:

examples/oooas.php

CleanShot.2024-04-18.at.14.11.39.mp4

Supported providers

Provider	Environment Variables	Enum	API Key Creation URL
Ollama	`OLLAMA_HOST`	Ollama
OpenAI	`OPENAI_API_KEY`	OpenAi	API Key Management
Anthropic	`ANTHROPIC_API_KEY`	Anthropic	API Key Management
Mistral	`MISTRAL_API_KEY`	Mistral	API Key Management
Fireworks AI	`FIREWORKS_API_KEY`	Fireworks	API Key Management
Groq	`GROQ_API_KEY`	Groq	API Key Management
Together AI	`TOGETHER_API_KEY`	Together	API Key Management
Deepinfra	`DEEPINFRA_API_KEY`	Deepinfra	API Key Management
Perplexity	`PERPLEXITY_API_KEY`	Perplexity	API Key Management
Anyscale	`ANYSCALE_API_KEY`	Anyscale	API Key Management
OctoAI	`OCTOAI_API_KEY`	OctoAI	API Key Management

The supported providers are Enums, which you can pass to the llm argument of InstructriceFactory::create:

use AdrienBrault\Instructrice\InstructriceFactory;
use AdrienBrault\Instructrice\LLM\Provider\OpenAi;

$instructrice->get(
    ...,
    llm: OpenAi::GPT_4T, // API Key will be fetched from the OPENAI_API_KEY environment variable
);

Supported models

Strategy	📄 Text	🧩 JSON	🚀 Function

Commercial usage 💼	✅ Yes	⚠️ Yes, but	❌ Nope

Open Weights

Foundation

	💼	ctx	Ollama	Mistral	Fireworks	Groq	Together	DeepInfra	Perplexity	Anyscale	OctoAI
Mistral 7B	✅	32k		🧩 68/s			📄 98/s		📄 88/s !ctx=16k!	🧩	🧩
Mixtral 8x7B	✅	32k		🧩 44/s	🧩 237/s	📄 560/s	🚀 99/s		📄 119/s !ctx=16k!	🧩	🧩
Mixtral 8x22B	✅	65k		🧩 77/s	🧩 77/s		📄 52/s	🧩 40/s	📄 62/s !ctx=16k!	🧩	🧩
Phi-3-Mini-128K	✅	128k	🧩
Llama3 8B	⚠️	8k	📄		🧩 280/s	📄 800/s	📄 194/s	🧩 133/s	📄 121/s	🧩	🧩
Llama3 70B	⚠️	8k	🧩		🧩 116/s	📄 270/s	📄 105/s	🧩 26/s	📄 42/s	🧩	🧩
Gemma 7B	⚠️	8k				📄 800/s	📄 118/s	🧩 64/s		🧩
DBRX	⚠️	32k			🧩 50/s		📄 72/s	🧩
Qwen1.5 32B	⚠️	32k					📄				🧩
Command R	❌	128k	📄
Command R+	❌	128k	📄

Throughputs from https://artificialanalysis.ai/leaderboards/providers .

Fine Tune

	💼	ctx	Base	Ollama	Fireworks	Together	DeepInfra	OctoAI
Hermes 2 Pro Mistral 7B	✅		Mistral 7B	🧩	🧩			🧩
FireFunction V1	✅		Mixtral 8x7B		🚀
WizardLM 2 7B	✅		Mistral 7B				🧩
WizardLM 2 8x22B	✅		Mixtral 8x7B			📄	🧩	🧩
Capybara 34B	✅	200k	Yi 34B		🧩
Hermes 2 Pro Llama3 8B	⚠️		Llama3 8B	📄
Dolphin 2.9	⚠️	8k	Llama3 8B	🧩		📄	🧩

Proprietary

Provider	Model	ctx
Mistral	Large	32k	✅ 26/s
OpenAI	GPT-4 Turbo	128k	🚀 24/s
OpenAI	GPT-3.5 Turbo	16k	🚀 72/s
Anthropic	Claude 3 Haiku	200k	📄 88/s
Anthropic	Claude 3 Sonnet	200k	📄 59/s
Anthropic	Claude 3 Opus	200k	📄 26/s
Perplexity	Sonar Small Chat	16k	📄
Perplexity	Sonar Small Online	12k	📄
Perplexity	Sonar Medium Chat	16k	📄
Perplexity	Sonar Medium Online	12k	📄

Throughputs from https://artificialanalysis.ai/leaderboards/providers .

Automate updating these tables by scraping https://artificialanalysis.ai , along with chatboard arena elo.? Would be a good use case / showcase of this library/cli?

Custom Models

You can also use any OpenAI compatible api by passing an LLMConfig:

use AdrienBrault\Instructrice\InstructriceFactory;
use AdrienBrault\Instructrice\LLM\LLMConfig;
use AdrienBrault\Instructrice\LLM\Cost;
use AdrienBrault\Instructrice\LLM\OpenAiLLM;
use AdrienBrault\Instructrice\LLM\OpenAiJsonStrategy;
use AdrienBrault\Instructrice\LLM\Provider\ProviderModel;
use AdrienBrault\Instructrice\Http\GuzzleStreamingClient;
use GuzzleHttp\Client;

$instructrice->get(
    ...,
    llm: new LLMConfig(
        uri: 'https://api.together.xyz/v1/chat/completions',
        model: 'meta-llama/Llama-3-70b-chat-hf',
        contextWindow: 8000,
        label: 'Llama 3 70B',
        provider: 'Together',
        cost: Cost::create(0.9),
        strategy: OpenAiJsonStrategy::JSON,
        headers: [
            'Authorization' => 'Bearer ' . $apiKey,
        ]
    ),
);

You may also implement LLMInterface.

Acknowledgements

Obviously inspired by instructor-php and instructor.

How is it different from instructor php?

Both libraries essentially do the same thing:

Automatic schema generation from classes
Multiple LLM/Providers abstraction/support
Many strategies to extract data: function calling, json mode, etc
Automatic deserialization/hydration
Maybe validation/retries later for this lib.

However, instructice differs with:

Streaming first.
Preconfigured provider+llms, to not have to worry about:
- Json mode, function calling, etc
- The best prompt format to use
- Your options for local models
- Whether streaming works. For example, groq can only do streaming without json-mode/function calling.
PSR-3 logging
Guzzle+symfony/http-client support
No messages. You just pass context, prompt.
- I am hoping that this choice enables cool things later like supporting few-shots examples, evals, etc
More flexible schema options
Higher level abstraction. You aren't able to provide a list of messages, while it is possible with instructor-php.

Notes/Ideas

Things to look into:

Unstructured
Llama Parse
EMLs
jina-ai/reader -> This is awesome, $client->request('GET', 'https://r.jina.ai/' . $url)
firecrawl

DSPy is very interesting. There are great ideas to be inspired by.

Ideally this library is good to prototype with, but can support more advanced extraction workflows with few shot examples, some sort of eval system, generating samples/output like DSPy, etc

Would be cool to have a CLI, that accepts a FQCN and a context.

instructrice get "App\Entity\Customer" "$(cat some_email_body.md)"

Autosave all input/schema/output in sqlite db. Like llm? Leverage that to test examples, add few shots, evals?

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ci.sh		ci.sh
composer.json		composer.json
ecs.php		ecs.php
phpstan.neon.dist		phpstan.neon.dist
phpunit.xml		phpunit.xml
pre-commit.sh		pre-commit.sh
rector.php		rector.php

License

adrienbrault/instructrice

Folders and files

Latest commit

History

Repository files navigation