Skip to content

Commit

Permalink
Structures mostly done, cleanup in docs/hub (subfolders)
Browse files Browse the repository at this point in the history
  • Loading branch information
ddebowczyk committed Apr 28, 2024
1 parent cd4d0e7 commit 5c2c30e
Show file tree
Hide file tree
Showing 116 changed files with 1,288 additions and 608 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -518,7 +518,7 @@ class Skill {
You can use ValidationMixin trait to add ability of easy, custom data object validation.

```php
use Cognesy\Instructor\Traits\ValidationMixin
use Cognesy\Instructor\Validation\Traits\ValidationMixin;

class User {
use ValidationMixin;
Expand Down
5 changes: 0 additions & 5 deletions docs/concepts/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,6 @@ assert($prediction->classLabel == Label::SPAM);

## Multi-Label Classification

!!! example

Run example from CLI: `php examples/ClassificationMulticlass/run.php`


### Defining the Structures

For multi-label classification, we introduce a new enum class and a different PHP class to handle multiple labels.
Expand Down
8 changes: 0 additions & 8 deletions docs/examples/index.md

This file was deleted.

File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class UserDetails
#[Assert\Callback]
public function validateName(ExecutionContextInterface $context, mixed $payload) {
if ($this->name !== strtoupper($this->name)) {
$context->buildViolation("Name must be in uppercase.")
$context->buildViolation("Name must be in all uppercase letters.")
->atPath('name')
->setInvalidValue($this->name)
->addViolation();
Expand Down
File renamed without changes.
5 changes: 4 additions & 1 deletion docs/hub/scalars.md → docs/hub/advanced/scalars.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Extracting Scalar Values
# Extracting scalar values

Sometimes we just want to get quick results without defining a class for
the response model, especially if we're trying to get a straight, simple
Expand All @@ -7,6 +7,9 @@ a simplified API for such cases.

```php
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__ . '../../src/');

use Cognesy\Instructor\Extras\Scalars\Scalar;
use Cognesy\Instructor\Instructor;

Expand Down
File renamed without changes.
77 changes: 77 additions & 0 deletions docs/hub/advanced/structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Structures

Structures allow dynamically define the shape of data to be extracted
by LLM.

Use `Structure::define()` to define the structure and pass it to Instructor
as response model.

If `Structure` instance has been provided as a response model, Instructor
returns an array in the shape you defined.

`Structure::define()` accepts array of `Field` objects.

Following types of fields are currently supported:
- Field::bool() - bool value
- Field::int() - int value
- Field::string() - string value
- Field::float() - float value
- Field::enum() - enum value
- Field::structure() - for nesting structures

Fields can be marked as optional with `$field->optional(bool $isOptional = true)`.
By default, all defined fields are required.

You can also provide extra instructions for LLM by `$field->description(string
$description)`.

Instructor also supports validation for structures.

You can define field validator with:
- `$field->validator(callable $validator)` - $validator has to return an instance of `ValidationResult`
- `$field->validIf(callable $condition, string $message)` - $condition has to return false if validation has not succeeded, $message with be provided to LLM as explanation for self-correction of the next extraction attempt

```php
<?php
global $events;
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);

$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__ . '../../src/');

use Cognesy\Instructor\Extras\Structure\Field;
use Cognesy\Instructor\Extras\Structure\Structure;
use Cognesy\Instructor\Instructor;

enum Role : string {
case Manager = 'manager';
case Line = 'line';
}

$structure = Structure::define([
'name' => Field::string('Name of the person'),
'age' => Field::int('Age of the person')->validIf(fn($value) => $value > 0, "Age has to be positive number"),
'address' => Field::structure(Structure::define([
'street' => Field::string('Street name')->optional(),
'city' => Field::string('City name'),
'zip' => Field::string('Zip code')->optional(),
]), 'Address of the person'),
'role' => Field::enum(Role::class, 'Role of the person'),
// 'properties' => Field::array(Structure::define([
// 'name' => Field::string('Name of the property'),
// 'value' => Field::string('Value of the property'),
// ]), 'Additional properties of the person'),
], 'Person', 'A person object');

$text = "Jane Doe lives in Springfield. She is 25 years old and works as a line worker. McDonald's in Ney York is located at 456 Elm St, NYC, 12345.";

$person = (new Instructor)->respond(
messages: $text,
responseModel: $structure,
);

dump($person);
?>
```
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Azure support
# Support for Azure OpenAI API

You can connect to Azure OpenAI instance using a dedicated client provided
by Instructor. Please note it requires setting up your own model deployment
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
56 changes: 56 additions & 0 deletions docs/hub/basics/basic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Basic use

Instructor allows you to use large language models to extract information
from the text (or content of chat messages), while following the structure
you define.

LLM does not 'parse' the text to find and retrieve the information.
Extraction leverages LLM ability to comprehend provided text and infer
the meaning of the information it contains to fill fields of the
response object with values that match the types and semantics of the
class fields.

The simplest way to use the Instructor is to call the `respond` method
on the Instructor instance. This method takes a string (or an array of
strings in the format of OpenAI chat messages) as input and returns a
data extracted from provided text (or chat) using the LLM inference.

Returned object will contain the values of fields extracted from the text.

The format of the extracted data is defined by the response model, which
in this case is a simple PHP class with some public properties.

```php
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__ . '../../src/');

use Cognesy\Instructor\Instructor;

// Step 1: Define a class that represents the structure and semantics
// of the data you want to extract
class User {
public int $age;
public string $name;
}

// Step 2: Get the text (or chat messages) you want to extract data from
$text = "Jason is 25 years old and works as an engineer.";
print("Input text:\n");
print($text . "\n\n");

// Step 3: Extract structured data using default language model API (OpenAI)
print("Extracting structured data using LLM...\n\n");
$user = (new Instructor)->respond(
messages: $text,
responseModel: User::class,
);

// Step 4: Now you can use the extracted data in your application
print("Extracted data:\n");
dump($user);

assert(isset($user->name));
assert(isset($user->age));
?>
```
42 changes: 42 additions & 0 deletions docs/hub/basics/basic_via_mixin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Basic use with HandlesExtraction trait

Instructor provides `HandlesExtraction` trait that you can use to enable
extraction capabilities directly on class via static `extract()` method.

`extract()` method returns an instance of the class with the data extracted
using the Instructor.

`extract()` method has following signature (you can also find it in the
`CanHandleExtraction` interface):

```php
static public function extract(
string|array $messages, // (required) The message(s) to extract data from
string $model = '', // (optional) The model to use for extraction (otherwise - use default)
int $maxRetries = 2, // (optional) The number of retries in case of validation failure
array $options = [], // (optional) Additional data to pass to the Instructor or LLM API
Instructor $instructor = null // (optional) The Instructor instance to use for extraction
) : static;
```

```php
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__ . '../../src/');

use Cognesy\Instructor\Extras\Mixin\HandlesExtraction;

class User {
use HandlesExtraction;

public int $age;
public string $name;
}

$user = User::extract("Jason is 25 years old and works as an engineer.");
dump($user);

assert(isset($user->name));
assert(isset($user->age));
?>
```
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class User
class UserWithPrivateField
{
public string $name;
protected int $age = 0;
private int $age = 0;
private string $password = '';

public function getAge(): int {
Expand All @@ -45,26 +45,32 @@ $text = <<<TEXT
Jason is 25 years old. His password is '123admin'.
TEXT;


// CASE 1: Class with public fields

$user = (new Instructor)->respond(
messages: [['role' => 'user', 'content' => $text]],
messages: $text,
responseModel: User::class
);

$userPriv = (new Instructor)->respond(
messages: [['role' => 'user', 'content' => $text]],
responseModel: UserWithPrivateField::class
);

echo "User with public 'password' field\n";
echo "User with public fields\n";
dump($user);

echo "User with private 'password' field\n";
dump($userPriv);

assert($user->name === "Jason");
assert($user->getAge() === 25);
assert($user->getPassword() === '123admin');


// CASE 2: Class with some private fields

$userPriv = (new Instructor)->respond(
messages: $text,
responseModel: UserWithPrivateField::class,
);

echo "User with private 'password' and 'age' fields\n";
dump($userPriv);

assert($userPriv->name === "Jason");
assert($userPriv->getAge() === 0);
assert($userPriv->getPassword() === '');
Expand Down
File renamed without changes.
44 changes: 44 additions & 0 deletions docs/hub/basics/validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Validation

Instructor uses validation to verify if the response generated by LLM
meets the requirements of your response model. If the response does not
meet the requirements, Instructor will throw an exception.

Instructor uses Symfony's Validator component to validate the response,
check their documentation for more information on the usage:
https://symfony.com/doc/current/components/validator.html

Following example demonstrates how to use Symfony Validator's constraints
to validate the email field of response.

```php
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__.'../../src/');

use Cognesy\Instructor\Instructor;
use Symfony\Component\Validator\Constraints as Assert;

class UserDetails
{
public string $name;
#[Assert\Email]
#[Assert\NotBlank]
/** Find user's email provided in the text */
public string $email;
}

$caughtException = false;
$user = (new Instructor)->request(
messages: [['role' => 'user', 'content' => "you can reply to me via mail -- Jason"]],
responseModel: UserDetails::class,
)->onError(function($e) use (&$caughtException) {
$caughtException = true;
})->get();

dump($user);

assert($user === null);
assert($caughtException === true);
?>
```
53 changes: 53 additions & 0 deletions docs/hub/basics/validation_mixin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Validation across multiple fields

Sometimes property level validation is not enough - you may want to check values of multiple
properties and based on the combination of them decide to accept or reject the response.
Or the assertions provided by Symfony may not be enough for your use case.

In such case you can easily add custom validation code to your response model by:
- using `ValidationMixin`
- and defining validation logic in `validate()` method.

In this example LLM should be able to correct typo in the message (graduation year we provided
is `1010` instead of `2010`) and respond with correct graduation year.

```php
<?php
$loader = require 'vendor/autoload.php';
$loader->add('Cognesy\\Instructor\\', __DIR__.'../../src/');

use Cognesy\Instructor\Instructor;
use Cognesy\Instructor\Validation\Traits\ValidationMixin;
use Cognesy\Instructor\Validation\ValidationResult;

class UserDetails
{
use ValidationMixin;

public string $name;
public int $birthYear;
public int $graduationYear;

public function validate() : ValidationResult {
if ($this->graduationYear > $this->birthYear) {
return ValidationResult::valid();
}
return ValidationResult::fieldError(
field: 'graduationYear',
value: $this->graduationYear,
message: "Graduation year has to be after birth year.",
);
}
}

$user = (new Instructor)->respond(
messages: [['role' => 'user', 'content' => 'Jason was born in 1990 and graduated in 1010.']],
responseModel: UserDetails::class,
maxRetries: 2
);

dump($user);

assert($user->graduationYear === 2010);
?>
```

0 comments on commit 5c2c30e

Please sign in to comment.