Implements a unified `encode`ing/`decode`ing pipeline for `llm` #442

kiranandcode · 2025-12-12T02:50:28Z

This PR refactors the encoding/decoding pipeline to go through a unified encode/decode operation.

effectful/handlers/llm/encoding.py implements an associated type morally equivalent to:
```
structure Encodable (A: Type) where
    U: Type
    encode: A -> U
    decode: U -> A
```
it provides a function type_to_encodable_type which does a sort of type-class resolution and returns an instance of Encodable for a given type T.
effectful/handlers/llm/provider.py now uniformly constructs these Encodable[T] objects and uses their encode and decode operations when calling the LLM.

The high level pipeline to implement Template.__call__(template, args):

use type_to_encodable_type to obtain an Encodable[T] for each type in the signature of template
use the associated encode operators to convert the arguments args to pydantic types
use serialize to map pydantic types to OpenAIMessages
call provider with message
while tool_calls:
- use decode to map llm provided pydantic types back to python user types
- call tool with arguments
- use encode to map result to a pydantic type
- use serialize to map result to OpenAIMessage
- send back to LLM
take final call, use type_to_encodable_type to get Encodable for return type
use decode to map pydantic type to user type
return final result

I expect some discussion about the interface around Tool - for this initial version, I kept most of the API and logic the same, but we might want to restructure it a bit.

Another aspect is the encoding type resolution might be a bit overengineered. Currently it descends into types such as tuple[Image.Image, Image.Image] to construct an Encodable that does the right thing for nested types as well.

eb8680

This looks great. Two high-level questions in addition to the comments below:

I'm not seeing the type_to_encodable_type logic for the case when the user type T is a pydantic.BaseModel subtype, or any test cases. Is that case meant to be the object case where encode and decode are identity mappings? Can you add tests that exercise this logic?
What happened to deserialize? Have we decided to identify it with pydantic.BaseModel.model_validate_json?

effectful/handlers/llm/providers.py

effectful/handlers/llm/encoding.py

effectful/handlers/llm/providers.py

kiranandcode · 2025-12-12T16:07:36Z

@eb8680

I'm not seeing the type_to_encodable_type logic for the case when the user type T is a pydantic.BaseModel subtype, or any test cases. Is that case meant to be the object case where encode and decode are identity mappings? Can you add tests that exercise this logic?

Yes, will do.

What happened to deserialize? Have we decided to identify it with pydantic.BaseModel.model_validate_json?

Yes, from implementing this, a slight refinement on the design. Pydantic gives us:

serialize: U -> json
deserialize: json -> U

it could be that the serialize defop I've defined is misnamed, what it provides is actually serialize: U -> OpenAIMessage, for which we don't need an inverse.

kiranandcode · 2025-12-12T17:56:15Z

@eb8680 @jfeser after some iterations, whittled down type_to_encodable_type a bit more. Now we have an implementation, what I'm realising is that for most case type_to_encodable_type is identity (U is the same type as T).

The one case where we don't want identity is for images:

@type_to_encodable_type.register(Image.Image)
class EncodableImage(_Encodable[Image.Image, ChatCompletionImageUrlObject]):
    t = ChatCompletionImageUrlObject

    @classmethod
    def encode(cls, image: Image.Image) -> ChatCompletionImageUrlObject:
        return {
            "detail": "auto",
            "url": _pil_image_to_base64_data_uri(image),
        }

    @classmethod
    def decode(cls, image: ChatCompletionImageUrlObject) -> Image.Image:
        image_url = image["url"]
        if not image_url.startswith("data:image/"):
            raise RuntimeError(
                f"expected base64 encoded image as data uri, received {image_url}"
            )
        data = image_url.split(",")[1]
        return Image.open(fp=io.BytesIO(base64.b64decode(data)))

and on top of this, we want non-identity mappings for two more cases on top of this: list[Image.Image], and tuple[....,Image.Image,...].

We map these types to list[ChatCompletionImageUrlObject] and tuple[...,ChatCompletionImageUrlObject,...] so when we use pydantic to serialize user-input that contains PIL.Image.Image's, we don't get an exception.

That being said, maybe this is incorrect? Pydantic won't complain now, but currently serialize will just call str on the encoded U, which will provide a string input to the llm, and return a list with a single ChatCompletionTextObject but maybe we'd want serialize to return a list of ChatCompletionImageUrlObjects.

adding a test for this functionality (whether tools that return list[Image] work as expected)

kiranandcode · 2025-12-12T18:04:41Z

Test fails, as expected:

I retrieved a set of 5 images. Each image appears to be a simple placeholder graphic encoded in Base64. Here's a general description:

1. **Image 1 to 5:** All images are identical, consisting of a very basic visual placeholder. They are tiny (8x8 pixels) representations that lack distinctive features or detailed content, likely used just as a stand-in graphic, possibly for testing purposes. They don't contain any meaningful scenes or identifiable objects.

These are minimalistic graphics and don't offer much to describe. Please provide more specific criteria if you need a different type of image or additional detail!

effectful/handlers/llm/encoding.py

eb8680

Looks good to me, but @jfeser and @datvo06 should probably also sign off before we merge.

datvo06

Thanks! LGTM!

eb8680 · 2025-12-15T14:55:13Z

@jfeser any further comments before this lands?

jfeser

A few nits, but otherwise looks good!

effectful/handlers/llm/encoding.py

kiranandcode added 2 commits December 11, 2025 17:15

implemented unified encoding type

4c907e5

implemented decoding

3477aa1

kiranandcode requested review from eb8680 and jfeser December 12, 2025 02:50

kiranandcode mentioned this pull request Dec 12, 2025

implemented type-based decoding #440

Closed

eb8680 reviewed Dec 12, 2025

View reviewed changes

eb8680 requested a review from datvo06 December 12, 2025 14:59

eb8680 mentioned this pull request Dec 12, 2025

Initial version of Lexical Context Collection - Collecting Tools and Template #434

Merged

jfeser reviewed Dec 12, 2025

View reviewed changes

effectful/handlers/llm/providers.py Outdated Show resolved Hide resolved

kiranandcode added 13 commits December 12, 2025 11:13

unified __init__

63c1c98

added tests for basemodels

418253a

s/@property/@functools.cached_property/

9de0073

type for encode and decode

9530180

removed handling for numbers.Number and explicit tests for complex

e11759c

fixed is_dataclass checks

2ed0254

updated to check parameter annotations in Tool.of_operation constructor

14f0906

updated serializer to be more selective in what is an image

ade480d

reducing number of #type: ignores, and switching to typing.Any

39c4cfa

removed comment

9378d09

dropped dataclass support

5ec1451

dropped tests for dataclass with image

6d6f5f2

updated dataclass tests to stop assuming pydantic models

6cf4157

test for tool that returns list of images

2ab6ad1

made serialization a parameter of encodable and thus type-directed

193b777

datvo06 reviewed Dec 12, 2025

View reviewed changes

effectful/handlers/llm/encoding.py Outdated Show resolved Hide resolved

kiranandcode added 2 commits December 12, 2025 14:21

dropped test for tool that returns list of images

60195d6

dropped registration of encodable types

d4cda9a

kiranandcode requested a review from jfeser December 12, 2025 20:31

eb8680 approved these changes Dec 12, 2025

View reviewed changes

kiranandcode requested a review from datvo06 December 13, 2025 07:40

datvo06 approved these changes Dec 14, 2025

View reviewed changes

jfeser requested changes Dec 15, 2025

View reviewed changes

effectful/handlers/llm/encoding.py Outdated Show resolved Hide resolved

effectful/handlers/llm/encoding.py Outdated Show resolved Hide resolved

kiranandcode added 2 commits December 15, 2025 10:52

dropped unused typevar

2c33590

s/_Encodable/EncodableAs/

a2506de

kiranandcode requested a review from jfeser December 15, 2025 15:55

eb8680 added the module:llm label Dec 15, 2025

eb8680 added this to the LLM Infrastructure milestone Dec 15, 2025

jfeser approved these changes Dec 15, 2025

View reviewed changes

jfeser merged commit 44d7d12 into staging-llm Dec 15, 2025
6 checks passed

jfeser deleted the kg-encodable branch December 15, 2025 16:08

This was referenced Dec 16, 2025

Adding mypy check and test #422

Open

Type-based decoding for Template results (generalises Constrained Decoding) #432

Closed

Implements a unified encodeing/decodeing pipeline for llm #442

Implements a unified encodeing/decodeing pipeline for llm #442

Uh oh!

Conversation

kiranandcode commented Dec 12, 2025

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kiranandcode commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranandcode commented Dec 12, 2025

Uh oh!

kiranandcode commented Dec 12, 2025

Uh oh!

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

datvo06 left a comment

Choose a reason for hiding this comment

Uh oh!

eb8680 commented Dec 15, 2025

Uh oh!

jfeser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implements a unified `encode`ing/`decode`ing pipeline for `llm` #442

Implements a unified `encode`ing/`decode`ing pipeline for `llm` #442

kiranandcode commented Dec 12, 2025 •

edited

Loading