Skip to content

Add Model Capabilities Support #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

ilopezluna
Copy link
Contributor

@ilopezluna ilopezluna commented May 20, 2025

This PR introduces support for specifying model capabilities during packaging, allowing users to define input/output types and tool usage capabilities for their models.

Added new command-line flags to the package command:
--input: Comma-separated list of input types (text, embedding, image, audio, video)
--output: Comma-separated list of output types (text, embedding, image, audio, video)
--tool-usage: Flag to enable tool usage support

Note: my first attempt was to define the capabilities during model instantiation. The benefit of that approach is that the config is created once and never updated.
However, I ultimately discarded it in favor of the current approach, as I believe it's cleaner from an implementation perspective.
Let me know which approach you prefer 🙏

@ilopezluna ilopezluna marked this pull request as ready for review May 20, 2025 14:35
@ilopezluna ilopezluna requested a review from a team May 20, 2025 14:35
@bergwolf
Copy link

@ilopezluna We are adding similar fields to the model spec. It borrows your idea of differenciating input and output types, and extend the capabilities with reasoning, knowledge cut off date and instruction following. Does the extention make sense to you?

@ilopezluna
Copy link
Contributor Author

@ilopezluna We are adding similar fields to the model spec. It borrows your idea of differenciating input and output types, and extend the capabilities with reasoning, knowledge cut off date and instruction following. Does the extention make sense to you?

Added my feedback in your PR 👍

Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a business logic perspective, but Emily will probably have more useful feedback.

@@ -22,6 +23,13 @@ const (
MediaTypeLicense = types.MediaType("application/vnd.docker.ai.license")

FormatGGUF = Format("gguf")

// IOTypeText Valid IO types
IOTypeText = "text"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: Would it be worth defining type IOType string and then typing the constants? If you defined func (t IOType) MarshalJSON and func (t *IOType) UnmarshalJSON then you could perform the validation during decoding.

@ilopezluna ilopezluna requested a review from ekcasey May 29, 2025 10:25
types/config.go Outdated
// Capabilities describes what the model can do
type Capabilities struct {
IO IOTypes `json:"io"`
ToolUsage bool `json:"tool_usage"`
Copy link
Contributor

@ekcasey ekcasey Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a *bool might be better here so we can differentiate "doesn't support this capability" from "unknown". This could be important if we add new capabilities and runtimes choose to validate capabilities - older models published before the field addition should be nil rather than false, so runtimes can handle the unknown case differently if desired.


// Capabilities describes what the model can do
type Capabilities struct {
IO IOTypes `json:"io"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of io maybe modalities? I like the grouping, but io isn't really adding additional context when the two fields are input and output, whereas modalities communicates what exactly about the input and output is being described.

wdyt @bergwolf? Since we are aiming for convergence, consensus is probably more important than the exact keys here. I am also okay with input_types and output_types (without the additional layer of nesting) like in modelpack/model-spec#52 if there is a strong preference

…lity" from "unknown". This could be important if we add new capabilities and runtimes choose to validate capabilities - older models published before the field addition should be nil rather than false, so runtimes can handle the unknown case differently if desired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants