-
Notifications
You must be signed in to change notification settings - Fork 4
Add Model Capabilities Support #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ilopezluna We are adding similar fields to the model spec. It borrows your idea of differenciating input and output types, and extend the capabilities with reasoning, knowledge cut off date and instruction following. Does the extention make sense to you? |
Added my feedback in your PR 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a business logic perspective, but Emily will probably have more useful feedback.
@@ -22,6 +23,13 @@ const ( | |||
MediaTypeLicense = types.MediaType("application/vnd.docker.ai.license") | |||
|
|||
FormatGGUF = Format("gguf") | |||
|
|||
// IOTypeText Valid IO types | |||
IOTypeText = "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: Would it be worth defining type IOType string
and then typing the constants? If you defined func (t IOType) MarshalJSON
and func (t *IOType) UnmarshalJSON
then you could perform the validation during decoding.
types/config.go
Outdated
// Capabilities describes what the model can do | ||
type Capabilities struct { | ||
IO IOTypes `json:"io"` | ||
ToolUsage bool `json:"tool_usage"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a *bool
might be better here so we can differentiate "doesn't support this capability" from "unknown". This could be important if we add new capabilities and runtimes choose to validate capabilities - older models published before the field addition should be nil
rather than false
, so runtimes can handle the unknown case differently if desired.
|
||
// Capabilities describes what the model can do | ||
type Capabilities struct { | ||
IO IOTypes `json:"io"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of io
maybe modalities
? I like the grouping, but io
isn't really adding additional context when the two fields are input
and output
, whereas modalities
communicates what exactly about the input and output is being described.
wdyt @bergwolf? Since we are aiming for convergence, consensus is probably more important than the exact keys here. I am also okay with input_types
and output_types
(without the additional layer of nesting) like in modelpack/model-spec#52 if there is a strong preference
…lity" from "unknown". This could be important if we add new capabilities and runtimes choose to validate capabilities - older models published before the field addition should be nil rather than false, so runtimes can handle the unknown case differently if desired.
This PR introduces support for specifying model capabilities during packaging, allowing users to define input/output types and tool usage capabilities for their models.
Added new command-line flags to the package command:
--input
: Comma-separated list of input types (text, embedding, image, audio, video)--output
: Comma-separated list of output types (text, embedding, image, audio, video)--tool-usage
: Flag to enable tool usage supportNote: my first attempt was to define the capabilities during model instantiation. The benefit of that approach is that the config is created once and never updated.
However, I ultimately discarded it in favor of the current approach, as I believe it's cleaner from an implementation perspective.
Let me know which approach you prefer 🙏