-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Closed
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET codeBuildFeatures planned for next Build conferenceFeatures planned for next Build conferenceSK-H2-PlanningIssues tagged with this label are listed in SK H2 Planning loopIssues tagged with this label are listed in SK H2 Planning loopai connectorAnything related to AI connectorsAnything related to AI connectorssk team issueA tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)
Description
Implement a hybrid model orchestration within Semantic Kernel to leverage both local and cloud models. The system should default to local models for inference where available and seamlessly fall back to cloud models. Additionally, it should support local memory storage and retrieval, using cloud-based solutions as a fallback or for additional backup. This hybrid strategy should be abstracted within the Semantic Kernel, enabling developers to specify preferences and priorities without managing the underlying complexities. This should build on top of the capabilities we already have.
Scenarios
- As a developer, I want my Semantic Kernel application to utilize local models for inference to achieve low-latency responses while falling back to cloud models when local models are unavailable or insufficient.
Requirements
Model Orchestration Layer:
- Create a model orchestration layer within the Semantic Kernel capable of routing requests to either local or cloud models based on availability and priority settings.
- Develop a configuration file where users can specify local and cloud model endpoints and prioritize their usage.
- Inference Abstraction:
- Abstract model inference calls such that the application can make a single call, and the underlying architecture decides whether to use local or cloud resources.
- Support dynamic switching between local and cloud models based on real-time performance monitoring (e.g., latency, throughput).
Metadata
Metadata
Assignees
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET codeBuildFeatures planned for next Build conferenceFeatures planned for next Build conferenceSK-H2-PlanningIssues tagged with this label are listed in SK H2 Planning loopIssues tagged with this label are listed in SK H2 Planning loopai connectorAnything related to AI connectorsAnything related to AI connectorssk team issueA tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)
Type
Projects
Status
Sprint: Done