Skip to content

[dotnet-ai] ONNX Runtime + local LLM inference#243

Draft
luisquintanilla wants to merge 2 commits intodotnet:mainfrom
luisquintanilla:dotnet-ai/onnx-local-llm
Draft

[dotnet-ai] ONNX Runtime + local LLM inference#243
luisquintanilla wants to merge 2 commits intodotnet:mainfrom
luisquintanilla:dotnet-ai/onnx-local-llm

Conversation

@luisquintanilla
Copy link
Copy Markdown
Contributor

Fixes #231
Part of #225

Dependencies

Merge order: This PR should be merged after #237 Plugin scaffold.

Summary

Adds two inference skills covering model execution at the edge:

  • onnx-runtime-inference Running pre-trained ONNX models via standalone InferenceSession or ML.NET's ApplyOnnxModel(), with execution provider selection (CPU, CUDA, DirectML). Includes references/tensors.md for tensor I/O patterns and a text preprocessing section using Microsoft.ML.Tokenizers for BERT/WordPiece models.
  • local-llm-inference Running LLMs locally via Ollama and Foundry Local, both surfacing models through MEAI's IChatClient abstraction (provider-agnostic).

Changes

File Description
plugins/dotnet-ai/skills/onnx-runtime-inference/SKILL.md Standalone ONNX + ML.NET ONNX, execution providers, text preprocessing with tokenizers
plugins/dotnet-ai/skills/onnx-runtime-inference/references/tensors.md Tensor I/O patterns: shape management, named inputs/outputs, batch processing, OrtValue
plugins/dotnet-ai/skills/local-llm-inference/SKILL.md Ollama + Foundry Local setup, model selection, IChatClient abstraction
tests/dotnet-ai/onnx-runtime-inference/eval.yaml Eval: ONNX model inference scenario
tests/dotnet-ai/local-llm-inference/eval.yaml Eval: local LLM setup + production rejection scenarios

Key Packages

  • Microsoft.ML.OnnxRuntime / .Gpu / .DirectML
  • Microsoft.ML.OnnxTransformer
  • OllamaSharp
  • Microsoft.AI.Foundry.Local

Validation

  • onnx-runtime-inference covers both standalone and ML.NET approaches
  • Execution providers (CPU, CUDA, DirectML) documented
  • tensors.md provides tensor shape and memory management guidance
  • local-llm-inference presents Ollama and Foundry Local neutrally
  • Both eval.yaml files have scenarios

Adds two inference skills: onnx-runtime-inference (standalone and ML.NET
ONNX, execution providers, tensors reference) and local-llm-inference
(Ollama and Foundry Local through IChatClient).

Fixes dotnet#231
Part of dotnet#225

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
.NET 10 is the current LTS (released Nov 2025). .NET 8 reaches
end-of-support Nov 2026.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 7 days of this comment. If it is closed, you may reopen it anytime when you're ready again.

Generated by Close Stale Pull Requests ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[dotnet-ai] ONNX Runtime + local LLM inference

2 participants