Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ title: Run a Phi-4-mini chatbot powered by ONNX Runtime
weight: 2

overview: |
This Learning Path shows you how to use a 32-core Azure Dpls_v6 instance powered by an Arm Neoverse-N2 CPU to build a simple chatbot server that you can then use to provide a chatbot to serve a small number of concurrent users.
This Learning Path shows you how to use a 32-core Azure Dpls_v6 instance powered by an Arm Neoverse N2 CPU to build a simple chatbot that you can use to serve a small number of concurrent users.

This architecture is suitable for businesses looking to deploy the latest Generative AI technologies with RAG capabilities using their existing CPU compute capacity and deployment pipelines.
This architecture is suitable for deploying the latest Generative AI technologies with RAG capabilities using their existing CPU compute capacity and deployment pipelines.

The demo uses the ONNX runtime, which Arm has enhanced with its own Kleidi technologies. Further optimizations are achieved by using the smaller Phi-4-mini model, which has been optimized at INT4 quantization to minimize memory usage.
The demo uses the ONNX runtime, which Arm has integrated with KleidiAI. Further optimizations are achieved by using the smaller Phi-4-mini model, which has been optimized at INT4 quantization to minimize memory usage.

Chat with the chatbot LLM below to see the performance for yourself, and then follow the Learning Path to build your own Generative AI service on Arm Neoverse.
Chat with the LLM below to see the performance for yourself, and then follow the Learning Path to build your own Generative AI service on Arm Neoverse.


demo_steps:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ who_is_this_for: This is an advanced topic for developers, ML engineers, and clo

learning_objectives:
- Quantize and run the Phi-4-mini model with ONNX Runtime on Azure.
- Analyze performance on Arm Neoverse-N2 based Azure Cobalt 100 VMs.
- Analyze performance on Arm Neoverse N2 based Azure Cobalt 100 VMs.

prerequisites:
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider. This Learning Path has been tested on an Azure Cobalt 100 virtual machine.
Expand Down