Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions content/learning-paths/servers-and-cloud-computing/onnx/_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Run a Phi-4-mini chatbot powered by ONNX Runtime
weight: 2

overview: |
This Learning Path shows you how to use a 32-core Azure Dpls_v6 instance powered by an Arm Neoverse-N2 CPU to build a simple chatbot server that you can then use to provide a chatbot to serve a small number of concurrent users.

This architecture is suitable for businesses looking to deploy the latest Generative AI technologies with RAG capabilities using their existing CPU compute capacity and deployment pipelines.

The demo uses the ONNX runtime, which Arm has enhanced with its own Kleidi technologies. Further optimizations are achieved by using the smaller Phi-4-mini model, which has been optimized at INT4 quantization to minimize memory usage.

Chat with the chatbot LLM below to see the performance for yourself, and then follow the Learning Path to build your own Generative AI service on Arm Neoverse.


demo_steps:
- Type and send a message to the chatbot.
- Receive the chatbot's reply.
- View performance statistics demonstrating how well Azure Cobalt 100 instances run LLMs.

diagram: config-diagram-dark.png
diagram_blowup: config-diagram.png

terms_and_conditions: demo-terms-and-conditions.txt

prismjs: true # enable prismjs rendering of code snippets

example_user_prompts:
- Prompt 1?
- Prompt 2?


rag_data_cutoff_date: 2025/01/17

title_chatbot_area: Phi-4-mini Chatbot Demo

prismjs: true



### Specific details to this demo
# ================================================================================
tps_max: 30 # sets stat visuals for tps
tps_ranges:
- name: Low
context: Around the average human reading rate of 3-5 words per second.
color: var(--arm-green)
min: 0
max: 5
- name: High
context: This is significantly higher than the average human reading rate of 5 words per second, delivering a stable and usable user chatbot experience from the Phi-4-mini LLM using the ONNX runtime.
color: var(--arm-green)
min: 5
max: 1000

### FIXED, DO NOT MODIFY
# ================================================================================
demo_template_name: phi_onnx_chatbot_demo # allows the 'demo.html' partial to route to the correct Configuration and Demo/Stats sub partials for page render.
weight: 2 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Interact with the Phi-4-mini Chatbot
weight: 4
weight: 5

layout: learningpathall
---
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Run the Chatbot Server
weight: 3
weight: 4

layout: learningpathall
---
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# User change
title: "Build ONNX Runtime and set up the Phi-4-mini Model"

weight: 2
weight: 3

# Do not modify these elements
layout: "learningpathall"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@
all_messages_div.removeChild(all_messages_div.firstChild);
}
{{ else if eq .Params.demo_template_name "llm_chatbot_first_demo" }}
{{ else if eq .Params.demo_template_name "phi_onnx_chatbot_demo" }}
{{ else }}
{{ end }}

Expand Down Expand Up @@ -629,6 +630,9 @@
{{ else if eq .Params.demo_template_name "llm_chatbot_first_demo" }}
{{ $server_location = getenv "HUGO_LLM_API" | base64Encode }}
console.log('Using LLM API.');
{{ else if eq .Params.demo_template_name "phi_onnx_chatbot_demo" }}
{{ $server_location = getenv "HUGO_PHI_ONNX_LLM_API" | base64Encode }}
console.log('Using HUGO_PHI_ONNX_LLM_API.');
{{ else }}
console.log('No server location provided.');
{{ end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@
{{else if eq .Params.demo_template_name "whisper_audio_demo"}}
{{/* {{partial "demo-components/config-params-only.html" .}} */}}

{{else if eq .Params.demo_template_name "phi_onnx_chatbot_demo"}}
{{/* {{partial "demo-components/config-params-only.html" .}} */}}

{{else if eq .Params.demo_template_name "kubectl_demo"}}
{{partial "demo-components/config-param-and-file.html" .}}

Expand All @@ -42,6 +45,10 @@
{{partial "demo-components/llm-voice-transcriber/demo-stats--llm-voice-transcriber.html" .}}
{{partial "demo-components/llm-voice-transcriber/javascript--llm-voice-transcriber.html" .}}

{{else if eq .Params.demo_template_name "phi_onnx_chatbot_demo"}}
{{partial "demo-components/llm-chatbot/demo-stats--llm-chatbot.html" .}}
{{partial "demo-components/llm-chatbot/javascript--llm-chatbot.html" .}}

{{else if eq .Params.demo_template_name "kubectl_demo"}}
{{partial "demo-components/demo--kubectl.html" .}}

Expand Down