Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

This file was deleted.

This file was deleted.

17 changes: 11 additions & 6 deletions content/learning-paths/servers-and-cloud-computing/rag/_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,17 @@
title: Run a llama.cpp chatbot powered by Arm Kleidi technology

overview: |
Some description of this sucker.
This Arm learning path shows how to use a single c4a-standard-64 Google Axion instance -- powered by an Arm Neoverse CPU -- to build a simple "Token as a Service" RAG-enabled server, used below to provide a chatbot to serve a small number of concurrent users.

This architecture would be suitable for businesses looking to deploy the latest Generative AI technologies with RAG capabilities using their existing CPU compute capacity and deployment pipelines. It enables semantic search over chunked documents using FAISS vector store. The demo uses the open source llama.cpp framework, which Arm has enhanced by contributing the latest Arm Kleidi technologies. Further optimizations are achieved by using the smaller 8 billion parameter Llama 3.1 model, which has been quantized to optimize memory usage.

Chat with the Llama-3.1-8B RAG-enabled LLM below to see the performance for yourself, then follow the learning path to build your own Generative AI service on Arm Neoverse.


demo_steps:
- Type & send a message to the chatbot.
- Receive the chatbot's reply.
- View stats showing how well AWS Graviton runs LLMs.
- Receive the chatbot's reply, including references from RAG data.
- View stats showing how well Google Axion runs LLMs.

diagram: config-diagram-dark.png
diagram_blowup: config-diagram.png
Expand All @@ -18,9 +22,10 @@ terms_and_conditions: demo-terms-and-conditions.txt
prismjs: true # enable prismjs rendering of code snippets

example_user_prompts:
- Do Hyperscan and Snort3 work on Graviton4?
- How can I easily build multi-architecture Docker images?

- How can I build multi-architecture Docker images?
- How do I test Java performance on Google Axion instances?


rag_data_cutoff_date: 2025/01/17

title_chatbot_area: Arm RAG Demo
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
next_step_guidance: >
Thank you for completing this Learning path on how to run a LLM chatbot on an Arm-based server. You might be interested in learning how to run a NLP sentiment analysis model on an Arm-based server.
Thank you for completing this Learning path on how to run a RAG-enabled LLM chatbot on an Arm-based server. You might be interested in learning how to run a NLP sentiment analysis model on an Arm-based server.

recommended_path: "/learning-paths/servers-and-cloud-computing/nlp-hugging-face/"

Expand All @@ -17,10 +17,6 @@ further_reading:
title: Democratizing Generative AI with CPU-based inference
link: https://blogs.oracle.com/ai-and-datascience/post/democratizing-generative-ai-with-cpu-based-inference
type: blog
- resource:
title: Llama-2-7B-Chat-GGUF
link: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
type: website


# ================================================================================
Expand Down
45 changes: 0 additions & 45 deletions content/learning-paths/servers-and-cloud-computing/rag/_review.md

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@
<div class="c-row u-gap-1/2 u-flex-nowrap u-padding-top-0">
<div class="c-col">
<h2>RAG Vector Store Details</h2>
<p>This app uses all data on this site, <a href="https://www.learn.arm.com">learn.arm.com</a>, as the RAG data set. The Markdown formatted content across Learning Paths and Install Guides was segmented into labeled chunks, and vector embeddings were generated. FAISS is used for the embedded similarity search. The LLM demo below references this vector store for your query.</p>
<p>This application uses all data on <a href="https://www.learn.arm.com">learn.arm.com</a>
as the RAG dataset. The content across Learning Paths and Install Guides is segmented into labeled chunks,
and vector embeddings are generated.
This LLM demo references the FAISS vector store to answer your query.</p>
<p><b>Note:</b> Data was sourced on {{.Params.rag_data_cutoff_date}}.</p>
</div>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,38 @@

const renderer = new marked.Renderer();

renderer.link = (link) => {
// Extract the link parts
const href = link.href;
const text = link.text;
const title = link.title;

// Escape href to prevent XSS attacks
const escapedHref = href
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;');

// Escape title if it exists
const escapedTitle = title
? title
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;')
: '';

// Create the link element with target="_blank"
return `
<a href="${escapedHref}"${escapedTitle ? ` title="${escapedTitle}"` : ''} target="_blank" rel="noopener noreferrer">
${text}
</a>
`.replace(/\n\s+/g, ''); // Remove unnecessary newlines and spaces
};

// Customize the code block rendering
renderer.code = (code, language) => {
var language = code['lang'];
Expand Down