Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ title: Deploy a RAG-based Chatbot with llama-cpp-python using KleidiAI on Arm Se

minutes_to_complete: 45

who_is_this_for: This Learning Path is for software developers, ML engineers, and those looking to deploy production-ready LLM chatbots with RAG capabilities, knowledge base integration, and performance optimization for Arm Architecture.
who_is_this_for: This Learning Path is for software developers, ML engineers, and those looking to deploy production-ready LLM chatbots with Retrieval Augmented Generation (RAG) capabilities, knowledge base integration, and performance optimization for Arm Architecture.

learning_objectives:
- Set up llama-cpp-python optimized for Arm servers.
- Implement RAG architecture using the FAISS vector database.
- Implement RAG architecture using the Facebook AI Similarity Search (FAISS) vector database.
- Optimize model performance through 4-bit quantization.
- Build a web interface for document upload and chat.
- Monitor and analyze inference performance metrics.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,14 @@ Open the web application in your browser using either the local URL or the exter
http://localhost:8501 or http://75.101.253.177:8501
```

{{% notice Note %}}

To access the links you may need to allow inbound TCP traffic in your instance's security rules. Always review these permissions with caution as they may introduce security vulnerabilities.

{{% /notice %}}
## Upload a PDF File and Create a New Index

Now you can upload a PDF file in the web browser by selecting the **Create New Store** option.
Now you can upload a PDF file in the web browser by selecting the **Create New Store** option.

Follow these steps to create a new index:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This learning path demonstrates how to build and deploy a Retrieval Augmented Ge

## Overview

In this Learning Path, you learn how to build a Retrieval Augmented Generation (RAG) chatbot using llama-cpp-python, a Python binding for llama.cpp that enables efficient LLM inference on Arm CPUs.
In this Learning Path, you learn how to build a RAG chatbot using llama-cpp-python, a Python binding for llama.cpp that enables efficient LLM inference on Arm CPUs.

The tutorial demonstrates how to integrate the FAISS vector database with the Llama-3.1-8B model for document retrieval, while leveraging llama-cpp-python's optimized C++ backend for high-performance inference.

Expand Down