From 699ea6c50356f10d04c5bca740d83a187e661f0e Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Tue, 15 Oct 2024 16:16:20 +0000 Subject: [PATCH 01/14] Editorial first-pass --- .../servers-and-cloud-computing/milvus-rag/_index.md | 12 ++++-------- .../milvus-rag/offline_data_loading.md | 6 +++--- .../milvus-rag/prerequisite.md | 6 +++--- 3 files changed, 10 insertions(+), 14 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md index 0bef6ba40f..bd2c5249ce 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md @@ -1,21 +1,17 @@ --- title: Build a Retrieval-Augmented Generation (RAG) application using Zilliz Cloud on Arm servers -draft: true -cascade: - draft: true - minutes_to_complete: 20 who_is_this_for: This is an introductory topic for software developers who want to create a RAG application on Arm servers. learning_objectives: - - Create a simple RAG application using Zilliz Cloud - - Launch a LLM service on Arm servers + - Create a simple RAG application using Zilliz Cloud. + - Launch a LLM service on Arm servers. prerequisites: - - Basic understanding of a RAG pipeline. - - An AWS Graviton3 c7g.2xlarge instance, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp) from a cloud service provider or an on-premise Arm server. + - A basic understanding of a RAG pipeline. + - An AWS Graviton3 c7g.2xlarge instance, or any [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp) from a cloud service provider or an on-premise Arm server. - A [Zilliz account](https://zilliz.com/cloud), which you can sign up for with a free trial. author_primary: Chen Zhang diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index d69bd1ffad..ddcb7d68b1 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -10,7 +10,7 @@ In this section, you will learn how to setup a cluster on Zilliz Cloud. You will ### Create a dedicated cluster -You will need to [register](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. +Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retreive the vector data as shown: @@ -21,14 +21,14 @@ When you select the `Create Cluster` Button, you should see the cluster running ![running](running_cluster.png) {{% notice Note %}} -You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. We can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md). +You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md). {{% /notice %}} ### Create the Collection With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster. -Within your activated python `venv`, start by creating a file named `zilliz-llm-rag.py` and copy the contents below into it: +Within your activated python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it: ```python from pymilvus import MilvusClient diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md index 9d336b341d..ff467b0a67 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md @@ -8,12 +8,12 @@ layout: learningpathall ## Overview -In this Learning Path, you will learn how to build a Retrieval-Augmented Generation (RAG) application on Arm-based servers. RAG applications often use vector databases to efficiently store and retrieve high-dimensional vector representations of text data. Vector databases are optimized for similarity search and can handle large volumes of vector data, making them ideal for the retrieval component of RAG systems. In this example, you will utilize [Zilliz Cloud](https://zilliz.com/cloud), the fully-managed Milvus vector database as your vector storage. Zilliz Cloud is available on major cloud such as AWS, GCP and Azure. In this demo you will use Zilliz Cloud deployed on AWS with Arm based servers. For the LLM, you will use the `Llama-3.1-8B` model running on an AWS Arm-based server using `llama.cpp`. +In this Learning Path, you will learn how to build a Retrieval-Augmented Generation (RAG) application on Arm-based servers. RAG applications often use vector databases to efficiently store and retrieve high-dimensional vector representations of text data. Vector databases are optimized for similarity search and can handle large volumes of vector data, making them ideal for the retrieval component of RAG systems. In this example, you will utilize [Zilliz Cloud](https://zilliz.com/cloud), the fully managed Milvus vector database as your vector storage. Zilliz Cloud is available on major cloud computing service providers, for example AWS, GCP, and Azure. In this demo you will use Zilliz Cloud that is deployed on AWS with Arm-based servers. For the LLM, you will use the `Llama-3.1-8B` model running on an AWS Arm-based server using `llama.cpp`. ## Install dependencies -This Learning Path has been tested on an AWS Graviton3 `c7g.2xlarge` instance running Ubuntu 22.04 LTS system. -You need at least four cores and 8GB of RAM to run this example. Configure disk storage up to at least 32 GB. +This Learning Path has been tested on an AWS Graviton3 `c7g.2xlarge` instance running a Ubuntu 22.04 LTS system. +You need at least four cores and 8GB of RAM to run this example. Configure the disk storage up to at least 32 GB. After you launch the instance, connect to it and run the following commands to prepare the environment. From 5a0410a8b2ed51de28995eaad94d6e060a4f54bb Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 16 Oct 2024 12:35:15 +0000 Subject: [PATCH 02/14] Editorial of review questions, --- .../milvus-rag/_review.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_review.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_review.md index 7722e4c24c..fc44dbd984 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_review.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_review.md @@ -12,23 +12,23 @@ review: - questions: question: > - Can Llama3.1 model run on Arm? + Can Meta Llama 3.1 run on Arm? answers: - "Yes" - "No" correct_answer: 1 explanation: > - The Llama-3.1-8B model from Meta can be used on Arm-based servers with llama.cpp. + You can use the Llama 3.1-8B model from Meta on Arm-based servers with llama.cpp. - questions: question: > - Which of the following is true about about Zilliz Cloud? + Which of the following is true about Zilliz Cloud? answers: - - "It is a fully-managed version of Milvus vector database" - - "It is a self-hosted version of Milvus vector database" + - "It is a fully managed version of Milvus vector database." + - "It is a self-hosted version of Milvus vector database." correct_answer: 1 explanation: > - Zilliz Cloud is a fully-managed version of Milvus. + Zilliz Cloud is a fully managed version of Milvus. From 4ddbbd4dcf753e0081cc413a8d2706b2ba71e852 Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Wed, 16 Oct 2024 12:50:47 +0000 Subject: [PATCH 03/14] Removed expanded form of RAG so title fits screen. --- .../servers-and-cloud-computing/milvus-rag/_index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md index bd2c5249ce..3d2442c834 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md @@ -1,9 +1,9 @@ --- -title: Build a Retrieval-Augmented Generation (RAG) application using Zilliz Cloud on Arm servers +title: Build a RAG application using Zilliz Cloud on Arm servers minutes_to_complete: 20 -who_is_this_for: This is an introductory topic for software developers who want to create a RAG application on Arm servers. +who_is_this_for: This is an introductory topic for software developers who want to create a Retrieval-Augmented Generation (RAG) application on Arm servers. learning_objectives: - Create a simple RAG application using Zilliz Cloud. From a35f9f677afbbf0febe9df169e292dff362c8703 Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Thu, 17 Oct 2024 14:26:54 +0000 Subject: [PATCH 04/14] Editorial clean-up. --- .../servers-and-cloud-computing/milvus-rag/_index.md | 4 ++-- .../milvus-rag/offline_data_loading.md | 2 +- .../milvus-rag/prerequisite.md | 10 +++++++--- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md index 3d2442c834..8cd15700f2 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/_index.md @@ -7,11 +7,11 @@ who_is_this_for: This is an introductory topic for software developers who want learning_objectives: - Create a simple RAG application using Zilliz Cloud. - - Launch a LLM service on Arm servers. + - Launch an LLM service on Arm servers. prerequisites: - A basic understanding of a RAG pipeline. - - An AWS Graviton3 c7g.2xlarge instance, or any [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp) from a cloud service provider or an on-premise Arm server. + - An AWS Graviton3 C7g.2xlarge instance, or any [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp) from a cloud service provider or an on-premise Arm server. - A [Zilliz account](https://zilliz.com/cloud), which you can sign up for with a free trial. author_primary: Chen Zhang diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index ddcb7d68b1..a7f3f1ffcb 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -6,7 +6,7 @@ weight: 3 layout: learningpathall --- -In this section, you will learn how to setup a cluster on Zilliz Cloud. You will then learn how to load your private knowledge database into the cluster. +In this section, you will learn how to set up a cluster on Zilliz Cloud. You will then learn how to load your private knowledge database into the cluster. ### Create a dedicated cluster diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md index ff467b0a67..96d965d80e 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md @@ -8,14 +8,18 @@ layout: learningpathall ## Overview -In this Learning Path, you will learn how to build a Retrieval-Augmented Generation (RAG) application on Arm-based servers. RAG applications often use vector databases to efficiently store and retrieve high-dimensional vector representations of text data. Vector databases are optimized for similarity search and can handle large volumes of vector data, making them ideal for the retrieval component of RAG systems. In this example, you will utilize [Zilliz Cloud](https://zilliz.com/cloud), the fully managed Milvus vector database as your vector storage. Zilliz Cloud is available on major cloud computing service providers, for example AWS, GCP, and Azure. In this demo you will use Zilliz Cloud that is deployed on AWS with Arm-based servers. For the LLM, you will use the `Llama-3.1-8B` model running on an AWS Arm-based server using `llama.cpp`. +In this Learning Path, you will learn how to build a Retrieval-Augmented Generation (RAG) application on Arm-based servers. + +RAG applications often use vector databases to efficiently store and retrieve high-dimensional vector representations of text data. Vector databases are optimized for similarity search and can handle large volumes of vector data, making them ideal for the retrieval component of RAG systems. + +In this example, for your vector storage, you will utilize [Zilliz Cloud](https://zilliz.com/cloud), the fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers, for example AWS, GCP, and Azure. In particular, you will use Zilliz Cloud that is deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`. ## Install dependencies -This Learning Path has been tested on an AWS Graviton3 `c7g.2xlarge` instance running a Ubuntu 22.04 LTS system. +This Learning Path has been tested on an AWS Graviton3 `C7g.2xlarge` instance running a Ubuntu 22.04 LTS system. You need at least four cores and 8GB of RAM to run this example. Configure the disk storage up to at least 32 GB. -After you launch the instance, connect to it and run the following commands to prepare the environment. +After you have launched the instance, connect to it and run the following commands to prepare the environment. Install python: From e6729ace3fd62b7c99e3726e7a48dbb537c071b2 Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Thu, 17 Oct 2024 16:31:01 +0000 Subject: [PATCH 05/14] Editorial review --- .../milvus-rag/offline_data_loading.md | 11 +++++------ .../milvus-rag/online_rag.md | 13 +++++-------- 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index a7f3f1ffcb..ce48104d7d 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -5,10 +5,9 @@ weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- +## Create a dedicated cluster -In this section, you will learn how to set up a cluster on Zilliz Cloud. You will then learn how to load your private knowledge database into the cluster. - -### Create a dedicated cluster +In this section, you will learn how to set up a cluster on Zilliz Cloud. Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. @@ -16,7 +15,7 @@ After you register, [create a cluster](https://docs.zilliz.com/docs/create-clust ![cluster](create_cluster.png) -When you select the `Create Cluster` Button, you should see the cluster running in your Default Project. +When you select the **Create Cluster** Button, you should see the cluster running in your Default Project. ![running](running_cluster.png) @@ -24,7 +23,7 @@ When you select the `Create Cluster` Button, you should see the cluster running You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md). {{% /notice %}} -### Create the Collection +## Create the Collection With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster. @@ -63,7 +62,7 @@ You will use inner product distance as the default metric type. For more informa You can now prepare the data to use in this collection. -### Prepare the data +## Prepare the data In this example, you will use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge that is loaded in your RAG dataset/collection. diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md index ced3778b20..f7b674e9e7 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md @@ -5,10 +5,7 @@ weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- - -In this section, you will build the online RAG part of your application. - -### Prepare the embedding model +## Prepare the embedding model In your python script, generate a test embedding and print its dimension and first few elements. @@ -31,7 +28,7 @@ Run the script. The output should look like: ### Retrieve data for a query -You will specify a frequent question about Milvus and then search for the question in the collection and retrieve the semantic top-3 matches. +Now specify a common question about Milvus, search for the question in the collection, retrieving the semantic top 3 matches. Append the code shown below to `zilliz-llm-rag.py`: @@ -77,9 +74,9 @@ Run the script again and the output with the top 3 matches will look like: You are now ready to use the LLM and obtain a RAG response. -For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You don't need to use any API key because it is running locally on your machine. +For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use a API key because it is running locally on your machine. -You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally use the LLM to generate a response based on the prompts. +You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts. Append the code below into `zilliz-llm-rag.py`: @@ -117,7 +114,7 @@ print(response.choices[0].message.content) ``` {{% notice Note %}} -Make sure your llama.cpp server from the previous section is running before you proceed +Make sure your llama.cpp server from the previous section is running before you proceed. {{% /notice %}} Run the script one final time with these changes using `python3 zilliz-llm-rag.py`. The output should look like: From fffb568059df57a717829f58227c05e3566f3afb Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 11:11:50 +0000 Subject: [PATCH 06/14] Editorial tweaks. --- .../milvus-rag/prerequisite.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md index 96d965d80e..e34c9fdaba 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md @@ -1,5 +1,5 @@ --- -title: Install dependencies +title: Overview and Install dependencies weight: 2 ### FIXED, DO NOT MODIFY @@ -12,14 +12,16 @@ In this Learning Path, you will learn how to build a Retrieval-Augmented Generat RAG applications often use vector databases to efficiently store and retrieve high-dimensional vector representations of text data. Vector databases are optimized for similarity search and can handle large volumes of vector data, making them ideal for the retrieval component of RAG systems. -In this example, for your vector storage, you will utilize [Zilliz Cloud](https://zilliz.com/cloud), the fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers, for example AWS, GCP, and Azure. In particular, you will use Zilliz Cloud that is deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`. +In this Learning Path, you will use [Zilliz Cloud](https://zilliz.com/cloud) for your vector storage, which is a fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers; for example, AWS, GCP, and Azure. + +Specifically, you will use Zilliz Cloud deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`. ## Install dependencies -This Learning Path has been tested on an AWS Graviton3 `C7g.2xlarge` instance running a Ubuntu 22.04 LTS system. +This Learning Path has been tested on an AWS Graviton3 `C7g.2xlarge` instance running an Ubuntu 22.04 LTS system. You need at least four cores and 8GB of RAM to run this example. Configure the disk storage up to at least 32 GB. -After you have launched the instance, connect to it and run the following commands to prepare the environment. +After you have launched the instance, connect to it, and run the following commands to prepare the environment. Install python: From 5ac781b790f8971cb9d6b9a173edc0525b86af34 Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 12:42:51 +0000 Subject: [PATCH 07/14] Fixing spelling --- .../milvus-rag/offline_data_loading.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index ce48104d7d..0a11cbb61d 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -11,7 +11,7 @@ In this section, you will learn how to set up a cluster on Zilliz Cloud. Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. -After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retreive the vector data as shown: +After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown: ![cluster](create_cluster.png) From a0de3a7b6f5ebc096637a6d1beb5fc83b0b3d5ef Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 12:45:03 +0000 Subject: [PATCH 08/14] Formality fix --- .../milvus-rag/offline_data_loading.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index 0a11cbb61d..2f164dc717 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -57,7 +57,7 @@ milvus_client.create_collection( ``` This code checks if a collection already exists and drops it if it does. You then, create a new collection with the specified parameters. -If you don't specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values. +If you do not specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values. You will use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating) You can now prepare the data to use in this collection. @@ -73,7 +73,7 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs ``` -You will load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file. +You will load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file. Open `zilliz-llm-rag.py` and append the following code to it: From 41c13e47f070ecc2767feec3f89c9bcf81d2cc51 Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 12:52:47 +0000 Subject: [PATCH 09/14] Editorial --- .../milvus-rag/offline_data_loading.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index 2f164dc717..895d1c46bf 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -11,7 +11,9 @@ In this section, you will learn how to set up a cluster on Zilliz Cloud. Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. -After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown: +After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. + +In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown: ![cluster](create_cluster.png) @@ -37,7 +39,7 @@ milvus_client = MilvusClient( ) ``` -Replace and with the `URI` and `Token` for your running cluster. Refer to [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud for more details. +Replace ** and ** with the `URI` and `Token` for your running cluster. Refer to [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud for further information. Now, append the following code to `zilliz-llm-rag.py` and save the contents: @@ -55,16 +57,16 @@ milvus_client.create_collection( consistency_level="Strong", # Strong consistency level ) ``` -This code checks if a collection already exists and drops it if it does. You then, create a new collection with the specified parameters. +This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters. -If you do not specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values. -You will use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating) +If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values. +You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating) You can now prepare the data to use in this collection. ## Prepare the data -In this example, you will use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge that is loaded in your RAG dataset/collection. +In this example, you will use the FAQ pages from the [Milvus Documentation 2.4.x](https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip) as the private knowledge that is loaded in your RAG dataset. Download the zip file and extract documents to the folder `milvus_docs`. @@ -73,7 +75,7 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs ``` -You will load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file. +Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file. Open `zilliz-llm-rag.py` and append the following code to it: @@ -90,9 +92,9 @@ for file_path in glob("milvus_docs/en/faq/*.md", recursive=True): ``` ### Insert data -You will now prepare a simple but efficient embedding model [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) that can convert the loaded text into embedding vectors. +Now you can prepare a simple but efficient embedding model [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) that can convert the loaded text into embedding vectors. -You will iterate through the text lines, create embeddings, and then insert the data into Milvus. +You can iterate through the text lines, create embeddings, and then insert the data into Milvus. Append and save the code shown below into `zilliz-llm-rag.py`: From f1068ff0d42621609c130d9a768e42f64765a52c Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 14:03:50 +0000 Subject: [PATCH 10/14] Editorial checks --- .../milvus-rag/launch_llm_service.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md index 583d1fba35..5ddae4486c 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md @@ -6,13 +6,13 @@ weight: 4 layout: learningpathall --- -In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your running AWS Arm-based server instance. +### Llama 3.1 model and llama.cpp -### Llama 3.1 model & llama.cpp +In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your AWS Arm-based server instance. The [Llama-3.1-8B model](https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf) from Meta belongs to the Llama 3.1 model family and is free to use for research and commercial purposes. Before you use the model, visit the Llama [website](https://llama.meta.com/llama-downloads/) and fill in the form to request access. -[llama.cpp](https://github.com/ggerganov/llama.cpp) is an open source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`. +[llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`. ### Download and build llama.cpp @@ -33,7 +33,7 @@ Clone the source repository for llama.cpp: git clone https://github.com/ggerganov/llama.cpp ``` -By default, `llama.cpp` builds for CPU only on Linux and Windows. You don't need to provide any extra switches to build it for the Arm CPU that you run it on. +By default, `llama.cpp` builds for CPU only on Linux and Windows. You do not need to provide any extra switches to build it for the Arm CPU that you run it on. Run `make` to build it: From 92e81e41bd8649f58b1a79d241091dd9df3fbecc Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 14:07:09 +0000 Subject: [PATCH 11/14] deleted repeated word --- .../servers-and-cloud-computing/milvus-rag/online_rag.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md index f7b674e9e7..aa6f4b4ffc 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md @@ -65,7 +65,7 @@ Run the script again and the output with the top 3 matches will look like: 0.5974207520484924 ], [ - "What is the maximum dataset size Milvus can handle?\n\n \nTheoretically, the maximum dataset size Milvus can handle is determined by the hardware it is run on, specifically system memory and storage:\n\n- Milvus loads all specified collections and partitions into memory before running queries. Therefore, memory size determines the maximum amount of data Milvus can query.\n- When new entities and and collection-related schema (currently only MinIO is supported for data persistence) are added to Milvus, system storage determines the maximum allowable size of inserted data.\n\n###", + "What is the maximum dataset size Milvus can handle?\n\n \nTheoretically, the maximum dataset size Milvus can handle is determined by the hardware it is run on, specifically system memory and storage:\n\n- Milvus loads all specified collections and partitions into memory before running queries. Therefore, memory size determines the maximum amount of data Milvus can query.\n- When new entities and collection-related schema (currently only MinIO is supported for data persistence) are added to Milvus, system storage determines the maximum allowable size of inserted data.\n\n###", 0.5833579301834106 ] ] From d87580220d36b6d511f724ea8e97644637ef8ada Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 14:16:04 +0000 Subject: [PATCH 12/14] Editorial tweaks --- .../milvus-rag/online_rag.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md index aa6f4b4ffc..c2d4107fbf 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md @@ -5,11 +5,11 @@ weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Prepare the embedding model +## Prepare the Embedding Model -In your python script, generate a test embedding and print its dimension and first few elements. +In your Python script, generate a test embedding and print its dimension and the first few elements. -For the LLM, you will use the OpenAI SDK to request the Llama service launched before. You don't need to use any API key because it is running locally on your machine. +For the LLM, you will use the OpenAI SDK to request the Llama service that you launched previously. You do not need to use an API key because it is running locally on your machine. Append the code below to `zilliz-llm-rag.py`: @@ -28,7 +28,7 @@ Run the script. The output should look like: ### Retrieve data for a query -Now specify a common question about Milvus, search for the question in the collection, retrieving the semantic top 3 matches. +Now specify a common question about Milvus, and search for the question in the collection, in order to retrieve the top 3 semantic matches. Append the code shown below to `zilliz-llm-rag.py`: @@ -52,7 +52,7 @@ retrieved_lines_with_distances = [ ] print(json.dumps(retrieved_lines_with_distances, indent=4)) ``` -Run the script again and the output with the top 3 matches will look like: +Run the script again, and the output with the top 3 matches should look like: ```output [ @@ -70,13 +70,13 @@ Run the script again and the output with the top 3 matches will look like: ] ] ``` -### Use LLM to get a RAG response +### Use the LLM to obtain a RAG response You are now ready to use the LLM and obtain a RAG response. -For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use a API key because it is running locally on your machine. +For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use an API key because it is running locally on your machine. -You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts. +You will then convert the retrieved documents in to a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts. Append the code below into `zilliz-llm-rag.py`: From 46ea4011d806a6febefd272e31555fdd8605c74d Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 15:40:56 +0000 Subject: [PATCH 13/14] Editorial --- .../milvus-rag/launch_llm_service.md | 16 +++++++-------- .../milvus-rag/offline_data_loading.md | 20 +++++++++---------- .../milvus-rag/prerequisite.md | 2 +- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md index 5ddae4486c..341cfa7d62 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md @@ -6,18 +6,18 @@ weight: 4 layout: learningpathall --- -### Llama 3.1 model and llama.cpp +### Llama 3.1 Model and Llama.cpp In this section, you will build and run the `llama.cpp` server program using an OpenAI-compatible API on your AWS Arm-based server instance. The [Llama-3.1-8B model](https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf) from Meta belongs to the Llama 3.1 model family and is free to use for research and commercial purposes. Before you use the model, visit the Llama [website](https://llama.meta.com/llama-downloads/) and fill in the form to request access. -[llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`. +[Llama.cpp](https://github.com/ggerganov/llama.cpp) is an open-source C/C++ project that enables efficient LLM inference on a variety of hardware - both locally, and in the cloud. You can conveniently host a Llama 3.1 model using `llama.cpp`. -### Download and build llama.cpp +### Download and build Llama.cpp -Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building llama.cpp from source: +Run the following commands to install make, cmake, gcc, g++, and other essential tools required for building Llama.cpp from source: ```bash sudo apt install make cmake -y @@ -27,7 +27,7 @@ sudo apt install build-essential -y You are now ready to start building `llama.cpp`. -Clone the source repository for llama.cpp: +Clone the source repository for Llama.cpp: ```bash git clone https://github.com/ggerganov/llama.cpp @@ -64,7 +64,7 @@ You can now download the model using the huggingface cli: ```bash huggingface-cli download cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf dolphin-2.9.4-llama3.1-8b-Q4_0.gguf --local-dir . --local-dir-use-symlinks False ``` -The GGUF model format, introduced by the llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference. +The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference. ### Re-quantize the model weights @@ -91,10 +91,10 @@ Start the server from the command line, and it listens on port 8080: The output from this command should look like: ```output -'main: server is listening on 127.0.0.1:8080 - starting the main loop +main: server is listening on 127.0.0.1:8080 - starting the main loop ``` -You can also adjust the parameters of the launched LLM to adapt it to your server hardware to obtain ideal performance. For more parameter information, see the `llama-server --help` command. +You can also adjust the parameters of the launched LLM to adapt it to your server hardware to achieve an ideal performance. For more parameter information, see the `llama-server --help` command. You have started the LLM service on your AWS Graviton instance with an Arm-based CPU. In the next section, you will directly interact with the service using the OpenAI SDK. diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index 895d1c46bf..d2c255e80a 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -7,29 +7,29 @@ layout: learningpathall --- ## Create a dedicated cluster -In this section, you will learn how to set up a cluster on Zilliz Cloud. +In this section, you will set up a cluster on Zilliz Cloud. Begin by [registering](https://docs.zilliz.com/docs/register-with-zilliz-cloud) for a free account on Zilliz Cloud. -After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster) on Zilliz Cloud. +After you register, [create a cluster](https://docs.zilliz.com/docs/create-cluster). -In this Learning Path, you will create a dedicated cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown: +Now create a **Dedicated** cluster deployed in AWS using Arm-based machines to store and retrieve the vector data as shown: ![cluster](create_cluster.png) -When you select the **Create Cluster** Button, you should see the cluster running in your Default Project. +When you select the **Create Cluster** Button, you should see the cluster running in your **Default Project**. ![running](running_cluster.png) {{% notice Note %}} -You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about Milvus installation, please refer to the [installation documentation](https://milvus.io/docs/install-overview.md). +You can use self-hosted Milvus as an alternative to Zilliz Cloud. This option is more complicated to set up. You can also deploy [Milvus Standalone](https://milvus.io/docs/install_standalone-docker-compose.md) and [Kubernetes](https://milvus.io/docs/install_cluster-milvusoperator.md) on Arm-based machines. For more information about installing Milvus, see the [Milvus installation documentation](https://milvus.io/docs/install-overview.md). {{% /notice %}} ## Create the Collection -With the dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster. +With the Dedicated cluster running in Zilliz Cloud, you are now ready to create a collection in your cluster. -Within your activated python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it: +Within your activated Python virtual environment `venv`, start by creating a file named `zilliz-llm-rag.py`, and copy the contents below into it: ```python from pymilvus import MilvusClient @@ -59,7 +59,7 @@ milvus_client.create_collection( ``` This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters. -If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values. +If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema defined fields and their values. You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating) You can now prepare the data to use in this collection. @@ -116,10 +116,10 @@ for i, (line, embedding) in enumerate( milvus_client.insert(collection_name=collection_name, data=data) ``` -Run the python script, to check that you have successfully created the embeddings on the data you loaded into the RAG collection: +Run the Python script, to check that you have successfully created the embeddings on the data you loaded into the RAG collection: ```bash -python3 python3 zilliz-llm-rag.py +python3 zilliz-llm-rag.py ``` The output should look like: diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md index e34c9fdaba..1008493283 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/prerequisite.md @@ -14,7 +14,7 @@ RAG applications often use vector databases to efficiently store and retrieve hi In this Learning Path, you will use [Zilliz Cloud](https://zilliz.com/cloud) for your vector storage, which is a fully managed Milvus vector database. Zilliz Cloud is available on major cloud computing service providers; for example, AWS, GCP, and Azure. -Specifically, you will use Zilliz Cloud deployed on AWS with Arm-based servers. For the LLM, you will use the Llama-3.1-8B model running on an AWS Arm-based server using `llama.cpp`. +Here, you will use Zilliz Cloud deployed on AWS with an Arm-based server. For the LLM, you will use the Llama-3.1-8B model also running on an AWS Arm-based server, but using `llama.cpp`. ## Install dependencies From fccb522394054b9441be0e380548c63eeea5702d Mon Sep 17 00:00:00 2001 From: Madeline Underwood <167196745+madeline-underwood@users.noreply.github.com> Date: Fri, 18 Oct 2024 15:51:01 +0000 Subject: [PATCH 14/14] Editorial --- .../milvus-rag/launch_llm_service.md | 10 +++++----- .../milvus-rag/offline_data_loading.md | 4 ++-- .../milvus-rag/online_rag.md | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md index 341cfa7d62..aa75e24c51 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/launch_llm_service.md @@ -1,5 +1,5 @@ --- -title: Launch LLM Server +title: Launch the LLM Server weight: 4 ### FIXED, DO NOT MODIFY @@ -67,20 +67,20 @@ huggingface-cli download cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf do The GGUF model format, introduced by the Llama.cpp team, uses compression and quantization to reduce weight precision to 4-bit integers, significantly decreasing computational and memory demands and making Arm CPUs effective for LLM inference. -### Re-quantize the model weights +### Requantize the model weights -To re-quantize the model, run: +To requantize the model, run: ```bash ./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8 ``` -This will output a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support. +This outputs a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support. This requantization is optimal specifically for Graviton3. For Graviton2, the optimal requantization should be performed in the `Q4_0_4_4` format, and for Graviton4, the `Q4_0_4_8` format is the most suitable for requantization. ### Start the LLM Server -You can utilize the `llama.cpp` server program and send requests via an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network. +You can utilize the `llama.cpp` server program and send requests through an OpenAI-compatible API. This allows you to develop applications that interact with the LLM multiple times without having to repeatedly start and stop it. Additionally, you can access the server from another machine where the LLM is hosted over the network. Start the server from the command line, and it listens on port 8080: diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md index d2c255e80a..0299590493 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/offline_data_loading.md @@ -60,7 +60,7 @@ milvus_client.create_collection( This code checks if a collection already exists and drops it if it does. If this happens, you can create a new collection with the specified parameters. If you do not specify any field information, Milvus automatically creates a default `id` field for the primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema defined fields and their values. -You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating) +You can use inner product distance as the default metric type. For more information about distance types, you can refer to [Similarity Metrics page](https://milvus.io/docs/metric.md?tab=floating). You can now prepare the data to use in this collection. @@ -75,7 +75,7 @@ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/m unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs ``` -Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file, which can separate the content of each main part of the markdown file. +Now load all the markdown files from the folder `milvus_docs/en/faq` into your data collection. For each document, use "# " to separate the content in the file. This divides the content of each main part of the markdown file. Open `zilliz-llm-rag.py` and append the following code to it: diff --git a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md index c2d4107fbf..a3622b92c1 100644 --- a/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md +++ b/content/learning-paths/servers-and-cloud-computing/milvus-rag/online_rag.md @@ -76,7 +76,7 @@ You are now ready to use the LLM and obtain a RAG response. For the LLM, you will use the OpenAI SDK to request the Llama service you launched in the previous section. You do not need to use an API key because it is running locally on your machine. -You will then convert the retrieved documents in to a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts. +You will then convert the retrieved documents into a string format. Define system and user prompts for the Language Model. This prompt is assembled with the retrieved documents from Milvus. Finally, use the LLM to generate a response based on the prompts. Append the code below into `zilliz-llm-rag.py`: