argilla-io · sdiazlor · May 13, 2024 · May 8, 2024 · May 8, 2024 · May 8, 2024
diff --git a/docs/_source/_static/tutorials/llama_index/argilla-ui-dataset.png b/docs/_source/_static/tutorials/llama_index/argilla-ui-dataset.png
diff --git a/docs/_source/index.rst b/docs/_source/index.rst
@@ -66,5 +66,4 @@
    Github <https://github.com/argilla-io/argilla>
    community/developer_docs
    community/contributing
-   community/migration-rubrix.md
-
+   community/migration-rubrix.md
diff --git a/docs/_source/tutorials_and_integrations/integrations/integrations.md b/docs/_source/tutorials_and_integrations/integrations/integrations.md
@@ -30,6 +30,11 @@ Add text descriptives to your metadata to simplify the data annotation and filte
 
 Add semantic representations to your records using vector embeddings to simplify the data annotation and search process.
 ```
+```{grid-item-card} llama-index: Build LLM applications with LlamaIndex.
+:link: llama_index.html
+
+Build LLM applications with LlamaIndex and automatically log and monitor the predictions with Argilla.
+```
 ````
 
 ```{toctree}
@@ -40,4 +45,5 @@ process_documents_with_unstructured
 monitor_endpoints with_fastapi
 add_text_descriptives_as_metadata
 add_sentence_transformers_embeddings_as_vectors
+llama_index
 ```
diff --git a/docs/_source/tutorials_and_integrations/integrations/llama_index.ipynb b/docs/_source/tutorials_and_integrations/integrations/llama_index.ipynb
@@ -0,0 +1,219 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# `llamaindex`: Build LLM applications with LlamaIndex and monitor the data with Argilla.\n",
+    "\n",
+    "This integration allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow.\n",
+    "\n",
+    "LlamaIndex is a specialized data framework tailored for supporting LLM-driven application development. It provides a sophisticated structure enabling developers to seamlessly integrate various data sources with large language models. These sources encompass diverse file formats like PDFs and PowerPoints, popular applications such as Notion and Slack, and databases like Postgres and MongoDB. Through a range of connectors, the framework streamlines data ingestion, facilitating smooth interactions with LLMs. Additionally, LlamaIndex offers an efficient interface for data retrieval and queries. This functionality allows developers to input LLM prompts and receive context-rich, knowledge-enhanced outputs.\n",
+    "\n",
+    "In essence, LlamaIndex acts as an intermediary for managing interactions with an LLM by constructing an index from input data. This index is then leveraged to answer queries related to the provided data. LlamaIndex is flexible, capable of generating different types of indexes—vector, tree, list, or keyword indexes—tailored to specific requirements.\n",
+    "\n",
+    "LlamaIndex offers a wide array of tools that facilitate the processes of data ingestion, retrievals, structuring, and integration with diverse application frameworks.\n",
+    "Don't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Running Argilla\n",
+    "\n",
+    "For this tutorial, you will need to have an Argilla server running. There are two main options for deploying and running Argilla:\n",
+    "\n",
+    "\n",
+    "**Deploy Argilla on Hugging Face Spaces**: If you want to run tutorials with external notebooks (e.g., Google Colab) and you have an account on Hugging Face, you can deploy Argilla on Spaces with a few clicks:\n",
+    "\n",
+    "[![deploy on spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg)](https://huggingface.co/new-space?template=argilla/argilla-template-space)\n",
+    "\n",
+    "For details about configuring your deployment, check the [official Hugging Face Hub guide](https://huggingface.co/docs/hub/spaces-sdks-docker-argilla).\n",
+    "\n",
+    "\n",
+    "**Launch Argilla using Argilla's quickstart Docker image**: This is the recommended option if you want [Argilla running on your local machine](../../getting_started/quickstart.ipynb). Note that this option will only let you run the tutorial locally and not with an external notebook service.\n",
+    "\n",
+    "For more information on deployment options, please check the Deployment section of the documentation.\n",
+    "\n",
+    "<div class=\"alert alert-info\">\n",
+    "\n",
+    "Tip\n",
+    "    \n",
+    "This tutorial is a Jupyter Notebook. There are two options to run it:\n",
+    "\n",
+    "- Use the Open in Colab button at the top of this page. This option allows you to run the notebook directly on Google Colab. Don't forget to change the runtime type to GPU for faster model training and inference.\n",
+    "- Download the .ipynb file by clicking on the View source link at the top of the page. This option allows you to download the notebook and run it on your local machine or on a Jupyter notebook tool of your choice.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Getting started\n",
+    "\n",
+    "You first need to install argilla and argilla-llama-index as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install argilla-llama-index"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You will need to an Argilla Server running to monitor the LLM. You can either install the server locally or have it on HuggingFace Spaces. For a complete guide on how to install and initialize the server, you can refer to the [Quickstart Guide](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Usage\n",
+    "\n",
+    "It requires just a simple step to log your data into Argilla within your LlamaIndex workflow. We just need to call the handler before starting production with your LLM. We will use GPT3.5 from OpenAI as our LLM. For this, you will need a valid API key from OpenAI. You can have more info and get one via [this link](https://openai.com/blog/openai-api). After you get your API key, the easiest way to import it is through an environment variable, or via *getpass()*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "openai_api_key = os.getenv(\"OPENAI_API_KEY\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now write all the necessary imports and initialize the Argilla client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader, set_global_handler\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "\n",
+    "import argilla as rg\n",
+    "\n",
+    "rg.init(\n",
+    "    api_url=\"http://localhost:6900\",\n",
+    "    api_key=\"owner.apikey\",\n",
+    "    workspace=\"admin\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we will set up an Argilla global handler for Llama Index. By doing so, we ensure that the predictions that we obtain using Llama Index is automatically uploaded to the Argilla client we initialized before Within the handler, we need to provide the dataset name that we will use. If the dataset does not exist, it will be created with the given name. You can also set the `API KEY`, `API URL`, and the `workspace` name. You can learn more about the variables that controls Argilla initialization [here](https://docs.argilla.io/en/latest/getting_started/installation/configurations/workspace_management.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "set_global_handler(\"argilla\", dataset_name=\"query_model\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now create the `llm` instance, using GPT-3.5 from OpenAI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = OpenAI(model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=openai_api_key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With the code snippet below, you can create a basic workflow with LlamaIndex. You will also need a `txt` file as the data source within a folder `data`. We have an example `.txt` file ready for you inside that folder, obtained from the [Llama Index documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "service_context = ServiceContext.from_defaults(llm=llm)\n",
+    "docs = SimpleDirectoryReader(\"../../data\").load_data()\n",
+    "index = VectorStoreIndex.from_documents(docs, service_context=service_context)\n",
+    "query_engine = index.as_query_engine()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let's run the `query_engine` to have a response from the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = query_engine.query(\"What did the author do growing up?\")\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As the global handler was set, an Argilla dataset is created, with the dataset name that we introduced as parameter. Everything predicted using the `query` function will be automatically logged in this dataset, with information about the steps to produce the prediction. The prompt given and the response are logged, as you can see on this example of Argilla's UI:\n",
+    "\n",
+    "![Argilla Dataset](../../_static/tutorials/llama_index/argilla-ui-dataset.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}