# Memoire Python tutorial

This tutorial will teach you the basics of Memoire for your RAG or document retrieval/search pipeline.

Before going ahead, ensure that the Memoire docker container is running smoothly ([you can find how to on the readme of our GitHub here](https://github.com/A-star-logic/memoire)).

## **1. Import required libraries**
First, we'll set the app url and the API key in memory, and import an HTTP request library.

In [1]:
import requests
import json

API_KEY = "abc123" # secure me
MEMOIRE_URL = 'http://localhost:3003'

## **2. Define helper functions**
Next, we define some helper functions for convenience

In [None]:
def index_documents(documents):
    return requests.post(
        MEMOIRE_URL + "/memoire/ingest/urls",
        headers = {"Authorization": f"Bearer {API_KEY}"},
        json = {
            "documents": documents
        }
    )

def search(query):
    return requests.post(
        MEMOIRE_URL + "/memoire/search",
        headers = {"Authorization": f"Bearer {API_KEY}"},
        json = {
            "operationMode": 'speed',
            "maxResults": 3,
            "query": query
        }
    )

## **3. Ingest Documents**
Now this is time to ingest some documents.

In this example, we index cooking recipes from a few websites and MS Word documents.

In [3]:
documents = [
    {
        "documentID": "document1",
        "url": "https://raw.githubusercontent.com/A-star-logic/memoire/refs/heads/main/src/parser/tests/sampleFiles/test.txt"
    },
    {
        "documentID": "def-456",
        "url": "https://github.com/A-star-logic/memoire/raw/refs/heads/main/src/parser/tests/sampleFiles/test.docx"
    },
    {
        "documentID": "def-789",
        "url": "https://github.com/A-star-logic/memoire/raw/refs/heads/main/src/parser/tests/sampleFiles/test.csv"
    }
]


response = index_documents(documents)
response.json()

{'message': 'ok'}

## **4. Search retrieval**
Finally, we can do our retrieval.

In [None]:
search_response = search("text")

print(json.dumps(search_response.json(), indent=2))

{
  "results": [
    {
      "content": "text for tests\nand another line ",
      "documentID": "document1",
      "score": 0.016393442622950817,
      "highlights": "text for tests\nand another line ",
      "metadata": {}
    },
    {
      "content": "text,list,region\r\n\"gravida. Aliquam tincidunt, nunc ac mattis ornare, lectus ante dictum mi, ac mattis velit justo nec ante. Maecenas mi felis, adipiscing fringilla, porttitor vulputate, posuere vulputate, lacus. Cras interdum. Nunc sollicitudin commodo ipsum. Suspendisse non leo. Vivamus nibh dolor, nonummy ac, feugiat non, lobortis quis, pede. Suspendisse dui. Fusce diam nunc, ullamcorper eu, euismod ac, fermentum vel, mauris. Integer sem elit, pharetra ut, pharetra sed, hendrerit a, arcu. Sed et libero. Proin mi. Aliquam gravida mauris ut mi. Duis risus odio, auctor vitae, aliquet nec, imperdiet nec, leo. Morbi neque tellus, imperdiet non, vestibulum nec, euismod in, dolor. Fusce feugiat.\",7,Emilia-Romagna\r\n\"vestibulum, nequ

## Interpreting the results & use them correctly

Great, you've got the results, you can now pipe the output to an LLM, or directly to your user.

Due to hybrid search, we have to present our results in a "quirky" way, especially when you are used to traditional search.

- The `content` is for the original document. This field will always be present since we save the original documents. However, it may be too big for an LLM (imagine a 30-page long Word document).
- The `highlights` is for the current chunk of the document. This field will have the most contextual relevance to your query. However, in very rare cases, it could be empty.
- Some documents may appear multiple times (and that's a good sign), but with different highlights.

How does it translate to your application? You need to use the highlight of the top documents, and fallback on the content only if the highlight is not available.