# Best Practices for Building a Modern App with Vector Search

This notebook demonstrates how to build a modern LLM-powered application using:

- **Jina Embeddings v3** via Elastic Inference Service (EIS) - GPU-accelerated multilingual embeddings
- **Elasticsearch 9.3+** for vector storage and semantic search
- **Agent Builder** for creating AI agents that can query your data

## What You'll Learn

1. Setting up inference endpoints for embeddings
2. Creating optimized indices for vector search
3. Ingesting data with automatic embedding generation
4. Performing semantic searches
5. Building an AI agent with Agent Builder

## Setup and Configuration

In [1]:
# Install required packages
!pip install elasticsearch requests dotenv -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.3[0m[39;49m -> [0m[32;49m26.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
import json
import requests
from elasticsearch import Elasticsearch
from dotenv import load_dotenv

load_dotenv()

# Elasticsearch configuration
ELASTICSEARCH_URL = os.getenv("ELASTICSEARCH_URL")
ELASTIC_API_KEY = os.getenv("ELASTICSEARCH_API_KEY")
KIBANA_URL = os.getenv("KIBANA_URL")

# Initialize Elasticsearch client
es = Elasticsearch(ELASTICSEARCH_URL, api_key=ELASTIC_API_KEY)

## Create Inference Endpoint for Embeddings

We'll use **Jina Embeddings v3** through Elastic Inference Service (EIS). This model:

- Is multilingual out of the box
- Runs on Elastic's GPU infrastructure (no ML nodes needed)

### Best Practice: Use EIS for Production Workloads
EIS eliminates the need to manage ML infrastructure while providing GPU-accelerated performance.

In [3]:
INFERENCE_ENDPOINT_ID = "jina-embeddings-v3"

# Create the inference endpoint for Jina Embeddings v3
inference_config = {
    "service": "elastic",
    "service_settings": {"model_id": "jina-embeddings-v3"},
}

try:
    response = es.inference.put(
        inference_id=INFERENCE_ENDPOINT_ID,
        task_type="text_embedding",
        body=inference_config,
    )

    print(f"Created inference endpoint: {INFERENCE_ENDPOINT_ID}")
    print(json.dumps(response.body, indent=2))
except Exception as e:
    print(f"Error: {e}")

Error: Connection timed out


In [4]:
# Test the inference endpoint
test_response = es.inference.inference(
    inference_id=INFERENCE_ENDPOINT_ID,
    body={
        "input": "Elasticsearch is a distributed search and analytics engine.",
        "input_type": "ingest",
    },
)

embedding = test_response.body["text_embedding"][0]["embedding"]
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Embedding dimensions: 1024
First 5 values: [0.09668593, -0.0195286, 0.03599555, -0.04717219, 0.07873663]


## Create an Optimized Index for Vector Search

### Best Practices for Index Design:
1. **Use `semantic_text` field type** - Automatically handles chunking and embedding generation
2. **Use `copy_to` pattern** - Keep original fields for BM25 and copy content to a dedicated semantic field
3. **Consider field lengths** - Jina v3 performs optimally with 2048-4096 tokens


In [5]:
INDEX_NAME = "tech-articles"

# Create index with semantic_text field using copy_to pattern
# Best Practice: Keep original fields for BM25 search and use copy_to for semantic search
index_mappings = {
    "mappings": {
        "properties": {
            "title": {"type": "text", "copy_to": "semantic_field"},
            "content": {"type": "text", "copy_to": "semantic_field"},
            "category": {"type": "keyword"},
            "published_date": {"type": "date"},
            "semantic_field": {
                "type": "semantic_text",
                "inference_id": INFERENCE_ENDPOINT_ID,
            },
        }
    }
}

# Only create the index if it doesn't exist
if not es.indices.exists(index=INDEX_NAME):
    response = es.indices.create(index=INDEX_NAME, body=index_mappings)
    print(f"Created index: {INDEX_NAME}")
    print(json.dumps(response.body, indent=2))
else:
    print(f"Index '{INDEX_NAME}' already exists, skipping creation")

Index 'tech-articles' already exists, skipping creation


## Ingest Sample Data

### Best Practice: Use Bulk Operations

In [6]:
from elasticsearch import helpers


def build_data(json_file, index_name):
    """Generator function to yield documents for bulk indexing."""
    with open(json_file, "r") as f:
        data = json.load(f)

    for doc in data:
        yield {"_index": index_name, "_source": doc}


# Bulk index the documents from JSON file
try:
    success, failed = helpers.bulk(
        es,
        build_data("dataset.json", INDEX_NAME),
    )
    print(f"{success} documents indexed successfully")

    if failed:
        print(f"Errors: {failed}")
except Exception as e:
    print(f"Error: {str(e)}")

5 documents indexed successfully


In [7]:
# Check document count
count = es.count(index=INDEX_NAME)
print(f"Total documents in index: {count.body['count']}")

Total documents in index: 5


## Semantic Search

### Best Practice: Use `match` Query on `semantic_text` Fields
The `match` query automatically detects `semantic_text` fields and handles embedding generation for your search query.

In [8]:
def semantic_search(query: str, size: int = 3):
    """Perform semantic search on the tech-articles index."""
    response = es.search(
        index=INDEX_NAME,
        body={
            "query": {"match": {"semantic_field": {"query": query}}},
            "size": size,
            "_source": ["title", "category", "content"],
        },
    )

    print(f"Query: '{query}'\n")
    print(f"Found {response.body['hits']['total']['value']} results:\n")

    for hit in response.body["hits"]["hits"]:
        print(f"Score: {hit['_score']:.4f}")
        print(f"Title: {hit['_source']['title']}")
        print(f"Category: {hit['_source']['category']}")
        print(f"Content: {hit['_source']['content'][:150]}...\n")

    return response

In [9]:
# Test semantic search
semantic_search("How do I implement similarity search in my application?")

Query: 'How do I implement similarity search in my application?'

Found 5 results:

Score: 0.6704
Title: Introduction to Vector Databases
Category: databases
Content: Vector databases are specialized systems designed to store and query high-dimensional vectors efficiently. They enable similarity search, which is fun...

Score: 0.6694
Title: Building RAG Applications with Elasticsearch
Category: ai
Content: Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval. Elasticsearch provides an excellen...

Score: 0.6445
Title: Embeddings: The Foundation of Semantic Search
Category: ai
Content: Embeddings are numerical representations of text that capture semantic meaning. Modern embedding models like Jina v3 can process text in multiple lang...



ObjectApiResponse({'took': 209, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 5, 'relation': 'eq'}, 'max_score': 0.67035043, 'hits': [{'_index': 'tech-articles', '_id': 'oyBuNJwBTUn0vNv4PgWW', '_score': 0.67035043, '_source': {'title': 'Introduction to Vector Databases', 'content': 'Vector databases are specialized systems designed to store and query high-dimensional vectors efficiently. They enable similarity search, which is fundamental for AI applications like recommendation systems, image search, and semantic text search. Unlike traditional databases that use exact matching, vector databases find the most similar items based on distance metrics like cosine similarity or Euclidean distance.', 'category': 'databases'}}, {'_index': 'tech-articles', '_id': 'pCBuNJwBTUn0vNv4PgWW', '_score': 0.6693985, '_source': {'title': 'Building RAG Applications with Elasticsearch', 'content': 'Retrieval-Augmented Generation (RAG)

## Hybrid Search (BM25 + Semantic)

### Best Practice: Combine Lexical and Semantic Search
Hybrid search gives you the best of both worlds - exact keyword matching and semantic understanding.

In [10]:
def hybrid_search(query: str, size: int = 3):
    """Perform hybrid search combining BM25 and semantic search."""
    response = es.search(
        index=INDEX_NAME,
        body={
            "retriever": {
                "rrf": {
                    "retrievers": [
                        {
                            "standard": {
                                "query": {
                                    "multi_match": {
                                        "query": query,
                                        "fields": [
                                            "title^2",
                                            "content",
                                        ],  # BM25 on original text fields
                                    }
                                }
                            }
                        },
                        {
                            "standard": {
                                "query": {"match": {"semantic_field": {"query": query}}}
                            }
                        },
                    ],
                    "rank_window_size": 50,
                    "rank_constant": 20,
                }
            },
            "size": size,
            "_source": ["title", "category"],
        },
    )

    print(f"Hybrid Search Query: '{query}'\n")

    for hit in response.body["hits"]["hits"]:
        print(
            f"Score: {hit['_score']:.4f} | {hit['_source']['title']} [{hit['_source']['category']}]"
        )

    return response

In [11]:
hybrid_search("What are the best practices for semantic search in Elasticsearch?")

Hybrid Search Query: 'What are the best practices for semantic search in Elasticsearch?'

Score: 0.0931 | Scaling Elasticsearch for Production [infrastructure]
Score: 0.0893 | Embeddings: The Foundation of Semantic Search [ai]
Score: 0.0889 | Building RAG Applications with Elasticsearch [ai]


ObjectApiResponse({'took': 289, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 5, 'relation': 'eq'}, 'max_score': 0.09307359, 'hits': [{'_index': 'tech-articles', '_id': 'piBuNJwBTUn0vNv4PgWW', '_score': 0.09307359, '_source': {'title': 'Scaling Elasticsearch for Production', 'category': 'infrastructure'}}, {'_index': 'tech-articles', '_id': 'pSBuNJwBTUn0vNv4PgWW', '_score': 0.08928572, '_source': {'title': 'Embeddings: The Foundation of Semantic Search', 'category': 'ai'}}, {'_index': 'tech-articles', '_id': 'pCBuNJwBTUn0vNv4PgWW', '_score': 0.08893281, '_source': {'title': 'Building RAG Applications with Elasticsearch', 'category': 'ai'}}]}})

## Create an AI Agent with Agent Builder

Agent Builder uses LLMs to power agent reasoning and decision-making.

### Default vs Custom LLM
- **Elastic Managed LLM** (default): Available out-of-the-box on Elastic Cloud. No configuration or API keys needed.
- **Custom LLM**: You can configure third-party providers (OpenAI, Azure, Anthropic, etc.) using connectors.

### SSL Certificate Verification
For production environments, use `verify=True` when making HTTPS requests:
- **Elastic Cloud**: Certificates are valid by default, no additional configuration needed.
- **Self-managed**: You may need to provide the CA certificate path (e.g., `verify="/path/to/ca.crt"`).

In [None]:
# Create an Agent using the default Elastic Managed LLM

headers = {
    "kbn-xsrf": "true",
    "Authorization": f"ApiKey {ELASTIC_API_KEY}",
    "Content-Type": "application/json",
}

agent_payload = {
    "id": "tech-articles-assistant",
    "name": "Tech Articles Assistant",
    "description": "An AI assistant that helps users find information about technology topics from our knowledge base.",
    "configuration": {
        "tools": [{"tool_ids": ["platform.core.search", "platform.core.execute_esql"]}],
        "instructions": f"""You are a helpful assistant that answers questions about technology topics.

Use the search tool to find relevant articles from the '{INDEX_NAME}' index.
When searching, prefer semantic search for natural language questions.
Always cite the article titles when providing information.
If you cannot find relevant information, say so clearly.""",
    },
}

response = requests.post(
    f"{KIBANA_URL}/api/agent_builder/agents",
    headers=headers,
    json=agent_payload,
    verify=True,
)

if response.status_code == 200:
    agent_data = response.json()
    agent_id = agent_data.get("id")
    print(f"Created agent: {agent_id}")
    print(json.dumps(agent_data, indent=2))
else:
    print(f"Error creating agent: {response.text}")
    agent_id = None

Error creating agent: {"statusCode":400,"error":"Bad Request","message":"Agent with id tech-articles-assistant already exists","attributes":{}}


In [12]:
# Chat with the agent
def chat_with_agent(agent_id: str, message: str):
    """Send a message to the agent and get a response."""
    chat_payload = {"input": message, "agent_id": agent_id}

    response = requests.post(
        f"{KIBANA_URL}/api/agent_builder/converse",
        headers=headers,
        json=chat_payload,
        verify=True,
    )

    if response.status_code == 200:
        return response.json()
    else:
        return {"error": response.text, "status_code": response.status_code}

In [None]:
# Example conversation
if agent_id:
    result = chat_with_agent(
        agent_id, "What are the best practices for building RAG applications?"
    )
    print(json.dumps(result, indent=2))

In [None]:
# es.indices.delete(index=INDEX_NAME)
# es.inference.delete(inference_id=INFERENCE_ENDPOINT_ID)