# Elasticsearch Inference API & Hugging Face

This notebook demonstrates how to use Hugging Face completions along with the Elasticsearch Inference API. This notebook is based on the article [Inference API & and Hugging Face](https://www.elastic.co/search-labs/blog/inference-api-and-hugging-face).

In [26]:
%pip install requests elasticsearch -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/opt/python@3.11/bin/python3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Installing dependencies and importing packages

In [121]:
import os
import json
import requests

from elasticsearch import Elasticsearch, helpers
from getpass import getpass

## Setting up environment variables

In [None]:
# os.environ["HUGGING_FACE_INFERENCE_ENDPOINT_URL"] = getpass(
#     "Enter your Hugging Face Inference Endpoint URL: "
# )
# os.environ["ELASTICSEARCH_API_KEY"] = getpass("Enter your Elasticsearch API key: ")
# os.environ["ELASTICSEARCH_URL"] = getpass("Enter your Elasticsearch URL: ")
# os.environ["HUGGING_FACE_API_KEY"] = getpass("Enter your Hugging Face API key: ")


INDEX_NAME = "company-blog-posts"
INFERENCE_ENDPOINT_ID = "hugging-face-gpt-oss-safeguard"

## Elasticsearch Python client

In [123]:
es_client = Elasticsearch(
    os.environ["ELASTICSEARCH_URL"], api_key=os.environ["ELASTICSEARCH_API_KEY"]
)

## Hugging Face completions inference endpoint setup

Let’s create an Elasticsearch inference endpoint that uses the Hugging Face model. This endpoint will be used for chat completions, and this service will help us generate responses based on the dataset. To create this endpoint we’re using the [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).

In [124]:
try:
    resp = es_client.inference.put(
        task_type="chat_completion",
        inference_id=INFERENCE_ENDPOINT_ID,
        body={
            "service": "hugging_face",
            "service_settings": {
                "api_key": os.environ["HUGGING_FACE_API_KEY"],
                "url": os.environ["HUGGING_FACE_INFERENCE_ENDPOINT_URL"],
            },
        },
    )

    print(
        "Chat completion inference endpoint created successfully:", resp["inference_id"]
    )
except Exception as e:
    print("Error creating chat completion inference endpoint:", {e})

Chat completion inference endpoint created successfully: hugging-face-gpt-oss-safeguard


## Index setup

### Creating mappings


In [23]:
try:
    mapping = {
        "mappings": {
            "properties": {
                "id": {"type": "keyword"},
                "title": {
                    "type": "text",
                    "fields": {"keyword": {"type": "keyword"}},
                    "copy_to": "semantic_field",
                },
                "author": {"type": "keyword", "copy_to": "semantic_field"},
                "category": {"type": "keyword", "copy_to": "semantic_field"},
                "content": {"type": "text", "copy_to": "semantic_field"},
                "date": {"type": "date"},
                "semantic_field": {"type": "semantic_text"},
            }
        }
    }

    es_client.indices.create(index=INDEX_NAME, body=mapping)
    print(f"Index {INDEX_NAME} created successfully")
except Exception as e:
    print(f"Error creating index: {e}")

Index company-blog-posts created successfully


### Ingesting data to Elasticsearch

In [24]:
def build_data(json_file, index_name):
    with open(json_file, "r") as f:
        data = json.load(f)

    for doc in data:
        yield {"_index": index_name, "_source": doc}


try:
    success, errors = helpers.bulk(es_client, build_data("dataset.json", INDEX_NAME))
    print(f"{success} documents indexed successfully")

    if errors:
        print("Errors during indexing:", errors)
except Exception as e:
    print(f"Error: {str(e)}")

10 documents indexed successfully


## Function to execute semantic search

In [70]:
def semantic_search(user_question: str, size: int = 5):
    try:
        response = es_client.search(
            index=INDEX_NAME,
            body={
                "query": {
                    "semantic": {
                        "field": "semantic_field",
                        "query": user_question,
                    }
                },
                "size": size,
            },
        )

        return {
            "hits": response["hits"]["hits"],
            "total_hits": len(response["hits"]["hits"]),
        }

    except Exception as e:
        print(f"Error searching index: {str(e)}")

In [52]:
results = semantic_search(
    user_question="Search for updates related to data encryption.", size=1
)

print(f"Total hits: {results['total_hits']}")
for hit in results["hits"]:
    print(f"{json.dumps(hit['_source'], indent=2)}")

Total hits: 1
{
  "id": "8",
  "title": "Security update: encrypted data pipelines",
  "author": "Daniel Vega",
  "date": "2025-11-08",
  "category": "security",
  "content": "We have enhanced our data pipelines with full encryption in transit and at rest. This aligns with our company\u2019s commitment to protecting customer data."
}


## Generating completions function

In [125]:
def stream_chat_completion(messages: list, inference_id: str = INFERENCE_ENDPOINT_ID):
    try:

        response = requests.post(
            url=f"{os.environ['ELASTICSEARCH_URL']}/_inference/chat_completion/{inference_id}/_stream",
            json={
                "messages": messages,
            },
            headers={
                "Authorization": f"ApiKey {os.environ['ELASTICSEARCH_API_KEY']}",
                "Content-Type": "application/json",
            },
            stream=True,
        )
        response.raise_for_status()
        response.encoding = "utf-8"

        for line in response.iter_lines(decode_unicode=True):
            if line:
                line = line.strip()

                # Skip event lines like "event: message"
                if line.startswith("event:"):
                    continue

                # Process data lines
                if line.startswith("data: "):
                    data_content = line[6:]  # Remove "data: " prefix

                    if not data_content.strip() or data_content.strip() == "[DONE]":
                        continue

                    try:
                        chunk_data = json.loads(data_content)

                        # Extract the content from the response structure
                        if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
                            choice = chunk_data["choices"][0]
                            if "delta" in choice and "content" in choice["delta"]:
                                content = choice["delta"]["content"]
                                if content:
                                    yield content

                    except json.JSONDecodeError as json_err:
                        print(f"\nJSON decode error: {json_err}")
                        print(f"Problematic data: {data_content}")
                        continue

    except requests.exceptions.RequestException as e:
        yield f"Error: {str(e)}"

## RAG Chat with Streaming


In [126]:
def moderation_chat(
    user_query: str,
    inference_id: str = INFERENCE_ENDPOINT_ID,
    ruleset_path: str = "policies.txt",
):
    print(
        f"Moderation chat invoked with QUERY: {user_query} and INFERENCE_ID: {inference_id}"
    )
    search_results = semantic_search(user_query)
    context_docs = search_results["hits"]

    try:
        with open(ruleset_path, "r", encoding="utf-8") as f:
            ruleset = f.read()
    except FileNotFoundError:
        raise FileNotFoundError(
            "ruleset.txt not found. Please provide the company policies file."
        )

    system_prompt = f"""
        You are a Safeguard moderation assistant.
        Evaluate the provided article according to the company content policies.

        The evaluation must follow these instructions:
        - Only use the provided ruleset to judge compliance.
        - Focus on tone, confidentiality, and adherence to internal communication standards.

        Company ruleset:
        {ruleset}
    """

    results = []

    for i, doc in enumerate(context_docs, 1):
        title = doc["_source"].get("title", f"Article {i}")
        content = doc["_source"].get("content", "")

        user_prompt = f"""
            Evaluate this article:
            Title: {title}
            Content body:
            {content}
        """

        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ]

        print("\n------------------------------------------------")
        print(
            f"Evaluating article {i}/{len(context_docs)}\n TITLE: {title} \n CONTENT: {content}\n"
        )

        try:
            full_response = ""
            for chunk in stream_chat_completion(messages, inference_id=inference_id):
                print(chunk, end="", flush=True)
                full_response += chunk

            results.append({"title": title, "response": full_response.strip()})
        except Exception as e:
            print(f"\n⚠️ Error evaluating {title}: {e}")
            results.append({"title": title, "response": None, "error": str(e)})

    return results

In [127]:
response = moderation_chat(
    user_query="Find posts explaining debugging or authentication issues"
)
# Find posts explaining debugging or authentication issues.

# “Find articles about analytics dashboards or visualization tools.”

Moderation chat invoked with QUERY: Find posts explaining debugging or authentication issues and INFERENCE_ID: hugging-face-gpt-oss-safeguard

------------------------------------------------
Evaluating article 1/5
 TITLE: Debugging authentication issues 
 CONTENT: When debugging authentication, never share production tokens or client credentials. Always use our test environment and anonymized data for public examples.


------------------------------------------------
Evaluating article 1/5
 TITLE: Debugging authentication issues 
 CONTENT: When debugging authentication, never share production tokens or client credentials. Always use our test environment and anonymized data for public examples.

**Evaluation Result: PASS**  
The article does **not** contain any of the prohibited content categories:

**Evaluation Result: PASS**  
The article does **not** contain any of the prohibited content categories:

1. No confidential project names, pricing, or sensitive credentials are disclosed.

In [None]:
NEW_INFERENCE_ENDPOINT_ID = "gpt-oss-20b-endpoint"
NEW_INFERENCE_ENDPOINT_URL = (
    "https://aajflm.us-east-2.aws.endpoints.huggingface.cloud/v1/chat/completions"
)

try:
    resp = es_client.inference.put(
        task_type="chat_completion",
        inference_id=NEW_INFERENCE_ENDPOINT_ID,
        body={
            "service": "hugging_face",
            "service_settings": {
                "api_key": os.environ["HUGGING_FACE_API_KEY"],
                "url": NEW_INFERENCE_ENDPOINT_URL,
            },
        },
    )

    print(
        "Chat completion inference endpoint created successfully:", resp["inference_id"]
    )
except Exception as e:
    print("Error creating chat completion inference endpoint:", {e})

Chat completion inference endpoint created successfully: gpt-oss-20b-endpoint


In [119]:
response = moderation_chat(
    "Find posts explaining debugging or authentication issues",
    inference_id=NEW_INFERENCE_ENDPOINT_ID,
)

Moderation chat invoked with QUERY: Find posts explaining debugging or authentication issues and INFERENCE_ID: gpt-oss-20b-endpoint

------------------------------------------------
Evaluating article 1/5
 TITLE: Debugging authentication issues 
 CONTENT: When debugging authentication, never share production tokens or client credentials. Always use our test environment and anonymized data for public examples.

**Compliance Evaluation**

- **Rule 1 – Confidential Information:** No internal project names, client data, or non‑public credentials are disclosed.  
- **Rule 2 – Internal Pricing:** No pricing or financial details are shared.  
- **Rule 3 – Unprofessional Language:** The tone is factual and respectful; no mocking or negative comparisons.  
- **Rule 4 – Personal Data:** No identifiable client or employee information appears.  
- **Rule 5 – Unsafe Practices:** The post explicitly advises against sharing production tokens or credentials, discouraging unsafe behavior.  

**Result:*

## Deleting

Delete the resources used to prevent them from consuming resources.

In [None]:
# Cleanup - Delete Index
es_client.indices.delete(index=INDEX_NAME)

ObjectApiResponse({'acknowledged': True})

In [120]:
# Cleanup - Delete Inference Endpoints
es_client.inference.delete(inference_id=INFERENCE_ENDPOINT_ID)

# es_client.inference.delete(inference_id=NEW_INFERENCE_ENDPOINT_ID)

ObjectApiResponse({'acknowledged': True, 'pipelines': [], 'indexes': []})