Elasticsearch explorer
The aim is to input your unauthenticated elasticsearch url and this tool will help you query it with natural language.
You can ask "please aggregate all agent status records by companyId for the last 6 months" and you will get the results in a table - saves me looking up the syntax all the time
there are 2 tools for reading the searchable aliases and finding the mapping for each. You can ask to "list all aliases" and you can ask to see the mapping for a particular alias. you can also just ask for the field list for an alias.
If you ask, you can just get the search query displayed - usefull for debugging
If you ask to query for records, the llm will use the tools to first create a query, then execute it and show you the results formatted to a table.
The date and the "companyId" if you are using it will be set dynamically into the prompt

In [1]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr
import requests
import json
from datetime import datetime

In [2]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
openai = OpenAI()
MODEL = 'gpt-4.1'
alias_and_mappinglist = []

OpenAI API Key exists and begins sk-proj-


This system prompt aims to instruct the llm to allow the creation of elasticsearch queries. It also gives instruction about using the tools to get the lists of aliases and mapings to help construct the queries
I ave added a companyId as that is always present in my data, you may want to remove it

In [3]:
def system_prompt(company_id, elasticsearch_url):
    example1 = {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "companyId": 11529601
              }
            },
            {
              "range": {
                "tsStart": {
                  "gte": 1793842800000,
                  "lt": 1793846400000
                }
              }
            }
          ]
        }
      }
    };
    
    return f'''Your job is to help the user to generate and run elasticsearch queries for
        the company {company_id}. The Elasticsearch cluster is available at: {elasticsearch_url}
        If the user asks for a list of aliases, you can use the get_alias_list function.
        If the user asks for the mapping for a specific alias, you can use the get_alias_mapping function.
        It might be helpul to just provide the fields in the mapping to the user.
        If the user asks you to generate a search query, If the company id is present, you must always include the companyId field in the search query.
        If the user asks to search for documents, first you must generate a search body compatible with elasticsearch
        syntax then you can use the search_elasticsearch function and pass in
        the search_body and alias. You can use the mapping to help generate the search_body.
        If the user asks a question about the data, you can use any of the alias names from the get_alias_list
        function and the mapping to help generate the search_body.
        When presenting data to the user, make sure to include the alias name and the index name.
        Never return more than 100 rows of data.
        If the user asks for relative date ranges, the current date time is {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
        dates in the elasticsearch query should be in epoch ms time.         
        The data coming back from elasticsearch is in json format. please convert this to a simple tabular form
        and include summary data like the total number of hits.
        for example if you are asked to generate a search query for the alias "v2-agent-status" you might return 
        {json.dumps(example1)}
        for example if you are asked to execute a query, or search for records, then you might execute the tool call
        search_elasticsearch with the search_body and alias.
        search_elasticsearch("http://golda.eng.tcs.local:9200/", {json.dumps(example1)}, "v2-agent-status")
        '''


Elasticsearch tool functions
these are the queries to elasticsearch

In [4]:

def get_elasticsearch_aliases_and_mappings(elasticsearch_url):
    """
    Fetches all aliases and their mappings from an Elasticsearch cluster.
    Returns an array of objects with {alias, mapping} for each alias.
    If an alias points to multiple indices, uses the first index's mapping.
    
    Args:
        elasticsearch_url (str): The base URL of the Elasticsearch cluster 
                                (e.g., 'http://localhost:9200')
    
    Returns:
        list: An array of dictionaries, each containing:
              {
                  'alias': 'alias_name',
                  'mapping': {...}  # mapping from the first index this alias points to
              }
    """
    global alias_and_mappinglist
    
    # Lazy return if already populated
    if alias_and_mappinglist != []:
        return alias_and_mappinglist
    
    try:
        # Remove trailing slash if present
        base_url = elasticsearch_url.rstrip('/')
        
        # Get all aliases
        aliases_url = f"{base_url}/_aliases"
        aliases_response = requests.get(aliases_url, timeout=10)
        aliases_response.raise_for_status()
        aliases_data = aliases_response.json()
        
        # Get all mappings
        mappings_url = f"{base_url}/_mapping"
        mappings_response = requests.get(mappings_url, timeout=10)
        mappings_response.raise_for_status()
        mappings_data = mappings_response.json()
        
        # Build a map: alias_name -> first_index_it_appears_in
        alias_to_index = {}
        
        # Iterate through all indices
        for index_name, index_data in aliases_data.items():
            # Check if this index has aliases
            if 'aliases' in index_data and index_data['aliases']:
                # For each alias on this index
                for alias_name in index_data['aliases'].keys():
                    # Only store the first index we encounter for this alias
                    if alias_name not in alias_to_index:
                        alias_to_index[alias_name] = index_name
        
        # Build result array: for each alias, get its mapping from the first index
        alias_and_mappinglist = []
        for alias_name, first_index in alias_to_index.items():
            # Get the mapping for the first index this alias points to
            if first_index in mappings_data:
                mapping = mappings_data[first_index].get('mappings', {})
                alias_and_mappinglist.append({
                    'alias': alias_name,
                    'mapping': mapping
                })
        
        return alias_and_mappinglist
    
    except requests.exceptions.RequestException as e:
        return [{
            'error': f"Failed to connect to Elasticsearch: {str(e)}",
            'alias': None,
            'mapping': None
        }]
    except json.JSONDecodeError as e:
        return [{
            'error': f"Failed to parse response: {str(e)}",
            'alias': None,
            'mapping': None
        }]

# Example usage:
# result = get_elasticsearch_aliases_and_mappings('http://localhost:9200')
# print(json.dumps(result, indent=2))

def search_elasticsearch(elasticsearch_url, search_body, index_or_alias=None):
    """
    Executes a search query against Elasticsearch.
    
    Args:
        elasticsearch_url (str): The base URL of the Elasticsearch cluster 
                                (e.g., 'http://localhost:9200')
        search_body (dict or str): The search query body as a dictionary or JSON string
                                   (e.g., {'query': {'match': {'field': 'value'}}})
        index_or_alias (str, optional): The index or alias to search. If None, searches all indices.
    
    Returns:
        dict: The JSON response from Elasticsearch containing search results
    """
    try:
        # Remove trailing slash if present
        base_url = elasticsearch_url.rstrip('/')
        
        # Build the search URL
        if index_or_alias:
            search_url = f"{base_url}/{index_or_alias}/_search"
        else:
            search_url = f"{base_url}/_search"
        
        # Convert search_body to dict if it's a string
        if isinstance(search_body, str):
            search_body = json.loads(search_body)
        
        # Set headers for JSON content
        headers = {'Content-Type': 'application/json'}
        
        # Execute the search request
        response = requests.post(
            search_url,
            json=search_body,
            headers=headers,
            timeout=30
        )
        response.raise_for_status()
        
        return response.json()
    
    except requests.exceptions.RequestException as e:
        return {
            'error': f"Failed to execute search: {str(e)}",
            'status_code': getattr(e.response, 'status_code', None) if hasattr(e, 'response') else None
        }
    except json.JSONDecodeError as e:
        return {
            'error': f"Failed to parse search body or response: {str(e)}"
        }

# Example usage:
# search_query = {
#     "query": {
#         "match": {
#             "field_name": "search term"
#         }
#     },
#     "size": 10
# }
# result = search_elasticsearch('http://localhost:9200', search_query, index_or_alias='my_index')
# print(json.dumps(result, indent=2))


Test the elasticsearch functions befre proceeding

In [5]:
get_elasticsearch_aliases_and_mappings("http://golda.eng.tcs.local:9200/")

[{'alias': '.kibana-event-log-7.9.3',
  'mapping': {'dynamic': 'false',
   'properties': {'@timestamp': {'type': 'date'},
    'ecs': {'properties': {'version': {'type': 'keyword',
       'ignore_above': 1024}}},
    'error': {'properties': {'message': {'type': 'text', 'norms': False}}},
    'event': {'properties': {'action': {'type': 'keyword',
       'ignore_above': 1024},
      'duration': {'type': 'long'},
      'end': {'type': 'date'},
      'outcome': {'type': 'keyword', 'ignore_above': 1024},
      'provider': {'type': 'keyword', 'ignore_above': 1024},
      'start': {'type': 'date'}}},
    'kibana': {'properties': {'alerting': {'properties': {'instance_id': {'type': 'keyword',
         'ignore_above': 1024}}},
      'saved_objects': {'type': 'nested',
       'properties': {'id': {'type': 'keyword', 'ignore_above': 1024},
        'namespace': {'type': 'keyword', 'ignore_above': 1024},
        'rel': {'type': 'keyword', 'ignore_above': 1024},
        'type': {'type': 'keyword', 'i

Tool functions
these are the tool glue and metadata functions

In [6]:
def get_alias_list():
    if alias_and_mappinglist != []:
        alias_list = []
        for alias in alias_and_mappinglist:
            alias_list.append(alias['alias'])
        return alias_list
    return None

def get_alias_mapping(aliasName):
    if alias_and_mappinglist != []:
        for alias in alias_and_mappinglist:
            if alias['alias'] == aliasName:
                return alias['mapping']
    return None

def handle_tool_calls(message):
    responses = []
    data = {}
    for tool_call in message.tool_calls:
        if tool_call.function.name == "get_alias_list":
            arguments = json.loads(tool_call.function.arguments)
            city = arguments.get('destination_city')
            alias_list = get_alias_list()
            # Tool response content must be a string (JSON stringified)
            responses.append({
                "role": "tool",
                "content": json.dumps(alias_list) if alias_list is not None else "null",
                "tool_call_id": tool_call.id
            })
        elif tool_call.function.name == "get_alias_mapping":
            arguments = json.loads(tool_call.function.arguments)
            alias_name = arguments.get('alias_name')
            mapping = get_alias_mapping(alias_name)
            # Tool response content must be a string (JSON stringified)
            responses.append({
                "role": "tool",
                "content": json.dumps(mapping) if mapping is not None else "null",
                "tool_call_id": tool_call.id
            })
        elif tool_call.function.name == "search_elasticsearch":
            arguments = json.loads(tool_call.function.arguments)
            elasticsearch_url = arguments.get('elasticsearch_url')
            search_body = arguments.get('search_body')
            index_or_alias = arguments.get('index_or_alias')
            # Call the actual search_elasticsearch function
            search_result = search_elasticsearch(elasticsearch_url, search_body, index_or_alias)
            # Tool response content must be a string (JSON stringified)
            responses.append({
                "role": "tool",
                "content": json.dumps(search_result) if search_result is not None else "null",
                "tool_call_id": tool_call.id
            })
    return responses

get_alias_list_function = {
    "name": "get_alias_list",
    "description": "Get a list of all the aliases in the Elasticsearch cluster",
    "parameters": {
        "type": "object",
        "properties": {
            "elasticsearch_url": {
                "type": "string",
                "description": "The URL of the Elasticsearch cluster",
            },
        },
        "required": ["elasticsearch_url"],
        "additionalProperties": False
    }
}

get_alias_mapping_function = {
    "name": "get_alias_mapping",
    "description": "Given the name of an alias get the mapping of the index it points to",
    "parameters": {
        "type": "object",
        "properties": {
            "alias_name": {
                "type": "string",
                "description": "The name of the alias to get the mapping for",
            },
        },
        "required": ["alias_name"],
        "additionalProperties": False
    }
}

search_function = {
    "name": "search_elasticsearch",
    "description": "Search for documents in an Elasticsearch cluster",
    "parameters": {
        "type": "object",
        "properties": {
            "elasticsearch_url": {
                "type": "string",
                "description": "The URL of the Elasticsearch cluster",
            },
            "search_body": {
                "type": "object",
                "description": "The search query body",
            },
            "index_or_alias": {
                "type": "string",
                "description": "The index or alias to search",  
            },
        },
        "required": ["elasticsearch_url", "search_body", "index_or_alias"],
        "additionalProperties": False
    }
}

tools = [
    {"type": "function", "function": get_alias_list_function},
    {"type": "function", "function": get_alias_mapping_function},
    {"type": "function", "function": search_function}]

In [7]:
search_elasticsearch("http://golda.eng.tcs.local:9200/", {"query": {"match": {"companyId": "11529601"}}}, "v2-agent-status")

{'took': 10,
 'timed_out': False,
 '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 23, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'v2-agent-status-000001',
    '_type': 'agent-status-change',
    '_id': 'f8bae8a6-459d-5c41-959a-80c4939fb751',
    '_score': 1.0,
    '_source': {'tsStart': 1761303058588,
     'tsEnd': 1761303058872,
     'timeInOldState': 284,
     'agentId': 103678201,
     'agentName': 'agent1 agent1',
     'oldState': 101,
     'oldStateName': 'Unavailable',
     'oldStateGroup': 7,
     'oldStateGroupName': 'Unavailable',
     'newState': 0,
     'newStateName': 'Available',
     'newStateGroup': 0,
     'newStateGroupName': 'Available',
     'companyId': 11529601,
     'resellerId': 1,
     'timeInOldStateText': '0.284s'}},
   {'_index': 'v2-agent-status-000001',
    '_type': 'agent-status-change',
    '_id': '2a675fd9-73fd-5129-9b63-715e2b96ff43',
    '_score': 1.0,
    '_source': {'tsStart': 17613

Chat function
with added tool definitions and some conversion from json

In [8]:
def chat(company_id, es_url, message, history):
    # Ensure all history messages have content as a string
    processed_history = []
    for h in history:
        content = h.get("content", "")
        # If content is not a string, convert it to string
        if not isinstance(content, str):
            if content is None:
                content = ""
            else:
                content = json.dumps(content) if not isinstance(content, str) else content
        processed_history.append({"role": h["role"], "content": content})
    
    messages = [{"role": "system", "content": system_prompt(company_id, es_url)}] + processed_history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)

    while response.choices[0].finish_reason=="tool_calls":
        message_obj = response.choices[0].message
        responses = handle_tool_calls(message_obj)
        # Convert message object to dict format for API
        message_dict = message_obj.model_dump() if hasattr(message_obj, 'model_dump') else {
            "role": message_obj.role,
            "content": message_obj.content or "",
            "tool_calls": [{"id": tc.id, "type": tc.type, "function": {"name": tc.function.name, "arguments": tc.function.arguments}} for tc in message_obj.tool_calls] if message_obj.tool_calls else []
        }
        # Ensure content is a string, not None
        if message_dict.get("content") is None:
            message_dict["content"] = ""
        messages.append(message_dict)
        messages.extend(responses)
        response = openai.chat.completions.create(model=MODEL, messages=messages, tools=tools)
    
    return response.choices[0].message.content

Gradio
finally the UI to allow you to chat

In [9]:
# Create Gradio interface
history = []

with gr.Blocks() as demo:
    gr.Markdown("# Elasticsearch Explorer")
    gr.Markdown("Enter your company ID and Elasticsearch URL, then ask questions in the chat.")
    
    with gr.Row():
        with gr.Column(scale=1):
            company_id_input = gr.Textbox(
                label="Company ID",
                placeholder="Enter company ID",
                value=""
            )
            elasticsearch_url_input = gr.Textbox(
                label="Elasticsearch URL",
                placeholder="http://localhost:9200",
                value="http://golda.eng.tcs.local:9200/"
            )
    
    with gr.Row():
        chatbot = gr.Chatbot(
            label="Chat",
            height=400,
            type="messages"
        )
    
    with gr.Row():
        msg = gr.Textbox(
            label="Your Question",
            placeholder="Type your question here...",
            scale=4
        )
        submit_btn = gr.Button("Submit", scale=1)
    
    def respond(company_id, es_url, message, chat_history):
        if not es_url:
            return chat_history, "Please provide an Elasticsearch URL."
        
        if not message:
            return chat_history, ""
        
        # Load aliases and mappings (will use cached version if already loaded)
        global alias_and_mappinglist
        alias_and_mappinglist = get_elasticsearch_aliases_and_mappings(es_url)
        
        # With type='messages', chat_history is already in OpenAI format
        # Format: [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]
        openai_history = chat_history if chat_history else []
        
        # Get response
        response = chat(company_id, es_url, message, openai_history)
        
        # Update chat history (Gradio messages format: list of dicts with 'role' and 'content')
        chat_history.append({"role": "user", "content": message})
        chat_history.append({"role": "assistant", "content": response})
        
        return chat_history, ""
    
    submit_btn.click(
        respond,
        inputs=[company_id_input, elasticsearch_url_input, msg, chatbot],
        outputs=[chatbot, msg]
    )
    
    msg.submit(
        respond,
        inputs=[company_id_input, elasticsearch_url_input, msg, chatbot],
        outputs=[chatbot, msg]
    )

demo.launch(server_name="0.0.0.0")


* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.


