# ELI Intelligent Search APIs

### What is Retrieval Augmented Generation (RAG)?

Retrieval Augmented Generation (RAG) is a framework that combines the generative capabilities of large language models (LLMs) with external data sources to generate more accurate and up-to-date responses.

RAG helps overcome limitations of standalone LLMs, such as lack of up-to-date information and inability to provide sources for responses.
By grounding the LLM in external data, RAG can reduce hallucinations and increase relizability of answers.

### Intelligent Search for Telecom

Intelligent Search for Telecom (ELI Semantic) is a Retrieval Augmented Generation (RAG) solution enhanced with Telecom-adapted LMs covering various data sources including CPI, 3GPP, Wikipedia etc.
It provides an end-to-end RAG system with retrieval, reranking, and generation stages.

### ELI Gateway

ELI Gateway is the entypoint to the ELI platform and provides API access to various deployed language models and various use cases.

> These APIs are meant for exploration purposes, if you are targeting production deployment contact the [ELI Team](https://ericsson.sharepoint.com/sites/GAIA-Portal/SitePages/ELI--Ericsson-Language-Intelligence.aspx).

ELI provides locally deployed models where no data leaves to external services, but there is also support for external models such as those from Azure OpenAI and AWS Bedrock.

### Install requirements

Install the requirements for this guide as follows. 

If you have not already done so, it ss a good practice to create a Python virtual environment to avoid conflicts with your system dependencies.

You can follow [this guide](https://docs.python.org/3/tutorial/venv.html) to create an environment and activate.

In [2]:
%%sh

pip install --no-cache-dir -q -r requirements.txt

Couldn't find program: 'sh'


### ELI API Key

To use the ELI APIs, you will need to get your API Key as follows:

1. Register or login at https://gateway.eli.gaia.gic.ericsson.se using your Ericsson account.
2. Copy the API key from the user [Profile](https://gateway.eli.gaia.gic.ericsson.se/profile) page.
3. Save the API key to a file called **.eli-key** in the same path as this notebook. So it can be loaded directly as shown below.

DO NOT SHARE YOUR **PERSONAL** API KEY with others!!

As a practice, do not put the API key in your code/notebooks.

In [3]:
# Use the API Key as is (Reminder - do not hard-code in public notebooks/code)
ELI_API_KEY = "eli-58e256f2-2d43-488f-9626-ffa03ed96d84"

# Or if you have it stored in a file, load as follows
# ELI_API_KEY = open(".eli-key").read().strip()

# Set ELI Gateway URL
ELI_API_URL = "https://gateway.eli.gaia.gic.ericsson.se"

### List Generative LLMs on ELI

This API endpoint will responsd with list of names of generative llms deployed on ELI.

Any of the models names can be used later.

In [4]:
import warnings
import requests

warnings.filterwarnings("ignore")


def eli_list_generative_llms():
    """Make API call to list the generative LLMs available in ELI."""

    try:
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.get(f"{ELI_API_URL}/api/v1/llm/list", headers=headers, timeout=(300, 300), verify=False)
        if response.ok:
            return response.json().get("models", [])
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")

In [5]:
# List LLMs

print(eli_list_generative_llms())

['deepseekr1-14b', 'llama3.1-8b', 'mistral-12b', 'phi4-14b', 'qwen2.5-7b', 'qwen3-8b']


Any of the above models can be used in the API requests below.

> Note that if you provide an unknown model name in a request, then the default model will be used to respond.

> As of June 2024, the default generative LLM in ELI is `LLaMA3-8B`

### ELI Intelligent Search APIs

Telecom LM-based Retrieval/Reranking + Generation. The search part is performed in two stages:

- Stage-I: retrieve relevant sections from vector store.
- Stage-II: (optional) rerank results with cross encoder model.
- Finally, generative model is used to compose an answer to the query.

**Request Structure**

The request payload sent to ELI Gateway is structured as follows: 
```json
{
  "query": "",
  "chat_history": [],
  "category": "",
  "cpi_library_id": "",
  "cpi_library_title": "",
  "index_name": "",
  "top_k": 10,
  "rerank": true,
  "hybrid_alpha": 1,
  "generate_answer": true,
  "model": "LLaMA3-8b",
  "max_new_tokens": 512,
  "temperature": 0.01,
  "client": "api",
  "stream": false,
  "stream_batch_tokens": 10,
  "rag_method": "v2"
}
```

The `chat_history` messages use the following format:
```python
chat_history = [
    {"role": "user", "content" : "<message>"},
    {"role": "assistant", "content" : "<message>"},
]
```

Use asterisk (*) to query the default index.
Please use the default value if not sure what an attribute means.


**Response Structure**

The response from ELI Gateway is structured as follows:

```json
{
  "answer": "",
  "evidences": [
    {
      "hit_id": "1",
      "rec_id": "",
      "category": "CPI Store",
      "doc_name": "",
      "doc_url": "",
      "doc_title": "",
      "doc_subtitle": "",
      "section_title": "",
      "doc_text": "",
      "doc_type": "",
      "doc_page_num": "",
      "cpi_folder_id": "",
      "cpi_folder_title": "",
      "cpi_library_id": "",
      "cpi_library_title": "",
      "cpi_library_date": "",
      "stage1_rank": 1,
      "stage1_score": 0.0,
      "stage2_rank": 1,
      "stage2_score": 0.0
    }
  ],
  "message": "",
  "time_elapsed": "",
  "status": "success"
}
```

In [19]:
import json
import warnings
import requests

warnings.filterwarnings("ignore")


def eli_semantic_categories(index:str):
    """Make API call to list the data source categories in ELI Semantic."""

    try:
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.get(
            f"{ELI_API_URL}/api/v1/rag/categories/{index}",
            headers=headers,
            timeout=(300, 300),
            verify=False,
        )
        if response.ok:
            return response.json().get("categories", [])
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")


def eli_semantic_cpi_library_identities(index:str):
    """Make API call to list the cpi library identities in ELI Semantic."""

    try:
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.get(
            f"{ELI_API_URL}/api/v1/rag/cpi/library_identities/{index}",
            headers=headers,
            timeout=(300, 300),
            verify=False,
        )
        if response.ok:
            return response.json().get("categories", [])
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")


def eli_semantic_cpi_library_titles(index:str):
    """Make API call to list the cpi library titles in ELI Semantic."""

    try:
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.get(
            f"{ELI_API_URL}/api/v1/rag/cpi/library_titles/{index}",
            headers=headers,
            timeout=(300, 300),
            verify=False,
        )
        if response.ok:
            return response.json().get("categories", [])
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")

def eli_semantic_search_filters(index:str):
    """Make API call to list the search filters in ELI Semantic."""

    try:
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.get(
            f"{ELI_API_URL}/api/v1/rag/search_filters/{index}",
            headers=headers,
            timeout=(300, 300),
            verify=False,
        )
        if response.ok:
            return response.json().get("search_filters", [])
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")


def eli_semantic_rag(
    index: str,
    query: str,
    chat_history: list = [],
    model: str = "mistral",
    category: str = "",
    cpi_library_id: str = "",
    rerank: bool = True,
    top_k: int = 5,
    generate_answer: bool = True,
):
    """
    Make API call to ELI Semantic endpoint and ask questions on Ericsson internal documents.

    query: the question to ask
    category: the category of information source, empty to cover all
    """

    try:
        payload = {
            "index_name": index,
            "model": model,
            "query": query,
            "chat_history": chat_history,
            "category": category,
            "cpi_library_id": cpi_library_id,
            "rerank": rerank,
            "top_k": top_k,
            "generate_answer": generate_answer,
        }
        headers = {
            "Authorization": f"Bearer {ELI_API_KEY}",
            "Content-Type": "application/json",
        }
        response = requests.post(
            f"{ELI_API_URL}/api/v1/rag/query",
            json=payload,
            headers=headers,
            timeout=(300, 300),
            verify=False,
        )
        if response.ok:
            return response.json()
        print("Failed to get response.", response.text)
    except Exception as ex:
        print(f"An error occurred: {ex}.")

In [20]:
# Default index on ELI is marked as *.
# This can be changed to other index names if you get information from the ELI Team.
INDEX_NAME = "*"

In [21]:
categories = eli_semantic_categories(index=INDEX_NAME)
cpi_library_identities = eli_semantic_cpi_library_identities(index=INDEX_NAME)
cpi_library_titles = eli_semantic_cpi_library_titles(index=INDEX_NAME)
search_filterz = eli_semantic_search_filters(index=INDEX_NAME)

print(f"Categories:\n{categories}")
print()
print(f"CPI Library IDs:\n{cpi_library_identities}")
print()
print(f"CPI Library Titles:\n{cpi_library_titles}")
print()
print(f"Search Filters:\n{search_filterz}")
print()
print("end")

Categories:
['All Sources', 'CAL Store - BEAM', 'CAL Store - CBA', 'CAL Store - CEE', 'CAL Store - CNIS Solution', 'CAL Store - CNOM', 'CAL Store - Cloud Core', 'CAL Store - Cloud SDN', 'CAL Store - DM 5GC Small on Ericsson CNIS', 'CAL Store - DSE', 'CAL Store - ENIQ', 'CAL Store - ENM', 'CAL Store - Ericsson CCD', 'CAL Store - Ericsson Charging 22.10, Support', 'CAL Store - Ericsson Charging 22.11, Support', 'CAL Store - Ericsson Charging 22.8, Support', 'CAL Store - Ericsson Charging 22.9, Support', 'CAL Store - Ericsson Mediation', 'CAL Store - Ericsson Operations Manager Cloud Infrastructure [OMC]', 'CAL Store - Ericsson Orchestrator', 'CAL Store - Ericsson Transport Automation Controller', 'CAL Store - Hyperscale Datacenter System', 'CAL Store - IMS', 'CAL Store - Internal Test Libraries', 'CAL Store - MSC', 'CAL Store - Mobile Core', 'CAL Store - NFVI Solution', 'CAL Store - Network Solutions', 'CAL Store - R19', 'CPI Store - CBEV', 'CPI Store - ENIQ Statistics', 'CPI Store - ENM

In [11]:
len(cpi_library_identities)

247

**Search Filtering**

In order to achieve better accuracy, it is very important that you apply the right search filters and narrow down the scope in the API requests to Intelligent Search.

For example, provide a list of CPI library identities that you're interested in, instead of searching all of CPI Store.

Now let's run end to end ELI Intelligent Search queries that cover retrieval, reranking, and generation stages.

In [None]:
%%time

query1 = "Tell me all about Ericsson Internal support, and all its libraries associated with it."

response = eli_semantic_rag(
    index=INDEX_NAME,
    category=[],  # category can be empty or list of items
    query=query1,
    cpi_library_id=['EN/LZN 745 0028 R43A'],  # <- cpi_library_id can be a list of mulitple libraries
    generate_answer=True,
)

print(json.dumps(response, indent=2))

{
  "message": "Result can be found in the 'answer' and 'evidences' fields.",
  "status": "success",
  "time_elapsed": "24.5134",
  "rephrased_query": "",
  "answer": "Based on the provided search results, here's a concise summary of Ericsson Internal Support for vBGF 1.42.0:\n\n**Ericsson Internal Support:**\n- **Purpose:** Provides internal software interfaces (commands, files, directories) for troubleshooting the Virtual Border Gateway Function (vBGF).\n- **Intended Users:** Authorized Ericsson personnel.\n- **Security:** Descriptions serve to provide security semantics, stating what can be done by which user group. Unauthorized use relieves Ericsson of support obligations [2].\n- **Assumption:** The vBGF has been installed and configured as specified in the documentation.\n\n**Ericsson Library Explorer (ELEX):**\n- **Purpose:** A tool for browsing Ericsson Customer Product Information (CPI) libraries in a standard web browser.\n- **Compatibility:** Supports latest versions of Chrom

: 

In [13]:
# Get the final answer
query1_answer = response.get("answer", "")
print(query1_answer)

DRX (Discontinuous Reception) is a feature that allows User Equipment (UEs) in CM-IDLE state to reduce power consumption by discontinuously monitoring the paging channel [1][3]. It enables low paging frequency, with up to 2.56 seconds of DRX cycle length [1][3].


#### Including Chat History in RAG Queries

For demonstration purposes, we have used `query1` and `query1_answer` variables to track the previous conversation and we will use those to build chat history.

In a real case, an improved data-structure needs to be used to track the user conversation and continuously update chat history. 

In [11]:
chat_history = [
    {"role": "user", "content": query1},
    {"role": "assistant", "content": query1_answer},
]

print("Chat History:", json.dumps(chat_history, indent=2))

Chat History: [
  {
    "role": "user",
    "content": "What is DRX?"
  },
  {
    "role": "assistant",
    "content": "DRX stands for Discontinuous Reception, which is a mechanism that allows User Equipment (UE) to monitor PDCCH discontinuously in connected mode, enabling energy savings by reducing the time spent listening for data transmissions [1]."
  }
]


In [12]:
%%time

query2 = "How does it work?"
query2 = "How is it configured?"

response = eli_semantic_rag(
    index=INDEX_NAME,
    category="CPI Store",
    query=query2,
    chat_history=chat_history,
    cpi_library_id=[]  # <- cpi_library_id can be a list of multiple libraries
)

CPU times: user 11.9 ms, sys: 3.24 ms, total: 15.1 ms
Wall time: 9.13 s


In [13]:
# Get final answer for second round query
query2_answer = response.get("answer", "")
print(query2_answer)

DRX (Discontinuous Reception) configuration includes:

1. **drx-Cycle**: Sets the cycle length for DRX, e.g., `drx-Cycle = long` or `short`.
2. **onDurationTimer**: Defines how long the UE stays awake after a paging occasion.
3. **inactivityTimer**: Stops the drx-InactiveRetransmissionTimer upon leaving RRC_INACTIVE.

These parameters are configured via dedicated signaling (RRC Reconfiguration) and can be found in Ericsson's official product documentation, such as the LTE/NR RAN Node Specifications.


In [14]:
# Extend chat history with the second round conversation
chat_history.append({"role": "user", "content": query2})
chat_history.append({"role": "assistant", "content": query2_answer})

print("Chat History:", json.dumps(chat_history, indent=2))

Chat History: [
  {
    "role": "user",
    "content": "What is DRX?"
  },
  {
    "role": "assistant",
    "content": "DRX stands for Discontinuous Reception, which is a mechanism that allows User Equipment (UE) to monitor PDCCH discontinuously in connected mode, enabling energy savings by reducing the time spent listening for data transmissions [1]."
  },
  {
    "role": "user",
    "content": "How is it configured?"
  },
  {
    "role": "assistant",
    "content": "DRX (Discontinuous Reception) configuration includes:\n\n1. **drx-Cycle**: Sets the cycle length for DRX, e.g., `drx-Cycle = long` or `short`.\n2. **onDurationTimer**: Defines how long the UE stays awake after a paging occasion.\n3. **inactivityTimer**: Stops the drx-InactiveRetransmissionTimer upon leaving RRC_INACTIVE.\n\nThese parameters are configured via dedicated signaling (RRC Reconfiguration) and can be found in Ericsson's official product documentation, such as the LTE/NR RAN Node Specifications."
  }
]


In [15]:
# This Intelligent Search powered conversation can continue

# ...

### Note on Search Filtering

For various reasons you might sometimes get an empty list in the `evidences` field in the response.

This could mean one of the following:
1. Non-existent entries used in filtering -- maybe library id does not existing.
   > Note: When filtering with multiple library identities the operation is OR, hence it will only fail if all libraries were not found
2. Miss-aligned filters. For example, setting "3GPP" as category and also adding CPI library identity.  
   > Note: cross-field filtering (eg. category + cpi_libary_id) use AND operation, hence they need to align.
3. If the server side target index/collection on the Vector Store is empty
   > Check that you are using the right `index_name` in the request, or leave it empty for default
4. If the server side Vector Store down or internal connection is broken

The #1 & #2 issues can be verified and addressed by the user, while 3 & 4 need attention on the ELI server side.

If you notice that the evidences field is continuously empty, you can reach out to the ELI team for fixes.

~end~