# Building Semantic Search with Memory: a problem!



In [1]:
#!python -m pip install semantic-kernel==0.4.5.dev0

In [2]:
from typing import Tuple
import os
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAITextEmbedding,
    AzureChatCompletion,
    AzureTextEmbedding,
)
from dotenv import load_dotenv
load_dotenv()
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]
key = os.environ["AZURE_OPENAI_API_KEY"]
embeddings = os.environ["OPENAI_EMBEDDINGS_MODEL_NAME"]

In [3]:
kernel = sk.Kernel()

useAzureOpenAI = True

# Configure AI service used by the kernel
if useAzureOpenAI:
    # next line assumes chat deployment name is "turbo", adjust the deployment name to the value of your chat model if needed
    azure_chat_service = AzureChatCompletion(deployment_name=deployment_name, endpoint=endpoint, api_key=key)
    # next line assumes embeddings deployment name is "text-embedding", adjust the deployment name to the value of your chat model if needed
    azure_text_embedding = AzureTextEmbedding(deployment_name=embeddings, endpoint=endpoint, api_key=key)
    kernel.add_chat_service("chat_completion", azure_chat_service)
    kernel.add_text_embedding_generation_service("ada", azure_text_embedding)
else:
    api_key, org_id = sk.openai_settings_from_dot_env()
    oai_chat_service = OpenAIChatCompletion(ai_model_id="gpt-3.5-turbo", api_key=api_key, org_id=org_id)
    oai_text_embedding = OpenAITextEmbedding(ai_model_id="text-embedding-ada-002", api_key=api_key, org_id=org_id)
    kernel.add_chat_service("chat-gpt", oai_chat_service)
    kernel.add_text_embedding_generation_service("ada", oai_text_embedding)

kernel.register_memory_store(memory_store=sk.memory.VolatileMemoryStore())
kernel.import_skill(sk.core_skills.TextMemorySkill())

{'recall': SKFunction(), 'save': SKFunction()}

At its core, Semantic Memory is a set of data structures that allow you to store the meaning of text that come from different data sources, and optionally to store the source text too. These texts can be from the web, e-mail providers, chats, a database, or from your local directory, and are hooked up to the Semantic Kernel through data source connectors.

The texts are embedded or compressed into a vector of floats representing mathematically the texts' contents and meaning. You can read more about embeddings [here](https://aka.ms/sk/embeddings).

Now you might be wondering what happens if you have so much data that it doesn't fit into your RAM? That's where you want to make use of an external Vector Database made specifically for storing and retrieving embeddings. Fortunately, semantic kernel makes this easy thanks to an extensive list of available connectors. In the following section, we will connect to an existing Azure AI Search service that we will use as an external Vector Database to store and retrieve embeddings.


In [4]:
from semantic_kernel.connectors.memory.azure_cognitive_search import (
    AzureCognitiveSearchMemoryStore,
)

azure_ai_search_api_key, azure_ai_search_url = sk.azure_aisearch_settings_from_dot_env()

# text-embedding-ada-002 uses a 1536-dimensional embedding vector
kernel.register_memory_store(
    memory_store=AzureCognitiveSearchMemoryStore(
        vector_size=1536,
        search_endpoint=azure_ai_search_url,
        admin_key=azure_ai_search_api_key,
    )
)

In [5]:
index = os.environ["AZURE_AISEARCH_INDEX_NAME"]
question = 'what is the Revenue of Microsoft?'
result = await kernel.memory.search_async(index, question)
print(f"Answer: {result[0].text}\n")

HttpResponseError: () Invalid expression: Could not find a property named 'Id' on type 'search.document'.
Parameter name: $select
Code: 
Message: Invalid expression: Could not find a property named 'Id' on type 'search.document'.
Parameter name: $select

# Dónde está el problema: 

En el detalle precedente están las distintas llamadas a las funciones que generan el error. 

> Esta llamada: `File ~/.cache/pypoetry/virtualenvs/semantic-kernel-lHv8C67X-py3.9/lib/python3.9/site-packages/semantic_kernel/memory/semantic_text_memory.py:143, in SemanticTextMemory.search_async(self, collection, query, limit, min_relevance_score, with_embeddings)
    130 """Search the memory (calls the memory store's get_nearest_matches method).`

La clase `memory` contiene un método que se denomina `search_async` cuyo resultado debe tener la estructura siguiente:


```
return MemoryQueryResult(
            is_reference=record._is_reference,
            external_source_name=record._external_source_name,
            id=record._id,
            description=record._description,
            text=record._text,
            additional_metadata=record._additional_metadata,
            embedding=record._embedding,
            relevance=relevance,
)
```

# En este foro de semantic-kernel está tratado  este problema en .NET y dice:

### https://discord.com/channels/1063152441819942922/1098714016890769408/1194920438384570378   

Richard Urwin — 11/01/2024 5:27
Hi all,
     
 I've got a pre-existing ingestion pipeline, indexing content (with embeddings) into Azure AI search, so I don't need the indexing functionality provided by KernelMemory (server or serverless).
     
 I want to use KernelMemory as part of a RAG process, but the stumbling block seems to be that the field names in my index must match those hardcoded in the AzureAISearchMemoryRecord because there appears to be no way of overriding them (the class is sealed, the values are constants, it doesn't implement an interface, etc.). There's even a comment to this end:
     
```
namespace Microsoft.KernelMemory.MemoryDb.AzureAISearch;
     
// TODO: support bring your own index schema
public sealed class AzureAISearchMemoryRecord
{
    internal const string IdField = "id";
    internal const string VectorField = "embedding";
    private const string TagsField = "tags";
    private const string PayloadField = "payload";
...
}
``` 

The files in question are:
- https://github.com/microsoft/kernel-memory/blob/4b92c2896eb14dc4f948c792937238057a694838/extensions/AzureAISearch/AzureAISearch/AzureAISearchMemoryRecord.cs#L14
- https://github.com/microsoft/kernel-memory/blob/4b92c2896eb14dc4f948c792937238057a694838/extensions/AzureAISearch/AzureAISearch/AzureAISearchMemoryRecord.cs#L14

Any suggestions as to how to work around this, other than taking a copy of AzureAISearchMemory.cs and AzureAISearchMemoryRecord.cs, modifying them as required and then the former  in the KernelMemoryBuilder using WithCustomMemoryDb(...)?
    
Thanks,
Rich

---

La solución estaría implementada en .NET pero no disponible aún: 

Devis Lucato
SK architect @ Microsoft
GitHub @dlucquark — 11/01/2024 6:34

here's the fix https://github.com/microsoft/kernel-memory/pull/251 it's merged. However, the service runs with nugets, so it won't see the latest code in main. We need to release the new packages and update the service to use the latest.


### Otros comentarios acá:
- One drawback of the above approach is that the Semantic Kernel requires data to be indexed only in the MemoryRecord format, which includes fields such as ‘externalSourceName,’ ‘externalId,’ ‘description,’ ‘text,’ etc. Therefore, if you are ingesting data from an external source, you will need to convert the data into the required fields before ingesting it. [link](https://medium.com/@akshaykokane09/step-by-step-guide-to-integrate-azure-cognitives-vector-search-in-your-chatgpt-like-app-part-2-7e32155dbf9e)

# El bypass

En la propia documentaciòn de Microsoft sobre semantic-kernell implementan una consulta sin la clase `memory`. 

https://techcommunity.microsoft.com/t5/educator-developer-blog/teach-chatgpt-to-answer-questions-using-azure-ai-search-amp/ba-p/3985395 