# Introduction to Memory in Semantic Kernel

In AI applications, memory is crucial for creating contextual, personalized experiences. Semantic Kernel provides powerful memory management capabilities that allow your AI applications to:

- Remember facts and knowledge over time
- Find information based on meaning rather than exact matches
- Use previous context in ongoing conversations
- Implement Retrieval-Augmented Generation (RAG) patterns



This notebook explores how to implement and use memory capabilities in Semantic Kernel applications. 


Let's visualize how memory fits into the Semantic Kernel architecture:

In [1]:
%pip install -U "semantic-kernel[azure]" mermaid-py --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
import mermaid as md
from mermaid.graph import Graph

In [3]:
sequence = Graph('Sequence-diagram',"""
graph TD
    A[Application] --> B[Kernel]
    B --> C[AI Models]
    B --> D[Memory System]
    B --> E[Plugins]
    D --> F[Short-term Memory]
    D --> G[Long-term Memory]
    G --> H[Vector Embeddings]
    G --> I[Memory Store]
    I --> J[Volatile Store]
    I --> K[Persistent Store]
    style D fill:#f9d5e5,stroke:#333,stroke-width:2px
""")
md.Mermaid(sequence)


## Memory
In SK, **memory** refers to the storage and recall of information the AI has learned or been provided. There are two primary forms of memory:

### Semantic Memory (Long-term)
- This is usually an **external vector store** that holds **embeddings of text**, allowing the AI to store facts or documents and later retrieve them by **semantic similarity**.
- SK provides **Memory Connectors** to various vector databases (like **Azure Cognitive Search, Pinecone, Qdrant**, etc.) via a common interface.
- By using a memory store, you can implement the **retrieval** part of **RAG**: store chunks of knowledge and fetch relevant pieces at query time.
- We’ll see how to add and use such memory in our chatbot.

### Conversation History (Short-term Memory)
- SK also manages the **immediate dialogue context** with a **Chat History object** for multi-turn conversations.
- This ensures the AI remembers prior user queries and its own responses, maintaining context across turns.
- We will leverage this to keep the conversation coherent.

In [4]:
from semantic_kernel import __version__
print(__version__)

1.24.0


In [5]:
from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import AzureChatCompletion
from semantic_kernel.connectors.ai.open_ai.services.azure_text_embedding import AzureTextEmbedding
from semantic_kernel.core_plugins.text_memory_plugin import TextMemoryPlugin
from semantic_kernel.kernel import Kernel
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
from semantic_kernel.memory.volatile_memory_store import VolatileMemoryStore
import os



In [6]:
embedding_deployment_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-ada-002")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")

# Create the embedding service
embedding_service = AzureTextEmbedding(
    endpoint=base_url,
    deployment_name=embedding_deployment_name,
    api_key=api_key
)

This cell creates an embedding service that connects to Azure OpenAI. This service will convert text into vector embeddings which are numerical representations that capture semantic meaning. The environment variables should be set in your `.env` file

In [7]:
memory = SemanticTextMemory(storage=VolatileMemoryStore(), embeddings_generator=embedding_service)

Here we initialize our semantic memory system with:

- A VolatileMemoryStore - an in-memory vector database (data will be lost when your session ends)
- The embedding service we created earlier, which will generate vector embeddings for text


In [8]:
collection_id = "generic"


async def populate_memory(memory: SemanticTextMemory) -> None:
    # Add some documents to the semantic memory
    await memory.save_information(collection=collection_id, id="info1", text="Your budget for 2024 is $100,000")
    await memory.save_information(collection=collection_id, id="info2", text="Your savings from 2023 are $50,000")
    await memory.save_information(collection=collection_id, id="info3", text="Your investments are $80,000")


await populate_memory(memory)

This function adds information to our memory store. Each memory item consists of:

- `collection`: A namespace for organizing related memories (like a database table)
- `id`: A unique identifier for retrieving specific memories
- `text`: The actual information to store


When we save information, Semantic Memory:

1. Generates an embedding vector for the text
2. Stores both the text and its vector in the memory store
3. Associates it with the given ID and collection

In [9]:
async def search_memory_examples(memory: SemanticTextMemory) -> None:
    questions = [
        "What is my budget for 2024?",
        "What are my savings from 2023?",
        "What are my investments?",
    ]

    for question in questions:
        print(f"Question: {question}")
        result = await memory.search(collection_id, question)
        print(f"Answer: {result[0].text}\n")

In [10]:
await search_memory_examples(memory)

Question: What is my budget for 2024?
Answer: Your budget for 2024 is $100,000

Question: What are my savings from 2023?
Answer: Your savings from 2023 are $50,000

Question: What are my investments?
Answer: Your investments are $80,000



---
### How does semantic search work?

1. We provide a natural language query (e.g., "What is my budget for 2024?")
2. The memory system:
   - Converts the query to a vector embedding
   - Compares this vector against stored embeddings using cosine similarity
   - Returns the closest matching results
   
The search works even if the query doesn't exactly match the stored text, as it finds semantically similar content.

---

### Exercise: Adding and Retrieving Custom Memories

Try adding your own information to the memory and retrieving it with semantic search.

1. Create a new collection called "personal"
2. Add at least three facts about a fictional person
3. Search for those facts using natural language queries

<details>
  <summary>Click to see solution</summary>
  
```python
# Create a new collection
personal_collection = "personal"

# Add information to memory
async def add_personal_info(memory):
    await memory.save_information(collection=personal_collection, id="fact1", text="John was born in Seattle in 1980")
    await memory.save_information(collection=personal_collection, id="fact2", text="John graduated from University of Washington in 2002")
    await memory.save_information(collection=personal_collection, id="fact3", text="John has two children named Alex and Sam")

await add_personal_info(memory)

# Search for information
questions = [
    "Where was John born?",
    "When did John graduate college?",
    "Does John have kids?"
]

for question in questions:
    print(f"Question: {question}")
    result = await memory.search(personal_collection, question)
    print(f"Answer: {result[0].text}\n")

In [11]:
# Your code goes here

# Create a new collection

# Add information to memory

# Search for information


## Kernel Setup

This code creates a new Kernel instance and adds both:

1. A chat completion service for generating responses
2. The embedding service we created earlier for vector operations

This configuration allows the kernel to generate text and work with vector embeddings in memory operations.

In [12]:
from semantic_kernel.kernel import Kernel
import os
from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import AzureChatCompletion
from dotenv import load_dotenv

load_dotenv('../.env', override=True)

kernel = Kernel()


deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")


chat_completion = AzureChatCompletion(
        endpoint=base_url,    
        deployment_name=deployment_name,
        api_key=api_key,
        service_id='chat',
    )

kernel.add_service(chat_completion)

# we also add the embedding service to the kernel
kernel.add_service(embedding_service)

In [13]:
from semantic_kernel.core_plugins.text_memory_plugin import TextMemoryPlugin
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
from semantic_kernel.memory.volatile_memory_store import VolatileMemoryStore

In [14]:
memory = SemanticTextMemory(storage=VolatileMemoryStore(), embeddings_generator=embedding_service)
kernel.add_plugin(TextMemoryPlugin(memory), "TextMemoryPlugin")

KernelPlugin(name='TextMemoryPlugin', description=None, functions={'recall': KernelFunctionFromMethod(metadata=KernelFunctionMetadata(name='recall', plugin_name='TextMemoryPlugin', description='Recall a fact from the long term memory', parameters=[KernelParameterMetadata(name='ask', description='The information to retrieve', default_value=None, type_='str', is_required=True, type_object=<class 'str'>, schema_data={'type': 'string', 'description': 'The information to retrieve'}, include_in_function_choices=True), KernelParameterMetadata(name='collection', description='The collection to search for information.', default_value='generic', type_='str', is_required=False, type_object=<class 'str'>, schema_data={'type': 'string', 'description': 'The collection to search for information.'}, include_in_function_choices=True), KernelParameterMetadata(name='relevance', description='The relevance score, from 0.0 to 1.0; 1.0 means perfect match', default_value=0.75, type_='float', is_required=False

Here we:
1. Create a `SemanticTextMemory` object with our in-memory store and embedding service
2. Add the `TextMemoryPlugin` to the kernel, which provides memory-related functions

The `TextMemoryPlugin` exposes memory operations to the kernel, allowing:
- Semantic search through the `recall` function
- Saving new information during conversations
- Integration of memory capabilities into AI responses


----

Now we set up a chat function that incorporates memory:

1. We define a prompt template that:
   - Gives the AI a role and instructions
   - Uses the `{{recall '...'}}` syntax to search memory for relevant information
   - Includes the user's request via `{{$request}}`

2. We create a kernel function from this template

The `{{recall 'query'}}` syntax tells Semantic Kernel to:
1. Search the memory for information relevant to the query
2. Insert the retrieved information into the prompt
3. Let the AI use this information in its response

This creates a chatbot that can reference previously stored financial information.

In [15]:
from semantic_kernel.functions import KernelFunction
from semantic_kernel.prompt_template import PromptTemplateConfig


async def setup_chat_with_memory(
    kernel: Kernel,
    service_id: str,
) -> KernelFunction:
    prompt = """
    ChatBot can have a conversation with you about any topic.
    It can give explicit instructions or say 'I don't know' if
    it does not have an answer.

    Information about me, from previous conversations:
    - {{recall 'budget by year'}} What is my budget for 2024?
    - {{recall 'savings from previous year'}} What are my savings from 2023?
    - {{recall 'investments'}} What are my investments?

    {{$request}}
    """.strip()

    prompt_template_config = PromptTemplateConfig(
        template=prompt,
        execution_settings={
            service_id: kernel.get_service(service_id).get_prompt_execution_settings_class()(service_id=service_id)
        },
    )

    return kernel.add_function(
        function_name="chat_with_memory",
        plugin_name="chat",
        prompt_template_config=prompt_template_config,
    )

In [16]:
print("Populating memory...")
await populate_memory(memory)

print("Asking questions... (manually)")
await search_memory_examples(memory)

print("Setting up a chat (with memory!)")
chat_func = await setup_chat_with_memory(kernel, 'chat')

print("Begin chatting (type 'exit' to exit):\n")
print(
    "Welcome to the chat bot!\
    \n  Type 'exit' to exit.\
    \n  Try asking a question about your finances (i.e. \"talk to me about my finances\")."
)


async def chat(user_input: str):
    print(f"User: {user_input}")
    answer = await kernel.invoke(chat_func, request=user_input)
    print(f"ChatBot:> {answer}")

Populating memory...
Asking questions... (manually)
Question: What is my budget for 2024?
Answer: Your budget for 2024 is $100,000

Question: What are my savings from 2023?
Answer: Your savings from 2023 are $50,000

Question: What are my investments?
Answer: Your investments are $80,000

Setting up a chat (with memory!)
Begin chatting (type 'exit' to exit):

Welcome to the chat bot!    
  Type 'exit' to exit.    
  Try asking a question about your finances (i.e. "talk to me about my finances").


In [17]:
await chat("What is my budget for 2024?")

User: What is my budget for 2024?
ChatBot:> Your budget for 2024 is $100,000.


In [24]:
await chat("talk to me about my finances")

User: talk to me about my finances
ChatBot:> Based on the information you've shared from previous conversations, here's a brief overview of your financial situation:

1. **Budget for 2024**: You have a budget of $100,000 set for the year. This indicates the amount you have planned to allocate or spend across various needs and goals.

2. **Savings from 2023**: You have accumulated savings of $50,000 from the previous year. This amount could be set aside for future financial security, emergencies, or specific goals.

3. **Investments**: You currently hold investments totaling $80,000. This might be invested in various assets like stocks, bonds, real estate, or other financial vehicles meant to grow over time.

Considering this information, here are a few things you might focus on or discuss:

- **Financial goals**: What specific goals do you have for your budget, savings, and investments? Are you planning any significant purchases, further investments, or building a stronger emergency fu

In [None]:
kernel.remove_all_services()


kernel.add_service(chat_completion)

# we also add the embedding service to the kernel
kernel.add_service(embedding_service)

## Retrieval-Augmented Generation (RAG) with Self-Critique

This section demonstrates a powerful pattern combining memory retrieval with response evaluation:

1. **RAG Prompt**: This prompt template:
   - Retrieves information from memory relevant to the user's question
   - Provides this context to the AI
   - Uses the context to generate an informed answer

2. **Self-Critique**: This second prompt evaluates the quality of RAG responses:
   - Takes the original question, retrieved context, and generated answer
   - Classifies the answer as "Grounded", "Ungrounded", or "Unclear"
   - Helps ensure responses are properly using retrieved information

This pattern creates more reliable AI responses by:
- Providing relevant facts from memory
- Checking if responses properly use this information
- Identifying when responses make claims beyond available information

In [27]:
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureTextEmbedding
from semantic_kernel.connectors.memory.azure_cognitive_search import AzureCognitiveSearchMemoryStore
from semantic_kernel.connectors.memory.azure_cognitive_search.azure_ai_search_settings import AzureAISearchSettings
from semantic_kernel.contents import ChatHistory
from semantic_kernel.core_plugins import TextMemoryPlugin
from semantic_kernel.memory import SemanticTextMemory




COLLECTION_NAME = "generic"


async def populate_memory(memory: SemanticTextMemory) -> None:
    # Add some documents to the ACS semantic memory
    await memory.save_information(COLLECTION_NAME, id="info1", text="My name is Andrea")
    await memory.save_information(COLLECTION_NAME, id="info2", text="I currently work as a tour guide")
    await memory.save_information(COLLECTION_NAME, id="info3", text="I've been living in Seattle since 2005")
    await memory.save_information(
        COLLECTION_NAME,
        id="info4",
        text="I visited France and Italy five times since 2015",
    )
    await memory.save_information(COLLECTION_NAME, id="info5", text="My family is from New York")

azure_ai_search_settings = AzureAISearchSettings.create(endpoint=os.getenv("AZURE_AI_SEARCH_ENDPOINT"), api_key=os.getenv("AZURE_AI_SEARCH_API_KEY"))
vector_size = 1536


acs_connector = AzureCognitiveSearchMemoryStore(
    vector_size=vector_size,
    search_endpoint=azure_ai_search_settings.endpoint,
    admin_key=azure_ai_search_settings.api_key,
)

memory = SemanticTextMemory(storage=acs_connector, embeddings_generator=embedding_service)
kernel.add_plugin(TextMemoryPlugin(memory), "TextMemoryPlugin")

print("Populating memory...")
await populate_memory(memory)

Populating memory...


In [28]:
"It can give explicit instructions or say 'I don't know' if it does not have an answer."

sk_prompt_rag = """
Assistant can have a conversation with you about any topic.

Here is some background information about the user that you should use to answer the question below:
{{ recall $user_input }}
User: {{$user_input}}
Assistant: """.strip()

user_input = "Do I live in Seattle?"
print(f"Question: {user_input}")
req_settings = kernel.get_prompt_execution_settings_from_service_id(service_id="chat")
chat_func = kernel.add_function(
    function_name="rag", plugin_name="RagPlugin", prompt=sk_prompt_rag, prompt_execution_settings=req_settings
)

chat_history = ChatHistory()
chat_history.add_user_message(user_input)

answer = await kernel.invoke(
    chat_func,
    user_input=user_input,
    chat_history=chat_history,
)
chat_history.add_assistant_message(str(answer))
print(f"Answer: {str(answer).strip()}")


Question: Do I live in Seattle?
Answer: Based on the information provided, yes, you have been living in Seattle since 2005.


In [None]:

sk_prompt_rag_sc = """
You will get a question, background information to be used with that question and a answer that was given.
You have to answer Grounded or Ungrounded or Unclear.
Grounded if the answer is based on the background information and clearly answers the question.
Ungrounded if the answer could be true but is not based on the background information.
Unclear if the answer does not answer the question at all.
Question: {{$rag_output}}
Background: {{ recall $rag_output }}
Answer: {{ $input }}
Remember, just answer Grounded or Ungrounded or Unclear: """.strip()



self_critique_func = kernel.add_function(
    function_name="self_critique_rag",
    plugin_name="RagPlugin",
    prompt=sk_prompt_rag_sc,
    prompt_execution_settings=req_settings,
)



print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(self_critique_func, rag_output=answer, input=answer, chat_history=chat_history)
print(f"The answer was {str(check).strip()}")

print("-" * 50)
print("   Let's pretend the answer was wrong...")
print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(
    self_critique_func, input=answer, rag_output="Yes, you live in New York City.", chat_history=chat_history
)
print(f"The answer was {str(check).strip()}")

print("-" * 50)
print("   Let's pretend the answer is not related...")
print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(
    self_critique_func, input="Yes, the earth is not flat.", rag_output=answer, chat_history=chat_history
)
print(f"The answer was {str(check).strip()}")



Answer: Based on the information provided, yes, you have been living in Seattle since 2005.
The answer was Grounded
--------------------------------------------------
   Let's pretend the answer was wrong...
Answer: Based on the information provided, yes, you have been living in Seattle since 2005.
The answer was Ungrounded
--------------------------------------------------
   Let's pretend the answer is not related...
Answer: Based on the information provided, yes, you have been living in Seattle since 2005.
The answer was Unclear


## Memory
In SK, **memory** refers to the storage and recall of information the AI has learned or been provided. There are two primary forms of memory:

### Semantic Memory (Long-term)
- This is usually an **external vector store** that holds **embeddings of text**, allowing the AI to store facts or documents and later retrieve them by **semantic similarity**.
- SK provides **Memory Connectors** to various vector databases (like **Azure Cognitive Search, Pinecone, Qdrant**, etc.) via a common interface.
- By using a memory store, you can implement the **retrieval** part of **RAG**: store chunks of knowledge and fetch relevant pieces at query time.
- We’ll see how to add and use such memory in our chatbot.

### Conversation History (Short-term Memory)
- SK also manages the **immediate dialogue context** with a **Chat History object** for multi-turn conversations.
- This ensures the AI remembers prior user queries and its own responses, maintaining context across turns.
- We will leverage this to keep the conversation coherent.


In [None]:
!pip install semantic-kernel

In [None]:
import os
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai.services.azure_text_embedding import AzureTextEmbedding
from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import AzureChatCompletion
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
# Updated import for memory store in SK 1.23
from semantic_kernel.memory import VolatileMemoryStore

# Assuming kernel is already set up with chat completion

# 1. Set up the embedding service
embedding_deployment_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-ada-002")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")

# Create the embedding service
embedding_service = AzureTextEmbedding(
    endpoint=base_url,
    deployment_name=embedding_deployment_name,
    api_key=api_key
)


deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")


chat_completion = AzureChatCompletion(
        endpoint=base_url,    
        deployment_name=deployment_name,
        api_key=api_key,
    )



# Add the embedding service to the kernel
kernel = sk.Kernel()
kernel.add_service(embedding_service)
kernel.add_service(chat_completion)

# 2. Create a memory store and semantic memory
# Updated to use the correct class from the new location
memory_store = VolatileMemoryStore()
memory = SemanticTextMemory(storage=memory_store, embeddings_generator=embedding_service)

# 3. Save information to memory
async def populate_memory():
    # Add some facts about programming languages
    await memory.save_information(
        collection="programming",
        id="python1",
        text="Python is a high-level, interpreted programming language known for its readability and versatility."
    )
    
    await memory.save_information(
        collection="programming",
        id="javascript1",
        text="JavaScript is a scripting language that enables interactive web pages and is an essential part of web applications."
    )
    
    await memory.save_information(
        collection="programming",
        id="csharp1",
        text="C# is a modern, object-oriented programming language developed by Microsoft for the .NET platform."
    )
    
    # Add some facts about AI and machine learning
    await memory.save_information(
        collection="ai",
        id="ml1",
        text="Machine learning is a subset of AI that enables systems to learn and improve from experience without being explicitly programmed."
    )
    
    await memory.save_information(
        collection="ai",
        id="nlp1",
        text="Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human language."
    )
    
    await memory.save_information(
        collection="ai",
        id="cv1",
        text="Computer Vision is an AI field that trains computers to interpret and understand visual information from the world."
    )
    
    # Add some facts about Semantic Kernel
    await memory.save_information(
        collection="semantic_kernel",
        id="sk1",
        text="Semantic Kernel is an open-source SDK that integrates Large Language Models (LLMs) with conventional programming languages."
    )
    
    await memory.save_information(
        collection="semantic_kernel",
        id="sk2",
        text="Semantic Kernel provides a flexible plugin system that allows developers to extend its functionality."
    )
    
    print("Memory populated with information.")

# 4. Query the memory
async def query_memory():
    # Search for information about Python
    results = await memory.search("programming", "Tell me about Python programming")
    print("Query: Tell me about Python programming")
    for result in results:
        print(f"Relevance: {result.relevance:.4f}, ID: {result.id}")
        print(f"Text: {result.text}\n")
    
    # Search for information about AI
    results = await memory.search("ai", "How does natural language processing work?")
    print("Query: How does natural language processing work?")
    for result in results:
        print(f"Relevance: {result.relevance:.4f}, ID: {result.id}")
        print(f"Text: {result.text}\n")
    
    # Search across all collections
    results = await memory.search("*", "What is Semantic Kernel?")
    print("Query: What is Semantic Kernel?")
    for result in results:
        print(f"Relevance: {result.relevance:.4f}, Collection: {result.collection}, ID: {result.id}")
        print(f"Text: {result.text}\n")

# 5. Build a simple Q&A system
async def memory_qa_system():
    # Create a semantic function for answering questions
    answer_prompt = """
    Answer the following question based on the provided context.
    
    Context:
    {{$context}}
    
    Question:
    {{$question}}
    
    Answer:
    """
    
    answer_fn = kernel.add_function(
        prompt=answer_prompt,
        function_name="answer_with_context",
        plugin_name="QA",
        max_tokens=150
    )
    
    async def ask_question(question, collection="*", limit=3):
        # Search for relevant information
        results = await memory.search(collection, question, limit=limit)
        
        if not results:
            return "I don't have enough information to answer that question."
        
        # Combine the retrieved information as context
        context = "\n".join([f"- {result.text}" for result in results])
        
        # Use the answer function to generate a response
        answer = await kernel.invoke(answer_fn, context=context, question=question)
        return str(answer)
    
    # Test the Q&A system with various questions
    questions = [
        "What is Python used for?",
        "Explain the difference between machine learning and computer vision.",
        "How does Semantic Kernel help developers work with AI?",
        "What programming language works best with web development?"
    ]
    
    for question in questions:
        print(f"Question: {question}")
        answer = await ask_question(question)
        print(f"Answer: {answer}\n")

# Run the demonstration
async def run_memory_demo():
    await populate_memory()
    await query_memory()
    print("\n--- Q&A System Demo ---\n")
    await memory_qa_system()

# Execute the demo
await run_memory_demo()

# To run this code, uncomment the line below and execute it in an async context
# Example usage:
# import asyncio
# asyncio.run(run_memory_demo())



**Semantic Memory** is crucial for building AI apps that retain knowledge or context over time. SK’s memory system allows you to store and retrieve information (text, embeddings, documents) so that the AI can use it later in its reasoning. This is key for scenarios such as:
- Recalling past conversations.
- Searching a knowledge base (Retrieval-Augmented Generation).
- Personalizing responses.

Key aspects of SK memory management:

- **Embeddings and Vector Stores**: Text is converted into numerical vectors (embeddings) and stored in a vector database (e.g., Azure Cognitive Search, Pinecone, Redis, etc.). This allows you to perform similarity searches at runtime.
- **Memory as a Plugin**: Your vector store can be exposed as a plugin function (e.g., `KnowledgePlugin.SearchDocuments(query)`), which the AI can call to retrieve relevant text.
- **Storing Memories**: You generate embeddings for text and upsert them into the store along with an identifier and metadata.
- **Retrieving Memories**: At query time, you search the memory store using the query’s embedding. The results (relevant text snippets) are then provided to the AI as context.
- **Short-term vs Long-term Memory**: SK manages short-term conversational context and long-term semantic memory separately, allowing efficient handling of both.
- **Memory Connectors Abstraction**: The updated SK memory interface supports advanced features like multiple embedding vectors per record, metadata filtering, and more.




This example demonstrates how the memory search pulls the exact fact from the store and the AI can then use that information to provide a factual answer.


**Example of Saving and Retrieving Memory**:

---

In summary, the **Semantic Kernel Intro** section covers the following key points:
- An overview of SK and its importance in bridging AI and application code.
- Core components like the Kernel, AI services, plugins, context, planner, and memory.
- How functions (semantic and native) are organized into plugins.
- The power of automatic function calling for multi-step AI reasoning.
- The role of filters in ensuring secure and validated execution.
- Detailed memory management for integrating external knowledge into AI workflows.

This concludes the Intro section in Markdown format with code blocks and dividers for easy copying and use.
