# CosmosDB as a single source

# Prereqs


In [149]:
#!import code/Setup.cs
#!import code/VectorModels.cs


# Vector Store Setup

## Database Configuration: Overview


### **Understand Index Type, Vector Data Type, and Distance Functions**

#### **`Vector Index Type`**

This option determines how vectors are indexed within Cosmos DB to optimize search performance.

- **`flat` Index Type**: Use for low-dimensional, exact searches on smaller datasets.
- **`quantizedFlat` Index Type**: Choose when you need to balance performance and storage with acceptable accuracy loss in high-dimensional data.
- **`diskANN` Index Type**: Opt for large-scale, high-dimensional datasets where approximate searches suffice, and speed is critical.

<details>
<summary>
Options
</summary>

- **`flat`**: Stores vectors alongside other indexed properties without additional indexing structures. Supports up to **505 dimensions**.

  **When to Use:**

  - **Low-dimensional data**: Ideal for applications with vectors up to 505 dimensions.
  - **Exact search requirements**: When you need precise search results.
  - **Small to medium datasets**: Efficient for datasets where the index size won't become a bottleneck.

    **Real-World Scenario:**

    - **Customer Segmentation**: A retail company uses customer feature vectors (age, income, purchase history) with dimensions well below 505 to segment customers. Exact matches are important for targeted marketing campaigns.

- **`quantizedFlat`**: Compresses (quantizes) vectors before indexing, improving performance at the cost of some accuracy. Supports up to **4096 dimensions**.

  **When to Use:**

  - **High-dimensional data with storage constraints**: Suitable for vectors up to 4096 dimensions where storage efficiency is important.
  - **Performance-critical applications**: When reduced latency and higher throughput are needed.
  - **Acceptable accuracy trade-off**: Minor losses in accuracy are acceptable for performance gains.

    **Real-World Scenario:**

    - **Mobile Image Recognition**: An app recognizes objects using high-dimensional image embeddings. Quantization reduces the storage footprint and improves search speed, crucial for mobile devices with limited resources.

- **`diskANN`**: Utilizes the DiskANN algorithm for approximate nearest neighbor searches, optimized for speed and efficiency. Supports up to **4096 dimensions**.

  **When to Use:**

  - **Large-scale, high-dimensional data**: Best for big datasets where quick approximate searches are acceptable.
  - **Real-time applications**: When fast response times are critical.
  - **Scalability needs**: Suitable for applications expected to grow significantly.

  **Real-World Scenario:**

  - **Semantic Search Engines**: A search engine indexes millions of documents using embeddings from language models like BERT (768 dimensions). DiskANN allows users to get fast search results by efficiently handling high-dimensional data.
</details>

---

#### **`Vector Data Type`**

Specifies the data type of the vector components.

- **`float32` Datatype**: Default choice for precision; use when storage is less of a concern.
- **`uint8` and `int8` Datatypes**: Use for storage efficiency, particularly when data can be quantized.

<details>
<summary>Options</summary>

- **`float32`** (default): 32-bit floating-point numbers.

  **When to Use:**

  - **High precision requirements**: Necessary when the application demands precise calculations.
  - **Standard ML embeddings**: Most machine learning models output float32 vectors.

  **Real-World Scenario:**

  - **Scientific Simulations**: In climate modeling, vectors represent complex data where precision is vital for accurate simulations and predictions.

- **`uint8`**: 8-bit unsigned integers.

  **When to Use:**

  - **Memory optimization**: Reduces storage needs when precision can be sacrificed.
  - **Quantized models**: When vectors are output from models that already quantize data.

  **Real-World Scenario:**

  - **Basic Image Features**: Storing color histograms for image retrieval systems, where each bin can be represented with an 8-bit integer.

- **`int8`**: 8-bit integer with potentially specialized encoding (interpretation may vary; assuming it's an 8-bit integer with logarithmic encoding).

  **When to Use:**

  - **Custom quantization schemes**: When using specialized compression techniques that map floating-point values to an 8-bit integer scale.
  - **Edge devices**: Ideal for applications on devices with extreme memory limitations.

  **Real-World Scenario:**

  - **Audio Fingerprinting**: Compressing audio feature vectors for song recognition apps where storage and quick retrieval are essential.
</details>

---
#### **`Dimension Size`**

The length of the vectors being indexed. Ranges from 0-4096, default is **1536**.
<details>
<summary>Options</summary>


**When to Consider Lower Dimensions (≤ 505):**

  - **Simpler models**: Applications using basic embeddings or feature vectors.
  - **Flat index type**: Required when using the `flat` index type due to its dimension limit.

  *Real-World Scenario:*

  - **Keyword Matching**: Using low-dimensional TF-IDF vectors for document similarity in a content management system.

  **When to Consider Higher Dimensions (506 - 4096):**

  - **Complex models**: Deep learning applications with high-dimensional embeddings.
  - **Advanced search features**: When richer representations of data are necessary for accuracy.

  *Real-World Scenario:*

  - **Face Recognition**: Using high-dimensional embeddings (e.g., 2048 dimensions) to represent facial features for security systems.
</details>

---

#### **`Distance Function`**

Determines how similarity between vectors is calculated. Select based on the nature of similarity in your application—`cosine` for orientation, `dot product` when magnitude matters, and `euclidean` for spatial relevance.

<details>
<summary>Options</summary>

- **`cosine`**: Measures the cosine of the angle between vectors.

  **When to Use:**

  - **Orientation-focused similarity**: When the magnitude is less important than the direction.
  - **Normalized data**: Ideal when vectors are normalized to unit length.

  **Real-World Scenario:**

  - **Document Similarity**: In text analytics, comparing documents based on topic similarity where word counts are normalized.

- **`dot product`**: Computes the scalar product of two vectors.

  **When to Use:**

  - **Magnitude matters**: When both direction and magnitude are significant.
  - **Machine learning models**: Often used in recommendation systems where strength of preferences is important.

  **Real-World Scenario:**

  - **Personalized Recommendations**: Matching users to products by calculating the dot product of user and item embeddings in a collaborative filtering system.

- **`euclidean`**: Calculates the straight-line distance between vectors.

  **When to Use:**

  - **Spatial distance relevance**: When physical distance correlates with similarity.
  - **High-dimensional data**: Suitable for embeddings where both magnitude and direction impact similarity.

  **Real-World Scenario:**

  - **Anomaly Detection**: Identifying outliers in network traffic patterns by measuring Euclidean distances in feature space.

---

### **Option Combinations and Preferred Use-Cases**



#### **Combination 1: Low-Dimensional, Exact Searches**

- **`vectorIndexType`**: `flat`
- **`datatype`**: `float32`
- **`dimensions`**: ≤ 505
- **`distanceFunction`**: `cosine`

**Real-World Scenario:**

- **Small-Scale Text Classification**: A startup builds a news categorization tool using word embeddings (300 dimensions). Exact cosine similarity searches ensure accurate article tagging without the overhead of approximate methods.

---

#### **Combination 2: High-Dimensional, Performance-Critical Applications**

- **`vectorIndexType`**: `diskANN`
- **`datatype`**: `float32`
- **`dimensions`**: 768 - 1536
- **`distanceFunction`**: `cosine` or `dot product`

**Real-World Scenario:**

- **Real-Time Recommendations**: A streaming service uses user and content embeddings (1024 dimensions) to provide instantaneous movie recommendations. DiskANN accelerates search times, offering a smooth user experience despite the large dataset.

---

#### **Combination 3: Storage-Efficient High-Dimensional Data**

- **`vectorIndexType`**: `quantizedFlat`
- **`datatype`**: `uint8` or `int8`
- **`dimensions`**: 2048
- **`distanceFunction`**: `cosine`

**Real-World Scenario:**

- **Mobile Visual Search**: An app allows users to search for products by uploading photos. High-dimensional image embeddings are quantized to fit the storage constraints of mobile devices, and approximate searches provide quick results.

---

#### **Combination 4: Precision-Critical Scientific Computing**

- **`vectorIndexType`**: `flat`
- **`datatype`**: `float32`
- **`dimensions`**: 4096
- **`distanceFunction`**: `euclidean`

**Real-World Scenario:**

- **Genomic Data Analysis**: Researchers analyze genetic sequences represented as high-dimensional vectors. Precise Euclidean distance calculations are essential for identifying genetic similarities and mutations.

---

#### **Combination 5: Medium-Dimensional Data with Storage Constraints**

- **`vectorIndexType`**: `quantizedFlat`
- **`datatype`**: `uint8`
- **`dimensions`**: 500
- **`distanceFunction`**: `dot product`

**Real-World Scenario:**

- **IoT Sensor Data**: A network of sensors generates medium-dimensional vectors representing environmental data. Quantization reduces storage and transmission costs, and dot product calculations help in identifying patterns and anomalies efficiently.

## Implementation using Financial Datasets

### Setup Containers


1. **`CompanyData`**

    - **Data Types**: `BasicCompanyInfo`, `CompanyOfficer`
    - **Partition Key**: `/Cik`
    - **Id**: `/Cik` to ensure there is only 1 basic information document per company
    - **Vector Paths**: ``
    - **Notes**:
        - **Optimized for Company Queries**: Facilitates queries and reports scoped to specific companies.
        - **Rationale**: Embedding reduces the need for cross-partition queries and improves read performance when retrieving company information along with its officers.


2. **`FinancialFilings`**

    - **Data Types**: `Form10KSection`, `Form13D`
    - **Partition Key**: `/Cik`
    - **Id**: `
    - **Indexing**:
        - **Enable Vector Indexing**: For `Form10KSection` embeddings.
    - **Notes**:
        - **Efficient Semantic Search**: Supports AI-driven searches over financial filings.

3. **`MarketData`**

    - **Data Types**: `DailyMarketData`
    - **Partition Key**: `/Symbol`
    - **Notes**:

        - **High Write Throughput**: Allocate sufficient RU/s to handle frequent updates.
4. **`Holdings`**

    - **Data Types**: `Form13FHolding`
    - **Partition Key**: `/Cusip`
    - **Alternate Partition Key**: `/ManagerName` if queries are more often by manager.
    - **Notes**:

        - **Facilitates Cross-Company Queries**: Efficiently retrieve holdings data for reports.
5. **`NewsArticles`**

    - **Data Types**: `NewsArticle`
    - **Partition Key**: `/PublishDate` (e.g., formatted as `yyyy-MM` for monthly partitions)
    - **Indexing**:

        - **Enable Vector Indexing**: For `ArticleText` embeddings.
    - **Notes**:

        - **Time-Based Partitioning**: Improves performance for time-bound queries.


### Classes
- Every document that will have vector search has only 1 embedding field


In [150]:
#pragma warning disable SKEXP0001,SKEXP0020

public class ChatService
{
    private readonly VectorCollection<ChatThread> _chatThreadCollection;
    private readonly VectorCollection<ChatThreadMessage> _chatThreadMessageCollection;
    private readonly VectorCollection<CacheItem> _cacheCollection;
    private readonly Tokenizer _tokenizer;
    private readonly ITextEmbeddingGenerationService _embeddingService;
    private readonly IChatCompletionService _chatCompletionService;
    private readonly Kernel _semanticKernel;

    private const string SystemMessage = """

    """;
    public ChatService(Kernel semanticKernel)
    {
        _semanticKernel = semanticKernel;
        var vectorStore = _semanticKernel.GetRequiredService<VectorStore>();
        _chatThreadCollection = vectorStore.GetContainer<ChatThread>();
        _chatThreadMessageCollection = vectorStore.GetContainer<ChatThreadMessage>();
        _cacheCollection = vectorStore.GetContainer<CacheItem>();
        _tokenizer = _semanticKernel.GetRequiredService<Tokenizer>();
        _chatCompletionService = _semanticKernel.GetRequiredService<IChatCompletionService>();
        _embeddingService = _semanticKernel.GetRequiredService<ITextEmbeddingGenerationService>();
    }

    public async IAsyncEnumerable<ChatThreadMessage> GetChatCompletionAsync(ChatThread thread, string userQuery, [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {   
        var queryMessage = new ChatThreadMessage(
            Id: Guid.NewGuid().ToString(),
            UserId: thread.UserId,
            ThreadId: thread.ThreadId,
            MessageContent: new ChatMessageContent(
                content: userQuery,
                role: AuthorRole.User
            )
        )
        {
            Tokens = _tokenizer.CountTokens(userQuery)
        };
        thread.Messages.Append(queryMessage);
        await _chatThreadMessageCollection.UpsertAsync(queryMessage, cancellationToken: cancellationToken);

        // Generate embedding for the user query and check the cache
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(userQuery, cancellationToken: cancellationToken);
        var cacheItems = await _cacheCollection.GetNearestMatchesAsync(
            embedding: queryEmbedding,
            fields: new string[] { "id", "prompts", "completion" },
            limit: 1,
            minRelevanceScore: 0.00,
            cancellationToken: cancellationToken
        ).ToListAsync(cancellationToken);

        if(cacheItems.Any())
        {
            var (cacheItem, relevanceScore) = cacheItems.First();
            cacheItem.RegisterHit();
            await _cacheCollection.UpsertAsync(cacheItem, cancellationToken: cancellationToken);
            var response = new ChatThreadMessage(
                Id: Guid.NewGuid().ToString(),
                UserId: thread.UserId,
                ThreadId: thread.ThreadId,
                MessageContent: new ChatMessageContent(
                    content: cacheItem.Completion,
                    role: AuthorRole.Assistant
                )
            ) {
              CacheHit = true,
              Tokens = cacheItem.Tokens       
            };
            thread.Messages.Append(response);
            await _chatThreadMessageCollection.UpsertAsync(response, cancellationToken: cancellationToken);
            yield return response;
        }
        else
        {
            // Cache miss: perform streaming chat completion
            // Initialize a StringBuilder to accumulate the response
            var contextWindow = thread.GetContextWindow();
            var responseBuilder = new StringBuilder();
            // Create a new ChatThreadMessage to hold the streamed content
            var streamingResponse = new ChatThreadMessage(
                Id: Guid.NewGuid().ToString(),
                UserId: thread.UserId,
                ThreadId: thread.ThreadId,
                MessageContent: new ChatMessageContent(
                    content: string.Empty, // Will be filled incrementally
                    role: AuthorRole.Assistant
                )
            )
            {
                CacheHit = false,
                FinishedStream = false
            };

            // Save the initial streaming message
            await _chatThreadMessageCollection.UpsertAsync(streamingResponse, cancellationToken: cancellationToken);

            // Stream the response from the chat completion service
            await foreach (var partialContent in GetChatCompletionStreamingAsync(thread, userQuery, streamingResponse, responseBuilder, cancellationToken))
            {
                // Yield each partial message as it's received
                yield return streamingResponse;
            }

            var finalResponseString = responseBuilder.ToString();

            // After streaming completes, set the final content and persist
            var finalResponse = new ChatThreadMessage(
                Id: streamingResponse.Id,
                UserId: streamingResponse.UserId,
                ThreadId: streamingResponse.ThreadId,
                MessageContent: new ChatMessageContent(
                    content: finalResponseString,
                    role: AuthorRole.Assistant
                )
            )
            {
                CacheHit = false,
                Tokens = _tokenizer.CountTokens(finalResponseString),
                FinishedStream = true
            };
            // Update the streaming message with the final content
            await _chatThreadMessageCollection.UpsertAsync(finalResponse, cancellationToken: cancellationToken);

            
            // Optionally, add to cache
            await AddToCacheAsync(contextWindow, finalResponse.MessageContent.Content, cancellationToken);
            
            // Yield the final complete message
            yield return finalResponse;
        }
    }

    /// <summary>
    /// Streams chat completion content from the chat completion service.
    /// Updates the streaming response message incrementally.
    /// </summary>
    private async IAsyncEnumerable<string> GetChatCompletionStreamingAsync(
        ChatThread thread,
        string userQuery,
        ChatThreadMessage streamingResponse,
        StringBuilder responseBuilder,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        // Call the chat completion service to get streaming content
        await foreach (var partialContent in _chatCompletionService.GetStreamingChatMessageContentsAsync(
            chatHistory: thread,
            kernel: this._semanticKernel,
            cancellationToken: cancellationToken))
        {
            // Append the partial content to the response builder
            responseBuilder.Append(partialContent.Content);

            // Update the streaming response's content
            streamingResponse.MessageContent.InnerContent = responseBuilder.ToString();

            // Optionally, update token count
            streamingResponse.Tokens = _tokenizer.CountTokens(responseBuilder.ToString());

            // Optionally, persist the updated streaming response while streaming
            // await _chatThreadMessageCollection.UpsertAsync(streamingResponse, cancellationToken: cancellationToken);

            // Yield the updated streaming response
            yield return partialContent.Content;
        }
    }

    public async Task<ChatThread> CreateChatThreadAsync(string userId, string displayName, CancellationToken cancellationToken = default)
    {
        var thread = new ChatThread(
            ThreadId: Guid.NewGuid().ToString(),
            UserId: userId,
            DisplayName: displayName
        );
        await _chatThreadCollection.UpsertAsync(thread, cancellationToken: cancellationToken);
        return thread;
    }

    public async Task<ChatThread> GetChatThreadAsync(ChatThread thread, int maxMessages = 50, CancellationToken cancellationToken = default)
    {
        var foundThread = await _chatThreadCollection.GetAsync(thread, cancellationToken);
        if (thread == null)
            throw new KeyNotFoundException($"ChatThread with ID {thread.Id} not found.");
            
        var messages = GetRecentMessagesForThreadAsync(foundThread, maxMessages, cancellationToken);
        foundThread.Messages = await messages.ToListAsync(cancellationToken);
        return thread;
    }

    public async Task<List<ChatThread>> GetChatThreadsForUser(string userId, int maxResults = 10, CancellationToken cancellationToken = default)
    {   
        List<ChatThread> results = new();
        var items = _chatThreadCollection.FindItems(
          predicate: (x) => x.UserId == userId && x.Type == "ChatThread", 
          select: (s) => new { s.Id, s.UserId, s.DisplayName, s.Type }, 
          maxResults: maxResults, 
          cancellationToken: cancellationToken);

        await foreach (var item in items)
        {
            results.Add(item);
        }
        return results;
    } 

    /// <summary>
    /// Retrieves recent messages from the active chat thread within the token limit.
    /// </summary>
    private async IAsyncEnumerable<ChatThreadMessage> GetRecentMessagesForThreadAsync(ChatThread thread, int maxItems = 50, [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        List<ChatThreadMessage> messages = new();

        var items = _chatThreadMessageCollection.FindItems(
            predicate: (x) => x.UserId == thread.UserId && x.ThreadId == thread.ThreadId && x.Type == "ChatThreadMessage",
            select: (s) => new { s.Id, s.ThreadId, s.MessageContent, s.Tokens, s.CreatedAt, s.Type },
            maxResults: maxItems,
            cancellationToken: cancellationToken
        );

        await foreach (var item in items)
        {
            yield return item;
        }
    }
    /// <summary>
    /// Adds a new cache item to the cache collection.
    /// </summary>
    private async Task AddToCacheAsync(string contextWindow, string completion, CancellationToken cancellationToken = default)
    {
        // Generate embedding for the contextWindow
        var embedding = await _embeddingService.GenerateEmbeddingAsync(contextWindow, cancellationToken: cancellationToken);
        var cacheItem = new CacheItem(
            Id: Guid.NewGuid().ToString(),
            Prompts: contextWindow,
            Completion: completion,
            Embedding: embedding
        );

        await _cacheCollection.UpsertAsync(cacheItem, updateEmbeddingFields: false, cancellationToken);
    }
}


In [None]:
#pragma warning disable SKEXP0001,SKEXP0020

public class CompanyFinancialsAgent
{
    public const string AgentName = "CompanyFinancialsAgent";

    private readonly VectorCollection<CompanyInfo> _companyInfoCollection;
    
    public CompanyFinancialsAgent(Kernel semanticKernel)
    {
        var vectorStore = semanticKernel.GetRequiredService<VectorStore>();
        _companyInfoCollection = vectorStore.GetContainer<CompanyInfo>();
    }


}

var database = cosmosNoSqlService.databaseClient;
skBuilder.Services.AddTransient(sp => {
  var database = cosmosNoSqlService.databaseClient;
  var tokenizer = sp.GetRequiredService<Tokenizer>();
  var textEmbeddingService = sp.GetRequiredService<ITextEmbeddingGenerationService>();
  return new VectorStore(database, tokenizer, textEmbeddingService);
});
skBuilder.Services.AddKeyedSingleton<ChatCompletionAgent>(CompanyFinancialsAgent.AgentName, (sp,key) => new ChatCompletionAgent(){
  Instructions = "",
  Name = "",
  Kernel = sp.GetRequiredService<Kernel>().Clone(),
});
var skInstance = skBuilder.Build();
var ragContextBuilder = new RagContextBuilder(skInstance);
// var vectorStore = skInstance.GetRequiredService<VectorStore>();
// await vectorStore.CreateContainers(); 

# Semantic Search on Cosmos DB

# Structured Database Copilot
NL2SQL - Database query generation

### Considerations
- Usually good at building most of the database query, however it needs prompt tuning or native functions to improve the where clause.

**Example User Stories**
- I have application monitoring or metric data that I want to derive insights from. 
- I want to chat over the entire corpus of Service Now or other ICM support ticket information