# Semantic Search Quick Start

This interactive notebook will introduce you to some basic operations with **Elasticsearch**, using the official `Elastic.Clients.Elasticsearch` .NET  client. You'll perform semantic search using Sentence Transformers for text embedding. Learn how to integrate traditional text-based search with semantic search, for a hybrid search system.

## Create Elastic Deployment

I've notebook to run elastic using *Testcontainers*, navigate [src/_infra/setup-elastic-infrastructure.ipynb](../_infra/setup-elastic-infrastructure.ipynb) to run Elasticsearch container locally.

### Install packages and import modules

In [None]:
#r "nuget: Elastic.Clients.Elasticsearch, 8.15.10"
#r "nuget: System.Net.Http.Json, 8.0.1"

#!import ./Utils.cs
#!import ../_infra/get-connection-string.ipynb

## Initialize the Elasticsearch client

Now, we need to initialize the Elasticsearch client. We will use the [Elasticsearch client for .NET](https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html) to connect to Elasticsearch.

In [None]:
using Elastic.Transport;
using Elastic.Clients.Elasticsearch;
using Elastic.Transport.Products.Elasticsearch;

var elasticSettings = new ElasticsearchClientSettings(connectionString)
    .DisableDirectStreaming()
    .ServerCertificateValidationCallback(CertificateValidations.AllowAll);

var client = new ElasticsearchClient(elasticSettings);

## Test the Client

Before you continue, confirm that the client has connected with this test.


In [None]:
var info = await client.InfoAsync();

DumpResponse(info);

## Setup the Embedding Model


In [None]:
#r "nuget: Microsoft.Extensions.AI.OpenAI, 9.0.0-preview.*"
#r "nuget: Azure.AI.OpenAI, 2.0.0"

In [None]:
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;

AzureOpenAIClient aiClient = new AzureOpenAIClient(
    new Uri(envs["AZURE_OPENAI_ENDPOINT"]),
    new System.ClientModel.ApiKeyCredential(envs["AZURE_OPENAI_APIKEY"]));

IEmbeddingGenerator<string,Embedding<float>> generator = aiClient
    .AsEmbeddingGenerator(modelId: "text-embedding-3-small");

var textEmeddingDimension = 384;

## Index some test data
Our client is set up and connected to our Elastic deployment. Now we need some data to test out the basics of Elasticsearch queries. We'll use a small index of books with the following fields:

In [None]:
using System.Text.Json.Serialization;

public class Book
{
    [JsonPropertyName("title")]
    public string Title { get; set; }

    [JsonPropertyName("summary")]
    public string Summary { get; set; }

    [JsonPropertyName("authors")]
    public List<string> Authors { get; set; }

    [JsonPropertyName("publish_date")]
    public DateTime publish_date { get; set; }

    [JsonPropertyName("num_reviews")]
    public int num_reviews { get; set; }

    [JsonPropertyName("publisher")]
    public string Publisher { get; set; }


    public float[] TitleVector { get; set; }
}

## Create an index

First ensure that you do not have a previously created index with the name `book_index`.

In [None]:
var deleteIndexResponse = await client.Indices.DeleteAsync("book_index");

Dump(deleteIndexResponse);

Let's create an Elasticsearch index with the correct mappings for our test data.

In [None]:
using Elastic.Clients.Elasticsearch;
using Elastic.Clients.Elasticsearch.IndexManagement;
using Elastic.Clients.Elasticsearch.Mapping;

var indexResponse = await client.Indices.CreateAsync<Book>("book_index", d =>
    d.Mappings(m => m
        .Properties(pp => pp
            .Text(p => p.Title)
            .DenseVector(Infer.Property<Book>(p => p.TitleVector),
                d => d
                    .Dims(textEmeddingDimension)
                    .Index(true)
                    .Similarity(DenseVectorSimilarity.Cosine))
            .Text(p => p.Summary)
            .Date(p => p.publish_date)
            .IntegerNumber(p => p.num_reviews)
            .Keyword(p => p.Publisher)
            .Keyword(p => p.Authors)
        )
    ));

DumpRequest(indexResponse);

## Index test data

Run the following command to upload some test data, containing information about 10 popular programming books from this dataset.

In [None]:
using System.Net.Http;
using System.Net.Http.Json;

var http = new HttpClient();
var url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/data.json";
var books =  await http.GetFromJsonAsync<Book[]>(url);

// books.DisplayTable();

In [None]:
async Task<float[]> ToEmbedding(string text) {
    GeneratedEmbeddings<Embedding<float>> embeddings = await generator
        .GenerateAsync(text, new EmbeddingGenerationOptions{
            AdditionalProperties = new AdditionalPropertiesDictionary{
                {"dimensions", textEmeddingDimension}
            }
        });

    return embeddings.First().Vector.ToArray();
}

var embedding = await ToEmbedding("The quick brown fox jumps over the lazy dog");
display($"Dimensions length = {embedding.Length}");


`ToEmbedding` will encode the text into a vector on the fly, using the model we initialized earlier.

In [None]:
foreach(var book in books)
{
    book.TitleVector = await ToEmbedding(book.Title);
}

Now we can use Bulk API to upload data to Elasticsearch.

In [None]:
var bulkResponse = await client.BulkAsync("book_index", d => d
    .IndexMany<Book>(books, (bd, b) => bd.Index("book_index"))
);

bulkResponse.Display();

## Making queries

Let's use the keyword search to see if we have relevant data indexed.

In [None]:
var searchResponse = await client.SearchAsync<Book>(s => s
    .Index("book_index")
    .Query(q => q.Match(m => m.Field(f => f.Title).Query("JavaScript")))
);

DumpRequest(searchResponse);
searchResponse.Documents.Select(x => x.Title).DisplayTable();

Now that we have indexed the books, we want to perform a semantic search for books that are similar to a given query. We embed the query and perform a search. 

In [None]:
var searchQuery = "javascript books";
var queryEmbedding = await ToEmbedding(searchQuery);
var searchResponse = await client.SearchAsync<Book>(s => s
    .Index("book_index")
    .Knn(d => d
        .Field(f => f.TitleVector)
        .QueryVector(queryEmbedding)
        .k(5)
        .NumCandidates(100))
);

var threshold = 0.7;
searchResponse.Hits
    .Where(x => x.Score > threshold)
    .Select(x => new { x.Source.Title, x.Score })
    .DisplayTable();

## Filtering

Filter context is mostly used for filtering structured data. For example, use filter context to answer questions like:

* Does this timestamp fall into the range 2015 to 2016?
* Is the status field set to "published"?

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in a bool query.

[Learn more](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context) about filter context in the Elasticsearch docs.

### Example: Keyword Filtering
This is an example of adding a keyword filter to the query.

The example retrieves the top books that are similar to "javascript books" based on their title vectors, and also Addison-Wesley as publisher.

In [None]:
var searchQuery = "javascript books";
var queryEmbedding = await ToEmbedding(searchQuery);
var searchResponse = await client.SearchAsync<Book>(s => s
    .Index("book_index")
    .Knn(d => d
        .Field(f => f.TitleVector)
        .QueryVector(queryEmbedding)
        .k(5)
        .NumCandidates(100)
        .Filter(f => f.Term(t => t.Field(p => p.Publisher).Value("addison-wesley"))) 
    )
);

searchResponse.Hits
    .Select(x => new { x.Source.Title, x.Score })
    .DisplayTable(); 