## AI Search Experimentation.
Experimenting with different search index for the best results 

In [6]:
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient
from azure.search.documents.models import (
    QueryAnswerType,
    QueryCaptionType,
    QueryType,
    VectorizedQuery   
)
from dotenv import dotenv_values
from langchain_community.embeddings import HuggingFaceEmbeddings

In [5]:
#import cosmos db credentials 
config = dotenv_values('credential.env')
ai_search_location = config['ai_search_location']
ai_search_key = config['ai_search_key']
ai_search_url = config['ai_search_url']
ai_search_index = 'oewg-speech-meeeting-index'
ai_search_name = 'oewg-meeting'
embedding_length = 768
cog_search_cred = AzureKeyCredential(ai_search_key)

In [7]:
#convert data to vector embeddings
def generate_embeddings(text):
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
    embeddings = embedding_model.embed_query(text)
    return embeddings

In [13]:
search_word= 'Russia'

In [8]:
question = generate_embeddings('Russia')

  from .autonotebook import tqdm as notebook_tqdm


In [11]:
#vector search
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key))  
vector_query = VectorizedQuery(vector=question, 
                               k_nearest_neighbors=3, 
                               fields="SpeakerEmbeddings")

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=5
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Captions: {result['@search.captions']}")  
    print(f"Highlights: {result['@search.highlights']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n")  
    print("###############################") 

Score: 1.0000001
Captions: None
Highlights: None
Text: space for undertaking military activity With a view to ensuring their dominance And prevalence superiority. There is a need as an intermediary measure to affirm the predominance Of the existing and enforce international legal norms and principles Regulating activities in space. First and foremost, the 1967 Space Treaty And the resolution of the first UN General Assembly Special Session on Disarmament of 1978 Ssot 1As well as a series of other documents which we have made reference to yesterday morning. At

Country: Russia

Meeting: 5

Session: 3

###############################
Score: 1.0000001
Captions: None
Highlights: None
Text: to provide its recommendations to the UNGA. Moreover, within that Commission Next year, for example, there is a plan for the review of the UN Secretary General's report. Which is in the near future to be prepared, pursuant to resolution the Union can resolution On transparency and confidence building mea

In [16]:
#hybrid search - vector field : Text embeddings
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key))  
vector_query = VectorizedQuery(vector=question, 
                               k_nearest_neighbors=3, 
                               fields="TextEmbeddings")

results = search_client.search(  
    search_text=search_word,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=3
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n") 
    print("###############################")

Score: 0.027893736958503723
Text: What's but esteemed chairperson the Russian Federation considers Outer space as an exclusively peaceful environment, Space vehicles Execute numerous and very varied functions. This includes supporting communications Research into the surface of the Earth And the Near Earth Space orbit and much more Satellites in orbit And also space systems and equipment. By their very nature, can be deployed. Both for peaceful and military purposes. Therefore, any assessment of dual use potential Should be conducted on the

Country: Russia

Meeting: 5

Session: 3

###############################
Score: 0.01666666753590107
Text: countries, including those identified as potential drop zones of re-entering debris from the launch that pose a potential risk of injury to people or damage or destruction to property. Thank you, Mr. Chair.

Country: Philippines

Meeting: 5

Session: 3

###############################
Score: 0.01666666753590107
Text: Brazil therefore believes t

In [15]:
#hybrid search and vector field - speaker embeddings
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key))  
vector_query = VectorizedQuery(vector=question, 
                               k_nearest_neighbors=3, 
                               fields="SpeakerEmbeddings")

results = search_client.search(  
    search_text=search_word,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=3
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n") 
    print("###############################")

Score: 0.028861789032816887
Text: the framework of this working group's work on various different issues. It never brings anything positive, it simply leads to possible discrimination by certain Member States Against other Member States. We consider this Unnecessary We consider this to be counterproductive from the standpoint of international law and ultimately. The aim of ensuring space security and the security of space activities, therefore We call Upon colleagues to strictly base themselves on legally binding international norms and

Country: Russia

Meeting: 5

Session: 3

###############################
Score: 0.0265975221991539
Text: as I previously said, intentions can be interpreted in very different ways and they may look initially very different. They may intentions may differ greatly from actual actions in the real world. In developing various voluntary norms in the form of transparency and confidence building measures, which Create restrictions for the military use of elem

In [None]:
#vector search
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key))  
vector_query = VectorizedQuery(vector=question, 
                               k_nearest_neighbors=3, 
                               fields="SpeakerEmbeddings")

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=5
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Captions: {result['@search.captions']}")  
    print(f"Highlights: {result['@search.highlights']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n")  
    print("###############################") 

In [17]:
#Exhaustive KNN Search and speaker embeddings
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key)) 
vector_query = VectorizedQuery(vector=question, k_nearest_neighbors=3, 
                               fields="SpeakerEmbeddings",exhaustive=True)

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=3
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Captions: {result['@search.captions']}")  
    print(f"Highlights: {result['@search.highlights']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n")  
    print("###############################") 

Score: 1.0
Captions: None
Highlights: None
Text: Mr. Chairman. We would like to take the floor on both sub items on the topic. Two subtopic A&B, In conjunction in this connection. We would like to underscore.  That outer space capabilities, also called anti space capabilities are one of the key categories of systems. Which require very serious consideration and regulation as part of the work to prevent an arms race in outer space. It is our view that discussion of norms, rules, and principles, Concerning such systems. Require consideration of the aspects

Country: Russia

Meeting: 5

Session: 3

###############################
Score: 1.0
Captions: None
Highlights: None
Text: It is our view that discussion of norms, rules, and principles, Concerning such systems. Require consideration of the aspects of the functioning of those systems that are both Earth and space based. As we have already noted in light of the growing threat of weaponization of space in connection with the policy of a 

In [18]:
#Exhaustive KNN Search and text embeddings
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key)) 
vector_query = VectorizedQuery(vector=question, k_nearest_neighbors=3, 
                               fields="TextEmbeddings",exhaustive=True)

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    top=3
)  
  
for result in results:  
    print(f"Score: {result['@search.score']}")  
    print(f"Captions: {result['@search.captions']}")  
    print(f"Highlights: {result['@search.highlights']}")  
    print(f"Text: {result['Text']}\n")  
    print(f"Country: {result['Speaker']}\n")  
    print(f"Meeting: {result['Meeting']}\n")  
    print(f"Session: {result['Session']}\n")  
    print("###############################") 

Score: 0.62874544
Captions: None
Highlights: None
Text: countries, including those identified as potential drop zones of re-entering debris from the launch that pose a potential risk of injury to people or damage or destruction to property. Thank you, Mr. Chair.

Country: Philippines

Meeting: 5

Session: 3

###############################
Score: 0.6224268
Captions: None
Highlights: None
Text: I thank the distinguished representative of of the Russian Federation. And now I would like to give the photo, the distinguished representative of Iran. You have the first, Sir.

Country: Chairman

Meeting: 5

Session: 3

###############################
Score: 0.6182716
Captions: None
Highlights: None
Text: What's but esteemed chairperson the Russian Federation considers Outer space as an exclusively peaceful environment, Space vehicles Execute numerous and very varied functions. This includes supporting communications Research into the surface of the Earth And the Near Earth Space orbit and much

In [19]:
#semantic search and Text embeddings
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key))   
vector_query = VectorizedQuery(vector=question, 
                               k_nearest_neighbors=3, 
                               fields="TextEmbedding")

results = search_client.search(  
    search_text=search_word,  
    vector_queries=[vector_query],
    select=["Speaker", "Text","Meeting", "Session"],
    query_type=QueryType.SEMANTIC, 
    semantic_configuration_name='my-semantic-config', 
    query_caption=QueryCaptionType.EXTRACTIVE, 
    query_answer=QueryAnswerType.EXTRACTIVE,
    top=3
)  

In [None]:
#not available for free version
semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

In [None]:
#not available for free version
for result in results:
    print(f"Reranker Score: {result['@search.reranker_score']}")
    print(f"Content: {result['Text']}\n")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")