## This notebook retrieves data from the database

In [45]:
#import libraries
import os
import re
from langchain_community.embeddings import HuggingFaceEmbeddings
import json
import pandas as pd
from openai import OpenAI
from dotenv import dotenv_values
import json 
import requests
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient
from azure.search.documents.models import (
    QueryAnswerType,
    QueryCaptionType,
    QueryType,
    VectorizedQuery   
)


In [56]:
#import cosmos db credentials 
config = dotenv_values('credential.env')
ai_search_location = config['ai_search_location']
ai_search_key = config['ai_search_key']
ai_search_url = config['ai_search_url']
ai_search_index = 'oewg-speech-meeeting-index'
ai_search_name = 'oewg-meeting'

openai_key = config['openai_api_key']
openai_deployment_name = "gpt-4"
openai_url = config['open_ai_endpoint']
search_client = SearchClient(ai_search_url, ai_search_index, AzureKeyCredential(ai_search_key)) 

In [21]:
#convert data to vector embeddings
def generate_embeddings(text):
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
    embeddings = embedding_model.embed_query(text)
    return embeddings

In [49]:
question = 'what is iran stand?'
question_embedding = generate_embeddings(question)

In [81]:
def retrieve_vector_top_chunks(k, question_embedding):
    """Retrieve the top K entries from Azure AI Search using vector search with speaker embedding.""" 
    vector_query = VectorizedQuery(vector=question_embedding, 
                                k_nearest_neighbors=k, 
                                fields="SpeakerEmbeddings")

    results = search_client.search(  
        search_text=None,  
        vector_queries=[vector_query],
        select=["Speaker", "Text","Meeting", "Session"],
        top=k
    )  
    output = [[f'Session: {result["Session"]}',f'Session: {result["Meeting"]}',f'{result["Speaker"]}: {result["Text"]}'] for result in results]

    return output



In [82]:
results = retrieve_vector_top_chunks(3, question_embedding)
for result in results:
    print(result)

['Session: 3', 'Session: 5', 'Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dual use capability so far, in particular supporting. His view regarding, That to what extent? This dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of']
['Session: 3', 'Session: 5', 'Iran: of the dual use capabilities. So but so far however, Experience has shown that discussion of dual use capability and other disarmament fora has resulted in exerting 2 policies of restrictions and export controlled by developed countries against developing countries. That would hamper their peaceful use Exploration and use of outer space and that is the outer space activity which is now very needed for their daily 

In [94]:
def retrieve_hybrid_top_chunks(k, question, question_embedding):
    """Retrieve the top K entries from Azure AI Search using hybrid search with speaker embedding.""" 
    vector_query = VectorizedQuery(vector=question_embedding, 
                                k_nearest_neighbors=k, 
                                fields="TextEmbeddings")

    results = search_client.search(  
        search_text=question,  
        vector_queries=[vector_query],
        select=["Speaker", "Text","Meeting", "Session"],
        top=k
    )    

    output = [[f'Session: {result["Session"]}',f'Session: {result["Meeting"]}',f'{result["Speaker"]}: {result["Text"]}'] for result in results]  

    return output

In [95]:
results = retrieve_hybrid_top_chunks(3, question, question_embedding)
results

[['Session: 3',
  'Session: 5',
  'Chairman: I thank the distinguished representative of Iran for his statement. And now I would like to give the floor to the distinguished representative of the Republic of Korea. You have the floor.'],
 ['Session: 3',
  'Session: 5',
  'Chairman: I thank the distinguished representative of of the Russian Federation. And now I would like to give the photo, the distinguished representative of Iran. You have the first, Sir.'],
 ['Session: 3',
  'Session: 5',
  'Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dual use capability so far, in particular supporting. His view regarding, That to what extent? This dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would h

In [98]:
def retrieve_knn_top_chunks(k, question_embedding):
    """Retrieve the top K entries from Azure AI Search using hybrid search with Text embedding."""
    vector_query = VectorizedQuery(vector=question_embedding, k_nearest_neighbors=k, 
                                fields="SpeakerEmbeddings",exhaustive=True)

    results = search_client.search(  
        search_text=None,  
        vector_queries=[vector_query],
        select=["Speaker", "Text","Meeting", "Session"],
        top=k
    )  
    output = [[f'Session: {result["Session"]}',f'Session: {result["Meeting"]}',f'{result["Speaker"]}: {result["Text"]}'] for result in results]  

    return output

In [99]:
results = retrieve_knn_top_chunks(3, question_embedding)
results


[['Session: 3',
  'Session: 5',
  'Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dual use capability so far, in particular supporting. His view regarding, That to what extent? This dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of'],
 ['Session: 3',
  'Session: 5',
  'Iran: dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of side effects for the developing countries. Peaceful use of outer space, which is Now very needed in the everyday life of countries. So far this issue of operations involving dual use capability in our view is among the very complex issues that require careful examination. Some sta

In [100]:
def get_vector_context(user_question, retrieved_k = 5):
    # Generate embeddings for the question
    question_embedding = generate_embeddings(user_question)

    # Retrieve the top K entries
    output = retrieve_vector_top_chunks(retrieved_k, question_embedding)

    # concatenate the content of the retrieved documents
    context = '. '.join([item for sublist in output for item in sublist])

    return context

In [101]:
context = get_vector_context('iran', retrieved_k = 5)
context

"Session: 3. Session: 5. Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dual use capability so far, in particular supporting. His view regarding, That to what extent? This dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of. Session: 3. Session: 5. Iran: now very needed for their daily life. We are of the view that resorting to dual use capability context might be converted to a tool in hands of some to exert to discrimination policies. So, taking into account dual use capability, it might be in the context of behavioral approach would lead to misleading policies. Therefore, instead of instead of such a such a denial approaches that comes from such a behavioral approach. I th

In [102]:
def get_hybrid_context(user_question, retrieved_k = 5):
    # Generate embeddings for the question
    question_embedding = generate_embeddings(user_question)

    # Retrieve the top K entries
    output = retrieve_hybrid_top_chunks(retrieved_k, user_question, question_embedding)

    # concatenate the content of the retrieved documents
    context = '. '.join([item for sublist in output for item in sublist])

    return context

In [103]:
context = get_hybrid_context('iran', retrieved_k = 5)
context

"Session: 3. Session: 5. Chairman: I thank the distinguished representative of Iran for his statement. And now I would like to give the floor to the distinguished representative of the Republic of Korea. You have the floor.. Session: 3. Session: 5. Chairman: I thank the distinguished representative of of the Russian Federation. And now I would like to give the photo, the distinguished representative of Iran. You have the first, Sir.. Session: 3. Session: 5. Philippines: countries, including those identified as potential drop zones of re-entering debris from the launch that pose a potential risk of injury to people or damage or destruction to property. Thank you, Mr. Chair.. Session: 3. Session: 5. Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dua

In [104]:
def get_knn_context(user_question, retrieved_k = 5):
    # Generate embeddings for the question
    question_embedding = generate_embeddings(user_question)

    # Retrieve the top K entries
    output = retrieve_knn_top_chunks(retrieved_k, question_embedding)

    # concatenate the content of the retrieved documents
    context = '. '.join([item for sublist in output for item in sublist])

    return context

In [105]:
context = get_knn_context('iran', retrieved_k = 5)
context

'Session: 3. Session: 5. Iran: To the chair for this morning meeting with so far, and I had the intention to keep silence, to actually listen carefully to what other colleagues are saying, Of course, before me, distinguished representative of the Russian Federation has put some difficult elements of this issue of dual use capability so far, in particular supporting. His view regarding, That to what extent? This dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of. Session: 3. Session: 5. Iran: dealing with the issue through through a perceptions or misperceptions and based on behavioural approach would have a lot of side effects for the developing countries. Peaceful use of outer space, which is Now very needed in the everyday life of countries. So far this issue of operations involving dual use capability in our view is among the very complex issues that require careful examination. Some states of course this morn

In [107]:
#define chatbot 
def prompt_engineering(KnnContext, VectorContext,HybridContext):
    KnnContext, VectorContext,HybridContext = KnnContext, VectorContext, HybridContext

    chat_context_prompt = f"""

    You are an assistant to answer questions about the Meetings by the UN Open Ended Working Group on Space Threats. 
    Do not hallucinate. 
    Use all the information below to answer your questions
    
    first information: {VectorContext}

    Second information : {HybridContext}

    Third information : {KnnContext}

    If the answer to the question is not in information above,respond 'I am unable to provide a response on that'

    """

    return chat_context_prompt

In [108]:
client = OpenAI(api_key=openai_key)

def PR_Assistant(text,chat_context_prompt):
    MESSAGES = [
    {"role": "system", "content": chat_context_prompt},
    {"role": "user", "content": text},
    ]
    MESSAGES.append({"role": "user", "content": text})

    completion = client.chat.completions.create(model="gpt-4", messages=MESSAGES,temperature=0.9)
    return completion.choices[0].message.content

In [109]:
question = "What is iran's overall sentiment?"

KnnContext = get_knn_context(question, retrieved_k = 5)
VectorContext = get_vector_context(question, retrieved_k = 5)
HybridContext = get_hybrid_context(question, retrieved_k = 5)

chat_context_prompt = prompt_engineering(KnnContext, VectorContext,HybridContext)

client_response = PR_Assistant(question, chat_context_prompt)

print(client_response)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Iran expressed concerns about dual use capabilities in space operations, citing that the behavioural approach to this issue could lead to misperceptions and unintended side effects for developing countries. They suggested that this issue is complex and requires careful examination. Iran is of the view that the dual use capability context could potentially be used as a tool for some to exert discriminatory policies. Therefore, they recommend assurance and reassurance approaches through legal norms. Iran also emphasized the importance of peaceful use of outer space, which is very needed for everyday life in countries. They warned against policies of restrictions and export control by developed countries against developing nations that could hamper their peaceful exploration and use of outer space.


Iran expressed concerns about dual use capabilities in space operations, citing that the behavioural approach to this issue could lead to misperceptions and unintended side effects for developing countries. They suggested that this issue is complex and requires careful examination. Iran is of the view that the dual use capability context could potentially be used as a tool for some to exert discriminatory policies. Therefore, they recommend assurance and reassurance approaches through legal norms. Iran also emphasized the importance of peaceful use of outer space, which is very needed for everyday life in countries. They warned against policies of restrictions and export control by developed countries against developing nations that could hamper their peaceful exploration and use of outer space.

In [110]:
question = "list the countries that have similar sentiments to iran?"

KnnContext = get_knn_context(question, retrieved_k = 5)
VectorContext = get_vector_context(question, retrieved_k = 5)
HybridContext = get_hybrid_context(question, retrieved_k = 5)

chat_context_prompt = prompt_engineering(KnnContext, VectorContext,HybridContext)

client_response = PR_Assistant(question, chat_context_prompt)

print(client_response)

I am unable to provide a response on that. The provided information does not indicate specific countries that share similar sentiments with Iran on the discussed issues.


In [111]:
question = "what did the united states say in session 3 meeting 5?"

KnnContext = get_knn_context(question, retrieved_k = 5)
VectorContext = get_vector_context(question, retrieved_k = 5)
HybridContext = get_hybrid_context(question, retrieved_k = 5)

chat_context_prompt = prompt_engineering(KnnContext, VectorContext,HybridContext)

client_response = PR_Assistant(question, chat_context_prompt)

print(client_response)

In Session 3, Meeting 5, the United States raised concerns about the Chinese system She-Jin 21 (SJ21), which is described as being used to test and verify space debris mitigation technologies. The U.S. expressed frustration at having to rely on military, civil, and commercial space situational awareness systems to detect the behavior of SJ21, rather than receiving information about its function and intentions from China. 

The U.S. also discussed the issue of using civil space systems for military purposes and expressed the opinion that banning such usage is not possible since many militaries use dual-purpose systems. Instead of legally binding restrictions, the U.S. proposed that states consider elaborating general principles. 

The U.S. expressed a desire for open dialogue and communication between states to avoid misunderstandings and misperceptions. 

Later in the session, the U.S. reiterated the importance of space launch notifications, citing that they reduce tensions and aid cou