# Load DB

In [23]:
import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import AzureOpenAI

os.environ["OPENAI_API_BASE"] = os.getenv('OPENAI_AZURE_ENDPOINT')
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_AZURE_KEY')
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["OPENAI_API_TYPE"] = "azure"

embedding = OpenAIEmbeddings(
    deployment = 'text-embedding-ada-002',
    chunk_size = 1,
)

llm = AzureOpenAI(
    deployment_name = 'gpt-35-turbo',
    model_name = 'gpt-35-turbo',
)

In [5]:
from langchain.vectorstores import Chroma
persist_directory = '../assets/persistence/chroma/'

vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

In [6]:
print(vectordb._collection.count())

288


# Pure Information retrieval
Embedded vector search

In [13]:
question = "O que é Recuperação de Informação?"
docs_retrieved = vectordb.max_marginal_relevance_search(question,k=3)
print(len(docs_retrieved))
print(docs_retrieved[0].page_content)

3
Recuperação de Informação (RI) 
▰Encontrar documentos  de natureza não-estruturada que 
satisfaça uma necessidade de informação, a partir de uma 
grande coleção de materiais 
▰Requer cuidadosa avaliação  para 
demonstrar a performance 
superior de uma nova técnica 
4


In [15]:
question = "What is Information Retrieval?"
docs_retrieved = vectordb.max_marginal_relevance_search(question,k=3)
print(len(docs_retrieved))
print(docs_retrieved[0].page_content)

3
Recuperação de Informação (RI) 
▰Encontrar documentos  de natureza não-estruturada que 
satisfaça uma necessidade de informação, a partir de uma 
grande coleção de materiais 
▰Requer cuidadosa avaliação  para 
demonstrar a performance 
superior de uma nova técnica 
4


# Retrieval Augmented QnA

In [33]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate



In [86]:
template = """
Use the following pieces of context to answer the question at the end.
If the answer is not contained in the context, just say that you don't know, don't try to make up an answer.
The context is written in portuguese, but both question and answer should be in english.
Do not include any reference or link.
Do NOT include any mark like "<|im_end|>", just the answer and nothing else.
Use 50 words at maximum, under no condition write more than 50 words. Keep the answer as concise as possible. 
{context}

Question: {question}

Answer:"""

class Bot():
    def __init__(self, llm, retriever):
        #self.qa_chain = RetrievalQA.from_chain_type(
        #    llm,
        #    retriever = retriever,
        #)
        self.qa_chain = RetrievalQA.from_chain_type(
            llm,
            retriever = retriever,
            return_source_documents = True,
            chain_type_kwargs={"prompt": PromptTemplate.from_template(template)}
        )

    def answer(self, question: str) -> None:
        result = self.qa_chain(
            {"query": question}
        )
        answer: str = result['result'].replace('<|im_end|>', '').strip()
        print(answer)
bot = Bot(llm, vectordb.as_retriever())

In [82]:
question = "What are the specific goals of this work?"
result = bot.qa_chain({"query": question})
print(result['result'])
print('---------------'*3)
print('---------------'*3)
print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(result['source_documents'])]))

 The specific goals of this work are to compare IR techniques, implement IR techniques, use Skylink data to compare solutions, implement a prototype with the best technique, and test the prototype in real conditions.<|im_end|>
---------------------------------------------
---------------------------------------------
Document 1:

analistas de suporte.
1.4 Objetivos Específicos
Para alcançar o objetivo geral, foram traçados os seguintes objetivos especí-
ficos:
a)entender as técnicas de Recuperação de Informação aplicáveis ao domí-
nio do problema;
----------------------------------------------------------------------------------------------------
Document 2:

SUMÁRIO
1 INTRODUÇÃO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1 Justificativa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Definição do Problema . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Objetivo Geral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.

In [85]:
result['result']

'The specific goals of this work are to compare IR techniques, implement IR techniques, use Skylink data to compare solutions, implement a prototype with the best technique, and test the prototype in real conditions.'

In [93]:
bot.answer(question = 'What is Information Retrieval?')

The process of finding unstructured documents that fulfill an information need, from a large collection of materials, which requires careful evaluation to demonstrate the superior performance of a new technique.


In [94]:
bot.answer(question = 'What are the specific goals of this work?')

To compare IR techniques to search for similar support tickets; to implement the IR techniques; to use Skylink data to compare the solutions; to implement a prototype with the best technique; to test the prototype in real conditions.


In [95]:
bot.answer(question = 'What are the main results of this work?')

The work demonstrated the viability of a support ticket recovery system and made the dataset used freely and completely available. It also described in detail the implementation of each technique and explored the conditions affecting the results of each model. Finally, the study identified future possibilities.


In [96]:
bot.answer(question = 'Summarize the methodlogy to compare the used techniques')

The study compared different techniques that are representative of the main existing approaches to the problem, and the methodology had low sensitivity to the annotation methodology or metric. The low adoption rate indicates the need to improve interaction. All techniques were better than the random one and were identified for future possibilities.


In [97]:
bot.answer(question = 'In which company was the data collected?')

Not specified.


In [90]:
bot.answer(question = 'Who is the author (student)?')

F. S. Dyrhovden
