### Langchain
[LangChain](https://python.langchain.com/en/latest/index.html) is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model, but will also be:
- Data-aware: connect a language model to other sources of data
- Agentic: allow a language model to interact with its environment

The LangChain framework is designed around these principles.

We will use Langchain framework for rest of the workshop.

#### Question Answering over the docs/index
Question answering in this context refers to question answering over your document data.  For question answering over many documents, you almost always want to create an index over the data. This can be used to smartly access the most relevant documents for a given question, allowing you to avoid having to pass all the documents to the LLM (saving you time and money).

#### Set Environment Variables

In [1]:
import json  
import openai
from Utilities.envVars import *
import os

os.environ["AZURESEARCH_FIELDS_ID"] = "id"
os.environ["AZURESEARCH_FIELDS_CONTENT"] = "content"
os.environ["AZURESEARCH_FIELDS_CONTENT_VECTOR"] = "contentVector"
os.environ["AZURESEARCH_FIELDS_TAG"] = "{}"

# Set Search Service endpoint, index name, and API key from environment variables
indexName = SearchIndex

# Set OpenAI API key and endpoint
openAiEndPoint = f"{OpenAiEndPoint}"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"

#### Generate answer for a question from the document we already indexed in Vector Store

In [2]:
from langchain.chat_models import AzureChatOpenAI, ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.docstore.document import Document

embeddingModelType = "azureopenai"
temperature = 0.3
tokenLength = 1000

if (embeddingModelType == 'azureopenai'):
        openai.api_type = "azure"
        openai.api_key = OpenAiKey
        openai.api_version = OpenAiVersion
        openai.api_base = f"{OpenAiEndPoint}"

        llm = AzureChatOpenAI(
                openai_api_base=openai.api_base,
                openai_api_version=OpenAiVersion,
                deployment_name=OpenAiChat,
                temperature=temperature,
                openai_api_key=OpenAiKey,
                openai_api_type="azure",
                max_tokens=tokenLength)
        embeddings = OpenAIEmbeddings(engine=OpenAiEmbedding, chunk_size=1, openai_api_key=OpenAiKey)
        logging.info("LLM Setup done")
elif embeddingModelType == "openai":
        openai.api_type = "open_ai"
        openai.api_base = "https://api.openai.com/v1"
        openai.api_version = '2020-11-07' 
        openai.api_key = OpenAiApiKey
        llm = ChatOpenAI(temperature=temperature,
        openai_api_key=OpenAiApiKey,
        model_name="gpt-3.5-turbo",
        max_tokens=tokenLength)
        embeddings = OpenAIEmbeddings(openai_api_key=OpenAiApiKey)

                    engine was transferred to model_kwargs.
                    Please confirm that engine is what you intended.


In [3]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(
    file_path="../Workshop/Data/CSV/Covid19HistorySmall.csv",
    csv_args={
        "delimiter": ",",
        "quotechar": '"',
        "fieldnames": ["date","state","death","deathConfirmed","deathIncrease","deathProbable","hospitalized","hospitalizedCumulative","hospitalizedCurrently","hospitalizedIncrease","inIcuCumulative","inIcuCurrently","negative","negativeIncrease","negativeTestsAntibody","negativeTestsPeopleAntibody","negativeTestsViral","onVentilatorCumulative","onVentilatorCurrently","positive","positiveCasesViral","positiveIncrease","positiveScore","positiveTestsAntibody","positiveTestsAntigen","positiveTestsPeopleAntibody","positiveTestsPeopleAntigen","positiveTestsViral","recovered","totalTestEncountersViral","totalTestEncountersViralIncrease","totalTestResults","totalTestResultsIncrease","totalTestsAntibody","totalTestsAntigen","totalTestsPeopleAntibody","totalTestsPeopleAntigen","totalTestsPeopleViral","totalTestsPeopleViralIncrease","totalTestsViral","totalTestsViralIncrease"],
    },
    encoding="utf-8",
)
docs = loader.load()

In [4]:
from langchain.vectorstores import DocArrayInMemorySearch
db = DocArrayInMemorySearch.from_documents(docs, embeddings)

In [5]:
from langchain.chains import RetrievalQA  

retriever = db.as_retriever()
chain = RetrievalQA.from_chain_type(llm = llm, retriever = retriever, verbose=True, chain_type = "stuff")

In [6]:
def runChain(question):
    response = chain.run(question)
    return response

In [7]:
questions = [
    "How many people died in New York in 2021?",
    "How many hospitalized in Illinois on 03-07-2021?",
    "How many states reported the information?",
    "How many recovered in Connecticut?",
]

for question in questions:
    response = runChain(question)
    print(f"Question: {question}")
    print(f"Answer: {response}")
    print("")



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Question: How many people died in New York in 2021?
Answer: As of the given data on March 7, 2021, the number of deaths in New York in 2021 is 39,029.



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Question: How many hospitalized in Illinois on 03-07-2021?
Answer: On 03-07-2021, there were 1,141 people currently hospitalized in Illinois.



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Question: How many states reported the information?
Answer: The information provided includes data from 4 states.



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Question: How many recovered in Connecticut?
Answer: I don't have the information on the number of recovered individuals in Connecticut.

