# Azure OpenAI Service - Chat on private data using LangChain

Firstly, create a file called `.env` in this folder, and add the following content, obviously with your values:

```
OPENAI_API_KEY=xxxxxx
OPENAI_API_BASE=https://xxxxxxx.openai.azure.com/
```

Then, let's install all dependencies:

## Installing requirements

You can install the requirements from here, but a better option is to select to install the requirements when creating a virtual enviroment (.venv), see the main readme file to learn how to do this. 

In [None]:

# !pip install -r ../requirements.txt

In [21]:
import os
import openai
from dotenv import load_dotenv
from langchain.chat_models import AzureChatOpenAI
from langchain.embeddings import OpenAIEmbeddings

# Load environment variables (set OPENAI_API_KEY and OPENAI_API_BASE in .env)
load_dotenv()

# Configure Azure OpenAI Service API
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")

# Init LLM and embeddings model
llm = AzureChatOpenAI(deployment_name="gpt4-32", temperature=0, openai_api_version="2023-03-15-preview")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)

First, we load up our documents from the `data` directory:

## The code below throws an error due to codec issues

In [13]:
# from langchain.document_loaders import DirectoryLoader
# from langchain.document_loaders import TextLoader
# from langchain.text_splitter import TokenTextSplitter

# loader = DirectoryLoader('../data/qna/', glob="*.txt", loader_cls=TextLoader)

# documents = loader.load()
# text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)
# docs = text_splitter.split_documents(documents)

## I have replaced it with the code below

I got help from GPT4 rewriting the code below

In [14]:
# Import the required classes from the langchain library  
from langchain.document_loaders import DirectoryLoader  
from langchain.document_loaders import TextLoader  
from langchain.text_splitter import TokenTextSplitter  
  
# Define a custom document class that includes 'page_content' and 'metadata' attributes  
class CustomDocument:  
    def __init__(self, page_content, metadata=None):  
        self.page_content = page_content  # Store the text content of the document  
        self.metadata = metadata if metadata is not None else {}  # Store the metadata, defaulting to an empty dictionary if not provided  
  
# Define a custom text loader class that inherits from the langchain TextLoader class  
class CustomTextLoader(TextLoader):  
    def __init__(self, file_path, encoding='utf-8', errors='strict'):  
        super().__init__(file_path, encoding)  # Call the base class constructor with the file path and encoding  
        self.errors = errors  # Store the error handling strategy for decoding the text file  
  
    def load(self):  
        # Try to load the text file using the specified encoding and error handling strategy  
        try:  
            with open(self.file_path, encoding=self.encoding, errors=self.errors) as f:  
                text = f.read()  
        except UnicodeDecodeError as e:  
            # If there's an issue with decoding the text file, raise a runtime error with a helpful message  
            raise RuntimeError(f"Could not load text file '{self.file_path}'") from e  
        # Return a list containing a single CustomDocument object with the loaded text  
        return [CustomDocument(page_content=text)]  
  
# Initialize the DirectoryLoader with the custom text loader class and the desired encoding and error handling strategy  
loader = DirectoryLoader('../data/qna/', glob="*.txt", loader_cls=lambda file_path: CustomTextLoader(file_path, encoding='utf-8', errors='replace'))  
  
# Load the documents from the specified directory using the custom text loader  
documents = loader.load()  
  
# Initialize the TokenTextSplitter with the desired chunk size and overlap  
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0)  
  
# Split the loaded documents into smaller chunks using the TokenTextSplitter  
docs = text_splitter.split_documents(documents)  


Next, let's ingest them into FAISS so we can efficiently query our embeddings:

In [15]:
from langchain.vectorstores import FAISS
db = FAISS.from_documents(documents=docs, embedding=embeddings)

Now let's create a chain that can do the whole chat on our embedding database:

In [22]:
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

# Adapt if needed
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:""")

qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                           retriever=db.as_retriever(),
                                           condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                                           return_source_documents=True,
                                           verbose=False)

Now let's ask a question:

In [23]:
chat_history = []
query = "what is azure openai service?"
result = qa({"question": query, "chat_history": chat_history})
print(result["answer"])

Azure OpenAI service provides access to powerful language models, including GPT-3, Codex, and Embeddings series, through REST APIs, Python SDK, or the web-based interface in the Azure OpenAI Studio. These models can be adapted for various tasks such as content generation, summarization, semantic search, and natural language to code translation. The service offers features like virtual network support, managed identity via Azure Active Directory, and responsible AI content filtering. Azure OpenAI ensures compatibility and a smooth transition from OpenAI APIs while providing the security and enterprise capabilities of Microsoft Azure.


We can use this to easy implement chat conversations:

In [24]:
chat_history = []

query = "what is Azure OpenAI Service?"
result = qa({"question": query, "chat_history": chat_history})
print("Question:", query)
print("Answer:", result["answer"])

chat_history = [(query, result["answer"])]
query = "Which regions does the service support?"
result = qa({"question": query, "chat_history": chat_history})
print("Question:", query)
print("Answer:", result["answer"])

Question: what is Azure OpenAI Service?
Answer: Azure OpenAI Service is a platform that provides access to OpenAI's powerful language models, including the GPT-3, Codex, and Embeddings model series, through REST APIs, Python SDK, or the web-based interface in the Azure OpenAI Studio. These models can be adapted for various tasks such as content generation, summarization, semantic search, and natural language to code translation. Azure OpenAI Service offers features like virtual network support, managed identity via Azure Active Directory, and responsible AI content filtering. It is designed to provide advanced language AI capabilities with the security and enterprise promise of Microsoft Azure.
Question: Which regions does the service support?
Answer: The Azure OpenAI Service is supported in the following regions:

1. East US
2. South Central US
3. West Europe
