# Beispiel Langchain mit Azure Open AI 

[https://python.langchain.com/docs/integrations/chat/azure_chat_openai](https://python.langchain.com/docs/integrations/chat/azure_chat_openai)

- Damit das Beispiel funktioniert, müssen folgende Umgebungsvariablen angelegt sein:
    - `AZURE_OPENAI_ENDPOINT` [Beispielsweise  https://<base>.openai.azure.com/]
    - `OPENAI_API_VERSION` [Beispielsweise 2023-07-01-preview]
    - `AZURE_OPENAI_API_KEY `
    - `OPENAI_ORG_ID` <firmenname>
    - `OPENAI_API_DEPLOYMENT_NAME` [Beispielsweise  gpt-4-32k]
    - `OPENAI_API_MODEL`  [Beispielsweise  gpt-4-32k]

-   Es ist zu beachten, dass keine anderen OPEN_AI Umgebungsvariablen angelegt sein dürfen, weil der Programmablauf gegebenfalls ungewollt verzweigt. 
    Also wirklich nur die oben erwähnten Variablen.

-   Die verwendete FAISS library benötigte zum Zeitpunkt der Erstellung dieses Notebooks Pyhton 3.11.7. Ich bin deswegen auf die lancedb ausgewichen, die eine Unterstützung von 3.12.2 hatte.

-   Das nachfolgendeo Kommand installiert notwendige python Libraries
    `pip install -U langchain-community faiss-cpu langchain-openai tiktoken langchain-openai`

In [2]:
pip install -U langchain langchain-community lancedb langchain-openai tiktoken openai pandas

Collecting tiktoken
  Using cached tiktoken-0.6.0-cp312-cp312-win_amd64.whl.metadata (6.8 kB)
Collecting pandas
  Downloading pandas-2.2.0-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2023.4-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.0-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
    --------------------------------------- 0.3/11.5 MB 5.8 MB/s eta 0:00:02
   -- ------------------------------------- 0.6/11.5 MB 8.1 MB/s eta 0:00:02
   --- ------------------------------------ 1.0/11.5 MB 8.0 MB/s eta 0:00:02
   ----- ---------------------------------- 1.5/11.5 MB 9.0 MB/s eta 0:00:02
   ------- -------------------------------- 2.0/11.5 MB 10.0 MB/s eta 0:00:01
   -------- ------------------------------- 2.5/11.5 MB 10.1 MB/s eta 0:00:01
   ---------- 

In [2]:
import os
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import AzureOpenAIEmbeddings
from langchain_openai import AzureOpenAI
from langchain_community.vectorstores import LanceDB
from langchain.chains.question_answering import load_qa_chain
import lancedb

azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
#  "gpt-35-turbo" => BadRequestError: Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The embeddings operation does not work with the specified model, gpt-35-turbo. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}
#  "gpt-4" => BadRequestError: Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The embeddings operation does not work with the specified model, gpt-35-turbo. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}
#  "gpt-4-32k" => BadRequestError: Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The embeddings operation does not work with the specified model, gpt-35-turbo. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}
deployment_name = "text-embedding-ada-002" 
model_name = "text-embedding-ada-002" 
openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai_api_version = os.getenv("OPENAI_API_VERSION")
openai_organization = os.getenv("OPENAI_ORG_ID")

# Open the text file and read the text.
text_file = open("./data/how_AI_could_empower_any_business.txt", "r")
raw_text = text_file.read()

# Split the text into chunks of 1000 characters with 200 characters overlap.
text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
textTexts  = text_splitter.split_text(raw_text)

# Show how many chunks of text are generated.
len(textTexts)

# Pass the text chunks to the Embedding Model from Azure OpenAI API to generate embeddings.
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint = azure_endpoint,
    azure_deployment=deployment_name,
    openai_api_key=openai_api_key,
    openai_api_version=openai_api_version,
    openai_organization=openai_organization,
    chunk_size=1)

# Use LanceDB to index the embeddings. This will allow us to perform a similarity search on the texts using the embeddings.
# https://python.langchain.com/docs/integrations/vectorstores/lancedb
db = lancedb.connect("/tmp/lancedb")
table = db.create_table(
    "my_table",
    data=[
        {
            "vector": embeddings.embed_query("Hello World"),
            "text": "Hello World",
            "id": "1",
        }
    ],
    mode="overwrite",
)
docSearch = LanceDB.from_texts(texts=textTexts, embedding=embeddings, connection=table)

# Create a Question Answering chain using the embeddings and the similarity search.
# https://docs.langchain.com/docs/components/chains/index_related_chains
azureOpenAI = AzureOpenAI(
    openai_api_key=openai_api_key, 
    deployment_name="gpt-4-32k", 
    model_name="gpt-4-32k")
chain = load_qa_chain(azureOpenAI, chain_type="stuff")

# Perform first sample of question answering.
inquiry = "What is this file talking about?"
docs = docSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

# Perform second sample of question answering.
inquiry = "How AI cound empower any business?"
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

# Perform third sample of question answering.
inquiry = "Can you give me an example of how AI empowers the business?"
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

# Perform forth sample of question answering.
inquiry = "Please help to summarize this file into 300 words."
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

Created a chunk of size 1017, which is longer than the specified 1000
Created a chunk of size 1455, which is longer than the specified 1000
  warn_deprecated(


BadRequestError: Error code: 400 - {'error': {'code': 'OperationNotSupported', 'message': 'The completion operation does not work with the specified model, text-embedding-ada-002. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.'}}