## Ray integration with Ray, Serve, LangChain, and OpenAI 

<img src="ray_langchain.png" width="50%" height="25%">

The process involves two stages.

1. Building the vector embeddings indexes from the pdf document and storing them into a Chroma database. 
For long documents, exceeding hundreds of pages, they can split and parallelized with Ray tasks. 
2. Once the Chroma vector embeddings database is built, we can read the embeddings from the disk, we load the embeddings
from disk.

<img src="create_index_flow.png" width="70%" height="35%">

Our pdf document is an AI report, which is 351 pages, small enough for one remote Ray task

<img src="ai_report.png" width="20%" height="10%">




In [9]:
import os
import requests
from pathlib import Path

from langchain.document_loaders import PyPDFLoader 
from langchain.embeddings import OpenAIEmbeddings 
from langchain.vectorstores import Chroma 
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

from ray import serve
from starlette.requests import Request

# simple example inspired from 
# https://medium.com/geekculture/automating-pdf-interaction-with-langchain-and-chatgpt-e723337f26a6
# and https://github.com/sophiamyang/tutorials-LangChain/tree/main
# Default model used is OpenAI: default-gpt-3.5-turbo

#### Global variables

In [10]:
OPEN_AI_KEY = "your_key"
VECTOR_OPEN_AI_DB = "vector_oai_db"
vector_db = Path(Path.cwd(), VECTOR_OPEN_AI_DB).as_posix()

### Create our Serve deployment class

  * Load the respective embeddings from the [Chroma Vector Store](https://python.langchain.com/en/latest/ecosystem/chroma.html)
  * Initiates a LangChain high-level abstraction, which provides interface to OpenAI's `default-gpt-3.5-turbo` large language model
  * The callable class uses the LangChain interface to send request to the model for response

In [11]:
@serve.deployment(route_prefix="/",
                  autoscaling_config={
                        "min_replicas": 2,
                        "initial_replicas": 2,
                        "max_replicas": 4,
    })
class AnswerOpenAIPDFQuestions():
    def __init__(self, vector_db_path: str, open_ai_key: str):

        
        os.environ["OPENAI_API_KEY"] = open_ai_key
        embeddings = OpenAIEmbeddings()
        self._vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)
        self._pdf_qa_chain = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.9), 
                                                             self._vectordb.as_retriever())
        self._chat_history=[]

    async def __call__(self, http_request: Request) -> str:
        json_request: str = await http_request.json()
        prompts = []
        for prompt in json_request:
            text = prompt["text"]
            if isinstance(text, list):
                prompts.extend(text)
            else:
                prompts.append(text)
        result = self._pdf_qa_chain({"question": prompts[0], "chat_history": self._chat_history})
        
        return result

### Create Serve deployment. 
This is deployment has couple of replicas that can serve prompts or questions and return 
responses

### Send questions to the LLM model served by Serve

<img src="ray_serve_request_response.png" width="50%" height="25%">


In [13]:
# send the request to the deployment
prompts = [ "What is the total number of publications?",
            "What is the percentage increase in the number of AI-related job postings?",
            "Why Chinese citizens are more optimistic about AI than Americans?",
            # "What are the top takeaways from this report?",
            # "List benchmarks are released to evaulate AI workloads?",
            # "Describe and summarize the techincal ethical issues raised in the report?",
            # "How many bills containing “artificial intelligence” were passed into law?",
            # "What is the percentage increase in the number of AI-related job postings?"      
]

In [14]:
sample_inputs = [{"text": prompt} for prompt in prompts]
for sample_input in sample_inputs:
    output = requests.post("http://localhost:8000/", json=[sample_input]).json()
    print(f"Question: {output['question']}\nAnswer: {output['answer']}\n")

Question: What is the total number of publications?
Answer: From 2010 to 2021, the total number of AI publications more than doubled, growing from 200,000 in 2010 to almost 500,000 in 2021. However, it is important to note that this only includes English-language and Chinese-language AI publications globally.



[2m[36m(HTTPProxyActor pid=50775)[0m INFO 2023-04-21 11:10:49,270 http_proxy 127.0.0.1 http_proxy.py:373 - POST / 200 5465.9ms
[2m[36m(ServeReplica:AnswerOpenAIPDFQuestions pid=50778)[0m INFO 2023-04-21 11:10:49,265 AnswerOpenAIPDFQuestions AnswerOpenAIPDFQuestions#AfPppG replica.py:518 - HANDLE __call__ OK 5455.8ms


Question: What is the percentage increase in the number of AI-related job postings?
Answer: The data provides information on an increase in AI-related job postings, but it does not provide a single percentage increase across all job postings. It shows percentage increases and rankings by country, geographic area, sector, and state for different time periods.



[2m[36m(HTTPProxyActor pid=50775)[0m INFO 2023-04-21 11:10:53,386 http_proxy 127.0.0.1 http_proxy.py:373 - POST / 200 4103.2ms
[2m[36m(ServeReplica:AnswerOpenAIPDFQuestions pid=50777)[0m INFO 2023-04-21 11:10:53,381 AnswerOpenAIPDFQuestions AnswerOpenAIPDFQuestions#hsvqBJ replica.py:518 - HANDLE __call__ OK 4091.0ms


Question: Why Chinese citizens are more optimistic about AI than Americans?
Answer: According to a 2022 IPSOS survey, 78% of Chinese respondents agreed with the statement that products and services using AI have more benefits than drawbacks, whereas only 35% of sampled Americans agreed with this statement. The report does not provide a clear explanation for why Chinese citizens are more optimistic about AI than Americans. However, it does suggest that opinions about AI vary across countries, and sentiment relating to AI products and services seems to be strongly correlated within specific countries.



In [15]:
serve.shutdown()

[2m[36m(HTTPProxyActor pid=50775)[0m INFO 2023-04-21 11:11:00,307 http_proxy 127.0.0.1 http_proxy.py:373 - POST / 200 6912.8ms
[2m[36m(ServeReplica:AnswerOpenAIPDFQuestions pid=50778)[0m INFO 2023-04-21 11:11:00,303 AnswerOpenAIPDFQuestions AnswerOpenAIPDFQuestions#AfPppG replica.py:518 - HANDLE __call__ OK 6903.1ms
[2m[36m(ServeController pid=50769)[0m INFO 2023-04-21 11:11:01,849 controller 50769 deployment_state.py:1359 - Removing 2 replicas from deployment 'AnswerOpenAIPDFQuestions'.
