# Deploy a Simple LLM Chain API with LangServe

## Install OpenAI, and LangChain dependencies

Install the following httpx library version for compatibility with other libraries

In [1]:
!pip install httpx==0.27.2

Collecting httpx==0.27.2
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Downloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.28.1
    Uninstalling httpx-0.28.1:
      Successfully uninstalled httpx-0.28.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-genai 1.16.1 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.2 which is incompatible.[0m[31m
[0mSuccessfully installed httpx-0.27.2


In [2]:
!pip install langchain==0.2.0
!pip install langchain-openai==0.1.7
!pip install langchain-community==0.2.0
!pip install langserve[all]==0.2.1

Collecting langchain==0.2.0
  Downloading langchain-0.2.0-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.2.0)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain-core<0.3.0,>=0.2.0 (from langchain==0.2.0)
  Downloading langchain_core-0.2.43-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain==0.2.0)
  Downloading langchain_text_splitters-0.2.4-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.2.0)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting numpy<2,>=1 (from langchain==0.2.0)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
Collecting tenacity<9.0.0,>=8.1.0 (from langchain==0.2.0)
  Downloading tenacity-8.5.0-p

## Setup Environment Variables

In [3]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [4]:
# save server API into a python file to be deployed
%%writefile langchain_server_api.py
from fastapi import FastAPI
from langchain_openai import ChatOpenAI
from fastapi.middleware.cors import CORSMiddleware
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langserve import add_routes
import os

# Create an instance of FastAPI to serve as the main application.
app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="Spin up a simple API server using Langchain's Runnable interfaces",
)

# Configure CORS middleware to allow all origins, enabling cross-origin requests.
# details: https://fastapi.tiangolo.com/tutorial/cors/
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
    expose_headers=["*"],
)

@app.get("/liveness")
def liveness():
    """
    Define a liveness check endpoint.

    This route is used to verify that the API is operational and responding to requests.

    Returns:
        A simple string message indicating the API is working.
    """
    return 'API Works!'


# create input prompt
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Act as a helpful AI assistant, answer questions in detail with examples as necessary"),
        ("human", "{input}"),
    ]
)

# Initialize the OpenAI Chat instance with specific model parameters.
chatgpt = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# create simple llm chain
llm_chain = (prompt
                |
             chatgpt
                |
             StrOutputParser()
)


# Register routes using LangChain's utility function which integrates the chat model into the API.
add_routes(
    app,
    llm_chain,
    path="/llm_chain",
)

if __name__ == "__main__":
    import uvicorn
    # Start the server on localhost at port 8989.
    uvicorn.run(app, host="127.0.0.1", port=8989)

Writing langchain_server_api.py


In [5]:
!python langchain_server_api.py &>./app_logs.txt &

In [6]:
!ps -ef | grep langchain_server_api

root        1093       1 70 11:54 ?        00:00:02 python3 langchain_server_api.py
root        1111     410  0 11:54 ?        00:00:00 /bin/bash -c ps -ef | grep langchain_server_api
root        1113    1111  0 11:54 ?        00:00:00 grep langchain_server_api


In [7]:
!sudo kill -9 12764

kill: (12764): No such process


## Load Dependencies

In [8]:
from langchain_core.prompts import ChatPromptTemplate
import requests

## Check if API works

In [9]:
response = requests.get('http://127.0.0.1:8989/liveness')

In [10]:
response.json(), response.status_code

('API Works!', 200)

## Connect to the LLM API endpoint

In [11]:
from langserve import RemoteRunnable

chain_endpoint = RemoteRunnable("http://127.0.0.1:8989/llm_chain")

### Try a simple prompt

In [12]:
prompt = 'Tell me about Generative AI in 3 bullet points'

In [13]:
response = chain_endpoint.invoke({'input': prompt})

In [14]:
print(response)

1. Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns it has learned from existing data. This is in contrast to other types of AI, such as discriminative AI, which focuses on classification tasks.

2. One popular application of generative AI is in the field of image generation, where models like Generative Adversarial Networks (GANs) are used to create realistic images that are indistinguishable from real ones. For example, GANs have been used to generate photorealistic images of non-existent people, animals, or even landscapes.

3. Generative AI has also been used in natural language processing tasks, such as text generation and language translation. Models like OpenAI's GPT-3 can generate human-like text based on a given prompt, while neural machine translation models like Google Translate use generative techniques to translate text between languages.


### API supports native streaming

In [15]:
content = ''
for chunk in chain_endpoint.stream({'input': prompt}):
    print(chunk, end="", flush=True)
    content+=chunk

1. Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns it has learned from existing data. This is in contrast to other types of AI, such as discriminative AI, which focuses on classification tasks.

2. One popular application of generative AI is in the field of image generation, where models like Generative Adversarial Networks (GANs) are used to create realistic images that are indistinguishable from real ones. For example, GANs have been used to generate photorealistic images of non-existent people, animals, or even landscapes.

3. Generative AI has also been used in natural language processing tasks, such as text generation and language translation. Models like OpenAI's GPT-3 can generate coherent and contextually relevant text based on a given prompt, while neural machine translation models like Google Translate use generative techniques to translate text between languages.

In [16]:
content

"1. Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns it has learned from existing data. This is in contrast to other types of AI, such as discriminative AI, which focuses on classification tasks.\n\n2. One popular application of generative AI is in the field of image generation, where models like Generative Adversarial Networks (GANs) are used to create realistic images that are indistinguishable from real ones. For example, GANs have been used to generate photorealistic images of non-existent people, animals, or even landscapes.\n\n3. Generative AI has also been used in natural language processing tasks, such as text generation and language translation. Models like OpenAI's GPT-3 can generate coherent and contextually relevant text based on a given prompt, while neural machine translation models like Google Translate use generative techniques to translate text between languages."

In [17]:
qs = ['Tell me about Generative AI in 3 bullet points',
      'Tell me what is a large language model',
      'Explain prompt engineering in 1 line']
prompts = [{'input' : q} for q in qs]
prompts

[{'input': 'Tell me about Generative AI in 3 bullet points'},
 {'input': 'Tell me what is a large language model'},
 {'input': 'Explain prompt engineering in 1 line'}]

### Batch Execution

In [18]:
responses = await chain_endpoint.abatch(prompts)

In [19]:
for response in responses:
    print(response)
    print('-----')

1. Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns it has learned from existing data. This is in contrast to other types of AI, such as discriminative AI, which focuses on classification tasks.

2. One popular application of generative AI is in the field of image generation, where models like Generative Adversarial Networks (GANs) are used to create realistic images that are indistinguishable from real ones. For example, GANs have been used to generate photorealistic images of non-existent people, animals, or even landscapes.

3. Generative AI has also been used in natural language processing tasks, such as text generation and language translation. Models like OpenAI's GPT-3 can generate coherent and contextually relevant text based on a given prompt, while neural machine translation models like Google Translate use generative techniques to translate text between languages.
-----
A large langu