<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>

<br>

# <font color="#76b900">**Notebook 9:** LangServe and Assessment</font>

<br>

## LangServe Server Setup

This notebook is a playground for those interested in developing interactive web applications using LangChain and [**LangServe**](https://python.langchain.com/docs/langserve). The aim is to provide a minimal-code example to illustrate the potential of LangChain in web application contexts.

This section provides a walkthrough for setting up a simple API server using LangChain's Runnable interfaces with FastAPI. The example demonstrates how to integrate a LangChain model, such as `ChatNVIDIA`, to create and distribute accessible API routes. Using this, you will be able to supply functionality to the frontend service's [**`frontend_server.py`**](frontend/frontend_server.py) session, which strongly expects:
- A simple endpoint named `:9012/basic_chat` for the basic chatbot, exemplified below.
- A pair of endpoints named `:9012/retriever` and `:9012/generator` for the RAG chatbot.
- All three for the **Evaluate** utility, which will be required for the final assessment. *More on that later!*

**IMPORTANT NOTES:**
- Make sure to click the square ( $\square$ ) button twice to shut down an active FastAPI cell. The first time might fall through or trigger a try-catch routine on an asynchronous process.
- If it still doesn't work, do a hard restart on this notebook by using **Kernel -> Restart Kernel**.
- When a FastAPI server is running in your cell, expect the process to block up this notebook. Other notebooks should not be impacted by this. 

<br>

### **Part 1:** Delivering the /basic_chat endpoint

Instructions are provided for launching a `/basic_chat` endpoint both as a standalone Python file. This will be used by the frontend to make basic decision with no internal reasoning.

In [None]:
%%writefile server_app.py
from fastapi import FastAPI
from operator import itemgetter
import os
from typing import Any, Dict

from langserve import add_routes

from langchain_community.vectorstores import FAISS
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

# ---------------- Config ----------------
EMBED_MODEL = os.getenv("EMBED_MODEL", "nvidia/nv-embed-v1")   # must match index build model
LLM_MODEL   = os.getenv("LLM_MODEL", "meta/llama-3.1-8b-instruct")
INDEX_DIR   = os.getenv("INDEX_DIR", "docstore_index")
PORT        = int(os.getenv("PORT", "9012"))

# -------------- Utilities ---------------
def normalize_input(x: Any) -> Dict[str, Any]:
    """
    Accept raw strings or dicts and always return a dict with at least 'input'.
    Preserves extra keys like 'context' or 'docs' if provided.
    """
    if isinstance(x, str):
        return {"input": x}
    if isinstance(x, dict):
        if "input" in x and isinstance(x["input"], str):
            return x
        # common alternates some UIs send
        for k in ("question", "query", "text", "prompt"):
            if k in x and isinstance(x[k], str):
                return {"input": x[k], **{kk: vv for kk, vv in x.items() if kk != k}}
        # last resort: stringify
        return {"input": str(x), **x}
    return {"input": str(x)}

normalize = RunnableLambda(normalize_input)

# ----------- Load FAISS & LLM -----------
if not os.path.isdir(INDEX_DIR):
    raise FileNotFoundError(f"Missing '{INDEX_DIR}/'. Make sure your FAISS index is present.")

embedder = NVIDIAEmbeddings(model=EMBED_MODEL, truncate="END")
docstore = FAISS.load_local(INDEX_DIR, embedder, allow_dangerous_deserialization=True)
retriever = docstore.as_retriever(search_kwargs={"k": 4})

llm = ChatNVIDIA(model=LLM_MODEL) | StrOutputParser()

# ---------------- Prompts ----------------
basic_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise, helpful assistant."),
    ("user", "{input}")
])

rag_prompt = ChatPromptTemplate.from_messages([
    ("system",
     "Answer ONLY using the provided context. If the context is insufficient, say you don't know.\n\n"
     "Context:\n{context}"),
    ("user", "{input}")
])

# ---------------- Chains -----------------
# /basic_chat: accept string or dict
basic_chat_chain = normalize | basic_prompt | llm

# /retriever: MUST return List[Document] (no reorder/docs2str here; frontend does it)
retriever_chain = normalize | itemgetter("input") | retriever

# /generator: expects {"input": "...", "context": "<string>"}
# (frontend builds the context string for you)
generator_chain = normalize | rag_prompt | llm

# --------------- App ---------------------
app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="RAG endpoints compatible with the course frontend/grader."
)

add_routes(app, basic_chat_chain, path="/basic_chat")
add_routes(app, retriever_chain,  path="/retriever")
add_routes(app, generator_chain,  path="/generator")

@app.get("/")
def root():
    return {"status": "ok",
            "routes": ["/basic_chat/invoke", "/basic_chat/stream",
                       "/retriever/invoke", "/retriever/stream",
                       "/generator/invoke", "/generator/stream"]}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=PORT)

In [None]:
## Works, but will block the notebook.
!python server_app.py  

## Will technically work, but not recommended in a notebook. 
## You may be surprised at the interesting side effects...
# import os
# os.system("python server_app.py &")

<br>

### **Part 2:** Using The Server:

While this cannot be easily utilized within Google Colab (or at least not without a lot of special tricks), the above script will keep a running server tied to the notebook process. While the server is running, do not attempt to use this notebook (except to shut down/restart the service).

In another file, however, you should be able to access the `basic_chat` endpoint using the following interface:

```python
from langserve import RemoteRunnable
from langchain_core.output_parsers import StrOutputParser

llm = RemoteRunnable("http://0.0.0.0:9012/basic_chat/") | StrOutputParser()
for token in llm.stream("Hello World! How is it going?"):
    print(token, end='')
```

**Please try it out in a different file and see if it works!**


<br>

### **Part 3: Final Assessment**

**This notebook will be used to completing the final assessment!** When you have otherwise finished the course, we recommend cloning this notebook, getting the frontend open in a new tab, and implement the Evaluate functionality by implementing the `/generator` and `/retriever` endpoints above! For a quick link to the frontend, run the cell below:

In [None]:
%%js
var url = 'http://'+window.location.host+':8090';
element.innerHTML = '<a style="color:#76b900;" target="_blank" href='+url+'><h2>< Link To Gradio Frontend ></h2></a>';

<hr>
<br>

#### **Assessment Hint:** 
Note that the following functionality is already implemented in the frontend microservice. 

```python
## Necessary Endpoints
chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),  ## For the final assessment
    'generator' : RemoteRunnable("http://lab:9012/generator/"),  ## For the final assessment
}

basic_chain = chains_dict['basic']

## Retrieval-Augmented Generation Chain

retrieval_chain = (
    {'input' : (lambda x: x)}
    | RunnableAssign(
        {'context' : itemgetter('input') 
        | chains_dict['retriever'] 
        | LongContextReorder().transform_documents
        | docs2str
    })
)

output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain
```

**To conform to this endpoint ingestion strategy, make sure not to duplicate pipeline functionality and only deploy the features that are missing!**

----

<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>