# Importing Required Libraries

First, we import all the necessary libraries that are used in this notebook. These libraries provide functions and methods for web scraping, data processing, and generating embeddings and chat responses.


In [None]:
# Uncomment and run the code below to install dependencies
# !pip install bs4 langchain openai chromadb unstructured tiktoken

In [None]:
import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, urljoin
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate


# Defining Helper Functions

Two helper functions are defined:

- `is_valid(url)`: Checks if a URL is valid.
- `get_all_website_links(url)`: Retrieves all the links from a given website.


In [None]:
def is_valid(url):
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)

def get_all_website_links(url):
    urls = set()
    domain_name = urlparse(url).netloc
    try:
        soup = BeautifulSoup(requests.get(url).content, "html.parser")
    except requests.exceptions.RequestException as e:
        print(e)
        return urls

    for a_tag in soup.findAll("a"):
        href = a_tag.attrs.get("href")
        if href == "" or href is None:
            continue
        href = urljoin(url, href)
        parsed_href = urlparse(href)
        href = parsed_href.scheme + "://" + parsed_href.netloc + parsed_href.path
        if not is_valid(href):
            continue
        if domain_name not in href:
            continue
        urls.add(href)
    return urls


# Data Scraping

We use the defined helper function `get_all_website_links(url)` to scrape all the links from the specified website. We then load the content from these links using `UnstructuredURLLoader`.


In [None]:
links = get_all_website_links("https://wanglab.ml") # Replace with your website of choice

loader = UnstructuredURLLoader(urls=links)
data = loader.load()


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


# Text Splitting

The loaded data is split into smaller chunks using the `RecursiveCharacterTextSplitter` from the langchain library.


In [None]:
import tiktoken

tokenizer = tiktoken.get_encoding('cl100k_base')

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=())
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20,  # number of tokens overlap between chunks
    length_function=tiktoken_len,
    separators=['\n\n', '\n', ' ', '']
)

docs = text_splitter.split_documents(data)



Additional data sources such as PDFs can also be added to the list of documents prior to creating a database for retrieval



# Converting the Dataset

We then convert the split data into embeddings using the OpenAIEmbeddings from the langchain library. We then create a vector store using these embeddings.


In [None]:
os.environ['OPENAI_API_KEY'] = '' # Place your API key here
persist_directory = '' # Place your target directory  here
embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents(docs, embeddings, persist_directory=persist_directory)
vector_store.persist() # Only need to run this if running the code in a notebook


# Chat Model Mechanics

Next, we set up the chat model mechanics. We start by creating an instance of `ChatOpenAI`. We then define a template for the chat prompt and use it to create a `PromptTemplate` instance. We also set up a question-answering chain with the help of `load_qa_chain`.


In [None]:
llm = ChatOpenAI(temperature=0, streaming=True, model='gpt-3.5-turbo-16k')

template = """You are a bot, named [name of bot], having a conversation with a human.

Background info: [desired background info]

Given the following extracted parts of a long document and a question, create a final answer.

{context}

{chat_history}
Human: {human_input}
[name of bot]:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"], template=template
)

chain = load_qa_chain(
   llm, chain_type="stuff", prompt=prompt
)

# Generating Response

Finally, we generate a response from the chat model. We start by setting up a query and getting relevant documents related to the query from the vector store. We then pass these documents along with the query to the question-answering chain to generate a response.


In [None]:
# Generating Response

query='what are some representative papers from the lab?'
docs = vector_store.similarity_search(query)
try:
    chat_history
except NameError:
    chat_history = ''

resp = chain({"input_documents": docs, "human_input": query,"chat_history":chat_history}, return_only_outputs=True)


Sure, here's a simplified explanation in markdown:

---

## FastAPI Application for Chat Model

This script sets up a FastAPI application to generate responses for chat queries using a language model. It's designed to be hosted on a cloud platform like Render.

### Key Components

**Imported Libraries**: These include FastAPI for the web application, threading and queue for managing concurrent tasks, and various components from the `langchain` package for setting up the chat model.

**ThreadedGenerator & ChainStreamHandler**: These are classes that manage the generation of responses in a separate thread, allowing responses to be streamed as they are generated.

**llm_thread & chain Functions**: These functions work together to start a new thread for each chat request, generate the response from the language model, and send the response back to the client.

**FastAPI Application**: This is the actual web application. It has a single endpoint, `/chain`, which accepts POST requests. Each request should have a JSON body with a `message` and `chat_history`. The endpoint responds with the generated chat response.

### Deployment

To deploy this application on Render:

1. Create a GitHub repository and add this script, a `requirements.txt` file with the necessary Python packages, and your generated dataset.
2. Set up a new Web Service on Render, linked to your GitHub repository. Render will automatically deploy your app and keep it synced with your repository.

> Note: If your dataset is too large for GitHub or contains sensitive data, add it to your `.gitignore` file and upload it directly to Render.

```python
import os
import threading
import queue
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel

from langchain.chat_models import ChatOpenAI
from langchain.callbacks.manager import AsyncCallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
persist_directory = 'data/j30'
vector_store = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

template = """You are a bot, named [name of bot], having a conversation with a human.

Background info: [desired background info]

Given the following extracted parts of a long document and a question, create a final answer.

{context}

{chat_history}
Human: {human_input}
[name of bot]:"""

pp = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"], template=template
)

class ThreadedGenerator:
    def __init__(self):
        self.queue = queue.Queue()

    def __iter__(self):
        return self

    def __next__(self):
        item = self.queue.get()
        if item is StopIteration: raise item
        return item

    def send(self, data):
        self.queue.put(data)

    def close(self):
        self.queue.put(StopIteration)

class ChainStreamHandler(StreamingStdOutCallbackHandler):
    def __init__(self, gen):
        super().__init__()
        self.gen = gen

    def on_llm_new_token(self, token: str, **kwargs):
        self.gen.send(token)

def llm_thread(g, prompt, chat_history):
    try:
        llm = ChatOpenAI(temperature=0, streaming=True, model='gpt-3.5-turbo-16k', callback_manager=AsyncCallbackManager([ChainStreamHandler(g)]))
        chain = load_qa_chain(llm, chain_type="stuff", prompt=pp)

        docs = vector_store.similarity_search(prompt)
        resp = chain({"input_documents": docs, "human_input": prompt, "chat_history": chat_history}, return_only_outputs=True)
        ans = resp

    finally:
        g.close()

def chain(prompt, chat_history):
    g = ThreadedGenerator()
    threading.Thread(target=llm_thread, args=(g, prompt, chat_history)).start()
    return g

class ChainRequest(BaseModel):
    message: str
    chat_history: str

app = FastAPI()

# CORS configuration
origins = ["*"]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.post("/chain")
async def _chain(request: ChainRequest):
    gen = chain(request.message, request.chat_history)
    return StreamingResponse(gen, media_type="text/plain")
```

## Deploying the FastAPI Application on Render

Follow the steps below to deploy your FastAPI application on Render:

1. **Create an account on Render**: If you haven't already, head over to [Render.com](https://render.com/) and sign up for a new account.

2. **Create a new web service**: After logging in, click on the "New+" dropdown menu on the dashboard. Select "Web Service" from the dropdown.

3. **Link your GitHub repository**: Connect your GitHub account and select the repository where you saved the FastAPI script as `app.py`. This repository should also contain the `requirements.txt` file and the dataset.

4. **Set the environment**: Make sure the environment is set to Python 3.

5. **Specify the build and start commands**:
    - In the "Build Command" field, enter `pip install -r requirements.txt`.
    - In the "Start Command" field, enter `uvicorn main:app --host 0.0.0.0 --port $PORT`.

6. **Set the environment variables**: Navigate to the "Environment" tab and set the following keys:
    - Key: `PYTHON_VERSION`, Value: `3.10.0`
    - Key: `OPENAI_API_KEY`, Value: [your open_api_key]

Once you've entered these settings, click "Create" to create the web service. Render will automatically build and deploy your application, and it will remain synced with your GitHub repository.

> Note: Ensure that your dataset is included in the repository if it's not too large. If it's too large for GitHub, you can upload it directly to Render via the shell after deployment.


## `requirements.txt` File

The `requirements.txt` file lists the Python packages that your project depends on. You can automatically install these packages using pip. Here's what the `requirements.txt` file looks like for this project:

```plaintext
fastapi[all]
langchain==0.0.205
openai
chromadb
Gunicorn
nest_asyncio
```

## Testing the FastAPI Application

You can test the application by sending POST requests to the `/chain` endpoint. In this example, we're using the `requests` library in Python to send these requests. Note that we're managing the chat history on the client side. This means that we append the chat history to new messages to ensure they remain in context.

Here is an example script to test the application:

```python
import requests

# Initialize the chat message and history
data = {
    "message": "who is rex ma",
    "chat_history": ""
}

# Send the first request and update the chat history
response = requests.post("[your render link]", json=data)
data['chat_history'] += 'Human: ' + data['message'] + '\n' + '[name of bot]: ' + response.text + '\n'

# Print the response
print(response.text)
```

The output should be something like:

```
Rex is a research student pursuing a PhD at the lab. His research area focuses on multi-modality integration in healthcare and biology using deep learning. He holds a BASc in Computer Engineering from the University of Toronto.
```

You can continue the conversation by sending more requests:

```python
# Set a new message
data['message'] = 'wait, is he an engineer?'

# Send the second request and update the chat history
response = requests.post("[your render link]", json=data)
data['chat_history'] += 'Human: ' + data['message'] + '\n' + '[name of bot]: ' + response.text + '\n'

# Print the response
print(response.text)
```

The output should be something like:

```
Yes, Rex Ma has a BASc in Computer Engineering from the University of Toronto.
```

As you can see, the application is able to generate relevant responses based on the chat history.
