# FastChat Demo
This notebook uses the [OpenAI REST API](https://platform.openai.com/docs/api-reference/introduction) to interact with LLMs hosted in a [FastChat](https://github.com/lm-sys/FastChat) deployment.
FastChat only supports chat completion and embeddings API endpoints.
For use on [jupyterhub.sdsu.edu](jupyterhub.sdsu.edu) select the image "Stack PRP". The Stack PRP image and FastChat both use Open AI API v0.28.1.

The OpenAI REST API endpoint is availbale at [https://sdsu-rci-fastchat.nrp-nautilus.io/v1](https://sdsu-rci-fastchat.nrp-nautilus.io/v1).

Your credentials should be stored in a file `env.yaml`. The API key will be shared with you via your instructor. Your `env.yaml` file should mimic the structure of the provided sample `env-template.yaml`.

In [1]:
import yaml
import openai

## Import Environment Variables

In [7]:
with open('env.yaml', 'r') as f:
    env = yaml.safe_load(f)

print(env["fastchat"]["base_url"])

https://sdsu-rci-fastchat.nrp-nautilus.io/v1


## Setup API Credentials

In [31]:
openai.api_key = env["fastchat"]["api_key"]
openai.api_base = env["fastchat"]["base_url"]

# Test config by printing available models
models = openai.Model.list()
print(models)

{
  "object": "list",
  "data": [
    {
      "id": "vicuna-13b-v1.5-16k",
      "object": "model",
      "created": 1707262950,
      "owned_by": "fastchat",
      "root": "vicuna-13b-v1.5-16k",
      "parent": null,
      "permission": [
        {
          "id": "modelperm-kqNnouRJzMd25voUMjzuCw",
          "object": "model_permission",
          "created": 1707262950,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": true,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "vicuna-33b-v1.3",
      "object": "model",
      "created": 1707262950,
      "owned_by": "fastchat",
      "root": "vicuna-33b-v1.3",
      "parent": null,
      "permission": [
        {
          "id": "modelperm-gvDNsDX7SfMPQbdFfyx9BZ",
          "object": "model_permission

## Preprocess paper.txt

In [8]:
text_filename = env['file_name_path']
text_filename

'./paper.txt'

In [16]:
transcript_raw = ""

with open(text_filename, 'r') as f:
    transcript_raw = f.read()

# Calculate and print info about raw file
rawCharCount = len(transcript_raw)
rawWordCount = len(transcript_raw.split())
rawLineCount = len(transcript_raw.split("\n"))

print(f"Raw transcript character count: {rawCharCount}")
print(f"Raw transcript word count: {rawWordCount}")
print(f"Raw transcript line count: {rawLineCount}")


Raw transcript character count: 5130
Raw transcript word count: 697
Raw transcript line count: 7


In [25]:
# Process transcript as a list to make it iterable
transcript_transform = transcript_raw.split("\n")
transcript_concat = "".join(transcript_transform)

# some models have a word limit in which you can change here, though not limiting this word count might be okay as some tokenizers can handle extra words
final_sentence = transcript_concat[:]

## Ask the LLM to Perform the Analysis


In [40]:
# Model can be replaced with the model id from the previous call
# "vicuna-33b-v1.3" is the second model
model = models.data[1].id

initial_prompt = "You will be given the introduction to a scientific paper in Artificial Intelligence. \
From this introduction: Provide the top 3 items discussed and what the researcher is trying to accomplish."

prompt = final_sentence

# create a chat completion
completion = openai.ChatCompletion.create(
  model=model,
  messages=[
      {"role": "system", "content": initial_prompt},
      {"role": "user", "content": prompt}
  ]
)

# print the completion
print(completion.choices[0].message.content)

The top 3 items discussed in the paper are:

1. Attention mechanisms and their limitations: The paper discusses the limitations of the self-attention mechanism in Transformer models, such as the fixed-length context window and the potential difficulties in capturing long-term dependencies in practice.
2. The need for a more robust and explainable attention mechanism: The paper aims to address these limitations by introducing the Multi-Head Gaussian Adaptive Attention Mechanism (GAAM), which uses a Gaussian-based modulation of input features to improve the attention mechanism's performance and interpretability.
3. The applicability and potential benefits of GAAM: The paper suggests that GAAM can significantly enhance model performance in various domains, such as multimedia recommendation, image classification, and text classification, and offers improved accuracy, robustness, and user experience across diverse and challenging real-world applications.
