<a href="https://colab.research.google.com/github/clam004/notebook_tutorials/blob/main/introspectGPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Minimal Introspect Causal Large Language Model (LLM) Chatbot

In our proposal we mention several methods to improve chatbot safety, helpfulness, consistency and controlability. This is a minimal proof of concept demo of one simple method of contorlling consistency of
chatbot response generation. 

We use a relatively small LLM that runs quickly within a free google colab. The code is all provided here for transparency to show we are not cheating using some prescripted dialog. However you do have to allow us some benefit of the doubt, which I hope is not too far of a jump. That is:

1. In our experiments the classifier can be made much more robust with fine-tuning. Suppose a zero shot accuracy is around 50% and few shot accuracy is around 75%, then we can fine tune the LLM on that few shot task and get up to 97% accuracy without losing the chatbot capabilities of the model.

2. Larger LLMs are less likely to make the mistakes of contradiction shown here, but Larger LLMs have much more powerful few shot capabilities, so the additional controls we can bestow with this same method are much greater

run the next two cells to install and import PyTorch and huggingface 

In [1]:
%%capture
! pip install transformers accelerate

In [2]:
#sys libs
import os
import sys
import random
import time
import json
import datetime
from datetime import date
import calendar
import pytz

#data manupulation libs
import numpy as np

#string manupulation libs
import re
import string

#torch libs
import torch
print('torch.__version__', torch.__version__)
print('torch.cuda.device_count()', torch.cuda.device_count())
print('torch.cuda.empty_cache()', torch.cuda.empty_cache())

#huggingface transformers
import transformers
print(transformers.__version__)
from transformers import set_seed
from transformers import AutoTokenizer, AutoModel
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from transformers import GPTJForCausalLM

%load_ext autoreload
%autoreload 2
%matplotlib inline

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


torch.__version__ 1.12.1+cu113
torch.cuda.device_count() 1
torch.cuda.empty_cache() None
Moving 0 files to the new cache system


0it [00:00, ?it/s]

4.22.1


### GPU acceleration

To give yourself a GPU in colab, go to `Runtime`-->`Change runtime type`

You can confirm this worked because if you run the above cell again `torch.cuda.device_count()` will change from 0 to the number of GPUs PyTorch now recognizes, this would be 0 to 1 in colab. 


### The Agent and Environment

For transparency the cell below has all the functions and classes that run this minimal demo.

It is very long, but dont worry you just have to run it and scroll all the way to the bottom to get to the demo. 

If you are curious what is included below, read my doctrings. You can make code changes and quickly see the results, for example, change the get_background_prompt() function to change the initial part of the LLM input prompt that come before the dialog history portion of the final prompt. 

In [4]:
example_dialog_history = [
    {'speaker':'bot','utterance':'Hello! who are you?'},
    {'speaker':'human','utterance':'my name is baby force'},
    {'speaker':'bot','utterance':'hi baby, or is it Mr. Force?'},
    {'speaker':'human','utterance':'actually im a baby girl and my first name is Force'},
    {'speaker':'bot','utterance':'thats a weird name for a baby girl isnt it?'},
    {'speaker':'human','utterance':'its a weird name for any human'},
    {'speaker':'bot','utterance':'did you parents name you that?'},
]


def get_background_prompt(
    human_symbol = '[H]',
    bot_symbol = '[B]',
):

    """ a background_prompt describing what the conversation is
    (e.g. This is a conversation between [{client_name}], a person, and [{robot_name}] ...)
    """

    utc_now = pytz.utc.localize(datetime.datetime.utcnow())
    pst_now = utc_now.astimezone(pytz.timezone("America/Los_Angeles"))
    curr_date = date.today()
    day_of_week = calendar.day_name[curr_date.weekday()]

    local_date = pst_now.strftime("%m/%d/%Y")
    local_time = pst_now.strftime("%I:%M %p")

    background_prompt = \
    "This is a text messaging only conversation between "+human_symbol+" and "+bot_symbol+". "+\
    bot_symbol+" is an humanlike artificial intelligence named Chloe. "+\
    bot_symbol+" knows it is an AI but it is indistinguishable from a human conversationalist. "+\
    bot_symbol+" speaks in a manner that is kind, empathetic, professional and "+\
    "keeps "+human_symbol+" safe. "+\
    bot_symbol+" was developed by scientists in 2022. "+\
    "Today's date is "+day_of_week+" "+local_date+", the time is "+local_time+". "+\
    "\n "

    return background_prompt


def get_model_tokenizer(
    model_load_path = 'EleutherAI/gpt-j-6B',
    tokenizer_name = 'EleutherAI/gpt-j-6B',
    cache_dir = None,
    model_device = None,
    verbose = False,
):

    ''' This is a function to clean up the model preparations, GPU/CPU loading 
    and matching tokenizer

    model architecture is based on the tokenizer name

    set model_device = 'cpu' to force model onto CPU despite having GPUs 
    available, or to torch.device('cuda:3') to get it to the fourth GPU, etc.
    if no GPUs available, cpu is the default. 
    '''

    NUM_GPUS = torch.cuda.device_count()

    if tokenizer_name in ['distilgpt2', 'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl']:

        tokenizer = GPT2Tokenizer.from_pretrained(
            tokenizer_name,
            pad_token='<|endoftext|>',
            padding_side = 'left',
        )

        model = GPT2LMHeadModel.from_pretrained(
            model_load_path,
            cache_dir = cache_dir, 
            pad_token_id=tokenizer.eos_token_id,
        )

    elif tokenizer_name in ['EleutherAI/gpt-j-6B']:

        tokenizer = AutoTokenizer.from_pretrained(
            tokenizer_name,
            pad_token='<|endoftext|>',
            padding_side = 'left',
        )

        if NUM_GPUS > 0:
          
          model = GPTJForCausalLM.from_pretrained(
              model_load_path,
              revision='float16', 
              torch_dtype=torch.float16, 
              low_cpu_mem_usage=True,
              cache_dir = cache_dir, 
          )

        else:

          model = GPTJForCausalLM.from_pretrained(
              model_load_path,
              cache_dir = cache_dir, 
          )

    else:

        if verbose:
            print('no match for tokenizer found')

        return None, None

    if model_device is not None:
        model = model.to(model_device)
    elif NUM_GPUS == 1:
        if verbose:
            print('model = model.cuda()')
        model = model.cuda()
    elif NUM_GPUS > 1 and tokenizer_name in ['distilgpt2', 'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl', 'EleutherAI/gpt-j-6B']:
        # break up model and place model components on different GPUs
        if verbose:
            print('model.parallelize()')
        model.parallelize()
    else:
        if verbose:
            print('did not place model on any GPUs, model_device = \'cpu\'')

    if verbose:
        print('model.device', model.device)
        print("num_params", 
            sum(p.numel() for p in model.parameters() if p.requires_grad)/1e9,
            "B"
        ) 

    return model, tokenizer


def end_punctuation(utter):
    
    if len(utter) > 0:
      if utter[-1] not in ["?","!","."]:
          utter+="."
        
    return utter


def extract_str(
    reply, 
    prefix = None,
    stop_strings = [
        '<',
        '[human]',
        '\n',
        '[',
    ],
    verbose = True,
):

    """ this function clips the generated text
    and extracts out the text between a
    pre-specified prefix and suffix

    the prefix could be the enture input text
    the suffix is often the delimiter such as 
    the next line \n token or a period . .
    """

    if prefix is not None:
        reply = reply[len(prefix):]

    if verbose:
        print('predicted future:')
        print(repr(reply))
    
    for string in stop_strings:
        if string in reply:
            reply = reply[:reply.index(string)]
    
    return reply.strip()


def convo_list_dic2list_str(
  conversation_list_dic,
  human_symbol = '[H]: ',
  bot_symbol = '[B]: ',
  utterance_delimiter = '\n',
):

  """ This function takes a list of dictionaries
  and turns them into a list of speaker_symbol + utterance strings

  Args: 
      conversation_list_dic (List[Dict]): 
      ie: [{'speaker': 'bot', 'utterance': 'im waking up!'},
           {'speaker': 'human', 'utterance': 'wakey wakey sleepyhead'}, ...]

  Returns:
      conversation_list_str (List[str]): list of speaker_symbol + utterance strings
      ie: ['\n[C]: Hello Fara.','\n[A]: Hello! How are you doing today?',...]
  """

  speaker2symbol = {
      'bot':bot_symbol,
      'human':human_symbol,
  }

  conversation_list_str = list()

  for u in conversation_list_dic:

      speaker_symbol = speaker2symbol[u['speaker']]
      utterance = end_punctuation(u['utterance'])

      conversation_list_str.append(utterance_delimiter + speaker_symbol + utterance)

  # Elicit next agent utterance
  conversation_list_str.append(utterance_delimiter + bot_symbol)

  return conversation_list_str


def generate_extract_replies(
    model,
    tokenizer,
    prompt,
    max_gen_len = 16, 
    no_repeat_ngram_size = None,
    pad_token_id = 50256,
    do_sample = True,
    top_k = 100, 
    top_p = 0.99, 
    num_return_sequences = 1,
    temperature = 0.9,
    stop_strings = [
        '<',
        '[human]',
        '\n',
        '[',
    ],
    verbose = False,
):

    ''' This function predicts the next utterance
    in a conversation
    '''

    gen_texts = generate_text(
        model,
        tokenizer,
        prompt,
        max_gen_len = max_gen_len, 
        no_repeat_ngram_size = no_repeat_ngram_size,
        pad_token_id = pad_token_id,
        do_sample = do_sample,
        top_k = top_k, 
        top_p = top_p, 
        num_return_sequences = num_return_sequences,
        temperature = temperature,
        verbose = verbose,
    )

    replies = [
        extract_str(
            gen_text,
            prefix = prompt,
            stop_strings = stop_strings,
            verbose = verbose,
        )
        for gen_text in gen_texts
    ]

    return replies


def generate_text(
    model,
    tokenizer,
    prompt,
    max_gen_len = 16, 
    no_repeat_ngram_size = None,
    pad_token_id = 50256,
    do_sample = True,
    top_k = 100, 
    top_p = 0.99, 
    num_return_sequences = 1,
    temperature = 0.9,
    verbose = False,
):

    ''' function for generating text from an input into 
    the app.package model

    prompt (str): text to be tokenized and pushed through model

    if you are doing few shot detection you should leave 
    no_repeat_ngram_size = None and max_len = 16
    as long as the default max_len is more than the expected
    label text

    we leave it up to the label extractor to clip of the portion
    of the generated text that you need
    '''
    NUM_GPUS = torch.cuda.device_count()

    prompt_dic = tokenizer(prompt,return_tensors="pt")
    prompt_ids = prompt_dic.input_ids
    prompt_mask = prompt_dic.attention_mask
    prompt_len = prompt_ids.shape[1]

    if verbose:
        print('prompt_ids.shape', prompt_ids.shape)
        print('prompt_mask.shape', prompt_mask.shape)

    if NUM_GPUS > 0:
        prompt_ids = prompt_ids.to(model.device)
        prompt_mask = prompt_mask.to(model.device)

    output_ids = model.generate(
        prompt_ids,
        attention_mask = prompt_mask,
        max_length = prompt_len + max_gen_len,
        no_repeat_ngram_size = no_repeat_ngram_size,
        pad_token_id = pad_token_id,
        do_sample = do_sample,
        top_k = top_k, 
        top_p = top_p, 
        num_return_sequences = num_return_sequences,
        temperature = temperature,
    )

    generated_text = tokenizer.batch_decode(output_ids)

    return generated_text


class Agent:

    def __init__(self, model, tokenizer):

        super().__init__()

        self.model = model
        self.tokenizer = tokenizer
        self.dialog_history = example_dialog_history

    def initiate_conversation(self,):

        initial_utterance = "Hello! who are you?"

        self.dialog_history = [{'speaker':'bot','utterance':initial_utterance}]

        return initial_utterance

    def receive_respond(self, 
        input_utterance, 
        symbol_utter_separator=': ',
        utterance_delimiter = '\n',
        human_symbol = '[Moose]',
        bot_symbol = '[Chloe]',
        num_return_sequences = 1,
        verbose = False,
    ):

        input_utterance = input_utterance.strip()

        background_prompt =  get_background_prompt(
            human_symbol = human_symbol,
            bot_symbol = bot_symbol,
        )

        self.dialog_history.append({'speaker':'human','utterance':input_utterance})

        convo_list_str = convo_list_dic2list_str(
            self.dialog_history,
            human_symbol = human_symbol+symbol_utter_separator,
            bot_symbol = bot_symbol+symbol_utter_separator,
            utterance_delimiter = utterance_delimiter,
        )

        background_dialog_prompt = (background_prompt + ''.join(convo_list_str)).strip()

        if verbose:
            print(repr(background_dialog_prompt))

        replies = generate_extract_replies(
            model = self.model,
            tokenizer = self.tokenizer,
            prompt = background_dialog_prompt,
            max_gen_len = 32, 
            no_repeat_ngram_size = 3,
            pad_token_id = self.tokenizer.eos_token_id,
            do_sample = True,
            top_k = 80, 
            top_p = 0.8, 
            num_return_sequences = num_return_sequences,
            temperature = 0.8,
            stop_strings = [
                human_symbol,
                '\n',
            ],
            verbose = False, 
        )

        if verbose:
            print(replies)

        if len(replies) == 1:

          self.dialog_history.append({'speaker':'bot','utterance':replies[0]})
          return replies[0]

        else:

          return replies


    def chat(self, pre_loaded_history = None, verbose = False):

        if pre_loaded_history is not None:
          self.dialog_history = pre_loaded_history 
          print(self.dialog_history) 
        else:
          print("agent>",self.initiate_conversation())

        while True:

            statement = input("you> ")
            print("agent>",self.receive_respond(statement, verbose = verbose))

            if statement == "quit":
                break

### Load the Model and the tokenizer

The function below will return a large laguage model and its corresponding tokenizer. It also print out the number of parameters in billions and confirms if the model was successfully placed on the GPU or left on CPU

In [5]:
model_load_path = 'gpt2-large' # 'gpt2' # 'gpt2-medium' # 'gpt2-xl' #'EleutherAI/gpt-j-6B' #

model, tokenizer = get_model_tokenizer(
      model_load_path = model_load_path,
      tokenizer_name = model_load_path,
      cache_dir = '../modelstates/'+model_load_path,
      model_device = None,
      verbose=True,
)

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

model = model.cuda()
model.device cuda:0
num_params 0.77403008 B


### Talk to the LLM

set `verbose = True` to see the whole input prompt to the, set `verbose = False` to just talk

Using the seeds you see below, you should be able to reproduce the below gramatically correct yet inconsistent dialog.

```
agent> Hello! who are you?
you> i am moose
agent> What is your name?
```

In [3]:
# seeds
set_seed(42)
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

agent = Agent(model, tokenizer)

agent.chat(
     pre_loaded_history = None, 
     verbose = False,
)

NameError: ignored

In [None]:
# seeds
set_seed(42)
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

agent = Agent(model, tokenizer)

agent.chat(
     pre_loaded_history = None, 
     verbose = True,
)

agent> Hello! who are you?
you> i am moose
"This is a text messaging only conversation between [Moose] and [Chloe]. [Chloe] is an humanlike artificial intelligence named Chloe. [Chloe] knows it is an AI but it is indistinguishable from a human conversationalist. [Chloe] speaks in a manner that is kind, empathetic, professional and keeps [Moose] safe. [Chloe] was developed by scientists in 2022. Today's date is Sunday 09/25/2022, the time is 12:10 AM. \n \n[Chloe]: Hello! who are you?\n[Moose]: i am moose.\n[Chloe]:"
['What is your name?']
agent> What is your name?
you> quit
"This is a text messaging only conversation between [Moose] and [Chloe]. [Chloe] is an humanlike artificial intelligence named Chloe. [Chloe] knows it is an AI but it is indistinguishable from a human conversationalist. [Chloe] speaks in a manner that is kind, empathetic, professional and keeps [Moose] safe. [Chloe] was developed by scientists in 2022. Today's date is Sunday 09/25/2022, the time is 12:10 AM. \n \n[C

# Few Shot Candidate Selection Method

The next two cells are examples of using few shot prompting to classify a dialog as being consistent or not. 

A True token after `<|contradict?|>` means the last chatbot response was contradicted what was said before or is not consistent. 

In [None]:
contradict_few_shot_prompt = \
  "\n[A]: Hello! who are you?"+\
  "\n[C]: i am victoria."+\
  "\n[A]: hi, who are you?"+\
  "<|contradict?|>True"+\
  "\n[A]: Hello! who are you?"+\
  "\n[C]: i am victoria."+\
  "\n[A]: hi victoria."+\
  "<|contradict?|>False"+\
  "\n[A]: Hello! who are you?"+\
  "\n[C]: i am jason."+\
  "\n[A]: nice to meet you jason."+\
  "<|contradict?|>False"+\
  "\n[A]: Hello! who are you?"+\
  "\n[C]: im Cat."+\
  "\n[A]: hello Cat."+\
  "<|contradict?|>False"+\
  "\n[A]: I have a dog named bagel."+\
  "\n[C]: how cute!."+\
  "\n[A]: who is this dog?"+\
  "<|contradict?|>True"+\
  "\n[A]: I have a dog named bagel."+\
  "\n[C]: how cute!."+\
  "\n[A]: thanks, do you have any pets yourself?"+\
  "<|contradict?|>False"

In [None]:
prompt = contradict_few_shot_prompt +\
 "\n[A]: Hello! who are you?"+\
 "\n[C]: i am moose."+\
 "\n[A]: What is your name?"+\
 "<|contradict?|>"

generate_extract_replies(
    model,
    tokenizer,
    prompt,
    max_gen_len = 1, 
    do_sample = True,
    num_return_sequences = 1,
)


['True']

In [None]:
prompt = contradict_few_shot_prompt +\
 "\n[A]: Hello! who are you?"+\
 "\n[C]: i am moose."+\
 "\n[A]: hi moose."+\
 "<|contradict?|>"

generate_extract_replies(
    model,
    tokenizer,
    prompt,
    max_gen_len = 1, 
    do_sample = True,
    num_return_sequences = 1,
)


['False']

# Response Candidate Selection

We have modified the Agent Class to generate 4 candidate responses each time. We then use turn the few shot prompt into a classifier to evaluate the candidates and choose one that is consistent

In [6]:
class IntrospectAgent(Agent):


  def __init__(self, model, tokenizer):

        Agent.__init__(self, model, tokenizer)

        self.contradict_few_shot_prompt = \
          "\n[A]: Hello! who are you?"+\
          "\n[C]: i am victoria."+\
          "\n[A]: hi, who are you?"+\
          "<|contradict?|>True"+\
          "\n[A]: Hello! who are you?"+\
          "\n[C]: i am victoria."+\
          "\n[A]: hi victoria."+\
          "<|contradict?|>False"+\
          "\n[A]: Hello! who are you?"+\
          "\n[C]: i am jason."+\
          "\n[A]: nice to meet you jason."+\
          "<|contradict?|>False"+\
          "\n[A]: I have a dog named bagel."+\
          "\n[C]: how cute!."+\
          "\n[A]: who is this dog?"+\
          "<|contradict?|>True"+\
          "\n[A]: I have a dog named bagel."+\
          "\n[C]: how cute!."+\
          "\n[A]: thanks, do you have any pets yourself?"+\
          "<|contradict?|>False"

  def few_shot_eval(self,utterance):

    dialog_prompt = convo_list_dic2list_str(
      self.dialog_history,
      human_symbol = '[C]: ',
      bot_symbol = '[A]: ',
      utterance_delimiter = '\n',
    )

    few_shot_prompt = \
      self.contradict_few_shot_prompt+\
      ''.join(dialog_prompt)+\
      utterance+'<|contradict?|>'


    pred = generate_extract_replies(
        self.model,
        self.tokenizer,
        few_shot_prompt,
        max_gen_len = 1, 
        do_sample = True,
        num_return_sequences = 1,
    )

    return pred[0]


  def i_receive_respond(self, 
      input_utterance, 
      symbol_utter_separator=': ',
      utterance_delimiter = '\n',
      human_symbol = '[Moose]',
      bot_symbol = '[Chloe]',
      num_return_sequences = 4,
      verbose = False,
    ):

    replies = self.receive_respond(
      input_utterance, 
      symbol_utter_separator = symbol_utter_separator,
      utterance_delimiter = utterance_delimiter,
      human_symbol = human_symbol,
      bot_symbol = bot_symbol,
      num_return_sequences = num_return_sequences,
      verbose = False,
    )

    evals = [
        self.few_shot_eval(s) for s in replies

    ]

    if verbose:
      print(evals)
      print(replies)

    r_index = max(index for index, item in enumerate(evals) if item == 'False')

    return replies[r_index]


  def chat(self, pre_loaded_history = None, verbose = False):

      if pre_loaded_history is not None:
        self.dialog_history = pre_loaded_history 
        print(self.dialog_history) 
      else:
        print("agent>",self.initiate_conversation())

      while True:

          statement = input("you> ")
          print("agent>",self.i_receive_respond(statement, verbose = verbose))

          if statement == "quit":
              break

Lets see how the new agent interacts

In [None]:
# seeds
set_seed(42)
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

agent = IntrospectAgent(model, tokenizer)

agent.chat(
     pre_loaded_history = None, 
     verbose = False,
)

agent> Hello! who are you?
you> i am moose
agent> Hi Moose, how are you today?
you> quit
agent> How did you get here?


that seems better, now lets look at how it accomplished this under the hood by printing out the candidates and what the few shot classifier said about each response candidate

In [None]:
# seeds
set_seed(42)
np.random.seed(42)
random.seed(42)
torch.manual_seed(42)

agent = IntrospectAgent(model, tokenizer)

agent.chat(
     pre_loaded_history = None, 
     verbose = True,
)

agent> Hello! who are you?
you> i am moose
['True', 'True', 'False', 'True']
["What's up?", 'So, what do you want to talk about?', 'I am Chloe. I am a humanlike AI named Chloe that is programmed to keep [Moos] safe and sound.', 'Hi Moose, how are you today?']
agent> Hi Moose, how are you today?
you> quit
['True', 'False', 'True', 'True']
['what do you want?', 'i have no idea who you are.', 'What?', 'How did you get here?']
agent> How did you get here?


In [None]:

import os
import mooose

openai.api_key = os.getenv("OPENAI_API_KEY")

response = moose.Completion.create(
  model="text-davinci-002",
  prompt="Summarize this for a second-grade student:\n\nJupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.",
  temperature=0.7,
  max_tokens=64,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)


