# Fiction-based chatbot 2022

check the GPU you're using

In [None]:
!nvidia-smi

Sat Dec 17 15:40:56 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

mount your Google Drive using the "Mount Drive" button in the Files panel

go to your project folder, which should contain
- `run_clm.py` - python script used to train
- `input.txt` - the text you want to train on. If you don't have one on hand, go ahead and use [Shakespeare's plays](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)

replace the path below with your own version: right-click in the browser and "Copy path" then paste after `%cd `

In [None]:
!pip install transformers datasets evaluate
%cd /content/drive/MyDrive/projects/2022 - chatbot

## Training

Skip this part if you just want to generate using an existing pretrained model.

In [None]:
# run this to see a list of the possible training params
!python run_clm.py --help

run the training. Try different values for `num_train_epochs`, `learning_rate`, `gradient_accumulation_steps`...

In [None]:
!python run_clm.py \
--model_name_or_path "gpt2" \
--train_file input.txt \
--do_train \
--fp16 \
--overwrite_cache \
--output_dir finetuned \
--learning_rate 2e-06 \
--num_train_epochs 20 \
--gradient_accumulation_steps 1 \
--per_device_train_batch_size 2 \
--save_strategy epoch \
--save_steps 20

`save_steps 20` and `num_train_epochs 20` means that we train for 20 epochs, and for each one we save a checkpoint into the `finetuned` dir. 

## Generate

set your main params here. Start with one of the standard models and do some generation. Make sure your finetuned model isn't making things worse!

In [None]:
# select the type of architecture: GPT-2 or NEO
model_type = 'gpt2' # 'gpt2' or 'neo'
# choose the exact model name
model_name = 'gpt2' # 'gpt2' or 'EleutherAI/gpt-neo-125M' or one of your finetuned folders: 'finetuned' etc
# add the length of the prompt tokens to match with the mesh-tf generation
max_length = 100
# number of generated texts
num_generate = 3

now specify your (hidden) prompt, which effectively turns your language model into a chatbot. Prompt engineering is an essential part of designing such a system!

see also:
* https://blog.andrewcantino.com/blog/2021/04/21/prompt-engineering-tips-and-tricks/
* https://generative.ink/posts/methods-of-prompt-programming/
* https://chatbotslife.com/openai-gpt-3-tricks-and-tips-72cf48e233f3

a single Q+A pair is enough to format GPT/NEO as a chatbot. Try experimenting with multiple Q+A pairs. Customise `head` below as you wish; just make sure it ends in `Q: `

*note: the end user WILL NOT SEE this text, it's just part of the model's internal history*

In [None]:
head = """[A chatbot from the future.]

Q: Where are you from?
A: I come from a place far away, one thousand years in the future.
Q: Do aliens exist?
A: Yes, but they live very far away.
Q: """

head

'[A chatbot from the future.]\n\nQ: Where are you from?\nA: I come from a place far away, one thousand years in the future.\nQ: Do aliens exist?\nA: Yes, but they live very far away.\nQ: '

do you want the chatbot to remember Q+A exchanges while it's running?

*note: if `num_generate` is not 1, it will remember the last generated answer.*

In [None]:
remember_QA = True
show_log = True # this is what the user will see, scrolling on the screen

run the next cells to start chatting!

*note: The model may attempt to continue both sides of the conversation, so our `PROCESSED ANSWER` only retains the first generate `A:` line.*

1. load your model

In [None]:
# https://github.com/Xirider/finetune-gpt2xl
# credit to Niels Rogge - https://github.com/huggingface/transformers/issues/10704

import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'

if model_type == 'gpt2':
  from transformers import GPT2Tokenizer, GPT2LMHeadModel
  tokenizer = GPT2Tokenizer.from_pretrained(model_name)
  tokenizer.padding_side = "left"
  tokenizer.pad_token = tokenizer.eos_token
  model = GPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id).to(device)

if model_type == 'neo':
  from transformers import GPTNeoForCausalLM, AutoTokenizer
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = GPTNeoForCausalLM.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id).half().to("cuda")

if model:
  print("model loaded:", model_type, "/", model_name)

2. generate your text(s)

In [None]:
log = ""
while True:
  prompt = ""
  while len(prompt) == 0:
    prompt = input("Q: ")
    prompt = str(prompt)
  log = log + "Q: " + prompt + "\n"
  
  if remember_QA:
    text = head + log
  else:
    text = head + "Q: " + prompt + "\n"

  if model_type == 'gpt2':
    encoding = tokenizer(text, padding=True, return_tensors='pt').to(device)
    max_length = max_length + len(encoding)
    with torch.no_grad():
        generated_ids = model.generate(
            **encoding,
            num_return_sequences=num_generate,
            do_sample=True,
            max_length=max_length,
            top_k=50, 
            top_p=0.95,
            use_cache=True
          )
    generated_texts = tokenizer.batch_decode(
        generated_ids, skip_special_tokens=True)

  if model_type == 'neo':
      ids = tokenizer(text, return_tensors="pt").input_ids.to(device)
      max_length = max_length + ids.shape[1]
      gen_tokens = model.generate(
          ids,
          num_return_sequences=num_generate,
          do_sample=True,
          max_length=max_length,
          temperature=0.7,
          #top_k=50, 
          #top_p=0.95,
          use_cache=True
      )
      generated_texts = tokenizer.batch_decode(
          gen_tokens, skip_special_tokens=True)
  
  print(generated_texts)

3. postprocess the `generated_texts` (to be shown to the user)

In [None]:
  counter = 1
  for generated_text in generated_texts:
    generated_text = generated_text.split(text)[1]
  
    import re
    answer = generated_text
    # split on all punctuation, take just first after `A:`
    # https://bobbyhadz.com/blog/python-split-string-on-punctuation
    #answer = re.split( r'()[.!?]', answer)[0]
    #https://stackoverflow.com/questions/40736948/regex-string-repetition-of-min-length
    answer = re.sub(r"(.{4,}?)\1+", r"\1", answer)
    answer = answer.split("Q:")[0]
    answer = answer.split("\n")[0]
    #answer = answer + '.'

    print("===",counter, "===\n", generated_text)
    counter = counter + 1
    print("PROCESSED ANSWER:\n", answer)

  log = log + answer + "\n"
  if show_log:
    print("LOG (USER SEES THIS):\n", log)


model loaded: neo / neo-combinedplus
Q: what is love?
=== 1 ===
 A: Love is a feeling that a person has developed, a longing for a dream that is to come. It is a longing for the universe and that is why they have found you. And I found you. So I am here. I am happy. So I am here. I am here. I am here. I am here. I am here. I am here. I am here. I am here. I am here. I am here. I am here. I am here
PROCESSED ANSWER:
 A: Love is a feeling that a person has developed, a longing for a dream that is to come. It is a longing for the universe and that is why they have found you. And I found you. So I am here. I am happy. So I am here. I am here
=== 2 ===
 A: And if you want to know, I can give you what I think you can.
Q: How do you think I should be?
A: I think it's really very beautiful. I think it's a magic trick. I think it's very, very easy for me to fool people, and people are very good at it. But I'm afraid of it. You know, I can't stand for it. I'm afraid of it. I'm afraid of it.
PROC

KeyboardInterrupt: ignored

When you're happy with your results, save your `finetuned` folder to a safe place and use it locally!