# Week 8: GenerativeAI Chatbots

This week you will be learning about how about how to generate text with large language models (LLMs). 

This code uses the excellent [huggingface transformers library](https://huggingface.co/docs/transformers/en/index), there are many LLMs and other transformer-based models available in this library. 

Today you will be using [huggingface's SmolLM](https://huggingface.co/blog/smollm), an LLM which has a small parameter count, which means it does not use much space or memory on your PC to run, and can be run efficiently on low-powered consumer hardware like laptops. SmolLM was trained on [a curated dataset of educational and synthetically generated text](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) designed specifically for conversational instruct-based LLMs.

Before you get started though, let just make sure that this notebook is setup to run using the `nlp` conda environment that you created at the start of term.

To set this notebook to the right environment, click the **Select kernel** button in the top right corner of this notebook, then select **Python Environments...** and then select the environment `nlp`.

To double check you have done this correctly, hit the run cell button (▶) on the cell below:

In [None]:
import os
print(os.environ['CONDA_DEFAULT_ENV'])

Now you can import the libraries you will need:

(this cell also uses [environment variables](https://docs.python.org/3/using/cmdline.html#environment-variables) to suppress warnings and give other instructions to the huggingface transformers library to prevent it printing unwanted messages to the terminal and disrupting the flow of conversation from the chatbot.)

In [None]:
import os
import re
import warnings
warnings.filterwarnings('ignore')
os.environ['TRANSFORMERS_VERBOSITY'] = 'error'
os.environ['TOKENIZERS_PARALLELISM'] = 'False'

from transformers import AutoModelForCausalLM, AutoTokenizer

## Comparing different types of LLM

In this notebook you will see code to run two different versions of the SmolLM model.

The first is the [standard LLM](#standard-llm) model that predicts the next token in a sequence without being conditioned on prompt instructions. 

The second is the [Instruct-LLM](#instruct-llm) model that has been fine-tuned on an instruct dataset. This allows it to be given prompt instructions that it responds to appropriately.

### Standard LLM

This cell loads in the standard [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) model weights and generates text based on a prompt. This standard LLM is useful for completing text based on a prompt, for instance finish a story or song lyrics (these types of models are also used in email-autocomplete and code co-pilot autocomplete).

In [None]:
checkpoint = "HuggingFaceTB/SmolLM-135M"

device = "cpu" 
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

input_str = "There once was a man from Nantucket"

model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode(input_str, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.5, top_p=0.99, min_p=0.1, do_sample=True)
print(tokenizer.decode(outputs[0]))

Try changing the `input_str` to get generations based on different inputs. Try change the parameters `temperature` `top_p` `min_p` and `max_new_tokens` to alter the randomness and length of the generated output.

Try comparing the results from the same prompt in this model and with the [Instruct-LLM](#instruct-llm) model.

### Instruct LLM

This cell loads in the [Instruct-SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct) model weights and generates text based on a prompt. This instruct LLM has been fine-tuned on an instruct dataset. This conditions the model to answer questions and perform tasks based on prompts

In [None]:
checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct"

device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

user_prompt_str = "How old is Taylor Swift?"
system_prompt_str = "You are a friendly chatbot that answers questions in single sentences."

messages = [
    {"role": "system", 
     "content": system_prompt_str,
    },
    {"role": "user", 
     "content": user_prompt_str
    }]

input_text=tokenizer.apply_chat_template(messages, tokenize=False)

inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
input_token_len = inputs.shape[-1]
outputs = model.generate(inputs, max_new_tokens=100, temperature=0.5, top_p=0.99, min_p=0.1, do_sample=True)
out_str = tokenizer.decode(outputs[0][input_token_len:])
print(out_str)

Run the following cell to remove the instruct formatting from the generated_output:

In [None]:
out_str = re.sub(r'(<\|im_start\|>assistant\n)|(<\|im_end\|>)','',out_str)
print(out_str)

Now try changing the `user_prompt_str` to get generations based on different inputs. To alter the behaviour of the model when generating the output try altering the `system_prompt_str`.

Try change the parameters `temperature` `top_p` `min_p` and `max_new_tokens` to alter the randomness and length of the generated output.

How do the results from this model compare to the same prompt in the [Standard LLM](#standard-llm) model?

### Adapting LLMs for different kinds of chatbot

Go to the file [week-8b-instruct-LLM-chatbot.py](week-8b-instruct-LLM-chatbot.py), here is all of the code already generated for a simple wrapper around the [Instruct-LLM](#instruct-llm) chatbot. 

There are a number of tasks you could do to build on this code. Try these in any order, and if you think you could use this code in your chatbot for your project, experiment with customising it to best fit with your project.

**Task A:** Try changing the system prompt and other hyper-parameters to get the chatbot to behave in a different way, with either a different personality or functionality.

**Task B:** Feed local data or data from the web into the prompt (this could be in both the user or system prompt) to make a basic retrieval-augmented-generation (RAG) chatbot.

**Task C:** Write scaffolding around the code to use the instruct LLM for a specific task, this could be:
- Using an LLM as a partner in an interactive game
- Using the LLM to write parts of a story 
- Using an LLM to summarise text
- Using an LLM to make suggestions based on a user input

**Task D:** Integrate this LLM into another chatbot. Can you combine this with a retrieval based chatbot? an intent based chatbot? or a rule-based chatbot with only uses the LLM at specific times?

**Task E:** Create a chatbot that uses a [standard LLM model](#standard-llm) instead of an instruct LLM. Think about what kinds of tasks a standard LLM might be better at.