# Walkthrough 2: Interacting with LLMs programmatically

If you completed Walkthrough 1, you will know that a key risk of using public services such as ChatGPT is Data Privacy and we showed that we can run Open Source LLMs locally using initiatives such as Mozilla LlamaFile (https://github.com/Mozilla-Ocho/llamafile). However, this is not the only way we can access and use Open source LLMs and we will show you one way of doing this in this walkthrough. 

**Note** we can also access closed source LLMs such as GPT3.5/4.0 (which power ChatGPT) and while these can be private and generate good output, there is a cost associated with these.

Additionally, while you may have been using platforms such as ChatGPT or Copilot in a conversational manner where you have a sequence of interactions with the model and the model remembers (up to a point) the previous interactions, this is not the only way to interact with LLMs.


In this walkthrough we will use an Open Source LLM from HuggingFace and the HuggingFace Library to programmatically interact with a model.

>HuggingFace (https://huggingface.co) is an amazing platform and community that focuses on building models and datasets for Machine Learning. They also provide a Python Library that simplifies many of the tasks we need to perform to load and interact with models.

HuggingFace provides a set of pre-trained models that we can use locally if we want to protect our data or hosted. For today's walkthrough we will use a model locally since we are mostly addressing Data Privacy. 


However, another interesting aspect of being able to programmatically interact with our models is that we can start to build more interesting AI based solutions by chaining different models together, enable our solutions to access other sources of data and even other tools, rather than trying to express everything as a Prompt.

HuggingFace is a very popular library to build such applications so if you want to dive more deeply into this I would recommend the free HuggingFace NLP Course (https://huggingface.co/learn/nlp-course).

I would also recommend investigating the LangChain library (https://www.langchain.com/langchain) for building AI Agents and complex AI based applications.

# Related Resources:

| Link                                                       | Description                                                                                                                                                |
|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| https://www.datacamp.com/blog/top-open-source-llms         | Introductory article on using Open Source LLMs                                                                                                             |
| https://huggingface.co/models                              | Link to the HuggingFace Pre-trained Models Hub                                                                                                             |
| https://huggingface.co/learn/nlp-course                    | A free course provided by HuggingFace that gets you up to speed with Natural Language Processing using the HuggingFace library (requires Python knowledge) |
| https://python.langchain.com/docs/get_started/introduction | Free tutorial provided by LangChain (requires Python knowledge)                                                                                            |


## Using LLMs programmatically using HuggingFace Models

During the challenge we used LLMs by typing in our prompts and we enhanced our prompts through various Prompt Engineering methods. Writing prompts into a chat style interface is certainly one way of interacting with a LLM, however it is not the only way we could use LLMs.

Imagine you've crafted a prompt to assess the quality of a defect report, perhaps it scores the defect report in terms of clarity and completeness and if the defect is lacking the LLM outputs a set of questions for the defect author. 

This might be a useful addition to your defect handling workflow but let's face it, if you needed to type (or copy) the prompt and the defect description into a Chat Interface for each defect it will quickly become tedious.

Instead, we can programmatically extract any new defects and for each one call an LLM to evaluate the defect report. We can achieve this in a straightforward manner using LLM from HuggingFace and a bit of Python code.

> For this walkthrough, you do not need to write any code. The code presented here is complete and designed to show we could integrate AI models. If you want to learn more about the coding aspect of integrating AI models then I would suggest you start with the HuggingFace NLP Course.


# Let's Get Started

This walkthrough has the following elements

1. Install some required Dependencies
2. Download and configure the Open Source LLM model
3. Start interacting with the model


**IMPORTANT**

The LLM model we are running needs a GPU so, before we begin running code cells we need to switch to using a GPU.

> You can do this by selecting "Runtime -> Change Runtime Type" from the Colab menu.
>
> Then select the "T4 GPU" option and click on "Save"

This will give you access to run on a small GPU processor for a period of time.
Do this now before continuing with the walkthrough.

Notes:
* Occasionally Colab will not have any available GPUs for you to use; if this happens you will need to try again later.
* The name of the Runtime may be different in different regions so just pick one that has "GPU" in the label.

# 1. Install Dependencies First we need to install some dependencies
Run the following code cells to install some dependencies into your Colab environment.

This can take a few minutes to complete.

In [None]:
!pip install transformers bitsandbytes>=0.39.0 accelerate

# 2. Download and configure the Open Source LLM 

Next we will use the Huggingface Transformer Library to create a pre-trained model based on the `Open-Source Mistral-7B` model.

This model has 7 Billion parameters which may sound large but if you consider that GPT3.5 (which powers ChatGPT) has approximately 20 Billion parameters, it is quite small really. However, it is about the largest model we can use within Colab. 

The HuggingFace Transformer Library makes loading a model very easy...Just one line of code 

Running the following cell will download the model - this could take a few minutes to complete

In [ ]:
from transformers import AutoModelForCausalLM
model_name = "mistralai/Mistral-7B-v0.1" 

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)

If you remember when you created your prompts earlier in the challenge, you used fairly natural langauge to specify what you wanted.

However, machines don't really understand text, they only deal with numbers so we need to perform a task called Tokenization, which converts the text we type into a numerical form that the model can process. Moreover, the method we use to tokenize our text needs to be one that the model understands.

Tokenization is an involved process but again, HuggingFace make this really easy for us and with just a few lines of code we can tokenize sentences.   

In [ ]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")


To understand what the Tokenizer does, you can run the following cell with some text to see what the model receives as input

In [ ]:
sample_input = "I've nearly completed the Ministry of Testing 30 Days of AI in Testing challenge"
sample_model_input = model_inputs = tokenizer([sample_input], return_tensors="pt")

print(sample_model_input)

As you can see from the above output, our text gets converted into a large set of numbers.

# 3. Start interacting with the model
To interact with the LLM model we need to provide our prompt and call the *generate()* method for the model.
The model will output a *tokenized* version of the output that we then need to decode back into text. 

Yet again, HuggingFace makes this a straightforward task. For convince we've wrapped all the HuggingFace code for generation into the following function.

In [ ]:
def get_model_response(model, tokenizer,  model_prompt) -> str:
    """"
    This function wraps the calls to the tokenizer to tokenize the model_prompt, calls the model's generate function then decodes the response.
    """
    tokens = tokenizer([sample_input], return_tensors="pt").to("cuda")
    generated_ids = model.generate(**tokens)
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    
    return response

Now let's create a prompt and get a response from our local model.

The following cell contains a `prompt_text` and you can replace the text between the quotation marks with your own prompt.

You can also change the text and re-run the cell as many times as you want.

In [ ]:
prompt_text = "What are the key risks associated with using AI for decision making?"

print(get_model_response(model, tokenizer, prompt_text))

And that is it - using libraries such as HuggingFace can simplify the building of AI empowered tools. 
The simple and versatile programming library along with a large (and ever-growing) pre-trained model hub is a powerful combination.

If you want to learn more about building using HuggingFace, I would recommend the HuggingFace NLP Course (https://huggingface.co/learn/nlp-course)


# Questions for Reflection
This walkthrough covered a basic setup for programmatically interacting with a local LLM, when working with models in this way, we have a lot of scope for adding bespoke logic, chaining interactions and integrating external tool use into our AI empowered tooling. From an AI in Testing perspective, this can open up a world beyond prompt engineering and allow teams to innovate on how they use AI in Testing. 

Before closing this workbook, reflect on the following questions:

1. How might type of approach be used to address other concerns raised about using AI in Testing?
2. How might your team use this type of approach to build AI empowered applications or integrate LLMs into existing test tooling you use?