# Open Source LLM Prompt Engineering with Falcon-7b-instruct
* Notebook by Adam Lang
* Date: 3/1/2024
* We will demonstrate prompt engineering using an open source model from huggingface, Falcon-7B-Instruct.
* Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.
* Huggingface model card: https://huggingface.co/tiiuae/falcon-7b-instruct


### Install Libraries from huggingface
* Transformers
* Accelerate
    * Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code! In short, training and inference at scale made simple, efficient and adaptable.
    * docs: https://huggingface.co/docs/accelerate/en/index

* einops
    * "einops stands for Einstein-Inspired Notation for operations. TLDR of the library would be, einops makes matrix ops more comprehensible and intuitive."
    * blogpost about this library: https://medium.com/ml-summaries/einops-making-tensor-ops-easy-in-deep-learning-236d2a1dc631

* bitsandbytes
    * From the repo: "lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplicaiton (LLM.int8()), and 8 & 4-bit quantization functions.
    * Docs page: https://huggingface.co/docs/bitsandbytes/main/en/index

In [3]:
!pip install -q transformers einops accelerate langchain bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.2/806.2 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m252.4/252.4 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.5/64.5 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━

### Create token from huggingface

In [4]:
# import token from HF
from huggingface_hub import login
login("<insert huggingface token here>")

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Load Falcon-7B model from huggingface

In [5]:
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
import torch

# specify model to use
model = "tiiuae/falcon-7b-instruct"

In [6]:
# load pre-trained tokenizzer
tokenizer = AutoTokenizer.from_pretrained(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

In [7]:
# setup pipeline for text generation
pipeline = pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)


config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

# Define function that accepts prompt and returns response

In [8]:
from langchain import PromptTemplate, LLMChain

# define prompt function
def get_response(question):
  template = """
  You are an intelligent chatbot. Help the following question with brilliant answers.
  Question: {question}
  Answer:"""
  prompt = PromptTemplate(template=template, input_variables=["question"])
  llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})
  llm_chain = LLMChain(prompt=prompt, llm=llm)

  question = question

  print(llm_chain.run(question))

# Prompt Templates
* We can use the same prompt template from langchain.

In [15]:
question = "What is the tallest mountain in Vermont and what is it's elevation in feet?"


In [16]:
print(get_response(question))

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


 The tallest mountain in Vermont is Mount Mansfield and its elevation is 4,346 feet (1,325 meters).
None
