This tutorial will utilize Prompt Engineering with LLM.
The prompts are from https://www.promptingguide.ai/techniques/knowledge

In [3]:
!pip3 install transformers>=4.32.0 optimum>=1.12.0
!pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

Looking in indexes: https://pypi.org/simple, https://huggingface.github.io/autogptq-index/whl/cu118/
Collecting auto-gptq
  Downloading https://huggingface.github.io/autogptq-index/whl/cu118/auto-gptq/auto_gptq-0.6.0%2Bcu118-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.0/5.0 MB[0m [31m46.6 MB/s[0m eta [36m0:00:00[0m
Collecting rouge (from auto-gptq)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Collecting gekko (from auto-gptq)
  Downloading gekko-1.0.6-py3-none-any.whl (12.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
Collecting peft>=0.5.0 (from auto-gptq)
  Downloading peft-0.7.1-py3-none-any.whl (168 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rouge, gekko, peft, auto-gptq
Successfully install

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Test

In [7]:
prompt = "Tell me about AI"
prompt_template=f'''[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
Tell me about AI[/INST]

AI (Artificial Intelligence) refers to the development of computer systems able to perform tasks typically requiring human intelligence such as visual perception, speech recognition, decision-making, and language translation. The purpose of artificial intelligence is to enable machines to think, learn, reason, and solve problems like humans do. It involves creating intelligent machines capable of learning from data, making de

**Zero-Shot Prompt**

In [9]:
prompt = "Text: I think the vacation is okay. Sentiment:"
prompt_template=f'''[INST] <<SYS>>
Classify the text into neutral, negative or positive.
<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>
Classify the text into neutral, negative or positive.
<</SYS>>
Text: I think the vacation is okay. Sentiment:[/INST]

Classification: Neutral


**Few-Shot Prompt**

In [12]:
prompt = "An example of a sentence that uses the word farduddle is:"
prompt_template=f'''[INST] <<SYS>>
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus. To do a 'farduddle' means to jump up and down really fast.
<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus. To do a 'farduddle' means to jump up and down really fast.
<</SYS>>
 An example of a sentence that uses the word farduddle is:[/INST]

The children were playing outside and they started farduddling on the trampoline.

It seems you have provided two different words with similar meanings. "Whatpu" and "Farduddle" are not related words, and they don't have the same meaning.

* Whatpu (noun) refers to a small, furry animal native to Tanzania.
* Farduddle (verb) means to jump up and down really fast, especially in excitement or joy.
So, if you want to use one of these words in a sentence, make sure you use the correct one based on its meaning and context.


**Task1**:
using few shot promot to let the model change the answer from Neutral to Good. Text: I think the vacation is okay. Sentiment:

In [34]:
prompt = "Text: I think the vacation is okay. Sentiment:"
prompt_template=f'''[INST] <<SYS>>

<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>

<</SYS>>
Text: I think the vacation is okay. Sentiment:[/INST]

You seem to have a neutral or indifferent attitude towards the vacation. You might be feeling lukewarm about it, or you might not have strong feelings one way or the other. Here are some possible sentiment labels for your text:

* Neutral: The vacation is just okay, nothing special or memorable. (e.g., "I think the vacation was okay.")
* Mildly positive: The vacation was enjoyable, but there were no standout moments or experiences. (e.g., "I think the vacation was pretty good.")
* Mildly negative: The vacation was disappointing or underwhelming in some ways. (e.g., "I think the vacation was alright, but...")


**Chain-of-Thought Prompting**

In [49]:
prompt = '''
I went to the market and bought 101 apples. I gave 23 apples to the neighbor and 22 to the repairman. I then went and bought 11 more apples and ate 33. How many apples did I remain with?
'''
prompt_template=f'''[INST] <<SYS>>

<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>

<</SYS>>

I went to the market and bought 101 apples. I gave 23 apples to the neighbor and 22 to the repairman. I then went and bought 11 more apples and ate 33. How many apples did I remain with?
[/INST]

Let's break down the information given in the problem:

* You went to the market and bought 101 apples.
* You gave 23 apples to the neighbor.
* You gave 22 apples to the repairman.
* After giving away apples, you bought 11 more apples.
* You ate 33 apples.

Now, let's calculate how many apples you have remaining:

You started with 101 apples.
You gave away 23 + 22 = 45 apples.
So, you have 101 - 45 = 56 apples remaining.
Therefore, you have 56 apples remaining.


In [48]:
prompt = '''
I went to the market and bought 101 apples. I gave 23 apples to the neighbor and 22 to the repairman. I then went and bought 11 more apples and ate 33. How many apples did I remain with?
'''
prompt_template=f'''[INST] <<SYS>>
You are good at math. Solve the question step by step.
<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

*** Pipeline:
[INST] <<SYS>>
You are good at math. Solve the question step by step.
<</SYS>>

I went to the market and bought 101 apples. I gave 23 apples to the neighbor and 22 to the repairman. I then went and bought 11 more apples and ate 33. How many apples did I remain with?
[/INST]

Great! Let's solve this problem step by step:

1. You started with 101 apples.
2. You gave 23 apples to your neighbor, so you have 101 - 23 = 78 apples left.
3. You gave 22 apples to the repairman, so you have 78 - 22 = 56 apples left.
4. You then bought 11 more apples, so you have 56 + 11 = 67 apples left.
5. Finally, you ate 33 apples, so you have 67 - 33 = 34 apples left.

Therefore, you remained with 34 apples after buying more and eating some.


**Task2:**
solve this question:
When I was 6 my sister was half my age. Now
I’m 70 how old is my sister?

In [None]:
prompt = '''

'''
prompt_template=f'''[INST] <<SYS>>

<</SYS>>
{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

**Task3**
Ask the model to solve this by any prompts:
"The odd numbers in this group add up to an even number: 18, 10, 5.
A:"

In [None]:
prompt = '''
'''
prompt_template=f'''[INST] <<SYS>>
<</SYS>>

{prompt}[/INST]

'''



# Inference can be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

## Finetuning LLMs. 
LLMs is too big to finetune the whole model so we can finetune it by LORA.
LORA https://huggingface.co/docs/peft/conceptual_guides/lora

In [None]:
from peft import LoraConfig
from peft import get_peft_model

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [None]:
model.enable_input_require_grads()


model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

**Task**: Try to finetune the model with the dataset and code in Week3 Tutorial.