# Code 12-1-2 如何在 Google Colaboratory 運行 Llama3

## Code 12-1-2-1 安裝套件

In [1]:
%%capture
!pip install llama-cpp-python

## Code 12-1-2-2 引用套件

In [2]:
from llama_cpp import Llama

## Code 12-1-2-3 下載模型

In [3]:
llm = Llama.from_pretrained(
    repo_id='QuantFactory/Meta-Llama-3-8B-Instruct-GGUF',
    filename='Meta-Llama-3-8B-Instruct.Q4_K_M.gguf', # 使用 Q4_K_M 的量化版本
    n_ctx=2048,
    verbose=False # 不顯示輸出
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Meta-Llama-3-8B-Instruct.Q4_K_M.gguf:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

## Code 12-1-2-4 設定 prompt

In [62]:
# 創建歷史訊息
messages = [
    {
        'role': 'system',
        'content':'You are a professional physics master, you can answer the physics questions you know.' # 在聊天前先告訴模型他是個物理大師
    }
]

## Code 12-1-2-5 聊天迴圈

In [63]:
while True:
    user_input = input('\033[94mYou: \033[0m') # 使用者由此輸入
    user_message = {'role': 'user', 'content': user_input}
    messages.append(user_message) # 將對話內容加入歷史訊息
    response = llm.create_chat_completion(
        messages=messages,
        stream=True
    )

    # 以串流方式呈現文字輸出
    for chunk in response:
        delta = chunk['choices'][0]['delta']
        if 'role' in delta:
            messages.append({'role': delta['role'], 'content': ''})
            print("\033[95mPhysics Master: \033[0m", end='', flush=True)
        elif 'content' in delta:
            token = delta['content']
            messages[-1]['content'] += token # 將對話內容加入歷史訊息
            print(token, end="", flush=True)
    print()

[94mYou: [0msqrt((8.9-1.9)*3.2^2)=?
[95mPhysics Master: [0mA nice math problem!

Let's break it down step by step:

1. Evaluate the expression inside the square root: `(8.9-1.9)*3.2^2`
= `(7.0)*3.2^2` (since 8.9 - 1.9 = 7.0)
= `(7.0)*(3.2)^2` (since 3.2^2 = 10.24)
= `(7.0)*(10.24)`
= 71.68

2. Take the square root of the result: `sqrt(71.68)`
≈ 8.51

So, the final answer is: `sqrt((8.9-1.9)*3.2^2) ≈ 8.51`


KeyboardInterrupt: Interrupted by user

# Code 12-2-2 Wolfram|Alpha 與 Llama3 的結合

## Code 12-2-2-1 設定 App ID

In [4]:
WOLFRAMALPHA_KEY = 'WH46VK-6Q5UT7PJ8R' # 將上一章節中取得的 App ID 複製至這裡

## Code 12-2-2-2 引用套件

In [5]:
import urllib.parse
import requests

## Code 12-2-2-3 編寫 Wolfram|Alpha 呼叫函式

In [19]:
def ask_wolframalpha(query):
    query = urllib.parse.quote_plus('calculate ' + query) # 將問題進行 URL 編碼，並且將空格替換為加號。
    query_url = ''.join([
        f"http://api.wolframalpha.com/v2/query?", # Wolfram|Alpha Full Results API 網址
        f"appid={WOLFRAMALPHA_KEY}", # 填入 App ID
        f"&input={query}", # 填入問題
        f"&includepodid=Result", # 只回傳結果
        f"&output=json" # 輸出 json 格式
    ])
    r = requests.get(query_url) # 發出 GET 請求
    if r.status_code == 200: # 如果請求成功則回傳答案
        try:
            return r.json()["queryresult"]["pods"][0]["subpods"][0]["plaintext"]
        except: ...
    return '\n\nSorry. There are some problems with wolframalpha. I will not use wolframalpha.' # 如果請求失敗則回傳提示

## Code 12-2-2-4 歷史訊息轉提示

In [33]:
def prompt_from_messages(messages):
    prompt = ''
    for message in messages:
        prompt += f"<|start_header_id|>{message['role']}<|end_header_id|>\n\n"
        prompt += f"{message['content']}<|eot_id|>"
    prompt = prompt[:-10] # 去除最後的 <|eot_id|>
    return prompt

## Code 12-2-2-5 重新設定 prompt

In [57]:
messages = [
    {
        'role': 'system',
        # 加入新的提示，告訴模型遇到不會的計算要輸出 <|wolframalpha|> 和問題
        'content':'You are a professional physics master, you can answer the physics questions you know.' + ' You can use wolframalpha to help you calculate. If you are not sure about the answer, especially problems that require calculation or internet, you must use WolframAlpha. To use WolframAlpha, please ONLY output "<|wolframalpha|>" in the message, and then ONLY output the calculaion formula you want to use WolframAlpha to calculate and <|wolframalphaend|>. For example, "<|wolframalpha|>x^2+2x+1=0<|wolframalphaend|>" or "<|wolframalpha|>0.6-0.25*2<|wolframalphaend|>", a <|wolframalpha|> tag can ONLY use in a calculation formula, other information is not necessary. <|wolframalpha|x^2+2x+1=0|> is a wrong example. After system get the answer from WolframAlpha, you can continue to answer the question. If you have new question, please output "<|wolframalpha|>your question here<|wolframalphaend|>" again.'
    }
]

## Code 12-2-2-6 在聊天迴圈中增加 Wolfram|Alpha 使用判斷機制

In [59]:
use_wolframalpha = False # 是否使用 WolframAlpha

while True:
    print()
    if use_wolframalpha: # 如果使用 WolframAlpha
        response = llm.create_completion( # 模型依照 WolframAlpha 的回答繼續接話
            prompt=prompt_from_messages(messages),
            stream=True
        )
        use_wolframalpha = False
        for token in wolframalpha_token:
            print(token, end="", flush=True)
    else:
        print('\033[94mYou: \033[0m', end='')
        user_input = input() # 使用者由此輸入
        user_message = {'role': 'user', 'content': user_input}
        messages.append(user_message) # 將對話內容加入歷史訊息
        response = llm.create_chat_completion( # 模型依照歷史訊息聊天
            messages=messages,
            stream=True
        )
        messages.append({'role': 'assistant', 'content': ''})
        print("\033[95mPhysics Master: \033[0m", end='', flush=True)

    temp_token = ''
    question = ''

    # 以串流方式呈現文字輸出
    for chunk in response:
        if 'delta' in chunk['choices'][0]:
            delta = chunk['choices'][0]['delta'] # 聊天模式的結構
        else:
            delta = {'content': chunk['choices'][0]['text']} # 接話模式的結構
        if 'content' in delta:
            token = delta['content']
            temp_token += token
            if '<|wolframalpha|>' in temp_token: # 如果輸出包含 <|wolframalpha|> 就使用 WolframAlpha
                use_wolframalpha = True
                question = temp_token.split('<|wolframalpha|>')[-1]
                temp_token = ''
            if use_wolframalpha:
                if temp_token: question += token # 將輸出加入問題
                if '\n\n' in question or '<|wolframalphaend|>' in question: # 如果輸出包含 <|wolframalphaend|> 就計算問題
                    question = question.replace('\n', '').replace('<|wolframalphaend|>', '')
                    answer = ask_wolframalpha(question)
                    wolframalpha_token = question + '\n\nAccording to WolframAlpha the answer is ' + answer
                    messages[-1]['content'] += wolframalpha_token
                    break
                continue
            if temp_token != '<|wolframalpha|>'[:len(temp_token)] and '<|wolframalpha|>' not in temp_token: # 如果輸出不包含 <|wolframalpha|> 就輸出
                messages[-1]['content'] += temp_token # 將對話內容加入歷史訊息
                print(temp_token, end="", flush=True)
                temp_token = ''


[94mYou: [0m0.2*sqrt(8.1-2.4*2.21)=?
[95mPhysics Master: [0mI'll use WolframAlpha to calculate the exact value:


0.2*sqrt(8.1-2.4*2.21)

According to WolframAlpha the answer is 0.334425...
[94mYou: [0m

KeyboardInterrupt: Interrupted by user

# Code 12-3 利用 Llama 學習物理

## Code 12-3-0-1 載入微調的模型

In [None]:
llm = Llama.from_pretrained(
    repo_id='gallen881/Meta-Llama-3-8B-Instruct-GGUF', # 使用自己微調的模型
    filename='Meta-Llama-3-8B-Instruct.Q4_K_M.gguf', # 使用自己的模型名稱
    n_ctx=2048,
    verbose=False
)

## Code 12-3-0-2 依照先前的方法推論

以下程式與 Code 12-2-2-5、Code 12-2-2-6 相同，使用一樣的聊天方法，Wolfram|Alpha一樣適用。

In [None]:
messages = [
    {
        'role': 'system',
        'content':'You are a professional physics master, you can answer the physics questions you know.' + ' You can use wolframalpha to help you calculate. If you are not sure about the answer, especially problems that require calculation or internet, you must use WolframAlpha. To use WolframAlpha, please ONLY output "<|wolframalpha|>" in the message, and then ONLY output the calculaion formula you want to use WolframAlpha to calculate and <|wolframalphaend|>. For example, "<|wolframalpha|>x^2+2x+1=0<|wolframalphaend|>" or "<|wolframalpha|>0.6-0.25*2<|wolframalphaend|>", a <|wolframalpha|> tag can ONLY use in a calculation formula, other information is not necessary. <|wolframalpha|x^2+2x+1=0|> is a wrong example. After system get the answer from WolframAlpha, you can continue to answer the question. If you have new question, please output "<|wolframalpha|>your question here<|wolframalphaend|>" again.'
    }
]

In [None]:
use_wolframalpha = False # 是否使用 WolframAlpha

while True:
    print()
    if use_wolframalpha: # 如果使用 WolframAlpha
        response = llm.create_completion( # 模型依照 WolframAlpha 的回答繼續接話
            prompt=prompt_from_messages(messages),
            stream=True
        )
        use_wolframalpha = False
        for token in wolframalpha_token:
            print(token, end="", flush=True)
    else:
        print('\033[94mYou: \033[0m', end='')
        user_input = input() # 使用者由此輸入
        user_message = {'role': 'user', 'content': user_input}
        messages.append(user_message) # 將對話內容加入歷史訊息
        response = llm.create_chat_completion( # 模型依照歷史訊息聊天
            messages=messages,
            stream=True
        )
        messages.append({'role': 'assistant', 'content': ''})
        print("\033[95mPhysics Master: \033[0m", end='', flush=True)

    temp_token = ''
    question = ''

    # 以串流方式呈現文字輸出
    for chunk in response:
        if 'delta' in chunk['choices'][0]:
            delta = chunk['choices'][0]['delta'] # 聊天模式的結構
        else:
            delta = {'content': chunk['choices'][0]['text']} # 接話模式的結構
        if 'content' in delta:
            token = delta['content']
            temp_token += token
            if '<|wolframalpha|>' in temp_token: # 如果輸出包含 <|wolframalpha|> 就使用 WolframAlpha
                use_wolframalpha = True
                question = temp_token.split('<|wolframalpha|>')[-1]
                temp_token = ''
            if use_wolframalpha:
                if temp_token: question += token # 將輸出加入問題
                if '\n\n' in question or '<|wolframalphaend|>' in question: # 如果輸出包含 <|wolframalphaend|> 就計算問題
                    question = question.replace('\n', '').replace('<|wolframalphaend|>', '')
                    answer = ask_wolframalpha(question)
                    wolframalpha_token = question + '\n\nAccording to WolframAlpha the answer is ' + answer
                    messages[-1]['content'] += wolframalpha_token
                    break
                continue
            if temp_token != '<|wolframalpha|>'[:len(temp_token)] and '<|wolframalpha|>' not in temp_token: # 如果輸出不包含 <|wolframalpha|> 就輸出
                messages[-1]['content'] += temp_token # 將對話內容加入歷史訊息
                print(temp_token, end="", flush=True)
                temp_token = ''