<a href="https://colab.research.google.com/github/HansHenseler/DFRWS-APAC-LLM-Workshop/blob/main/Part_III_Hands_on_with_Llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part III: Hands on with Llama2

This is an example for Instruction Answering and Chat Conversation. In the instruction answering you give an instruction to the LLM and the LLM makes the responses (Stage 3). In the chat conversation the LLM remembers previous prompts and answers (Stage 4).

```
This notebook is part of the DFRWS APAC 2023 workshop "LLMs: Prompt Engineering and Retrieval Augmented Generation for Digital Forensics" by

Hans Henseler, University of Applied Sciences Leiden and Netherlands Forensic Institute, Netherlands
Kwok-Yan Lam, Nanyang Technological University, Singapore
and Zee Kin Yeong, Singapore Academy of Law, Singapore
Victor C.W. Cheng, TAU Express Pte Ltd, Singapore

Fore more information see: https://dfrws.org/presentation/llms-prompt-engineering-and-retrieval-augmented-generation/```

Model used: llama-2-13b-chat.Q4_0.gguf
Model: Llama 2,
Size: 13B,
Model type: Chat model
Quantization: 4bit, native method
Model Format: gguf

First step: GET A GPU !!!


## Stage 1: Then install langchain and llama-cpp-python

In [6]:
!pip install langchain



Install Llamma cpp python

In [7]:
!export LLAMA_CUBLAS=1
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python



## Stage 2: Get the Llama 2 13B chat model

In [4]:
# This is the chat model!
!wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf

--2023-10-16 07:10:59--  https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf
Resolving huggingface.co (huggingface.co)... 13.33.33.110, 13.33.33.102, 13.33.33.55, ...
Connecting to huggingface.co (huggingface.co)|13.33.33.110|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/8d/b1/8db1d1f73b4caa58e947ccbfe2fb27ac5e495c2ad8457ad299d15987aee3b520/eda2a15d532bea4ce6fc14d15c2b72638243396816252f73a94dceeb0429112f?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.Q4_0.gguf%3B+filename%3D%22llama-2-13b-chat.Q4_0.gguf%22%3B&Expires=1697699459&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5NzY5OTQ1OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy84ZC9iMS84ZGIxZDFmNzNiNGNhYTU4ZTk0N2NjYmZlMmZiMjdhYzVlNDk1YzJhZDg0NTdhZDI5OWQxNTk4N2FlZTNiNTIwL2VkYTJhMTVkNTMyYmVhNGNlNmZjMTRkMTVjMmI3MjYzODI0MzM5Njg

In [5]:
from langchain.llms import LlamaCpp
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/content/llama-2-13b-chat.Q4_0.gguf", # location of the model
    temperature=0.75,                 # temperature
    max_tokens=2000,                 # Max. number of tokens to be generated
    top_p=0.9,                    # top_p = 0.9
    top_k=30,                     # top_k = 30
    n_gpu_layers=43,                 # number of layers to offload to GPU
    verbose=True, # Verbose is required to pass to the callback manager
    n_batch=200,          # number of token generation in parallel
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


Let's check what GPU we have:

```
Mon Oct 16 04:18:30 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0    27W /  70W |   7949MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
```



In [1]:
!nvidia-smi

Mon Oct 16 07:08:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Stage 3: Try the model with Instruction Answering

In [8]:
llm("what is water?")

"\nWater is a clear, colorless liquid that is essential for all known forms of life. It is the most abundant substance on Earth and covers about 71% of the planet's surface. Water is composed of hydrogen and oxygen atoms, with two hydrogen atoms bonded to one oxygen atom.\n\nThe physical properties of water are unique and critical for its role in supporting life. Some of these properties include:\n\n1. High specific heat capacity: Water has a high specific heat capacity, meaning it can absorb and release a lot of heat energy without a large change in temperature. This helps regulate the Earth's climate and maintain a relatively stable environment.\n2. High thermal conductivity: Water is an excellent conductor of heat, which allows it to transfer heat energy efficiently. This property is essential for the regulation of ocean currents and the formation of weather patterns.\n3. Density: Water is denser than most other substances, which helps it sink and flow downhill. This property is cri

In [9]:
llm("plan a 2 days trip in Singapore")

Llama.generate: prefix-match hit


", what are the top things to do?\n\nAnswer:\n\nIf you're planning a two-day trip to Singapore, here are some top things to do to make the most of your visit:\n\nDay 1:\n\n1. Gardens by the Bay - Start your day with a visit to this iconic attraction. Take a stroll through the Flower Dome and Cloud Forest cooled conservatories, and enjoy the breathtaking views of the Supertree Grove.\n2. Marina Bay Sands SkyPark - Head over to the rooftop garden of the Marina Bay Sands hotel for panoramic views of the city skyline. You can also take a walk along the promenade and enjoy the light and water show at night.\n3. Merlion Park - Visit this famous landmark and take photos with the iconic half-lion, half-fish statue. You can also enjoy the nearby Singapore River and take a boat ride to see the city from a different perspective.\n4. Chinatown - Explore the vibrant streets of Chinatown and experience its rich history and culture. Visit the Buddha Tooth Relic Temple, Chinatown Heritage Centre, and 

In [8]:
txt = '''
1. Gardens by the Bay - I want to see the Supertree Grove and the Flower Dome and Cloud Forest Domes.
2. Marina Bay Sands - I want to take a photo with the iconic hotel and enjoy the light and water show at night.
3. Merlion - I want to see the famous statue and take a photo with it.
4. Singapore Flyer - I want to ride the giant Ferris wheel and enjoy the panoramic view of the city.
5. Chinatown - I would like to explore the colorful streets and try some delicious street food.
6. Little India - I would like to experience the vibrant Indian culture and try some authentic Indian cuisine.
7. Haw Par Villa - I want to see the largest theme park in Singapore and enjoy the Chinese mythology-themed rides and attractions.
8. Clarke Quay - I would like to visit the nightlife district and enjoy the bars, clubs, and live music performances.
'''
llm("Please summarize the following passage in 50 words. Passage: " + txt)

Llama.generate: prefix-match hit


'9. Orchard Road - I want to go shopping at the famous shopping street and explore the various malls and department stores.'

In [9]:
llm("which countries are located in Europe")

Llama.generate: prefix-match hit


"?\n\nThere are 50 countries located in Europe, and they are:\n\n1. Albania\n2. Andorra\n3. Armenia\n4. Austria\n5. Azerbaijan\n6. Belarus\n7. Belgium\n8. Bosnia and Herzegovina\n9. Bulgaria\n10. Croatia\n11. Cyprus\n12. Czech Republic\n13. Denmark\n14. Estonia\n15. Faroe Islands\n16. Finland\n17. France\n18. Georgia\n19. Germany\n20. Gibraltar\n21. Greece\n22. Hungary\n23. Iceland\n24. Ireland\n25. Italy\n26. Kazakhstan\n27. Kosovo\n28. Latvia\n29. Liechtenstein\n30. Lithuania\n31. Luxembourg\n32. Malta\n33. Moldova\n34. Monaco\n35. Montenegro\n36. Netherlands\n37. Norway\n38. Poland\n39. Portugal\n40. Romania\n41. Russia (partially)\n42. San Marino\n43. Serbia\n44. Slovakia\n45. Slovenia\n46. Spain\n47. Sweden\n48. Switzerland\n49. Turkey (partially)\n50. Ukraine\n\nNote that some countries have territories located outside of Europe, such as Russia's Kaliningrad Oblast and Turkey's Istanbul province. Additionally, some countries have disputed borders or territorial claims, so this li

## Stage 4: Try the model with Chat Conversation


In [1]:
from langchain.llms import LlamaCpp
model_path = "/content/llama-2-13b-chat.Q4_0.gguf"
llm = LlamaCpp(model_path=model_path,
               # n_ctx=4096,
               seed=0,
               temperature=0.1,
               top_k=30,  # original 50
               top_p=0.75,# original 0.95
               n_batch=300,      # num. of parallel generation
               n_gpu_layers=43,  # important for GPU
               n_threads = 10,   # important for CPU
               max_tokens=1024,
               repeat_penalty=1.18,
               )

history = []

def convert_message_and_history_to_llama2_chat_format(message: str, memory_limit = 7) -> str:
    # memory_limit set to 7 -> only memorize latest 7 pairs of "user, assistant" conversation
    global history

    SYSTEM_PROMPT = """<s>[INST] <<SYS>>
    You are a helpful bot. Your answers are clear and concise.
    <</SYS>>

    """

    if len(history) > memory_limit:
        history = history[-memory_limit:]

    if len(history) == 0:
        return SYSTEM_PROMPT + f"{message} [/INST]"

    # first message pair is treated differently
    llama2_history_formated_message = SYSTEM_PROMPT + f"user{history[0][0]} [/INST] {history[0][1]} </s>"
    # for 2nd and further message pairs
    for user_msg, assistant_msg in history[1:]:
        llama2_history_formated_message += f"<s>[INST] {user_msg} [/INST] {assistant_msg} </s>"
    # for the current user message
    llama2_history_formated_message += f"<s>[INST] {message} [/INST]"

    return llama2_history_formated_message


def generate_response(message: str) -> str:
    global history

    query = convert_message_and_history_to_llama2_chat_format(message)
    responses = llm(query)
    history.append((message, responses))
    print("user: ", message)
    print("assistant: ", responses)
    print('\n')


AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [2]:
generate_response("hi")
generate_response("what is iron?")
generate_response("what is the boiling point of it?")

user:  hi
assistant:    Hello! I'm here to help answer any questions you may have. What can I assist you with today?




Llama.generate: prefix-match hit


user:  what is iron?
assistant:    Iron is a chemical element with the symbol Fe and atomic number 26. It is a metal that belongs to the group of transition metals and is one of the most common elements on Earth.

Iron is an essential nutrient for many living organisms, including humans, where it plays a crucial role in the functioning of red blood cells and the transportation of oxygen throughout the body. It is also a key component in the synthesis of hemoglobin, which is the protein in red blood cells that carries oxygen.

In addition to its biological importance, iron is also a critical material in many industrial processes, such as the production of steel, which is an alloy of iron and carbon. Steel is a key component in construction, transportation, and other industries, and it is estimated that over 90% of all metal produced worldwide is steel.

Iron is also a significant component in the Earth's core, where it is thought to make up about 85% of the planet's total iron content. 

Llama.generate: prefix-match hit


user:  what is the boiling point of it?
assistant:    The boiling point of iron is approximately 2860°C (5176°F) at standard atmospheric pressure. This means that iron melts at a temperature of around 1538°C (2800°F) and boils at a temperature of around 2860°C (5176°F). However, it's important to note that the exact boiling point of iron can vary depending on the specific conditions in which it is being he


