<a href="https://colab.research.google.com/github/dikshithakalva/LLM_Llama2/blob/main/Llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Llama2**

Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs) released by Meta AI in 2023. Released free of charge for research and commercial use, Llama 2 AI models are capable of a variety of natural language processing (NLP) tasks, from text generation to programming code.

## **Step 1: Install the Required Packages**

In [None]:
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 numpy==1.23.4 --force-reinstall --upgrade --no-cache-dir --verbose
!pip install huggingface_hub
!pip install llama-cpp-python==0.1.78
!pip install numpy==1.23.4

Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Collecting llama-cpp-python==0.1.78
  Downloading llama_cpp_python-0.1.78.tar.gz (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Running command pip subprocess to install build dependencies
  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
  Collecting setuptools>=42
    Using cached setuptools-69.2.0-py3-none-any.whl (821 kB)
  Collecting scikit-build>=0.13
    Using cached scikit_build-0.17.6-py3-none-any.whl (84 kB)
  Collecting cmake>=3.18
    Using cached cmake-3.29.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.6 MB)
  Collecting ninja
    Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
  Collecting distro (from scikit-build>=0.13)
    Using cached distro-1.9.0-py3-none-any.whl (20 kB)
  Collecting packaging (from scikit-bu

## **Step 2: Import the Libraries**

In [None]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format

## **Step 3: Model**

In [None]:
from huggingface_hub import hf_hub_download

In [None]:
from llama_cpp import Llama

In [None]:
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## **Step 4: Load the Model**

In [None]:
# GPU
lcpp_llm = None
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
    )

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 


In [None]:
# See the number of layers in GPU
lcpp_llm.params.n_gpu_layers

32

## **Step 5: Creating a template for the prompt**

In [None]:
prompt = "Write a linear regression in python"
prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: {prompt}

ASSISTANT:
'''

## **Step 5: Create a Prompt Template**

### **Response 1**

In [None]:
response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
                  repeat_penalty=1.2, top_k=150,
                  echo=True)

Llama.generate: prefix-match hit


In [None]:
print(response)

{'id': 'cmpl-89d107fe-d08e-4d11-8e2f-9b1de1bb3621', 'object': 'text_completion', 'created': 1712313870, 'model': '/root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin', 'choices': [{'text': 'SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n\nUSER: Write a linear regression in python\n\nASSISTANT:\n\nTo write a linear regression in Python, you can use the scikit-learn library. Here is an example of how to do this:\n```\nimport numpy as np\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.datasets import load_boston\nfrom sklearn.model_selection import train_test_split\n\n# Load the Boston housing dataset\nboston = load_boston()\n\n# Split the dataset into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3)\n\n# Create a LinearRegression object and fit it to the data

In [None]:
print(response["choices"][0]["text"])

SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: Write a linear regression in python

ASSISTANT:

To write a linear regression in Python, you can use the scikit-learn library. Here is an example of how to do this:
```
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# Load the Boston housing dataset
boston = load_boston()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3)

# Create a LinearRegression object and fit it to the data
reg = LinearRegression()
reg.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = reg.predict(X_test)

# Evaluate the performance of the model using mean squared error (MSE)
mse = np.mean((y_test - y_pred) ** 2)
print("Mean squared error: ", mse)
```



### **Response 2**

In [None]:
prompt = "Write a blog about GenAI"
prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: {prompt}

ASSISTANT:
'''

### **Response 3**

In [None]:
response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
                  repeat_penalty=1.2, top_k=150,
                  echo=True)

Llama.generate: prefix-match hit


In [None]:
print(response)



In [None]:
print(response["choices"][0]["text"])

SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: Write a blog about GenAI

ASSISTANT:

---

Introduction to GenAI: The Future of Artificial Intelligence

Artificial intelligence (AI) has been rapidly advancing in recent years, and one of the most exciting developments is the emergence of Generative Adversarial Networks (GANs). GANs are a type of deep learning algorithm that enables machines to generate new content, such as images, videos, music, and text. In this blog post, we'll explore what GenAI is, how it works, and its potential applications in various industries.

What is GenAI?
--------------

GenAI, short for Generative Adversarial Networks, is a type of deep learning algorithm that consists of two neural networks: the generator and the discriminator. The generator creates new content, while the discriminator evaluates the generated content and tells the generator whether it's realistic or not. Through this process, the generator im

In [None]:
prompt = "What is the capital of India"
prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: {prompt}

ASSISTANT:
'''

In [None]:
response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
                  repeat_penalty=1.2, top_k=150,
                  echo=True)

Llama.generate: prefix-match hit


In [None]:
print(response)

{'id': 'cmpl-1143abe2-4753-4397-bf6b-d740ad2f7a61', 'object': 'text_completion', 'created': 1712314388, 'model': '/root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin', 'choices': [{'text': "SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n\nUSER: What is the capital of India\n\nASSISTANT:\nThe capital of India is New Delhi. However, it's important to note that there are several other cities in India that could be considered capitals depending on the context and perspective. For example, some people may consider Mumbai (formerly known as Bombay) to be the financial capital of India, while others may view Bangalore as the technological hub of the country. Additionally, there are many other important cities in India that play significant roles in various fields such as politics, culture, and religion.", 'index': 0, 'logprobs': None, 'finish_reason': 

In [None]:
print(response["choices"][0]["text"])

SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.

USER: What is the capital of India

ASSISTANT:
The capital of India is New Delhi. However, it's important to note that there are several other cities in India that could be considered capitals depending on the context and perspective. For example, some people may consider Mumbai (formerly known as Bombay) to be the financial capital of India, while others may view Bangalore as the technological hub of the country. Additionally, there are many other important cities in India that play significant roles in various fields such as politics, culture, and religion.
