### 1. Open-Source Llama Model without Langchain

In order to run an open-source quantized LLAMA2 model, a few steps are needed first. Open up your terminal and run the following commands:

1. sudo apt-get update
2. sudo apt-get install build-essential
3. export CC=/usr/bin/gcc
4. export CXX=/usr/bin/g++
5. CMAKE_ARGS="-DLLAMA_CULABS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.7 numpy==1.23.4 --force-reinstall --upgrade --no-cache-dir --verbose
6. pip install huggingface_hub
7. pip install llama-cpp-python==0.1.78


In [12]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

In [9]:
model_name_or_path = 'TheBloke/Llama-2-13B-chat-GGML'
model_basename = 'llama-2-13b-chat.ggmlv3.q5_1.bin' # The model in bin format

In [15]:
model_path = hf_hub_download(
    repo_id = model_name_or_path,
    filename = model_basename
)

In [16]:
model_path

'/home/anubrata/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin'

In [18]:
# To run on GPU
lcpp_llm = None

lcpp_llm = Llama(
    model_path = model_path,
    n_threads = 2, # No. of CPU cores
    n_batch = 512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU
    n_gpu_layers = 32 # Change this value based on your model and your GPU VRAM pool
)

llama.cpp: loading model from /home/anubrata/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_head_kv  = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.11 MB
llama_model_load_in

In [24]:
# See the number of layers in GPU
lcpp_llm.params.n_gpu_layers

32

In [19]:
prompt = "Write a linear regression code"

In [28]:
prompt_template = f"""
    SYSTEM: You are a helpful, respectful and an honest assistant. Always answer to the best of your ability.
    
    USER: {prompt}
    
    ASSISTANT:
"""

In [29]:
response = lcpp_llm(
    prompt = prompt_template,
    max_tokens = 256,
    temperature = 0.5,
    top_p = 0.95,
    repeat_penalty = 1.2,
    top_k = 150,
    echo = True
)

Llama.generate: prefix-match hit

llama_print_timings:        load time = 21496.90 ms
llama_print_timings:      sample time =   128.27 ms /   256 runs   (    0.50 ms per token,  1995.84 tokens per second)
llama_print_timings: prompt eval time =  6958.11 ms /    15 tokens (  463.87 ms per token,     2.16 tokens per second)
llama_print_timings:        eval time = 117123.75 ms /   255 runs   (  459.31 ms per token,     2.18 tokens per second)
llama_print_timings:       total time = 124707.93 ms


In [30]:
response

{'id': 'cmpl-0146d0ce-f865-4ad5-8e75-8dcabe8e58b5',
 'object': 'text_completion',
 'created': 1720969375,
 'model': '/home/anubrata/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin',
 'choices': [{'text': "\n    SYSTEM: You are a helpful, respectful and an honest assistant. Always answer to the best of your ability.\n    \n    USER: Write a linear regression code\n    \n    ASSISTANT:\n        Sure! Here's an example of how you could write a simple linear regression code in Python using scikit-learn library:\n```\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.models import Model\n\n# Load the data\ndf = pd.read_csv('data.csv')\n\n# Create an X and y matrix from the data\nX = df[['feature1', 'feature2']]\ny = df['target']\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n\n# Cr

In [31]:
print(response['choices'][0]['text'])


    SYSTEM: You are a helpful, respectful and an honest assistant. Always answer to the best of your ability.
    
    USER: Write a linear regression code
    
    ASSISTANT:
        Sure! Here's an example of how you could write a simple linear regression code in Python using scikit-learn library:
```
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.models import Model

# Load the data
df = pd.read_csv('data.csv')

# Create an X and y matrix from the data
X = df[['feature1', 'feature2']]
y = df['target']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Create a linear regression object with default parameters
reg = LinearRegression()

# Train the model on the training data
reg.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = reg.predict(X_test)

# Evaluate the performance of the model using mean squared error (MSE)
mse = np.mean((y_test - y_pred) ** 2)


### 2. Open-Source Llama Model using Langchain

In [17]:
import torch
import warnings
import transformers
import huggingface_hub

from dotenv import load_dotenv

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline

from transformers import AutoTokenizer

load_dotenv()
warnings.filterwarnings('ignore')

In [2]:
huggingface_hub.login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
# Load the 7b parameter model from meta, you need to get access first. If you don't have it, run the daryll_model
# eta_model = "meta-llama/Meta-Llama-3-8B-Instruct"
daryll_model = "daryl149/llama-2-7b-chat-hf"

In [6]:
tokenizer = AutoTokenizer.from_pretrained(daryll_model)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.


In [7]:
pipeline = transformers.pipeline(
    "text-generation",
    model = daryll_model,
    tokenizer = tokenizer,
    torch_dtype = torch.bfloat16,
    trust_remote_code = True,
    device_map = "auto",
    max_length = 1000,
    do_sample = True,
    top_k = 10,
    num_return_sequences = 1,
    eos_token_id = tokenizer.eos_token_id
)

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   4%|4         | 430M/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Some parameters are on the meta device device because they were offloaded to the cpu.


In [14]:
meta_hf_model = HuggingFacePipeline(
    pipeline = pipeline,
    model_kwargs = {
        'temperature': 0 # No need for creativity
    }
)

In [15]:
prompt = "What do you know about Arsenal FC?"

In [16]:
print(meta_hf_model(prompt))

What do you know about Arsenal FC?

Arsenal Football Club is an English professional football club based in Islington, London, that plays in the Premier League, the top flight of English football. The club was founded in 1886 as Dial Square and changed its name to Royal Arsenal in 1892, before being renamed as Arsenal in 1913.
Arsenal has a long and successful history, having won 13 league titles, a record 13 FA Cups, two League Cups, and 16 FA Community Shields. The club has also enjoyed success in European competitions, winning the European Cup Winners' Cup in 1994 and the UEFA Cup Winners' Cup in 1993.
Arsenal plays its home games at the Emirates Stadium, which opened in 2006 and has a seating capacity of over 60,000 spectators. The club's traditional colors are red and white, and its crest features a cannon and a bow and arrow, symbolizing the club's nickname, "The Gunners."
Arsenal has a large and dedicated fan base, with supporters clubs around the world. The club has a rivalry w

In [23]:
# Using Langchain Prompt Templates
prompt_template = PromptTemplate(
    input_variables=['football_club'],
    template="What do you know about {football_club} FC?"
)

input_prompt = prompt_template.format(football_club = "Barcelona")
print(input_prompt)

What do you know about Barcelona FC?


In [26]:
# Complete the chain and extract the results
chain = LLMChain(
    llm = meta_hf_model,
    prompt = prompt_template,
    verbose = True
)

response = chain.run("Barcelona")
print(response)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWhat do you know about Barcelona FC?[0m

[1m> Finished chain.[0m
What do you know about Barcelona FC? Barcelona FC, also known as Barça, is a professional football club based in Barcelona, Spain. They are one of the most successful clubs in Spanish football history and have won numerous domestic and international titles. Here are some key facts about Barcelona FC:
History: Barcelona FC was founded in 1899 by a group of football enthusiasts led by Joan Gamper. The club has a rich history and has won 26 La Liga titles, 14 Copa del Rey trophies, and 5 UEFA Champions League titles.
Playing style: Barcelona has a distinct playing style that focuses on possession football, quick passing, and creative attacking play. Their philosophy emphasizes teamwork, creativity, and individual skill.
Home ground: Barcelona plays their home matches at the Camp Nou stadium, which has a seating capacity of over 99,000 spectat