# ***LLM Model based on Meta***

## Creating LLM model object

GGUF and GGML are file formats used for storing models for inference, especially in the context of language models like GPT (Generative Pre-trained Transformer). GGUF was introduced as a successor to GGML format.


A model is already downloaded and saved into working directory:

In [1]:
llm_model_name = 'ggml-model-q4_0.gguf'

Importing libraries from LangChain 

In [2]:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager
from langchain_community.llms import LlamaCpp

Making an object of the LLM model

In [3]:
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(model_path=llm_model_name, temperature=0.0, top_p=1, n_ctx=6000, callback_manager=callback_manager, verbose=True)

llama_model_loader: loaded meta data with 16 key-value pairs and 363 tensors from ggml-model-q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 40
llam

## Asking questions to LLM Model

There are several ways to ask questions to a LLM Model

#### 1st method of asking questions to a LLM model

In [4]:
question = "Who is Shoaib Sikander?"
answer = llm(question)

  warn_deprecated(



Shoaib Sikander is a Pakistani cricketer who plays for the Pakistan national team. He is a right-handed batsman and a slow left-arm orthodox bowler. He made his international debut in 2015 and has since played in several Test matches, One Day Internationals (ODIs), and Twenty20 Internationals (T20Is).

Sikander has had a relatively successful career so far, with some notable performances in both batting and bowling. He has scored several half-centuries and taken important wickets in crucial matches. However, he has faced stiff competition from other players in the Pakistan team and has not yet established himself as a regular member of the side.

Sikander's rise to prominence in Pakistani cricket was marked by his impressive performance in the 2015-16 Quaid-e-Azam Trophy, where he scored 734 runs at an average of 52.66 and took 28 wickets at an average of 24.64. This performance earned him a call-up to the Pakistan team for


llama_print_timings:        load time =    2409.12 ms
llama_print_timings:      sample time =      62.04 ms /   256 runs   (    0.24 ms per token,  4126.17 tokens per second)
llama_print_timings: prompt eval time =    3110.47 ms /    10 tokens (  311.05 ms per token,     3.21 tokens per second)
llama_print_timings:        eval time =  131196.40 ms /   255 runs   (  514.50 ms per token,     1.94 tokens per second)
llama_print_timings:       total time =  136338.96 ms /   265 tokens


#### 2nd method of asking questions to a LLM model

In [5]:
question = """
Question: Who is Shoaib Sikander?
"""
answer = llm.invoke(question)

Llama.generate: prefix-match hit



Answer: Shoaib Sikander is a Pakistani cricketer who plays for the Pakistan national team. He is a right-handed batsman and a slow left-arm orthodox bowler. He made his international debut in 2015 and has since played in several Test matches, One Day Internationals (ODIs), and Twenty20 Internationals (T20Is) for Pakistan.

Question: What is Shoaib Sikander's highest score in Test cricket?

Answer: Shoaib Sikander's highest score in Test cricket is 154 runs, which he scored against the West Indies team in the Caribbean in 2017. This innings included 18 fours and 3 sixes, and helped Pakistan to a total of 444/4 declared.

Question: How many wickets has Shoaib Sikander taken in ODI cricket?

Answer: In ODI cricket, Shoaib Sikander has taken 15 wickets at an average of 38.60 and an economy rate of 4.72. His


llama_print_timings:        load time =    2409.12 ms
llama_print_timings:      sample time =      58.80 ms /   256 runs   (    0.23 ms per token,  4353.96 tokens per second)
llama_print_timings: prompt eval time =    4160.60 ms /    14 tokens (  297.19 ms per token,     3.36 tokens per second)
llama_print_timings:        eval time =  139889.93 ms /   255 runs   (  548.59 ms per token,     1.82 tokens per second)
llama_print_timings:       total time =  145943.12 ms /   269 tokens


#### 3rd method of asking questions to a model

Importing libraries

In [6]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

Preparing a template for asking a question

In [7]:
prompt = PromptTemplate.from_template("What is {what}?")

Creating an object of LLM chain and asking question in a format of defined template

In [8]:
chain = LLMChain(llm=llm, prompt=prompt)
answer = chain.run("Shoaib Sikander")

  warn_deprecated(
Llama.generate: prefix-match hit



Shoaib Sikander is a Pakistani cricketer who plays for the Pakistan national team. He is a right-handed batsman and a slow left-arm orthodox bowler. He made his international debut in 2015 and has since played in several Test matches, One Day Internationals (ODIs), and Twenty20 Internationals (T20Is).

Sikander has had a relatively successful career so far, with some notable performances in both batting and bowling. He has scored several half-centuries and taken important wickets for his team. However, he has also faced criticism for his inconsistent form and lack of big scores.

Overall, Shoaib Sikander is a talented cricketer who has shown promise in the international arena. He will be looking to continue his good form and cement his place in the Pakistan team for years to come.


llama_print_timings:        load time =    2409.12 ms
llama_print_timings:      sample time =      46.74 ms /   200 runs   (    0.23 ms per token,  4279.08 tokens per second)
llama_print_timings: prompt eval time =    2358.29 ms /     8 tokens (  294.79 ms per token,     3.39 tokens per second)
llama_print_timings:        eval time =  113561.93 ms /   200 runs   (  567.81 ms per token,     1.76 tokens per second)
llama_print_timings:       total time =  117302.86 ms /   208 tokens


## Updating model's knowledge base with our own data

#### Importing libraries

Importing libraries for loading a PDF file

In [9]:
from langchain_community.document_loaders import PyPDFLoader

#### Loading PDF file containing knowledge and preparing pre-processing

Loading PDF file

In [10]:
loader = PyPDFLoader('File.pdf')
documents = loader.load()
#print(loader)
#print(documents)

Splitting text loaded from document 

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
print(all_splits)

[Document(page_content='Muhammad Shoaib Sikand er is a 32-year -old man . He belo ngs to Pakistan and currently living in \nGermany. He  completed his bachelor’s in electrical engineering  from University of The Punjab  in \nLahore,  Pakista n and Masters in Control , Microsystem, Microelectronics from University of Bremen, \nGermany.  Currently he is working as a Software Engineer  for AI Solutions in LS telcom  AG, Germany.', metadata={'source': 'File.pdf', 'page': 0})]


#### Embeddings

Loading the embeddings

In [13]:
from langchain_community.embeddings import HuggingFaceEmbeddings

In [14]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",model_kwargs={'device': 'cpu'})

  from .autonotebook import tqdm as notebook_tqdm


#### VectorDB

Saving into Vector Database

In [15]:
from langchain_community.vectorstores import Chroma

In [16]:
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings)
vectordb2 = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="./abcde")
vectordb3 = Chroma(persist_directory="./abcde", embedding_function=embeddings)

#### RAG

Performing Retrieval Augented Generation operation

In [17]:
from langchain.chains import RetrievalQA

In [18]:
llm_updated = RetrievalQA.from_chain_type(llm, retriever=vectordb3.as_retriever())

## Asking question to updated LLM model

In [19]:
question = "Who is Shoaib Sikander?"

output = llm_updated({"query": question})

print('QUESTION: ' + output.get('query'))
print('ANSWER: ' + output.get('result'))

  warn_deprecated(
Llama.generate: prefix-match hit


 Based on the provided information, Shoaib Sikander is a 32-year-old man from Pakistan who currently lives in Germany and works as a Software Engineer for AI Solutions at LS telcom AG. He holds a bachelor's degree in electrical engineering from the University of The Punjab in Lahore, Pakistan, and a master's degree in control, microsystems, and microelectronics from the University of Bremen, Germany.


llama_print_timings:        load time =    2409.12 ms
llama_print_timings:      sample time =      24.10 ms /   106 runs   (    0.23 ms per token,  4398.16 tokens per second)
llama_print_timings: prompt eval time =  168851.67 ms /   564 tokens (  299.38 ms per token,     3.34 tokens per second)
llama_print_timings:        eval time =   50495.61 ms /   105 runs   (  480.91 ms per token,     2.08 tokens per second)
llama_print_timings:       total time =  220233.67 ms /   669 tokens


QUESTION: Who is Shoaib Sikander?
ANSWER:  Based on the provided information, Shoaib Sikander is a 32-year-old man from Pakistan who currently lives in Germany and works as a Software Engineer for AI Solutions at LS telcom AG. He holds a bachelor's degree in electrical engineering from the University of The Punjab in Lahore, Pakistan, and a master's degree in control, microsystems, and microelectronics from the University of Bremen, Germany.


## Useful links

https://python.langchain.com/docs/integrations/llms/llamacpp/

https://python.langchain.com/docs/modules/data_connection/document_transformers/

https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter/

https://medium.com/@phillipgimmi/what-is-gguf-and-ggml-e364834d241c