#### With llama cpp python natively

In [1]:
from llama_cpp import Llama
import json


In [2]:
'''
CPU-only:

llm = Llama(
        model_path="./llama-2-13b-chat.Q4_K_M.gguf",
        max_tokens=512,
        n_ctx=2048,
    )

With GPU: 

llm = Llama(
        model_path="./llama-2-13b-chat.Q4_K_M.gguf",
        n_gpu_layers=59,
        n_batch=512,
        max_tokens=512,
        n_ctx=2048,
        f16_kv=True
    )
'''

llm = Llama(
        model_path="./llama-2-13b-chat.Q4_K_M.gguf",
        n_gpu_layers=8192,
        n_batch=512,
        max_tokens=512,
        n_ctx=2048,
        f16_kv=True
    )

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA TITAN RTX, compute capability 7.5
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from ./llama-2-13b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    6:     

In [3]:
prompt = "Tell me about the Roman Empire?"


In [5]:
reply = ""
for token in llm(prompt, stream=True, echo=False):
    item = token["choices"][0]["text"]
    reply += item
    #print(token["choices"][0]["text"])
    #print(item, end=" ")
    print(item, sep=' ', end='', flush=True)


I'd be happy

Llama.generate: prefix-match hit


 to tell you about the Roman Empire! The Roman Empire was a vast and powerful state that lasted for over a thousand years, from 27 BC to 476 AD. It was one of the most influential civilizations in human history, leaving a lasting legacy in fields such as law, architecture, engineering, art, literature, and religion.

Here are some key facts about the Roman Empire:

1. Rise of the Roman Empire: The Roman Empire began as a small city-state in central Italy, but it gradually expanded to include much of Europe, North


llama_print_timings:        load time =     101.96 ms
llama_print_timings:      sample time =      45.68 ms /   128 runs   (    0.36 ms per token,  2801.92 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    2422.39 ms /   128 runs   (   18.92 ms per token,    52.84 tokens per second)
llama_print_timings:       total time =    2753.70 ms


### With Langchain

In [1]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate

In [2]:
template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [4]:
llm = LlamaCpp(
    model_path="./llama-2-13b-chat.Q4_K_M.gguf",
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from ./llama-2-13b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q4_K     [  5120,  5120,     1,     1 ]
llama

In [14]:
prompt = """
Tell me about the roman empire
"""
llm(prompt)


The Roman

Llama.generate: prefix-match hit


 Empire was a powerful and influential civilization that lasted for over a thousand years. At its peak, it covered much of Europe, North Africa, and parts of Asia, and was home to a diverse population of millions of people. Here are some key facts about the Roman Empire:

1. Origins: The Roman Empire began as a small city-state in Italy called Rome, which was founded in 753 BC. Over time, Rome grew in power and influence, eventually becoming an empire that spanned across three continents.
2. Expansion: The Roman Empire expanded rapidly, conquering much of the Mediterranean region and beyond. At its peak, it included territories in Europe, North Africa, and parts of Asia, with a total population of around 50-60 million people.
3. Government: The Roman Empire was ruled by an autocratic government, with the emperor holding absolute power. The empire was divided into provinces, each governed by a Roman governor appointed by the emperor.
4. Military: The Roman military was one of the most p


llama_print_timings:        load time =     393.18 ms
llama_print_timings:      sample time =     212.75 ms /   604 runs   (    0.35 ms per token,  2839.01 tokens per second)
llama_print_timings: prompt eval time =     136.64 ms /     9 tokens (   15.18 ms per token,    65.87 tokens per second)
llama_print_timings:        eval time =   11654.55 ms /   603 runs   (   19.33 ms per token,    51.74 tokens per second)
llama_print_timings:       total time =   13564.62 ms


"\nThe Roman Empire was a powerful and influential civilization that lasted for over a thousand years. At its peak, it covered much of Europe, North Africa, and parts of Asia, and was home to a diverse population of millions of people. Here are some key facts about the Roman Empire:\n\n1. Origins: The Roman Empire began as a small city-state in Italy called Rome, which was founded in 753 BC. Over time, Rome grew in power and influence, eventually becoming an empire that spanned across three continents.\n2. Expansion: The Roman Empire expanded rapidly, conquering much of the Mediterranean region and beyond. At its peak, it included territories in Europe, North Africa, and parts of Asia, with a total population of around 50-60 million people.\n3. Government: The Roman Empire was ruled by an autocratic government, with the emperor holding absolute power. The empire was divided into provinces, each governed by a Roman governor appointed by the emperor.\n4. Military: The Roman military was 

### Introduce a Knowledge Base in form of Documents

In [3]:
import os

# Add a folder to Colab called "Knowledge Base" and upload PDF files of your choice into it.
DOCS_PATH = "main_data/"

# retrieve all PDF files in said folder
files = os.listdir(DOCS_PATH)
pdf_files = [os.path.join(DOCS_PATH, file) for file in files if file.endswith(".pdf")]

In [4]:
pdf_files

['main_data/Wahlpflichtmodule_Master-Informatik_WS23_Stand_21-Sep-2023.pdf',
 'main_data/119_ZuSMa_Senat_18012022.pdf',
 'main_data/Infoveranstaltung_Masterstudiengaenge-Informatik_HTWG-Konstanz.pdf',
 'main_data/124_SPOMa_AT_Senat_08112022.pdf',
 'main_data/Modulhandbuch_MSI_SS23_Stand_10-Jan-2023.pdf',
 'main_data/SPO_MSI_SPONr5_Senat_10122019.pdf']

In [5]:
from haystack.nodes import PDFToTextConverter

converter_pdf = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["de"])

doc_pdf_all = [converter_pdf.convert(file_path=path, meta=None)[0] for path in pdf_files]

In [6]:
from haystack.nodes import PreProcessor

# This is a default usage of the PreProcessor.
# The split_length is essential in managing document chunks.
preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=True,
    split_by="word",
    split_length=500,
    split_respect_sentence_boundary=True,
)
docs_default = preprocessor.process(doc_pdf_all)

Preprocessing: 100%|████████████████████████████| 6/6 [00:00<00:00, 20.44docs/s]


In [7]:
from haystack.document_stores import InMemoryDocumentStore

# Setup Document Store
document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(docs_default)

document_store

Updating BM25 representation...: 100%|█████| 84/84 [00:00<00:00, 2706.38 docs/s]


<haystack.document_stores.memory.InMemoryDocumentStore at 0x7f52985e6cb0>

In [8]:
from haystack.nodes import BM25Retriever

# Initialize Retriever
bm25_retriever = BM25Retriever(document_store=document_store)

In [9]:
query = "Was sind die Zulassungsvorraussetzungen für ein Master Studium?"

In [10]:
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents

# If we want to ask questions directly we can use the pre-loaded reader (roberta-base)
p_extractive_premade = DocumentSearchPipeline(bm25_retriever)

docs = p_extractive_premade.run(
        query=query, params={"Retriever": {"top_k": 3}}
)
docs

{'documents': [<Document: {'content': 'Hochschule Konstanz | Brauneggerstr. 55 | 78462 Konstanz | www.htwg-konstanz.de\nHerzlich willkommen bei der\nInformationsveranstaltung\nMaster Informatik (MSI) und\nBusiness Information Technology (BIT)\nan der HTWG Konstanz\x0cHochschule Konstanz\nZugangsvoraussetzungen MSI\n▪\nBachelor- oder Diplom-Abschluss in Informatik oder einem\nverwandten Studium (z.B. WIN oder GIB),\n•\nmindestens 60 ECTS in Fächern aus dem Bereich Informatik\n▪\nAbschlussnote 2,4 oder besser (harte Grenze)\nInformatik (M. Sc.)\x0cHochschule Konstanz\nZugangsvoraussetzungen BIT\n▪\nGrundständiges Hochschulstudium (mind. 180 ECTS) der Informatik\noder Betriebswirtschaftslehre oder verwandter Studiengang\n•\nerster Studienabschluss in Informatik (mind. 100 ECTS) in\no\na) Grundlagen der Informatik, b) Programmierung, c) Algorithmen und Datenstrukturen, d) Datenbanken\n•\nerster Studienabschluss in Betriebswirtschaftslehre (mind. 100 ECTS) in\no\na) Grundlagen der Betriebsw

In [11]:
input = f"""
Answer the question based solely on the given documents and 
by grouping the given documents into 3 bullet points.
Documents:
{str(docs)}

Question: {query}
"""

In [12]:

llm = LlamaCpp(
    model_path="./llama-2-13b-chat.Q4_K_M.gguf",
    temperature=0.75,
    max_tokens=5000,
    n_gpu_layers=43,
    n_batch=512,
    f16_kv=True,
    n_ctx=5570,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA TITAN RTX, compute capability 7.5
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from ./llama-2-13b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    6:     

In [13]:
llm(input)


Based on the provided documents, the Zulassungsvoraussetzungen (admission requirements) for a Master's program at Hochschule Konstanz are as follows:

1. Bachelor's degree or equivalent in Informatics or a related field.
2. At least 60 ECTS credits in informatics-related subjects, including programming, algorithms, data structures, and databases.
3. A final grade of at least 2.4 (on a scale of 1 to 4) in the Bachelor's thesis or a comparable academic achievement.
4. Proof of proficiency in German or English, if the language of instruction is not German.
5. Successful participation in an aptitude test or an alternative examination approved by the university.
6. Completion of a pre-study internship or work experience in a relevant field.
7. A letter of motivation and a curriculum vitae.
8. Proof of having completed the pre-requisites for the desired Master's program, such as mathematics, statistics, and computer science.

It is important to note that the admission requirements may vary 


llama_print_timings:        load time =     393.18 ms
llama_print_timings:      sample time =     107.16 ms /   306 runs   (    0.35 ms per token,  2855.54 tokens per second)
llama_print_timings: prompt eval time =    3568.98 ms /  3515 tokens (    1.02 ms per token,   984.87 tokens per second)
llama_print_timings:        eval time =    8562.91 ms /   305 runs   (   28.08 ms per token,    35.62 tokens per second)
llama_print_timings:       total time =   12990.21 ms


"\nBased on the provided documents, the Zulassungsvoraussetzungen (admission requirements) for a Master's program at Hochschule Konstanz are as follows:\n\n1. Bachelor's degree or equivalent in Informatics or a related field.\n2. At least 60 ECTS credits in informatics-related subjects, including programming, algorithms, data structures, and databases.\n3. A final grade of at least 2.4 (on a scale of 1 to 4) in the Bachelor's thesis or a comparable academic achievement.\n4. Proof of proficiency in German or English, if the language of instruction is not German.\n5. Successful participation in an aptitude test or an alternative examination approved by the university.\n6. Completion of a pre-study internship or work experience in a relevant field.\n7. A letter of motivation and a curriculum vitae.\n8. Proof of having completed the pre-requisites for the desired Master's program, such as mathematics, statistics, and computer science.\n\nIt is important to note that the admission requireme