<a href="https://colab.research.google.com/github/HansHenseler/DFRWS-APAC-LLM-Workshop/blob/main/Part_IV_Retrieval_Augmented_Generation_demo_with_Llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part IV Retrieval Augmented Generation Demo with Llama2 13B model

In this notebook we combine Retrieval Augmented Generation and the Llama2 Chat 13B model to ask questions to the 550 page pdf containing the Singapore Criminal Code pdf that was updated in 2020.

There are three stages in this notebook:

1. Download the pdf, divide it in text chunks and create vector store of the text chunks
2. Inference stage
3. Example prompts

```
This notebook is part of the DFRWS APAC 2023 workshop "LLMs: Prompt Engineering and Retrieval Augmented Generation for Digital Forensics" by

Hans Henseler, University of Applied Sciences Leiden and Netherlands Forensic Institute, Netherlands
Kwok-Yan Lam, Nanyang Technological University, Singapore
and Zee Kin Yeong, Singapore Academy of Law, Singapore
Victor C.W. Cheng, TAU Express Pte Ltd, Singapore

Fore more information see: https://dfrws.org/presentation/llms-prompt-engineering-and-retrieval-augmented-generation/
```

## Stage 1: Download PDF file and create vector store

****************************************************
####################################################

**Stage 1:
Download a PDF file and create vector store**
####################################################
****************************************************

In [1]:
# GET GPU FIRST !!!!!
# T4 GPU

# first we download a/some pdf files
# For simplicity, we just download and process one pdf file. You can create a folder and place your pdf files there.

!wget -O criminal_procedure_code.pdf https://sso.agc.gov.sg/Act/CPC2010?ViewType=Pdf&_=20230926140009

--2023-10-17 14:56:30--  https://sso.agc.gov.sg/Act/CPC2010?ViewType=Pdf
Resolving sso.agc.gov.sg (sso.agc.gov.sg)... 199.184.145.21
Connecting to sso.agc.gov.sg (sso.agc.gov.sg)|199.184.145.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1574942 (1.5M) [application/pdf]
Saving to: ‘criminal_procedure_code.pdf’


2023-10-17 14:56:32 (1.56 MB/s) - ‘criminal_procedure_code.pdf’ saved [1574942/1574942]



In [2]:
!ls -l

total 1544
-rw-r--r-- 1 root root 1574942 Sep 26 06:00 criminal_procedure_code.pdf
drwxr-xr-x 1 root root    4096 Oct 13 13:22 sample_data


In [3]:
!pip install langchain==0.0.310
!export LLAMA_CUBLAS=1
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python==0.2.11
!pip install sentence-transformers==2.2.2
!pip install chromadb==0.3.26
!pip install unstructured==0.10.19 pdf2image==1.16.3 pdfminer==20191125
!pip install glob2

Collecting langchain==0.0.310
  Downloading langchain-0.0.310-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.0.310)
  Downloading dataclasses_json-0.6.1-py3-none-any.whl (27 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.0.310)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.1.0,>=0.0.40 (from langchain==0.0.310)
  Downloading langsmith-0.0.44-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain==0.0.310)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4

In [4]:
!pip uninstall pdfminer
!pip install pdfminer.six
!pip install unstructured==0.10.19
#from pdfminer.pdfinterp import PDFResourceManager
#from pdfminer.converter import PDFPageAggregator

Found existing installation: pdfminer 20191125
Uninstalling pdfminer-20191125:
  Would remove:
    /usr/local/bin/dumppdf.py
    /usr/local/bin/pdf2txt.py
    /usr/local/lib/python3.10/dist-packages/pdfminer-20191125.dist-info/*
    /usr/local/lib/python3.10/dist-packages/pdfminer/*
Proceed (Y/n)? Y
  Successfully uninstalled pdfminer-20191125
Collecting pdfminer.six
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pdfminer.six
Successfully installed pdfminer.six-20221105


In [5]:
import chromadb
from typing import List
from chromadb.config import Settings
from langchain.document_loaders import (
    TextLoader,
    UnstructuredPDFLoader,

)

LOADER_MAPPING = {
    ".txt": (TextLoader, {}),       # text loader class, no parameter
    ".pdf": (UnstructuredPDFLoader, {}),      # PDF loader class, no parameter

}

In [6]:
# Load one document and save to a document object
from langchain.docstore.document import Document
import glob

def load_one_document(file_path):
  # get file extension
  file_ext = "." + file_path.rsplit(".", 1)[-1]

  # check whether the file type is in LOADER_MAPPING
  #   if YES, Load the file content to "Document" object
  #   if NO, skip
  docs_data = None
  if file_ext in LOADER_MAPPING:
    loader_class, loader_args = LOADER_MAPPING[file_ext]
    loader = loader_class(file_path, **loader_args)
    docs_data = loader.load()
  return docs_data


# Load all documents
import os
def load_all_documents(source_dir):
  all_files = glob.glob(source_dir + '/' + '*.*')
  all_docs_data = []
  for one_file in all_files:
    all_docs_data.extend(load_one_document(one_file))
  return all_docs_data

All files are now loaded into document objects. The next step is to split the contents hold in document objects into chunks (smaller pieces of text).

In [7]:
# source_directry is the path where input files are there
from langchain.text_splitter import RecursiveCharacterTextSplitter

source_directory = "/content"
chunk_size = 1000
chunk_overlap = 100

# load files and split contents
all_docs_data = load_all_documents(source_directory)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
chunked_docs = text_splitter.split_documents(all_docs_data)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [11]:
chunked_docs[110]

Document(page_content='Form of search warrant 26.—(1) A search warrant issued by a court under this Code must be in writing bearing the seal of the court, and signed by a Magistrate or District Judge (as the case may be), or in the case of the General Division of the High Court, by a Judge or by the Registrar of the Supreme Court.\n\n[40/2019]\n\nInformal Consolidation – version in force from 26/6/2023\n\n2020 Ed.\n\nCriminal Procedure Code 2010\n\n52\n\n(2) A search warrant must ordinarily be issued to the Commissioner of Police and to one or more other police officers to be designated by name in the warrant, and all or any of those police officers may execute it.\n\n(3) The court may in appropriate circumstances issue a search warrant to one or more named persons who are not police officers, and all or any of those persons may execute it.', metadata={'source': '/content/criminal_procedure_code.pdf'})

In [12]:
# Create embedding function

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from chromadb.config import Settings


# configuration of the vector store
persist_directory = "./"

CHROMA_SETTINGS = Settings(
    chroma_db_impl = 'duckdb+parquet',  # use duckdb to r/w parquet file
    anonymized_telemetry = False     # refuse telemetry of usage info
)
model_name = "sentence-transformers/all-MiniLM-L12-v2"
model_kwargs = {'device': 'cuda'}      # you must have a gpu, otherwise change it to cpu
encode_kwargs = {'normalize_embeddings': True}
embedding_func = HuggingFaceEmbeddings(
    model_name = model_name,
    model_kwargs = model_kwargs,
    encode_kwargs = encode_kwargs
)

print("Creating new vector store...")
db = Chroma.from_documents(
    chunked_docs,
    embedding_func,
    persist_directory=persist_directory,
    client_settings=CHROMA_SETTINGS
)

db.persist()
db = None
print("Files ingestion completed!")

Downloading (…)5dded/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)4d81d5dded/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)81d5dded/config.json:   0%|          | 0.00/573 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)ded/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)5dded/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

Downloading (…)dded/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)4d81d5dded/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)1d5dded/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

Creating new vector store...
Files ingestion completed!


In [13]:
!ls -l

total 4716
-rw-r--r-- 1 root root     557 Oct 17 15:15 chroma-collections.parquet
-rw-r--r-- 1 root root 3238797 Oct 17 15:15 chroma-embeddings.parquet
-rw-r--r-- 1 root root 1574942 Sep 26 06:00 criminal_procedure_code.pdf
drwxr-xr-x 2 root root    4096 Oct 17 15:15 index
drwxr-xr-x 1 root root    4096 Oct 13 13:22 sample_data


In [14]:
ls -l index

total 2076
-rw-r--r-- 1 root root   38921 Oct 17 15:15 id_to_uuid_3c4b1790-6a1f-45af-9ab1-f002adb0f438.pkl
-rw-r--r-- 1 root root 2029800 Oct 17 15:15 index_3c4b1790-6a1f-45af-9ab1-f002adb0f438.bin
-rw-r--r-- 1 root root     105 Oct 17 15:15 index_metadata_3c4b1790-6a1f-45af-9ab1-f002adb0f438.pkl
-rw-r--r-- 1 root root   45553 Oct 17 15:15 uuid_to_id_3c4b1790-6a1f-45af-9ab1-f002adb0f438.pkl


## Stage 2: Inference stage

****************************************************
####################################################

**Stage 2:
Input files ingested and chromadb is created!
Now we go to the inference stage.**
####################################################
****************************************************

In [15]:
# This is the chat model!
!wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf

--2023-10-17 15:15:30--  https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf
Resolving huggingface.co (huggingface.co)... 18.239.50.16, 18.239.50.49, 18.239.50.103, ...
Connecting to huggingface.co (huggingface.co)|18.239.50.16|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/8d/b1/8db1d1f73b4caa58e947ccbfe2fb27ac5e495c2ad8457ad299d15987aee3b520/eda2a15d532bea4ce6fc14d15c2b72638243396816252f73a94dceeb0429112f?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.Q4_0.gguf%3B+filename%3D%22llama-2-13b-chat.Q4_0.gguf%22%3B&Expires=1697814930&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5NzgxNDkzMH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy84ZC9iMS84ZGIxZDFmNzNiNGNhYTU4ZTk0N2NjYmZlMmZiMjdhYzVlNDk1YzJhZDg0NTdhZDI5OWQxNTk4N2FlZTNiNTIwL2VkYTJhMTVkNTMyYmVhNGNlNmZjMTRkMTVjMmI3MjYzODI0MzM5N

In [16]:
from langchain.llms import LlamaCpp
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    n_ctx = 4096,
    model_path="/content/llama-2-13b-chat.Q4_0.gguf", # location of the model
    temperature=0.75,                 # temperature
    max_tokens=2000,                 # Max. number of tokens to be generated
    top_p=0.9,                    # top_p = 0.9
    top_k=30,                     # top_k = 30
    n_gpu_layers=43,                 # number of layers to offload to GPU
    verbose=True, # Verbose is required to pass to the callback manager
    n_batch=200,          # number of token generation in parallel
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [17]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = "You are a helpful, respectful and honest assistant."

def create_prompt(instruction, new_system_prompt=DEFAULT_SYSTEM_PROMPT):
  SYSTEM_PROMPT = B_SYS + new_system_prompt + E_SYS
  prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
  return "<s>" + prompt

In [18]:
# open the vector database
from langchain.embeddings import HuggingFaceEmbeddings
from chromadb.config import Settings
from langchain.vectorstores import Chroma


model_name = "sentence-transformers/all-MiniLM-L12-v2"
model_kwargs = {'device': 'cuda'}      # you must have a gpu, otherwise change it to cpu
encode_kwargs = {'normalize_embeddings': True}
embedding_func = HuggingFaceEmbeddings(
    model_name = model_name,
    model_kwargs = model_kwargs,
    encode_kwargs = encode_kwargs
)


persist_directory = "./"

CHROMA_SETTINGS = Settings(
    chroma_db_impl = 'duckdb+parquet',  # use duckdb to r/w parquet file
    anonymized_telemetry = False     # refuse telemetry of usage info
)

vector_db = Chroma(persist_directory=persist_directory,
            embedding_function=embedding_func,
            client_settings=CHROMA_SETTINGS)

In [19]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate


instruction = """Given the following passage, you find out the answer to the question.

Passage: {context}

Question: {question}

Helpful answer: """

prompt_template = create_prompt(instruction)
PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

chain_type_kwargs = {"prompt": PROMPT}
retriever = vector_db.as_retriever(search_kwargs={"k": 5})

## Stage 3: Example prompts

### Prompt 1: Give an overview of different  sentences in Singapore

In [27]:
query = "Give an overview of different  sentences in Singapore"
relevant_docs = retriever.get_relevant_documents(query)

for doc in relevant_docs:
  print(doc)

page_content='(d) if, having changed his or her residence to a place outside Singapore, the person later returns to Singapore, personally present himself or herself and notify his or her return and his or her place of residence in Singapore to the officer in charge of the police division in which his or her residence in Singapore is situated; and\n\n(e) if the person intends to be absent from his or her last notified residence for more than 48 hours without changing his or her place of residence, personally present himself or herself and notify his or her intention, where he or she intends to go and how long he or she will be away to the officer in charge of the police division in which his or her residence is situated.' metadata={'source': '/content/criminal_procedure_code.pdf'}
page_content='THE STATUTES OF THE REPUBLIC OF SINGAPORE\n\nCRIMINAL PROCEDURE CODE 2010\n\n2020 REVISED EDITION\n\nThis revised edition incorporates all amendments up to and including 1 December 2021 and comes

In [28]:
# Q&A Method using langchain RetrievalQA chain

qa_chain = RetrievalQA.from_chain_type(
                          llm=llm,
                          chain_type="stuff",
                          retriever=retriever,
                          return_source_documents=True,
                          chain_type_kwargs=chain_type_kwargs,
                      )
answer = qa_chain(query)
print("Question: ", answer['query'] + '\n\n')
print("Answer:", answer['result'] + '\n\n')

Llama.generate: prefix-match hit


Question:  Give an overview of different  sentences in Singapore


Answer:   Based on the given passage, here is an overview of different sentences in Singapore:

1. Imprisonment: The sentence may include imprisonment for a term not exceeding 48 hours, or such longer term as the court may consider appropriate (section 114(1)).
2. Fine: The court may impose a fine on the person, and in default of payment, the person may be imprisoned for a period not exceeding one month (section 114(2)).
3. Prohibition from leaving Singapore: The court may order the person to remain in Singapore for a reasonable period to facilitate the investigation (section 114(1)).
4. Compensation: The court may order due provision to be made for the maintenance of the person and compensate them for their loss of time (section 114(2)).
5. Computer access restrictions: The court may order any person not to access or use any computer in Singapore, or to stop accessing or using any such computer (section 80).
6. Confisc

In [None]:
answer['source_documents']

[Document(page_content='(d) if, having changed his or her residence to a place outside Singapore, the person later returns to Singapore, personally present himself or herself and notify his or her return and his or her place of residence in Singapore to the officer in charge of the police division in which his or her residence in Singapore is situated; and\n\n(e) if the person intends to be absent from his or her last notified residence for more than 48 hours without changing his or her place of residence, personally present himself or herself and notify his or her intention, where he or she intends to go and how long he or she will be away to the officer in charge of the police division in which his or her residence is situated.', metadata={'source': '/content/criminal_procedure_code.pdf'}),
 Document(page_content='THE STATUTES OF THE REPUBLIC OF SINGAPORE\n\nCRIMINAL PROCEDURE CODE 2010\n\n2020 REVISED EDITION\n\nThis revised edition incorporates all amendments up to and including 1 

### Prompt 2: What offences are punished with imprisonment for more than 8 years? Can you give examples?

In [None]:
query="What offences are punished with imprisonment for more than 8 years? Can you give examples?"
answer = qa_chain(query)
print("Question: ", answer['query'] + '\n\n')
print("Answer:", answer['result'] + '\n\n')

Llama.generate: prefix-match hit


Question:  What offences are punished with imprisonment for more than 8 years? Can you give examples?


Answer:   Based on the provided passage, the following offences are punishable with imprisonment for more than 8 years:

1. Offences punishable with imprisonment for life or death.
2. Offences punishable with imprisonment for a term of 5 years or more.

Examples of such offences include, but are not limited to:

1. Murder (punishable with imprisonment for life or death).
2. Robbery (punishable with imprisonment for a term of 5 years or more).
3. Theft (punishable with imprisonment for a term of 5 years or more).
4. Criminal intimidation (punishable with imprisonment for a term of 5 years or more).

Please note that this answer is based on the provided passage and may not be exhaustive or applicable to all situations.




In [None]:
answer['source_documents']

[Document(page_content='[1/2014]\n\n(5) Where an offender who is sentenced under subsection (1) or (2) is also convicted at the same trial of any offence other than an offence punishable with imprisonment for 2 years or more, the court may, on the application of the Public Prosecutor, instead of imposing any term of imprisonment as may be prescribed for that offence, take into account that offence for the purposes of determining the period of corrective training or preventive detention, as the case may be.\n\n(6) A person sentenced to corrective training or preventive detention must be detained in a prison for the term of his or her sentence in accordance with the regulations made under section 428.\n\nReformative training 305.—(1) Where a person is convicted by a court of an offence punishable with imprisonment and that person is, on the day of his or her conviction —\n\n(a) of or above 16 years of age but below 21 years of age; or', metadata={'source': '/content/criminal_procedure_co

### Prompt 3: What offences are punished with community service? Can you give examples?

In [None]:
query="What offences are punished with community service? Can you give examples?"
answer = qa_chain(query)
print("Question: ", answer['query'] + '\n\n')
print("Answer:", answer['result'] + '\n\n')

Llama.generate: prefix-match hit


Question:  What offences are punished with community service? Can you give examples?


Answer:   Based on the provided passage, the following offences are punishable with community service:

1. Section 356 of the Penal Code 1871, which involves assaulting or using criminal force to any person with intent to commit an offence punishable with imprisonment.
2. Section 357 of the Penal Code 1871, which involves causing grievous hurt to any person with intent to commit an offence punishable with imprisonment.
3. Section 358 of the Penal Code 1871, which involves causing grievous hurt to any person by doing an act with intention to cause such hurt.

Examples of these offences include:

* Assaulting someone with a weapon or causing them grievous hurt with intent to commit a further offence, such as theft or robbery.
* Using criminal force to restrain or injure someone, such as by using physical violence or threatening them with violence.

It's important to note that the specific examples of c

In [None]:
answer['source_documents']

[Document(page_content='(d) Intending to cause injury to B, A falsely accuses B of having committed an offence knowing that there is no just or lawful basis for the charge. At the trial, A gives false evidence against B, intending thereby to cause B to be convicted of a capital offence. A may be separately charged with offences under sections 211 and 194 of the Penal Code 1871.\n\n(e) A, with 6 others, commits the offences of rioting, causing grievous hurt and assaulting a public servant B trying to suppress the riot in the discharge of B’s duty. A may be separately charged with offences under sections 145, 325 and 152 of the Penal Code 1871.\n\n(f) A threatens B, C and D at the same time with injury to their persons with intent to cause alarm to them. A may be separately charged with each of the 3 offences under section 506 of the Penal Code 1871.', metadata={'source': '/content/criminal_procedure_code.pdf'}),
 Document(page_content='(a) A wrongfully strikes B with a cane. A may be se

## Stage 4: Evaluation and improvements

Although the results look promising a legal expert has identified various problems  which are listed below:

```
Question 1:  Give an overview of different  sentences in Singapore
Answer:   Based on the passage provided, there are several types of sentences that can be imposed in Singapore, including:

1. Imprisonment: The accused may be sentenced to imprisonment for a term not exceeding certain limits, depending on the offense committed.
2. Fine: The accused may be ordered to pay a fine, which is a monetary penalty for the offense committed.
3. Probation: The accused may be placed on probation, which means that they are released from imprisonment but are subject to certain conditions and restrictions.
4. Community service: As an alternative to imprisonment or in addition to imprisonment, the accused may be ordered to perform community service as a form of punishment.
5. Caning: For certain offenses, such as drug trafficking and other serious crimes, the accused may be sentenced to caning in addition to imprisonment.
6. Restitution: The accused may be ordered to pay restitution to the victim as a condition of their sentence.
7. Suspension of sentence: In some cases, the sentence may be suspended, meaning that it is not immediately enforced but is instead postponed until a later date.
8. Prohibition orders: The court may issue an order prohibiting the accused from engaging in certain activities or behaviors for a specified period.
9. Forfeiture of property: The court may order the forfeiture of any property that was used in the commission of the offense.

It is important to note that the specific sentence imposed will depend on the circumstances of the case and the discretion of the court.
```

**Evaluation by expert:**
The list is pretty good. It is missing a couple of sentences, eg reformative training, corrective training and preventive detention, police supervision.

S/No 7, 8 & 9 are, strictly speaking, not sentences. (7) Suspension of sentence is not a sentence, but what the President can do to suspend the sentence. (8) I don't know where it got prohibition orders from -- these are a type of court orders but they are not mentioned in the Criminal Procedure Code. (9) Forfeiture is a bit ambiguous as it follows a convocation and is made in addition to a sentence. So some people do consider it as a type of sentence.

```
Question 2:  What offences are punished with imprisonment for more than 8 years? Can you give examples?
Answer:   Based on the provided passage, offences punishable with imprisonment for more than 8 years include:

1. An offence under section 305 of the Penal Code, where the maximum punishment prescribed is imprisonment for a term of 5 years or more.
2. An offence under section 306 of the Penal Code, where the maximum punishment prescribed is imprisonment for life.
3. An offence under section 307 of the Penal Code, where the maximum punishment prescribed is death.
```

**Evaluation by expert**: This section of the answer is hallucination. These sections exist, and all of them do prescribed imprisonment sentences of more than 8 years, but the length of the sentences shown in all of the 3 answers are wrong.


```
Examples of such offences include:

1. Theft under section 380 of the Penal Code, where the maximum punishment prescribed is imprisonment for a term of 5 years or more.
2. Criminal breach of trust under section 409 of the Penal Code, where the maximum punishment prescribed is imprisonment for life.
3. Murder under section 302 of the Penal Code, where the maximum punishment prescribed is death.
```

**Evaluation by expert**: in this part, S/No is wrong as section 380 prescribes a maximum sentence of 7 years. S/No 2 is partly correct as the prescribed maximum sentence is 20 years -- so the sentence shown in the answer is wrong as the length of sentence. S/No 3 is correct.

For this set of answers, the model could not refer to information in the CPC. So its answers were not constrained to the trusted source document. This could explain why there is so much hallucination.

```
Question 3:  What offences are punished with community service? Can you give examples?
Answer:   Based on the provided passage, the following offences can be punished with community service:

1. Offences under sections 319 to 338 of the Penal Code 1871, which include assaults and injuries.
2. Offences under section 356, 357 or 358 of the Penal Code 1871, which involve injury to persons.
3. Child abuse offences under section 28(2) of the Children and Young Persons Act 1993.
4. Offences punishable under the Organised Crime Act 2015.
5. Sexual offences under section 169(3) of the Women's Charter 1961.
6. Any other offence that the Minister may, after consulting the Chief Justice, prescribe.

Examples of offences that can be punished with community service include:

* Assaulting a person and causing grievous hurt.
* Causing injury to a child or a young person.
* Committing an act of sexual violence.
* Engaging in organized crime activities, such as money laundering or drug trafficking.

It's important to note that the specific offences and penalties may vary depending on the jurisdiction and the circumstances of the case
```

**Evaluation by expert**: Going throuh each and every one of the references as it would take up too much time. But there are clear hallucinations and even outright errors here. For example, offences under the Organised Crime Act 2015 cannot be sentenced to community sentence. Likewise, offences involving sexual violence also cannot be sentenced to community sentence.

Problem could be cause by a couple of reasons. "Community service" is not the phrase used in the CPC. "Community sentence" is the phrase that is used. This could have "confused" the model, such that it realised on its general training and didn't constrain itself to the trusted document. Further, as observed in preceding comment, the answers that it provided cannot be found in the CPC. Hence, the high degree of hallucination.

One cause of these problems may be that the question is asking for a summary and examples that are likely to be spread out over the document. Another problem is probably that we didn't instruct the model to restrict it's answer to the facts provided in the text chunks.

Let's try a different approach asking for the punishment for a specific crime/violation.

### Prompt 4: What is the punishment in Singapore kidnapping a person? Restrict the answer to the facts provided in the passage.

In [30]:
query="What is the punishment in Singapore kidnapping a person? Restrict the answer to the facts provided in the passage."
answer = qa_chain(query)
print("Question: ", answer['query'] + '\n\n')
print("Answer:", answer['result'] + '\n\n')

Llama.generate: prefix-match hit


Question:  What is the punishment in Singapore kidnapping a person? Restrict the answer to the facts provided in the passage.


Answer:   Based on the facts provided in the passage, the punishment for kidnapping a person in Singapore is as follows:

* If the offender is convicted before the General Division of the High Court or a District Court of an offence punishable with imprisonment for 2 years or more, and has been convicted in Singapore or elsewhere at least twice since he or she reached 16 years of age for offences punishable with such a sentence, the punishment shall be imprisonment for at least one month (Criminal Procedure Code 2010, section 304(5)).
* If the offender is convicted at one trial before the General Division of the High Court or a District Court of 3 or more distinct offences punishable with imprisonment for 2 years or more, and has been convicted and sentenced in Singapore or elsewhere to imprisonment for at least one month since he or she reached 16 years of ag

In [31]:
answer['source_documents']

[Document(page_content='(a) the offender pleaded guilty to the offence;\n\n(b) the offender is a first-time offender;\n\n(c) the offender is of previous good character.\n\n[15/2019]\n\nCorrective training and preventive detention 304.—(1) Where a person of 18 years of age or above —\n\n(a) is convicted before the General Division of the High Court or a District Court of an offence punishable with imprisonment for 2 years or more, and has been convicted in Singapore or elsewhere at least twice since he or she reached 16 years of age for offences punishable with such a sentence; or\n\n(b) is convicted at one trial before the General Division of the High Court or a District Court of 3 or more distinct offences punishable with imprisonment for 2 years or more, and has been convicted and sentenced in Singapore or elsewhere to imprisonment for at least one month since he or she reached 16 years of age for an offence punishable with imprisonment for 2 years or more,', metadata={'source': '/co