Chatbot

Virtual Environment

In [1]:
from dotenv import load_dotenv
import os
from mistralai import Mistral
import requests
import numpy as np
import faiss # vector database
load_dotenv()

True

In [2]:
url = "https://www.parliament.go.ke/sites/default/files/2017-05/The_Constitution_of_Kenya_2010.pdf"
response = requests.get(url)
text = response.text
# print(text)

In [3]:
chunk_size = 3000 # split text into chunks
chunks = [text[i:i+chunk_size] for i in range (0, len(text), chunk_size)]
len(chunks)

543

In [4]:
api_key = os.environ["mistral_api_key"]
client = Mistral(api_key = api_key)

In [5]:
def get_single_embedding(text):
    response = client.embeddings.create(
        model="mistral-embed",
        inputs=[text]  # must be a list
    )
    return response.data[0].embedding

In [11]:
import time

def get_embeddings_in_batches(chunks, max_batch_size=50, max_retries=3, sleep_time=2):
    all_embeddings = []

    for i in range(0, len(chunks), max_batch_size):
        batch = chunks[i:i+max_batch_size]

        for attempt in range(max_retries):
            try:
                response = client.embeddings.create(
                    model="mistral-embed",
                    inputs=batch
                )
                batch_embeddings = [item.embedding for item in response.data]
                all_embeddings.extend(batch_embeddings)
                break  # success → break retry loop
            except Exception as e:
                if "rate limit" in str(e).lower():
                    print(f"Rate limit hit. Sleeping {sleep_time}s...")
                    time.sleep(sleep_time)
                    sleep_time *= 2  # exponential backoff
                else:
                    raise

    return all_embeddings


In [12]:
# 1. Create embeddings for your document chunks
text_embeddings = get_embeddings_in_batches(chunks, max_batch_size=10)
text_embeddings = np.array(text_embeddings, dtype="float32")
# 2. Build FAISS index
d = text_embeddings.shape[1]   # embedding dimension
index = faiss.IndexFlatL2(d)   # L2 distance index
index.add(text_embeddings)     # add all chunk embeddings

In [13]:
# Create embedding for the query
question = "What are the Principles of land policy in the constitution of Kenya"
question_embeddings = get_single_embedding(question)
# Convert to numpy and reshape
question_embeddings = np.array(question_embeddings, dtype="float32").reshape(1, -1)
# Search top-k similar chunks
D, I = index.search(question_embeddings, k=2)
# Retrieve the corresponding chunks
retrieved_chunks = [chunks[i] for i in I[0]]


In [14]:
# Prompt Engineering
prompt = f""""
The context information is as below:
{retrieved_chunks}
Given the context information and without prior knowledge answer the query
query: {question}
answer:
"""

In [15]:
def run_mistral(user_message, model = "mistral-large-latest"):
    messages = [
        {
            "role": "user",
            "content": user_message
        }
    ]
    chat_response = client.chat.complete(model=model, messages=messages)

    return chat_response.choices[0].message.content

In [16]:
print(run_mistral(prompt))

The provided context appears to be corrupted or unreadable (it contains non-textual, garbled data). As a result, it does not contain any meaningful information about the **Principles of Land Policy in the Constitution of Kenya**.

However, I can provide the correct answer based on **Article 60 of the Constitution of Kenya (2010)**, which outlines the **Principles of Land Policy**:

### **Principles of Land Policy in the Constitution of Kenya (2010)**
The land policy in Kenya is guided by the following principles (as per **Article 60**):

1. **Equitable Access** – Land should be held, used, and managed in a manner that is equitable, efficient, productive, and sustainable.
2. **Secure Land Rights** – Land tenure systems should be based on:
   - **Equity** (fairness in distribution and access),
   - **Common good** (public interest),
   - **Sustainable development** (environmental and economic sustainability),
   - **Productive use** (efficient utilization).
3. **Elimination of Gender Dis

In [None]:
# The workflow that makes it easier to ask subsequent questions with just a function
def ask_question(question):
    # embed
    q_emb = get_single_embedding(question)
    q_emb = np.array(q_emb, dtype="float32").reshape(1, -1)
    # search
    D, I = index.search(q_emb, k=2)
    retrieved_chunks = [chunks[i] for i in I[0]]
    # build prompt
    context = "\n".join(retrieved_chunks)
    prompt = f"Answer the question using the context below:\n\nContext:\n{context}\n\nQuestion: {question}\nAnswer:"
    # run model
    return run_mistral(prompt)

In [41]:
print(ask_question("What is the role of the Judiciary in Kenya"))

The provided context appears to be corrupted or unreadable, so I cannot extract meaningful information from it. However, I can answer your question about the **role of the Judiciary in Kenya** based on general knowledge and Kenya's constitutional framework.

### **Role of the Judiciary in Kenya**
The Judiciary in Kenya is an independent arm of government established under **Article 159 of the Constitution of Kenya (2010)**. Its primary roles include:

1. **Interpreting and Applying the Law**
   - The Judiciary interprets the Constitution and other laws to ensure justice is administered fairly.
   - It resolves disputes between individuals, institutions, and the government.

2. **Upholding the Constitution**
   - It ensures that all laws and government actions comply with the **Constitution of Kenya (2010)**.
   - It has the power to declare laws or actions **unconstitutional** if they violate fundamental rights.

3. **Administering Justice**
   - It handles **civil, criminal, and const

In [43]:
print(ask_question("What does the Road Traffic Act state in the constitution of Kenya?"))

The provided context is a **PDF object structure** (likely from a PDF file of the Kenyan laws or another document), not the actual text of the **Road Traffic Act** or the **Constitution of Kenya**. The snippets you shared are metadata entries describing page layouts (e.g., `MediaBox`, `Resources`, `Contents`), not legal content.

### **Road Traffic Act in Kenya’s Legal Framework**
The **Road Traffic Act (Cap. 403)** is a **statute** (not part of the Constitution itself) that governs road use, vehicle registration, licensing, traffic offenses, and safety regulations in Kenya. Key provisions include:

1. **Licensing & Registration**
   - Mandates driver’s licenses (Section 30) and vehicle registration (Part III).
   - Requires third-party insurance (Section 5).

2. **Traffic Rules & Offenses**
   - Speed limits, drunk driving (Section 44), and reckless driving penalties.
   - Use of seatbelts (Section 118A) and helmets for motorcyclists.

3. **Enforcement**
   - Powers of police and **Na

In [44]:
print(ask_question("What does the law of Kenya say about Corruption?"))

The context you provided is a snippet of a **PDF file's internal structure** (specifically, page objects in a PDF's object stream). It does **not** contain any textual content about Kenyan laws or corruption.

To answer your question about **Kenya's laws on corruption**, here’s a summary based on actual legal frameworks:

### **Kenya’s Anti-Corruption Laws**
Kenya has several laws and institutions designed to combat corruption, including:

1. **The Anti-Corruption and Economic Crimes Act (2003, amended 2019)**
   - Criminalizes bribery, fraud, embezzlement, abuse of office, and other economic crimes.
   - Establishes the **Ethics and Anti-Corruption Commission (EACC)** as the primary anti-corruption agency.
   - Provides for asset recovery and whistleblower protections.

2. **The Bribery Act (2016)**
   - Prohibits bribery in both public and private sectors.
   - Introduces corporate liability for corruption offenses.

3. **The Public Officer Ethics Act (2003)**
   - Sets ethical stand

In [14]:
def get_embeddings(input):
    embeddings_batch_response = client.embeddings.create(
        model = "mistral-embed",
        inputs = input
    )
    return embeddings_batch_response.data[0].embedding

In [None]:
text_embedding = np.array([get_embeddings(chunk) for chunk in chunks])

In [7]:
# store 
d = text_embedding.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embedding)

In [8]:
question = "What were the 2 main things the author worked on before college?"
question_embeddings = np.array([get_embeddings(question)])

In [9]:
D, I = index.search(question_embeddings, k=2) # D = distance, I = index
retrieved_chunk = [chunks[1] for i in I.tolist()[0]]

## Zero short learning
## few short learning

In [10]:
# Prompt Engineering
prompt = f""""
The context information is as below:
{retrieved_chunk}
Given the context information and without prior knowledge answer the query
query: {question}
answer:
"""

# RAG - Retrieval Augmented Generation

In [11]:
def run_mistral(user_message, model = "mistral-large-latest"):
    messages = [
        {
            "role": "user",
            "content": user_message
        }
    ]
    chat_response = client.chat.complete(model=model, messages=messages)

    return chat_response.choices[0].message.content

In [12]:
print(run_mistral(prompt))

Before college, the author mainly worked on:

1. Programming simple games.
2. Developing a program to predict how high model rockets would fly.
