# Kenya Constitution Chatbot

This notebook is designed to extract and process text from the Kenya Constitution PDF, enabling a chatbot to answer questions about the constitution using natural language processing (NLP) techniques.


***
## Business Understanding
***
### Overview

A nation's Constitution serves as its foundational legal framework, outlining the structure of government, the rights and duties of citizens, and the principles that guide the rule of law. In the case of the Constitution of Kenya, it establishes the basis for democratic governance, justice, and the protection of human rights. It is a critical document that influences both the operation of state institutions and the freedoms of individuals. Therefore, it is crucial for every citizen to have access to and understand their country's Constitution.

However, many people face challenges when trying to access and comprehend the Constitution, particularly if they are unfamiliar with legal language or the document’s structure. This project seeks to address these obstacles by creating a question-answering system focused on the Constitution of Kenya. Utilizing supervised machine learning and natural language processing (NLP) techniques, this system will allow users to ask questions about the Constitution and receive accurate answers in real time, directly sourced from the document's content.

By developing this system, the project aims to foster a deeper understanding of constitutional rights and responsibilities among users. It is designed to empower individuals by improving access to crucial legal information, encouraging civic engagement, and supporting legal education. This tool will provide an accessible and user-friendly way to navigate the complexities of the Constitution, helping users, including legal professionals and the general public, better understand their rights and the workings of the law in Kenya.
### Business Problem
In Kenya, there is a significant gap in public understanding of the Constitution. Many citizens, including students, legal practitioners, and the general public, often face challenges when trying to access information on constitutional rights, duties, and legal interpretations. The complex legal language used in the Constitution can be intimidating, undermining public comprehension. This lack of understanding can lead to confusion about legal matters and reduced engagement in civic duties, as well as hinder individuals' ability to seek justice and engage meaningfully with governance. By addressing these issues, the proposed platform aims to simplify access to constitutional knowledge, empowering individuals to advocate for their rights and participate actively in the democratic process.

### Stakeholders

1. **Lawyers and legal practitioners:** For quick reference to constitutional clauses and provisions.
2. **Government institutions:** To facilitate better governance through enhanced public understanding of constitutional mandates.
3. **Citizens:** To empower individuals by making legal information accessible.
4. **Media:** As a tool for accurate reporting on constitutional matters.
5. **Civic activists:** To support advocacy and public education on constitutional rights.

### Objectives

1. **Create a User-Friendly Interface:**** Develop a clean, intuitive user interface that enables users to easily interact with the Q&A system, ensuring a seamless user experience that encourages frequent use.
2. **Improve Legal Literacy:** Educate users about their rights and responsibilities under the Constitution.
3. **Support Legal Practitioners:** Assist legal professionals in quickly retrieving relevant constitutional information to enhance their practice and advocacy. 
4. **Leverage Natural Language Processing (NLP) Techniques:** Apply advanced NLP techniques to extract relevant information, interpret questions correctly, and match them with the most appropriate sections of the Constitution

***
## Data Understanding
***
### Data Source:
* **Kenyan Constitution:** The full PDF of the Kenyan Constitution is the primary data source, covering various chapters and articles that define the structure of government, judicial authority, human rights, and other foundational legal aspects.
### Content and Structure Analysis:
* **Chapters and Sections:** The document contains 18 chapters with multiple sections and sub-sections. Each chapter addresses distinct themes such as “Judicial Authority and Legal System,” “Human Rights and Freedoms,” and “Representation of the People.”

* **Language and Terminology:** Since constitutional language is formal and legalistic, it is essential to understand common terminology and possible user variations to structure queries effectively.
### Challenges:
* **Complex Language:** Identifying ways to simplify or interpret legal terminology for broader public comprehension.
Contextual Overlaps: Some sections contain overlapping terms (e.g., “court,” “justice”) that can lead to misclassification. Techniques to manage synonym mapping and context filtering will be vital.

***
## Data Preparation
***
### Text Extraction:
* **PDF Processing:** Using pdfplumber, the text is extracted in a structured way. Each chapter is split into sections, with a focus on maintaining the original structure for consistency.

* **Function for Section Splitting:** split_chapter function organizes chapters into distinct sections based on headings and articles, facilitating better indexing and retrieval.
### Text Cleaning and Preprocessing:
* **Tokenization:** The document is tokenized into words and phrases to break down the text into manageable parts.

* **Stopword Removal and Lemmatization:** To enhance query matching, stopwords (like "and," "the") are removed, and words are normalized to their root forms.
### Synonym and Keyword Mapping:
* **Synonym Mapping:** Created a dictionary to match legal terms to lay person synonyms (e.g., “jurisdiction” mapped to “authority”).

* **Spelling Correction:** Integrated SpellChecker to address common spelling mistakes, ensuring queries are correctly matched with document sections.

***
## Modeling
***
The system’s core functionality lies in its Question-Answering Mechanism, designed to interpret user queries accurately and retrieve the most relevant constitutional sections. This is achieved through advanced Natural Language Processing (NLP) and Natural Language Understanding (NLU) components, which work in tandem to understand, match, and respond to user questions.

### Question-Answering Mechanism
#### Matching User Queries
* **answer_question_nlp Function:** This custom function analyzes user queries for keywords and phrases, matching them to relevant sections in the qa_mapping database, which is structured to cover critical constitutional topics. The function’s matching mechanism ensures that user questions are directed to the correct sections, regardless of variations in wording.

#### NLP Techniques
* **Named Entity Recognition (NER):** Leveraging spaCy’s NER capabilities, the system identifies and highlights essential entities in user queries, such as "President," "rights," and "court." This step aids in narrowing down relevant sections by directly mapping the entities to specific articles or sub-sections within the Constitution.

* **Semantic Similarity Scoring:** To handle diverse query phrasing, the bot calculates semantic similarity scores, allowing it to match terms with similar or alternative wording. For example, queries containing terms like "entitlements" are effectively matched to sections on "rights," increasing the system's robustness in interpreting different user expressions.

* **Query Expansion:** To ensure high response accuracy, the model employs query expansion techniques that enhance its ability to recognize variations of key legal terms. This way, synonyms and related terms are captured, broadening the model’s understanding and ability to retrieve accurate responses for users.

### Natural Language Understanding (NLU) 
The NLU component is integral to interpreting the purpose of a user’s query, focusing on Intent Recognition, Entity Recognition, and Response Matching:

* **Intent Recognition:** The NLU module discerns the underlying intent of user questions, mapping them to predefined legal themes. For instance, queries like “What are my rights?” or “Explain judiciary powers” are linked to topics such as citizens' rights and judicial authority, facilitating accurate document retrieval.

* **Entity Recognition:** Essential entities within user queries (e.g., “President,” “court system,” “constitution”) are identified and used to pinpoint sections within the Constitution. This helps the bot retrieve text that aligns closely with the user's inquiry by understanding specific legal terms and names.

* **Response Matching:** By combining intent recognition and entity identification, the NLU module prioritizes the most relevant sections, ensuring that responses align closely with the intended question. This matching process improves the bot’s precision and accuracy, especially for queries that may overlap in meaning across different sections.






***
## Evaluation
***
### Testing and Accuracy:
* **Functionality Testing:** A series of sample questions (e.g., “What is the supremacy of the constitution?”) were posed to the chatbot to confirm it accurately retrieves the relevant sections.

* **Manual Verification:** Each response is checked for accuracy, especially in high-ambiguity areas such as overlapping legal terms.

* **Relevance:** Ensuring the answer aligns well with user intent and contains the most relevant constitutional articles.

* **User Feedback:** Post-deployment feedback collected via Telegram will help in refining the system.

### Limitations:
* **Ambiguity in Legal Terminology:** Variances in phrasing might lead to misinterpretation without sufficient synonym mapping.

* **Complex Queries:** Longer, multifaceted questions may require additional processing steps to break down and respond accurately.


***
## Deployment
***
* **Telegram Integration:** The chatbot is deployed on Telegram, allowing users to ask questions in a familiar messaging environment. Telegram’s API facilitates interaction between the chatbot and users, making it easily accessible.

* **Real-time Response:** When a user submits a query on Telegram, the NLU component processes it, and the system responds with the best-matching constitutional article.

* **Continued Updates:** Based on user feedback, the model can be updated to improve understanding and expand coverage.


## Table of Contents
1. [Imports and Setup](#imports-and-setup)
2. [Text Extraction from PDF](#text-extraction-from-pdf)
3. [Text Preprocessing](#text-preprocessing)
4. [Synonym and QA Mapping](#synonym-and-qa-mapping)
5. [Question Answering System](#question-answering-system)
6. [Bot Integration](#bot-integration)
7. [Data Export](#data-export)

---

## Imports and Setup


python
Import necessary libraries

import spacy

import pdfplumber

from spellchecker import SpellChecker


In [1]:
import spacy
import pdfplumber
from spellchecker import SpellChecker


## Text Extraction from PDF

### Function to Extract Specific Pages


In [2]:
# Path to your PDF file
pdf_path = "E:\SCHOOL\Phase 5\Kenya Constitution\Kenya Constitution.pdf"

# Function to extract text from specific pages
def extract_specific_pages(pdf_path, start_page, end_page):
    with pdfplumber.open(pdf_path) as pdf:
        extracted_text = ""
        for page_num in range(start_page, end_page):
            page = pdf.pages[page_num]
            extracted_text += page.extract_text() + "\n"
        return extracted_text

# Chapter 1

Extract text for this chapter

In [3]:
chapter_1 = extract_specific_pages(pdf_path, 12,14 )
chapter_1_trimmed = chapter_1.split("CHAPTER TWO")[0].strip()



## Text Preprocessing

### Preprocessing User Queries


In [4]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Example usage of the preprocessing function
user_query = "What is the supremacy of the constitution?"
processed_query = preprocess_query(user_query)
print(processed_query)


supremacy constitution


In [5]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "sovereignty": [],
        "supremacy": [],
        "defence": []   
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Sovereignty of the people"):
            current_section = "sovereignty"
        elif stripped_line.startswith("Supremacy of this Constitution"):
            current_section = "supremacy"
        elif stripped_line.startswith("Defence of this Constitution"):
            current_section = 'defence'

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_1_sections = split_chapter(chapter_1_trimmed)

print("Example")
print("\nSupremacy Section:\n", chapter_1_sections['supremacy'])
print("\nDefence Section:\n", chapter_1_sections['defence'])

Example

Supremacy Section:
 Supremacy of this Constitution.
2. (1) This Constitution is the supreme law of the Republic and
binds all persons and all State organs at both levels of government.
(2) No person may claim or exercise State authority except as
authorised under this Constitution.
(3) The validity or legality of this Constitution is not subject to
challenge by or before any court or other State organ.
(4) Any law, including customary law, that is inconsistent with this
Constitution is void to the extent of the inconsistency, and any act or
omission in contravention of this Constitution is invalid.
(5) The general rules of international law shall form part of the
law of Kenya.
14 Constitution of Kenya, 2010
(6) Any treaty or convention ratified by Kenya shall form part of
the law of Kenya under this Constitution.

Defence Section:
 Defence of this Constitution.
3. (1) Every person has an obligation to respect, uphold and
defend this Constitution.
(2) Any attempt to establish a

### Question Answering mechanism for chapter 1

In [6]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 1
sections = chapter_1_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "supremacy": "supremacy of this constitution",
    "sovereignty": "sovereignty of the people",
    "defence": "defence of this constitution"
    # Add more mappings as needed
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What is the supremacy of the constitution?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

# Example 2
user_query_2 = 'Explain what the defence the people states'
answer_2 = answer_question_nlp(user_query_2, sections, qa_mapping)
answer_2


Supremacy of this Constitution.
2. (1) This Constitution is the supreme law of the Republic and
binds all persons and all State organs at both levels of government.
(2) No person may claim or exercise State authority except as
authorised under this Constitution.
(3) The validity or legality of this Constitution is not subject to
challenge by or before any court or other State organ.
(4) Any law, including customary law, that is inconsistent with this
Constitution is void to the extent of the inconsistency, and any act or
omission in contravention of this Constitution is invalid.
(5) The general rules of international law shall form part of the
law of Kenya.
14 Constitution of Kenya, 2010
(6) Any treaty or convention ratified by Kenya shall form part of
the law of Kenya under this Constitution.


'Defence of this Constitution.\n3. (1) Every person has an obligation to respect, uphold and\ndefend this Constitution.\n(2) Any attempt to establish a government otherwise than in\ncompliance with this Constitution is unlawful.'

# CHAPTER 1 NLU

In [7]:
# Define synonym mapping
synonyms_1 = {
    "supremacy" : ["supremacy", "authority", "ultimate power"],
    "sovereignty" : ["sovereignty", "power of the people", "authority of the people"],
    "defence" : ["defense", "protection", "preservation"]
}

# QA mapping
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence"
}

sections = chapter_1_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_1):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms_1.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What is the ultimate power of the constitution?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_1)
print(answer)

# Example 2
user_query2 = "What does the constitution say about defensey?"
answer = answer_question_nlp(user_query2, sections, qa_mapping, synonyms_1)
print(answer)


Processed Query: ultimate power constitution
Corrected Query: ultimate power constitution
Checking key: supremacy, value: supremacy
Trying synonym: supremacy
Trying synonym: authority
Trying synonym: ultimate power
Match found with synonym: ultimate power
Supremacy of this Constitution.
2. (1) This Constitution is the supreme law of the Republic and
binds all persons and all State organs at both levels of government.
(2) No person may claim or exercise State authority except as
authorised under this Constitution.
(3) The validity or legality of this Constitution is not subject to
challenge by or before any court or other State organ.
(4) Any law, including customary law, that is inconsistent with this
Constitution is void to the extent of the inconsistency, and any act or
omission in contravention of this Constitution is invalid.
(5) The general rules of international law shall form part of the
law of Kenya.
14 Constitution of Kenya, 2010
(6) Any treaty or convention ratified by Kenya 

# CHAPTER 2

In [8]:
# Extract chapter 2 content
chapter_2 = extract_specific_pages(pdf_path, 13,16)
chapter_2_cleaned = chapter_2.split("CHAPTER TWO")[1].strip()
chapter_2_trimmed = chapter_2_cleaned.split("CHAPTER THREE")[0].strip()

In [9]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "declaration republic": [],
        "territory": [],
        "devolution": [],
        "language": [],
        "religion": [],
        "symbol": [],
        "day": [],
        "national value principle governance": [],
        "culture": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        # Start of new sections
        if stripped_line.startswith("Declaration of the republic"):
            current_section = "declaration republic"
        elif stripped_line.startswith("Territory of Kenya"):
            current_section = "territory"
        elif stripped_line.startswith("Devolution and access to services"):
            current_section = 'devolution'
        elif stripped_line.startswith("National, official and other languages"):
            current_section = "language"
        elif stripped_line.startswith("State and Religion"):
            current_section = "religion"
        elif stripped_line.startswith("National symbols and national days"):
            current_section = "symbol"
        elif stripped_line.startswith("The national days are"):
            current_section = "day"
        elif stripped_line.startswith("National values and principles"):
            current_section = "national value principle governance"
        elif stripped_line.startswith("Culture"):
            current_section = "culture"

        # Append line to the current section if it's set
        if current_section:
            # Prevent adding "days" content to "symbols"
            if current_section == "symbol" and "The national days are" in stripped_line:
                current_section = "day"

            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_2_sections = split_chapter(chapter_2_trimmed)

# Print to verify the results
print("\n National Symbols Section:\n", chapter_2_sections["symbol"])
print("\n National Days Section:\n", chapter_2_sections["day"])



 National Symbols Section:
 National symbols and national days.
9. (1) The national symbols of the Republic are—
(a) the national flag;
(b) the national anthem;
(c) the coat of arms; and
(d) the public seal.
(2) The national symbols are as set out in the Second Schedule.

 National Days Section:
 (3) The national days are—
(a) Madaraka Day, to be observed on 1st June;
(b) Mashujaa Day, to be observed on 20th October; and
(c) Jamhuri Day, to be observed on 12th December.
(4) A national day shall be a public holiday.
(5) Parliament may enact legislation prescribing other public
holidays, and providing for observance of public holidays.


# Question Answering mechanism

In [10]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 2
sections = chapter_2_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "declaration republic": "declaration republic",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "language",
    "religion": "state and religion",
    "symbol": "symbol",
    "day": "day",
    "national value principle governance": "national value principle governance",
    "culture": "culture"
}



# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]
        
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]
        
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "National days in the constitution?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

print('\n')

user_query2 = "What does the culture entail?"
answer = answer_question_nlp(user_query2, sections, qa_mapping)
print(answer)

Processed query: national day constitution
key: day
(3) The national days are—
(a) Madaraka Day, to be observed on 1st June;
(b) Mashujaa Day, to be observed on 20th October; and
(c) Jamhuri Day, to be observed on 12th December.
(4) A national day shall be a public holiday.
(5) Parliament may enact legislation prescribing other public
holidays, and providing for observance of public holidays.


Processed query: culture entail
key: culture
Culture.
11. (1) This Constitution recognises culture as the foundation of
the nation and as the cumulative civilization of the Kenyan people and
nation.
(2) The State shall—
(a) promote all forms of national and cultural expression through
literature, the arts, traditional celebrations, science,
communication, information, mass media, publications,
libraries and other cultural heritage;
(b) recognise the role of science and indigenous technologies in
the development of the nation; and
(c) promote the intellectual property rights of the people of Keny

# CHAPTER 2 NLU

In [11]:
# Define synonym mapping
synonyms_2 = {
    "declaration": ["declaration", "proclamation"],
    "territory": ["territory", "land", "region",],
    "devolution": ["devolution", "decentralization"],
    "language": ["language", "dialects", "official languages"],
    "religion": ["religion", "faith", "belief",],
    "symbol": ["symbol", "emblem", "insignia"],
    "day": ["day", "holiday", "public holiday"],
    "national value principle governance": ["national value principle governance"],
    "culture": ["culture", "heritage", "tradition"]
    
}

# QA mapping
qa_mapping = {
    "declaration republic": "declaration republic",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "language",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "national value principle governance": "national value principle governance",
    "culture": "culture"
}

sections = chapter_2_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_2):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms_2.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What of heritage?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_2)
print(answer)

# Example 2
user_query2 = "What does the constitution say about insignia?"
answer = answer_question_nlp(user_query2, sections, qa_mapping, synonyms_2)
print(answer)


Processed Query: heritage
Corrected Query: heritage
Checking key: declaration republic, value: declaration republic
Trying synonym: declaration republic
Checking key: territory, value: territory
Trying synonym: territory
Trying synonym: land
Trying synonym: region
Checking key: devolution, value: devolution
Trying synonym: devolution
Trying synonym: decentralization
Checking key: language, value: language
Trying synonym: language
Trying synonym: dialects
Trying synonym: official languages
Checking key: religion, value: religion
Trying synonym: religion
Trying synonym: faith
Trying synonym: belief
Checking key: symbol, value: symbol
Trying synonym: symbol
Trying synonym: emblem
Trying synonym: insignia
Checking key: day, value: day
Trying synonym: day
Trying synonym: holiday
Trying synonym: public holiday
Checking key: national value principle governance, value: national value principle governance
Trying synonym: national value principle governance
Checking key: culture, value: culture


# CHAPTER 3

In [12]:
# Extract the relevant pages
chapter_3 = extract_specific_pages(pdf_path, 15,19 )
chapter_3_cleaned = chapter_3.split("CHAPTER THREE")[1].strip()
chapter_3_trimmed = chapter_3_cleaned.split("CHAPTER FOUR")[0].strip()


In [13]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "entitlement citizen": [],
        "retention": [],
        "birth": [],
        "registration": [],
        "dual": [],
        "revocation": [],
        "legislation citizen" : [],
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Entitlements of citizens"):
            current_section = "entitlement citizen"
        elif stripped_line.startswith("Retention and acquisition of citizenship"):
            current_section = "retention"
        elif stripped_line.startswith("Citizenship by birth"):
            current_section = 'birth'
        elif stripped_line.startswith("Citizenship by registration"):
            current_section = "registration"
        elif stripped_line.startswith("Dual citizenship"):
            current_section = "dual"
        elif stripped_line.startswith("Revocation of citizenship"):
            current_section = "revocation"
        elif stripped_line.startswith("Legislation on citizenship"):
            current_section = "legislation citizen"

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_3_sections = split_chapter(chapter_3_trimmed)

print("Example")
print("\nLegislation Section:\n", chapter_3_sections['legislation citizen'])
print('\n')
print("\nDefence Section:\n", chapter_3_sections['entitlement citizen'])

Example

Legislation Section:
 Legislation on citizenship.
18. Parliament shall enact legislation—
(a) prescribing procedures by which a person may become a
citizen;
(b) governing entry into and residence in Kenya;
(c) providing for the status of permanent residents;
(d) providing for voluntary renunciation of citizenship;
(e) prescribing procedures for revocation of citizenship;
(f) prescribing the duties and rights of citizens; and
(g) generally giving effect to the provisions of this Chapter.



Defence Section:
 Entitlements of citizens.
12. (1) Every citizen is entitled to—
(a) the rights, privileges and benefits of citizenship, subject to the
limits provided or permitted by this Constitution; and
(b) a Kenyan passport and any document of registration or
identification issued by the State to citizens.
Constitution of Kenya, 2010 17
(2) A passport or other document referred to in clause (1) (b) may
be denied, suspended or confiscated only in accordance with an Act of
Parliament tha

#### Question Answering mechanism for chapter 3

In [14]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 1
sections = chapter_3_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "entitlement citizen": "entitlement citizen",
    "retention": "retention",  # Ensure this matches your sections key
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation citizen": "legislation citizen"
    
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "dual citizenship?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

# Example 2
user_query_2 = "how do I become a citizen by registration"
answer_2 = answer_question_nlp(user_query_2, sections, qa_mapping)
print(answer_2)


Dual citizenship.
16. A citizen by birth does not lose citizenship by acquiring the
citizenship of another country.
Citizenship by registration.
15. (1) A person who has been married to a citizen for a period of
at least seven years is entitled on application to be registered as a
citizen.
(2) A person who has been lawfully resident in Kenya for a
continuous period of at least seven years, and who satisfies the
conditions prescribed by an Act of Parliament, may apply to be
registered as a citizen.
(3) A child who is not a citizen, but is adopted by a citizen, is
entitled on application to be registered as a citizen.
18 Constitution of Kenya, 2010
(4) Parliament shall enact legislation establishing conditions on
which citizenship may be granted to individuals who are citizens of
other countries.
(5) This Article applies to a person as from the effective date, but
any requirements that must be satisfied before the person is entitled to
be registered as a citizen shall be regarded as havi

# CHAPTER 3 NLU

In [15]:
citizenship_mapping = {
    "dual citizenship": "dual",
    "retention of citizenship": "retention",
    "citizenship by birth": "birth",
    "citizenship by registration": "registration",
    "legislation on citizenship": "legislation citizen",
    "revocation of citizenship": "revocation"
}


for subtopic, section_key in citizenship_mapping.items():
    print(section_key)

dual
retention
birth
registration
legislation citizen
revocation


In [16]:
# Define synonym mapping
synonyms_3 = {
    "entitlement citizen": ["entitlement citizen", "entitlement right"],
    "retention": ["retention", "maintenance"],
    "birth": ["birth", "native", "inborn citizenship"],
    "registration": ["registration", "citizenship application", "naturalization"],
    "dual": ["dual", "dual citizenship"],
    "revocation": ["revocation", "forfeiture"],
    "legislation citizen": ["legislation citizen", "legislation citizenship"]

}

# QA mapping
qa_mapping = {
    "entitlement citizen": "entitlement citizen",
    "retention": "retention",
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation citizen" : "legislation citizen"
}

# Define a dedicated mapping for citizenship subtopics
citizenship_mapping = {
    "retention citizenship": "retention",
    "citizenship birth": "birth",
    "citizenship registration": "registration",
    "dual citizenship": "dual",
    "revocation citizenship": "revocation",
    "legislation citizenship": "legislation citizen"
}


sections = chapter_3_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input


# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line

    # Check for specific citizenship subtopics first
    for subtopic, section_key in citizenship_mapping.items():
        if subtopic in corrected_query:
            print(f"Specific citizenship topic detected: {subtopic}")
            return section_key  # Return the section key for the specific subtopic

    # Only check for general citizenship if no specific subtopic was found
    if "citizenship" in corrected_query:
        print("General citizenship query detected.")  # Debugging line
        return "citizenship"  # Return a placeholder indicating interest in citizenship

    for key in qa_mapping:
        print(f"Checking key: {key}")  # Debugging line
        for synonym in synonyms_3.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key

    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping)
    
    if section_key in sections:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    elif section_key == "citizenship":
        return (f"It seems you're interested in citizenship. "
                f"Available subtopics include: {list(citizenship_mapping.keys())}.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What about entitlements of ctizens?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_3)
print(answer)

Processed Query: entitlement ctizen
Corrected Query: entitlement citizen
Checking key: entitlement citizen
Trying synonym: entitlement citizen
Match found with synonym: entitlement citizen
Entitlements of citizens.
12. (1) Every citizen is entitled to—
(a) the rights, privileges and benefits of citizenship, subject to the
limits provided or permitted by this Constitution; and
(b) a Kenyan passport and any document of registration or
identification issued by the State to citizens.
Constitution of Kenya, 2010 17
(2) A passport or other document referred to in clause (1) (b) may
be denied, suspended or confiscated only in accordance with an Act of
Parliament that satisfies the criteria referred to in Article 24.


# Since all functions above run perfectly, we'll combine the constants into a single variable for easy reference in the final code,

In [17]:
# Define the synonyms explicitly so that we can easily reference them later
synonyms = {
    "supremacy": ["supremacy", "authority", "ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration": ["declaration", "proclamation", "statement", "announcement", "affirmation"],
    "territory": ["territory", "land", "region", "area", "jurisdiction", "bounds"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "value": ["value", "principle", "ethic", "core value", "standard", "national ideal"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement citizen": ["entitlement citizen", "entitlement right"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation citizen": ["legislation citizen"]
}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration": "declaration",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "value": "value",
    "culture": "culture",
    "entitlement citizen": "entitlement citizen",
    "retention": "retention", 
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation citizen": "legislation citizen"
    
}

combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections}
# Print the keys of the dictionary
combined_sections.keys()

dict_keys(['sovereignty', 'supremacy', 'defence', 'declaration republic', 'territory', 'devolution', 'language', 'religion', 'symbol', 'day', 'national value principle governance', 'culture', 'entitlement citizen', 'retention', 'birth', 'registration', 'dual', 'revocation', 'legislation citizen'])

#

# CHAPTER 4

In [18]:
# extracting chapter 4 and trimming only to contain chapter 4 text
chapter_4 = extract_specific_pages(pdf_path, 18,41 )
chapter_4_trimmed = chapter_4.split("CHAPTER FOUR", 1)[1].split("CHAPTER FIVE", 1)[0].strip()

In [19]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Example usage of the preprocessing function
user_query = "What is the supremacy of the constitution?"
processed_query = preprocess_query(user_query)
print(processed_query)

supremacy constitution


In [20]:
def split_chapter_4(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "fundamental right freedom": [],
        "application bill right": [],
        "implementation right": [],
        "enforcement bill right": [],
        "authority court": [],
        "limitation right": [],
        "fundamental right freedom limit": [],
        "life": [],
        "equality": [],
        "dignity": [],
        "security": [],
        "slavery": [],
        "privacy": [],
        "conscience": [],
        "expression": [],
        "medium": [],
        "information": [],
        "association": [],
        "assembly": [],
        "political": [],
        "movement": [],
        "property": [],
        "work": [],
        "environment": [],
        "economic": [],
        "language culture": [],
        "family": [],
        "consumer": [],
        "fair administrative action": [],
        "justice": [],
        "arrest": [],
        "fair hearing": [],
        "custody": [],
        "interpret": [],
        "infant": [],
        "disable": [],
        "youth": [],
        "minority": [],
        "old": [],
        "emergency": [],
        "national human right commission": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Rights and fundamental freedoms"):
            current_section = "fundamental right freedom"
        elif stripped_line.startswith("Application of Bill of Rights"):
            current_section = "application bill right"
        elif stripped_line.startswith("Implementation of rights and fundamental freedoms"):
            current_section = "implementation right"
        elif stripped_line.startswith("Enforcement of Bill of Rights"):
            current_section = "enforcement bill right"
        elif stripped_line.startswith("Authority of courts to uphold and enforce the Bill of Rights"):
            current_section = "authority court"
        elif stripped_line.startswith("Limitation of rights and fundamental freedoms"):
            current_section = "limitation right"
        elif stripped_line.startswith("Fundamental Rights and freedoms that may not be limited"):
            current_section = "fundamental right freedom limit"
        elif stripped_line.startswith("Right to life"):
            current_section = "life"
        elif stripped_line.startswith("Equality and freedom from discrimination"):
            current_section = "equality"
        elif stripped_line.startswith("Human dignity"):
            current_section = "dignity"
        elif stripped_line.startswith("Freedom and security of the person"):
            current_section = "security"
        elif stripped_line.startswith("Slavery, servitude and forced labour"):
            current_section = "slavery"
        elif stripped_line.startswith("Privacy"):
            current_section = "privacy"
        elif stripped_line.startswith("Freedom of conscience, religion, belief and opinion"):
            current_section = "conscience"
        elif stripped_line.startswith("Freedom of expression"):
            current_section = "expression"
        elif stripped_line.startswith("Freedom of the media"):
            current_section = "medium"
        elif stripped_line.startswith("Access to information"):
            current_section = "information"
        elif stripped_line.startswith("Freedom of association"):
            current_section = "association"
        elif stripped_line.startswith("Assembly, demonstration, picketing and petition"):
            current_section = "assembly"
        elif stripped_line.startswith("Political rights"):
            current_section = "political"
        elif stripped_line.startswith("Freedom of movement and residence"):
            current_section = "movement"
        elif stripped_line.startswith("Protection of right to property"):
            current_section = "property"
        elif stripped_line.startswith("Labour relations"):
            current_section = "work"
        elif stripped_line.startswith("Environment"):
            current_section = "environment"
        elif stripped_line.startswith("Economic and social rights"):
            current_section = "economic"
        elif stripped_line.startswith("Language and culture"):
            current_section = "language culture"
        elif stripped_line.startswith("Family"):
            current_section = "family"
        elif stripped_line.startswith("Consumer rights"):
            current_section = "consumer"
        elif stripped_line.startswith("Fair administrative action"):
            current_section = "fair administrative action"
        elif stripped_line.startswith("Access to justice"):
            current_section = "justice"
        elif stripped_line.startswith("Rights of arrested persons"):
            current_section = "arrest"
        elif stripped_line.startswith("Fair hearing"):
            current_section = "fair hearing"
        elif stripped_line.startswith("Rights of persons detained, held in custody or imprisoned"):
            current_section = "custody"
        elif stripped_line.startswith("Interpretation of this Part"):
            current_section = "interpret"
        elif stripped_line.startswith("Children"):
            current_section = "infant"
        elif stripped_line.startswith("Persons with disabilities"):
            current_section = "disable"
        elif stripped_line.startswith("Youth"):
            current_section = "youth"
        elif stripped_line.startswith("Minorities and marginalised groups"):
            current_section = "minority"
        elif stripped_line.startswith("Older members of society"):
            current_section = "old"
        elif stripped_line.startswith("State of emergency"):
            current_section = "emergency"
        elif stripped_line.startswith("Kenya National Human Rights and Equality Commission"):
            current_section = "national human right commission"
        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_4_sections = split_chapter_4(chapter_4_trimmed)
chapter_4_sections

{'fundamental right freedom': 'Rights and fundamental freedoms.\n19. (1) The Bill of Rights is an integral part of Kenya’s democratic\nstate and is the framework for social, economic and cultural policies.\n(2) The purpose of recognising and protecting human rights and\nfundamental freedoms is to preserve the dignity of individuals and\ncommunities and to promote social justice and the realisation of the\npotential of all human beings.\n(3) The rights and fundamental freedoms in the Bill of Rights—\n(a) belong to each individual and are not granted by the State;\n(b) do not exclude other rights and fundamental freedoms not in\nthe Bill of Rights, but recognised or conferred by law, except\nto the extent that they are inconsistent with this Chapter; and\n(c) are subject only to the limitations contemplated in this\nConstitution.',
 'application bill right': 'Application of Bill of Rights.\n20. (1) The Bill of Rights applies to all law and binds all State\norgans and all persons.\n20 Con

### Question Answering mechanism for chapter 4

In [21]:
# Define the sections variable with Chapter 1
sections = chapter_4_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "fundamental right freedom": "rights and fundamental freedoms",
    "application bill right": "application of Bill of Rights", 
    "implementation right": "implementation of rights and fundamental freedoms", 
    "enforcement bill right": "enforcement of Bill of Rights", 
    "authority court": "authority court",
    "limitation right": "limitation of rights and fundamental freedoms", 
    "fundamental freedom limit": "fundamental Rights and freedoms that may not be limited", 
    "life": "right to life", 
    "equality": "equality and freedom from discrimination", 
    "dignity": "human dignity", 
    "security": "freedom and security of the person", 
    "slavery": "slavery, servitude and forced labour", 
    "privacy": "privacy", 
    "conscience": "freedom of conscience, religion, belief and opinion", 
    "expression": "freedom of expression", 
    "medium": "freedom of the media", 
    "information": "access to information", 
    "association": "freedom of association", 
    "assembly": "assembly, demonstration, picketing and petition", 
    "political": "political rights", 
    "movement": "freedom of movement and residence", 
    "property": "protection of right to property", 
    "work": "labor relations", 
    "environment": "environment", 
    "economic": "economic and social rights", 
    "language culture": "language culture", 
    "family": "family", 
    "consumer": "consumer rights", 
    "fair administrative action": "fair administrative action", 
    "justice": "access to justice" ,
    "arrest": "rights of arrested persons", 
    "fair hearing": "fair hearing", 
    "custody": "rights of persons detained, held in custody or imprisoned", 
    "interpret": "interpretation of this Part", 
    "infant": "children", 
    "disable": "persons with disabilities", 
    "youth": "youth", 
    "minority": "minorities and marginalised groups", 
    "old": "older members of society", 
    "emergency": "state of emergency",
    "national human right commission": "kenya national human rights and equality Commission"
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    print(processed_query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What about fair hearing?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

fair hearing
Fair hearing.
50. (1) Every person has the right to have any dispute that can be
resolved by the application of law decided in a fair and public hearing
before a court or, if appropriate, another independent and impartial
tribunal or body.
(2) Every accused person has the right to a fair trial, which
includes the right—
(a) to be presumed innocent until the contrary is proved;
(b) to be informed of the charge, with sufficient detail to answer it;
(c) to have adequate time and facilities to prepare a defence;
(d) to a public trial before a court established under this
Constitution;
(e) to have the trial begin and conclude without unreasonable
delay;
(f) to be present when being tried, unless the conduct of the
accused person makes it impossible for the trial to proceed;
(g) to choose, and be represented by, an advocate, and to be
informed of this right promptly;
(h) to have an advocate assigned to the accused person by the
State and at State expense, if substantial injustic

# CHAPTER 4 NLU

In [22]:
# Define synonym mapping
synonyms = {
    "fundamental freedom": ["fundamental freedom", "right fundamental freedom"],
    "application bill right": ["application bill right"],
    "implementation right": ["implementation right ", "implementation freedom"],
    "authority court ": ["authority court"],
    "limitation right": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right",],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty thought", "religion freedom", "belief"],
    "expression": ["expression", "communication"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "travel", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "labor relation", "labour practice"],
    "environment": ["environment"],
    "economic": ["economic"],
    "language culture": ["language culture"],
    "family": ["family", "society foundation","parent"],
    "consumer": ["consumer", "consumer right" "client entitlement"],
    "administrative action": ["administrative action", "fair administrative action"],
    "justice": ["justice", "legal access", "access justice", "justice access"],
    "arrest": ["arrest", "right arrest"],
    "fair hear": ["fair hear", "hear"],
    "custody": ["custody","held custody", "imprisoned"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "toddler", "kid"],
    "disable": ["disable","disability", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "marginalise", "marginalize"],
    "old": ["old", "elder", "veteran"],
    "emergency": ["emergency","state emergency"],
    "national human right commission": ["national human right","human right equality commission","national human right equality commission" ]
}

# QA mapping
qa_mapping = {
    "fundamental": ("fundamental"),
    "application": ("application"),
    "implementation": ("implementation"),
    "authority court ": ("authority court"),
    "limitation": ("limitation"),
    "limit": ("limit"),
    "life": ("life"),
    "equality": ("equality"),
    "dignity": ("dignity"),
    "security": ("security"),
    "slavery": ("slavery"),
    "privacy": ("privacy"),
    "conscience": ("conscience"),
    "expression": ("expression"),
    "medium": ("medium"),
    "information": ("information"),
    "association": ("association"),
    "assembly": ("assembly"),
    "political": ("political"),
    "movement": ("movement"),
    "property": ("property"),
    "work": ("work"),
    "environment": ("environment"),
    "economic": ("economic"),
    "language": ("language"),
    "family": ("family"),
    "consumer": ("consumer"),
    "fair administrative action": ("fair administrative action"),
    "justice": ("justice"),
    "arrest": ("arrest"),
    "fair hearing": ("fair hearing"),
    "custody": ("custody"),
    "interpret": ("interpret"),
    "infant": ("infant"),
    "disable": ("disable"),
    "youth": ("youth"),
    "minority": ("minority"),
    "old": ("old"),
    "emergency": ("emergency"),
    "national human right commission": ("national human right commission")
}

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input


# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "fair administrative action"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)

Processed Query: fair administrative action
Corrected Query: fair administrative action
Checking key: fundamental, value: fundamental
Trying synonym: fundamental
Checking key: application, value: application
Trying synonym: application
Checking key: implementation, value: implementation
Trying synonym: implementation
Checking key: authority court , value: authority court
Trying synonym: authority court
Checking key: limitation, value: limitation
Trying synonym: limitation
Checking key: limit, value: limit
Trying synonym: limit
Trying synonym: absolute right
Trying synonym: inalienable freedom
Trying synonym: immutable right
Checking key: life, value: life
Trying synonym: life
Trying synonym: existence
Trying synonym: entitlement to life
Trying synonym: survive
Checking key: equality, value: equality
Trying synonym: equality
Trying synonym: equal treatment
Trying synonym: justice
Checking key: dignity, value: dignity
Trying synonym: dignity
Trying synonym: intrinsic respect
Trying synon

# Chapter 5

In [23]:
# extracting the whole of chapter 4 which is located in page 41 to 48 and spliting chapter 5 only
chapter_5 = extract_specific_pages(pdf_path, 40, 48)
chapter_5_trimmed = chapter_5.split("CHAPTER FIVE", 1)[1].split("CHAPTER SIX", 1)[0].strip()

In [24]:
def split_chapter_5(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "principle land": [],
        "classification land": [],
        "public land": [],
        "community land": [],
        "private land": [],
        "landhold non citizen": [],
        "regulation land use": [],
        "land commission": [],
        "land legislation": [],
        "obligation respect environment": [],
        "enforcement environmental right": [],
        "agreement relating natural resource": [],
        "legislation environment": []
        
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Principles of land policy"):
            current_section = "principle land"
        elif stripped_line.startswith("Classification of land"):
            current_section = "classification land"
        elif stripped_line.startswith("Public land"):
            current_section = "public land"
        elif stripped_line.startswith("Community land"):
            current_section = "community land"
        elif stripped_line.startswith("Private land"):
            current_section = "private land"
        elif stripped_line.startswith("Landholding by non-citizens"):
            current_section = "landhold non citizen"
        elif stripped_line.startswith("Regulation of land use and property"):
            current_section = "regulation land use"
        elif stripped_line.startswith("National Land Commission"):
            current_section = "land commission"
        elif stripped_line.startswith("Legislation on land"):
            current_section = "land legislation"
        elif stripped_line.startswith("Obligations in respect of the environment"):
            current_section = "obligation respect environment"
        elif stripped_line.startswith("Enforcement of environmental rights"):
            current_section = "enforcement environmental right"
        elif stripped_line.startswith("Agreements relating to natural resource"):
            current_section = "agreement relating natural resource"
        elif stripped_line.startswith("Legislation relating to the environment"):
            current_section = "legislation environment"
        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_5_sections = split_chapter_5(chapter_5_trimmed)
chapter_5_sections['agreement relating natural resource']

'Agreements relating to natural resource.\n71. (1) A transaction is subject to ratification by Parliament if it—\n(a) involves the grant of a right or concession by or on behalf of\nany person, including the national government, to another\nperson for the exploitation of any natural resource of Kenya;\nand\n48 Constitution of Kenya, 2010\n(b) is entered into on or after the effective date.\n(2) Parliament shall enact legislation providing for the classes of\ntransactions subject to ratification under clause (1).'

# Question Answering mechanism for chapter 5

In [25]:
sections = chapter_5_sections
##Define Q&A mapping for Chapter 5
qa_mapping = {
    "principle land": "principle land",
    "classification land": "classification land",
    "public land": "public land",
    "community land": "community land",
    "private land": "private land",
    "landhold non citizen": "landhold non citizen" ,
    "regulation land use": "regulation land use",
    "land commission": "land commission",
    "land legislation": "land legislation",
    "obligation respect environment": "obligation respect environment",
    "enforcement environmental right": "enforcement environmental right",
    "agreement relating natural resource": "agreement relating natural resource",
    "legislation environment": "legislation environment"
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]
        
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]
        
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "explain legislation on environment?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: explain legislation environment
key: legislation environment
Legislation relating to the environment.
72. Parliament shall enact legislation to give full effect to the
provisions of this Part.


# CHAPTER 5 NLU

In [26]:
# Define synonym mapping
synonyms = {
    "principle land": ["principle", "land management principle", "land policy guide"],
    "classification land": ["classification", "land category", "types of land"],
    "public land": ["public", "government land", "state property", "national land", "public landhold"],
    "community land": ["community", "ethnic land", "cultural landhold", "community land"],
    "private land": ["private", "individual landhold", "personal property", "freehold land", "private land"],
    "landhold non citizen" : ["landhold", "foreign lease", "alien land", "non-citizen land"],
    "regulation land use" : ["regulation", "land use policy", "property regulation", "land oversight"],
    "land commission" : ["commission", "land authority", "public land commission", "land policy agency"],
    "land legislation" : ["land", "land legislation", "property law", "land-use regulation", "land act"],
    "obligation respect environment" : ["obligation respect environment"],
    "enforcement environmental right" : ["enforcement environmental right"],
    "agreement relating natural resource" : ["agreement relating natural resource"],
    "legislation environment" : ["legislation environment"]
}

# QA mapping
qa_mapping = {
    "principle land": "principle land",
    "classification land": "classification land",
    "public land": "public land",
    "community land": "community land",
    "private land": "private land",
    "landhold non citizen" : "landhold non citizen",
    "regulation land use" : "regulation land use",
    "land commission" : "land commission",
    "land legislation" : "land legislation",
    "obligation respect environment" : "obligation respect environment",
    "enforcement environmental right" : "enforcement environmental right",
    "agreement relating natural resource" : "agreement relating natural resource",
    "legislation environment" : "legislation environment"
    
    
}

sections = chapter_5_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction    
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What is principle?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)

Processed Query: principle
Corrected Query: principle
Checking key: principle land, value: principle land
Trying synonym: principle
Match found with synonym: principle
Principles of land policy.
60. (1) Land in Kenya shall be held, used and managed in a
manner that is equitable, efficient, productive and sustainable, and in
accordance with the following principles—
(a) equitable access to land;
(b) security of land rights;
(c) sustainable and productive management of land resources;
(d) transparent and cost effective administration of land;
(e) sound conservation and protection of ecologically sensitive
areas;
42 Constitution of Kenya, 2010
(f) elimination of gender discrimination in law, customs and
practices related to land and property in land; and
(g) encouragement of communities to settle land disputes
through recognised local community initiatives consistent with
this Constitution.
(2) These principles shall be implemented through a national land
policy developed and reviewed reg

# Chapter 6

In [27]:
# extracting the whole of chapter 6 which is located in page 48 to 51 trimming only chapter  6
chapter_6 = extract_specific_pages(pdf_path, 47, 51)
chapter_6_trimmed = chapter_6.split("CHAPTER SIX", 1)[1].split("CHAPTER SEVEN", 1)[0].strip()

In [28]:
def split_chapter_6(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "responsibility leadership": [],
        "oath office": [],
        "conduct state": [],
        "financial probity": [],
        "restriction activity": [],
        "citizenship leadership": [],
        "establish ethic anti corruption": [],
        "legislation leadership": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Responsibilities of leadership"):
            current_section = "responsibility leadership"
        elif stripped_line.startswith("Oath of office of State officers"):
            current_section = "oath office"
        elif stripped_line.startswith("Conduct of State officers"):
            current_section = "conduct state"
        elif stripped_line.startswith("Financial probity of State officers"):
            current_section = "financial probity"
        elif stripped_line.startswith("Restriction on activities of State officers"):
            current_section = "restriction activity"
        elif stripped_line.startswith("Citizenship and leadership"):
            current_section = "citizenship leadership"
        elif stripped_line.startswith("Legislation to establish the "):
            current_section = "establish ethic anti corruption"
        elif stripped_line.startswith("Legislation on leadership"):
            current_section = "legislation leadership"
        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_6_sections = split_chapter_6(chapter_6_trimmed)

print("Financial probity Section:\n", chapter_6_sections['financial probity'])

Financial probity Section:
 Financial probity of State officers.
76. (1) A gift or donation to a State officer on a public or official
occasion is a gift or donation to the Republic and shall be delivered to
the State unless exempted under an Act of Parliament.
(2) A State officer shall not—
(a) maintain a bank account outside Kenya except in accordance
with an Act of Parliament; or
(b) seek or accept a personal loan or benefit in circumstances
that compromise the integrity of the State officer.


In [29]:
chapter_6_sections

{'responsibility leadership': 'Responsibilities of leadership.\n73. (1) Authority assigned to a State officer—\n(a) is a public trust to be exercised in a manner that—\n(i) is consistent with the purposes and objects of this\nConstitution;\n(ii) demonstrates respect for the people;\n(iii) brings honour to the nation and dignity to the office; and\n(iv) promotes public confidence in the integrity of the office;\nand\n(b) vests in the State officer the responsibility to serve the people,\nrather than the power to rule them.\n(2) The guiding principles of leadership and integrity include—\n(a) selection on the basis of personal integrity, competence and\nsuitability, or election in free and fair elections;\n(b) objectivity and impartiality in decision making, and in ensuring\nthat decisions are not influenced by nepotism, favouritism,\nother improper motives or corrupt practices;\n(c) selfless service based solely on the public interest,\ndemonstrated by—\n(i) honesty in the execution of 

### Question Answering mechanism for chapter 6

In [30]:
sections = chapter_6_sections
##Define Q&A mapping for Chapter 6
qa_mapping = {
        "responsibility leadership": "responsibility leadership",
        "oath office": "oath office",
        "conduct state": "conduct state",
        "financial probity": "financial probity",
        "restriction activity": "restriction activitiy",
        "citizenship leadership": "citizenship leadership",
        "establish ethic anti corruption": "establish ethic anti corruption commission",
        "legislation leadership": "legislation leadership"
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "what does costitution state about oath of office?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Oath of office of State officers.
74. Before assuming a State office, acting in a State office, or
performing any functions of a State office, a person shall take and
subscribe the oath or affirmation of office, in the manner and form
prescribed by the Third Schedule or under an Act of Parliament.


# chapter 6 NLU

In [31]:
# Define synonym mapping
synonyms = {
    "responsibility leadership": ["responsibility leadership"],
    "oath office": ["oath office", "state office affirmation"],
    "conduct state": ["conduct state", "behaviour state"],
    "financial probity": ["financial probity"],
    "restriction activity": ["restriction activity"],
    "citizenship leadership": ["citizenship leadership"],
    "establish ethic anti corruption": ["establish ethic anti corruption"],
    "legislation leadership": ["legislation leadership"]
}

# QA mapping
qa_mapping = {
    "responsibility leadership": "responsibility leadership",
    "oath office": "oath office",
    "conduct state": "conduct state",
    "financial probity": "financial probity",
    "restriction activity": "restriction activity",
    "citizenship leadership": "citizenship leadership",
    "establish ethic anti corruption": "establish ethic anti corruption",
    "legislation leadership": "legislation"
}

sections = chapter_6_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What about responsiblity and leadership?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)

Processed Query: responsiblity leadership
Corrected Query: responsibility leadership
Checking key: responsibility leadership, value: responsibility leadership
Trying synonym: responsibility leadership
Match found with synonym: responsibility leadership
Responsibilities of leadership.
73. (1) Authority assigned to a State officer—
(a) is a public trust to be exercised in a manner that—
(i) is consistent with the purposes and objects of this
Constitution;
(ii) demonstrates respect for the people;
(iii) brings honour to the nation and dignity to the office; and
(iv) promotes public confidence in the integrity of the office;
and
(b) vests in the State officer the responsibility to serve the people,
rather than the power to rule them.
(2) The guiding principles of leadership and integrity include—
(a) selection on the basis of personal integrity, competence and
suitability, or election in free and fair elections;
(b) objectivity and impartiality in decision making, and in ensuring
that deci

In [32]:
# Define the synonyms explicitly so that we can easily reference them later
synonyms = {
    "supremacy": ["supremacy", "authority", "ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration": ["declaration", "proclamation", "statement", "announcement", "affirmation"],
    "territory": ["territory", "land", "region", "area", "jurisdiction", "bounds"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "value": ["value", "principle", "ethic", "core value", "standard", "national ideal"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement": ["entitlement", "right", "eligibility", "entitlement rights", "benefits", "privileges"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation": ["legislation", "laws", "legal framework", "statutes", "enactment"],
    "fundamental": ["fundamental", "fundamental freedoms", "essential"],
    "application": ["application", "exercise", "freedom enforcement"],
    "implementation": ["implementation", "right support", "execution"],
    "authority": ["authority", "judicial power", "court jurisdiction", "judicial review"],
    "limitation": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right"],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty of thought", "religion freedom", "belief"],
    "expression": ["expression", "communication", "speak"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right to information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "migration", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "benefit", "employ"],
    "environment": ["environment", "conserve", "right to a healthy environment"],
    "economic": ["economic", "standard of life", "social welfare"],
    "language": ["language", "cultural right", "linguistic freedom", "cultural expression"],
    "family": ["family", "society foundation", "parent right"],
    "consumer": ["consumer", "client entitlement", "buyer safeguard"],
    "administrative": ["administrative", "fair administration rule", "legal administration measure"],
    "justice": ["justice", "legal access", "justice available"],
    "arrest": ["arrest", "capture", "seizure", "apprehension"],
    "fair hearing": ["fair hearing", "prosecute", "equity hearing"],
    "custody": ["custody", "confinement", "imprisonment", "prison"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "kid", "toddler"],
    "disable": ["disable", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "diversity right", "affirmative right"],
    "old": ["old", "elder", "vintage"],
    "emergency": ["emergency", "danger", "crisis", "disaster"],
    "national": ["national", "human right", "equaity Body", "right authority"],
    "principle": ["principle", "land management principle", "land policy guide"],
    "classification": ["classification", "land category", "types of land"],
    "public": ["public", "government land", "state property", "national land", "public landhold"],
    "community": ["community", "ethnic land", "cultural landhold", "community land"],
    "private": ["private", "individual landhold", "personal property", "freehold land", "private land"],
    "landhold" : ["landhold", "foreign lease", "alien land", "non-citizen land"],
    "regulation" : ["regulation", "land use policy", "property regulation", "land oversight"],
    "commission" : ["commission", "land authority", "public land commission", "land policy agency"],
    "land" : ["land", "land legislation", "property law", "land-use regulation", "land act"],
    "obligation" : ["obligation", "biodiversity duty", "conservation mandate"],
    "environmental" : ["environmental", "ecological redress", "sustainability enforcement"],
    "resource" : ["resource", "resource use agreement", "environmental concession"],
    "legislation" : ["legislation", "environmental law", "green policy enactment"],
    "responsibility leadership": ["responsibility leadership"],
    "oath office": ["oath office", "state office affirmation"],
    "conduct state": ["conduct state", "behaviour state"],
    "financial probity": ["financial probity"],
    "restriction activity": ["restriction activity"],
    "citizenship leadership": ["citizenship leadership"],
    "establish ethic anti corruption": ["establish ethic anti corruption"],
    "legislation leadership": ["legislation leadership"]

}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration": "declaration",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "value": "value",
    "culture": "culture",
    "entitlement": "entitlement",
    "retention": "retention",  
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation": "legislation",
    "fundamental": "fundamental",
    "application": "application",
    "implementation": "implementation",
    "authority": "authority",
    "limitation": "limitation",
    "limit": "limit",
    "life": "life",
    "equality": "equality",
    "dignity": "dignity",
    "security": "security",
    "slavery": "slavery",
    "privacy": "privacy",
    "conscience": "conscience",
    "expression": "expression",
    "medium": "medium",
    "information": "information",
    "association": "association",
    "assembly": "assembly",
    "political": "political",
    "movement": "movement",
    "property": "property",
    "work": "work",
    "environment": "environment",
    "economic": "economic",
    "language": "language",
    "family": "family",
    "consumer": "consumer",
    "administrative": "administrative",
    "justice": "justice",
    "arrest": "arrest",
    "fair hearing": "fair hearing",
    "custody": "custody",
    "interpret": "interpret",
    "infant": "infant",
    "disable": "disable",
    "youth": "youth",
    "minority": "minority",
    "old": "old",
    "emergency": "emergency",
    "national": "national",
    "principle": "principle",
    "classification": "classification",
    "public": "public",
    "community": "community",
    "private": "private",
    "landhold": "landhold",
    "regulation": "regulation",
    "commission": "commission",
    "land": "land",
    "obligation": "obligation",
    "environmental": "environmental",
    "resource": "resource",
    "legislation": "legislation",
    "responsibility leadership": "responsibility leadership",
    "oath office": "oath office",
    "conduct state": "conduct state",
    "financial probity": "financial probity",
    "restriction activity": "restriction activity",
    "citizenship leadership": "citizenship leadership",
    "establish ethic anti corruption": "establish ethic anti corruption",
    "legislation leadership": "legislation"
    
}

combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections, **chapter_4_sections, **chapter_5_sections, **chapter_6_sections }
# Print the keys of the dictionary
combined_sections.keys()

dict_keys(['sovereignty', 'supremacy', 'defence', 'declaration republic', 'territory', 'devolution', 'language', 'religion', 'symbol', 'day', 'national value principle governance', 'culture', 'entitlement citizen', 'retention', 'birth', 'registration', 'dual', 'revocation', 'legislation citizen', 'fundamental right freedom', 'application bill right', 'implementation right', 'enforcement bill right', 'authority court', 'limitation right', 'fundamental right freedom limit', 'life', 'equality', 'dignity', 'security', 'slavery', 'privacy', 'conscience', 'expression', 'medium', 'information', 'association', 'assembly', 'political', 'movement', 'property', 'work', 'environment', 'economic', 'language culture', 'family', 'consumer', 'fair administrative action', 'justice', 'arrest', 'fair hearing', 'custody', 'interpret', 'infant', 'disable', 'youth', 'minority', 'old', 'emergency', 'national human right commission', 'principle land', 'classification land', 'public land', 'community land', 'p

# CHAPTER 7-9 JASMINE!

In [33]:
import re

In [34]:
# Extracting chapter 7 content
chapter_7_raw_text = extract_specific_pages(pdf_path, 50,58 )
chapter_7_parts = re.split(r'CHAPTER SEVEN', chapter_7_raw_text, flags=re.IGNORECASE)
chapter_7_parts = re.split(r'CHAPTER SEVEN', chapter_7_raw_text, flags=re.IGNORECASE)
if len(chapter_7_parts) > 1:
    chapter_7 = chapter_7_parts[1]
chapter_7_parts
chapter_8_parts = re.split(r'CHAPTER EIGHT', chapter_7, flags=re.IGNORECASE)
chapter_7 = chapter_8_parts[0]

In [35]:
def split_chapter(chapter_text):
  #split at key headings and split extra white spaces

  sections = {
      "general principle electoral system":[],
      "legislation election":[],
      "registration voter":[],
      "candidate election":[],
      "eligibility stand independent candidate":[],
      "vote":[],
      "electoral dispute":[],
      "independent electoral boundary commission":[],
      "delimitation electoral unit":[],
      "allocation party seat":[],
      "requirement political party":[],
      "legislation political party":[],
  }

  ## Split by new lines to process line by line
  lines = chapter_text.splitlines()
  current_section = None

  for line in lines:
    line = line.strip()
    # Use line instead of stripped_line
    if line.startswith("General principles for the electoral system"):
      current_section = "general principle electoral system"
    elif line.startswith("Legislation on elections"):
        current_section = "legislation election"
    elif line.startswith("Registration as a voter"):
        current_section = "registration voter"
    elif line.startswith("Candidates for election and political parties"):
        current_section = "candidate election"
    elif line.startswith("Eligibility to stand as an independent candidate"):
        current_section = "eligibility stand independent candidate"
    elif line.startswith("Voting"):
        current_section = "vote"
    elif line.startswith("Electoral disputes"):
        current_section = "electoral dispute"
    elif line.startswith("Independent Electoral and Boundaries Commission"):
        current_section = "independent electoral boundary commission"
    elif line.startswith("Delimitation of electoral units"):
        current_section = "delimitation electoral unit"
    elif line.startswith("Allocation of party list seats"):
        current_section = "allocation party seat"
    elif line.startswith("Basic requirements for political parties"):
        current_section = "requirement political party"
    elif line.startswith("Legislation on political parties"):
        current_section = "legislation political party"
        ## Append line to the current section if it's set
    if current_section:
        sections[current_section].append(line) # Use line instead of stripped_line
  ## Join each section into single string
  for key in sections:
    sections[key] = "\n".join(sections[key])
  return sections

In [36]:
# Split the chapter into sections
chapter_7_sections = split_chapter(chapter_7)
chapter_7_sections

{'general principle electoral system': 'General principles for the electoral system.\n81. The electoral system shall comply with the following\nprinciples—\n(a) freedom of citizens to exercise their political rights under\nArticle 38;\n(b) not more than two-thirds of the members of elective public\nbodies shall be of the same gender;\n(c) fair representation of persons with disabilities;\n(d) universal suffrage based on the aspiration for fair\nrepresentation and equality of vote; and\n(e) free and fair elections, which are—\n(i) by secret ballot;\n(ii) free from violence, intimidation, improper influence or\ncorruption;\n(iii) conducted by an independent body;\n(iv) transparent; and\n(v) administered in an impartial, neutral, efficient, accurate\nand accountable manner.',
 'legislation election': 'Legislation on elections.\n82. (1) Parliament shall enact legislation to provide for—\n(a) the delimitation by the Independent Electoral and Boundaries\nCommission of electoral units for ele

# Chapter 7 Question Aswering Mechanism

In [37]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 7
sections = chapter_7_sections  # Ensure this variable is defined with the correct content

# Define synonyms
synonyms = {
    "general principle electoral system": ["general principle electoral system"],
    "legislation election": ["legislation election"],
    "registration voter": ["registration voter"],
    "candidate election": ["candidate election"],
    "eligibility stand independent candidate": ["eligibility stand independent candidate"],
    "vote": ["vote"],
    "electoral dispute": ["electoral dispute"],
    "independent electoral boundary commission": ["independent electoral boundary commission"],
    "delimitation electoral unit": ["delimitation electoral unit"],
    "allocation party seat": ["allocation party seat"],
    "requirement political party": ["requirement political party"],
    "legislation political party": ["legislation political party"]
}

# Define the Q&A mapping
qa_mapping = {
      "general principle electoral system": "general principle electoral system",
      "legislation election": "legislation election",
      "registration voter": "registration voter",
      "candidate election": "candidate election",
      "eligibility stand independent candidate": "eligibility stand independent candidate",
      "vote": "vote",
      "electoral dispute": "electoral dispute",
      "independent electoral boundary commission": "independent electoral boundary commission",
      "delimitation electoral unit": "delimitation electoral unit",
      "allocation party seat": "allocation party seat",
      "requirement political party": "requirement political party",
      "legislation political party": "legislation political party"
}


# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the query
    preprocessed_query = preprocess_query(query)

    # Search for a key in QA_mapping that matches the preprocessed query
    for key in qa_mapping:  # Iterate through key-value pairs of QA_mapping
        if key in preprocessed_query:  # Check for match
            # Return the relevant section in the text
            return sections[key]
    
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What is the delimitation of electoral units?"
answer = answer_question_nlp(user_query, sections, synonyms)
print(answer)
print('\n')
user_query = "What is legislation of political parties?"
answer = answer_question_nlp(user_query, sections, qa_mapping)  
print(answer)

Delimitation of electoral units.
89. (1) There shall be two hundred and ninety constituencies for
the purposes of the election of the members of the National Assembly
provided for in Article 97 (1) (a).
(2) The Independent Electoral and Boundaries Commission shall
review the names and boundaries of constituencies at intervals of not
less than eight years, and not more than twelve years, but any review
shall be completed at least twelve months before a general election of
members of Parliament.
(3) The Commission shall review the number, names and
boundaries of wards periodically.
(4) If a general election is to be held within twelve months after
the completion of a review by the Commission, the new boundaries
shall not take effect for purposes of that election.
(5) The boundaries of each constituency shall be such that the
number of inhabitants in the constituency is, as nearly as possible,
equal to the population quota, but the number of inhabitants of a
constituency may be greater or

# CHAPTER 7 NLU

In [38]:
# Define synonym mapping
synonyms = {
      "general principle electoral system": ["general principle electoral system"],
      "legislation election": ["legislation election"],
      "registration voter": ["registration voter"],
      "candidate election": ["candidate election"],
      "eligibility stand independent candidate": ["eligibility stand independent candidate"],
      "vote": ["vote"],
      "electoral dispute": ["electoral dispute"],
      "independent electoral boundary commission": ["independent electoral boundary commission"],
      "delimitation electoral unit": ["delimitation electoral unit"],
      "allocation party seat": ["allocation party seat"],
      "requirement political party": ["requirement political party"],
      "legislation political party": ["legislation political party"]
}

# QA mapping
qa_mapping = {
    "general principle electoral system": "general principle electoral system",
      "legislation election": "legislation election",
      "registration voter": "registration voter",
      "candidate election": "candidate election",
      "eligibility stand independent candidate": "eligibility stand independent candidate",
      "voting": "voting",
      "electoral dispute": "electoral dispute",
      "independent electoral boundary commission": "independent electoral boundary commission",
      "delimitation electoral unit": "delimitation electoral unit",
      "allocation party seat": "allocation party seat",
      "requirement political party": "requirement political party",
      "legislation political party": "legislation political party"
}

sections = chapter_7_sections


# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."


# Example 2
user_query2 = "What does the constitution say about the general principles of the electoral system?"
answer = answer_question_nlp(user_query2, sections, qa_mapping, synonyms)
print(answer)


Processed Query: constitution general principle electoral system
Corrected Query: constitution general principle electoral system
Checking key: general principle electoral system, value: general principle electoral system
Trying synonym: general principle electoral system
Match found with synonym: general principle electoral system
General principles for the electoral system.
81. The electoral system shall comply with the following
principles—
(a) freedom of citizens to exercise their political rights under
Article 38;
(b) not more than two-thirds of the members of elective public
bodies shall be of the same gender;
(c) fair representation of persons with disabilities;
(d) universal suffrage based on the aspiration for fair
representation and equality of vote; and
(e) free and fair elections, which are—
(i) by secret ballot;
(ii) free from violence, intimidation, improper influence or
corruption;
(iii) conducted by an independent body;
(iv) transparent; and
(v) administered in an impar

# CHAPTER 8

In [39]:
# Extracting chapter 8 content
chapter_8_raw_text = extract_specific_pages(pdf_path, 57,75 )

chapter_8_parts = re.split(r'CHAPTER EIGHT', chapter_8_raw_text, flags=re.IGNORECASE)
if len(chapter_8_parts) > 1:
    chapter_8 = chapter_8_parts[1]

chapter_9_parts = re.split(r'CHAPTER NINE', chapter_8, flags=re.IGNORECASE)
chapter_8 = chapter_9_parts[0]

In [40]:
def split_chapter(chapter_text):
    # Split at key headings and split extra white spaces
    sections = {
        "establishment parliament": [],
        "role parliament": [],
        "role national assembly": [],
        "role senate": [],
        "membership national assembly": [],
        "membership senate": [],
        "qualification member parliament": [],
        "promotion representation marginalised group": [],
        "election member parliament": [],
        "term parliament": [],
        "vacation office": [],
        "right recall": [],
        "question membership": [],
        "speaker parliament": [],
        "presiding parliament": [],
        "party leader": [],
        "exercise legislative power": [],
        "bill county government": [],
        "special bill county government": [],
        "ordinary bill county government": [],
        "mediation committee": [],
        "money bill": [],
        "presidential assent": [],
        "come force law": [],
        "power privilege immunity": [],
        "public access participation": [],
        "right petition parliament": [],
        "official language parliament": [],
        "quorum": [],
        "vote parliament": [],
        "decision senate": [],
        "committee standing order": [],
        "power call evidence": [],
        "location sitting parliament": [],
        "parliamentary service commission": [],
        "clerk staff parliament": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()
    current_section = None

    for line in lines:
        line = line.strip()

        if line.startswith("Establishment of Parliament"):
            current_section = "establishment parliament"
        elif line.startswith("Role of Parliament"):
            current_section = "role parliament"
        elif line.startswith("Role of the National Assembly"):
            current_section = "role national assembly"
        elif line.startswith("Role of the Senate"):
            current_section = "role senate"
        elif line.startswith("Membership of the National Assembly"):
            current_section = "membership national assembly"
        elif line.startswith("Membership of the Senate"):
            current_section = "membership senate"
        elif line.startswith("Qualifications and disqualifications for election as member"):
            current_section = "qualification member parliament"
        elif line.startswith("Promotion of representation of marginalised groups"):
            current_section = "promotion representation marginalised group"
        elif line.startswith("Election of members of Parliament"):
            current_section = "election member parliament"
        elif line.startswith("Term of Parliament"):
            current_section = "term parliament"
        elif line.startswith("Vacation of office of member of Parliament"):
            current_section = "vacation office"
        elif line.startswith("Right of recall"):
            current_section = "right recall"
        elif line.startswith("Determination of questions of membership"):
            current_section = "question membership"
        elif line.startswith("Speakers and Deputy Speakers of Parliament"):
            current_section = "speaker parliament"
        elif line.startswith("Presiding in Parliament"):
            current_section = "presiding parliament"
        elif line.startswith("Party leaders"):
            current_section = "party leader"
        elif line.startswith("Exercise of legislative powers"):
            current_section = "exercise legislative power"
        elif line.startswith("Bills concerning county government"):
            current_section = "bill county government"
        elif line.startswith("Special Bills concerning county governments"):
            current_section = "special bill county government"
        elif line.startswith("Ordinary Bills concerning county governments"):
            current_section = "ordinary bill county government"
        elif line.startswith("Mediation committees"):
            current_section = "mediation committee"
        elif line.startswith("Money Bills"):
            current_section = "money bill"
        elif line.startswith("Presidential assent and referral"):
            current_section = "presidential assent"
        elif line.startswith("Coming into force of laws"):
            current_section = "come force law"
        elif line.startswith("Powers, privileges and immunities"):
            current_section = "power privilege immunity"
        elif line.startswith("Public access and participation"):
            current_section = "public access participation"
        elif line.startswith("Right to petition Parliament"):
            current_section = "right petition parliament"
        elif line.startswith("Official languages of Parliament"):
            current_section = "official language parliament"
        elif line.startswith("Quorum"):
            current_section = "quorum"
        elif line.startswith("Voting in Parliament"):
            current_section = "vote parliament"
        elif line.startswith("Decisions of Senate"):
            current_section = "decision senate"
        elif line.startswith("Committees and Standing Orders"):
            current_section = "committee standing order"
        elif line.startswith("Power to call for evidence"):
            current_section = "power call evidence"
        elif line.startswith("Location of sittings of Parliament"):
            current_section = "location sitting parliament"
        elif line.startswith("Parliamentary Service Commission"):
            current_section = "parliamentary service commission"
        elif line.startswith("Clerks and staff of Parliament"):
            current_section = "clerk staff parliament"

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections


In [41]:
# Split the chapter into sections
chapter_8_sections = split_chapter(chapter_8)
chapter_8_sections.keys()

dict_keys(['establishment parliament', 'role parliament', 'role national assembly', 'role senate', 'membership national assembly', 'membership senate', 'qualification member parliament', 'promotion representation marginalised group', 'election member parliament', 'term parliament', 'vacation office', 'right recall', 'question membership', 'speaker parliament', 'presiding parliament', 'party leader', 'exercise legislative power', 'bill county government', 'special bill county government', 'ordinary bill county government', 'mediation committee', 'money bill', 'presidential assent', 'come force law', 'power privilege immunity', 'public access participation', 'right petition parliament', 'official language parliament', 'quorum', 'vote parliament', 'decision senate', 'committee standing order', 'power call evidence', 'location sitting parliament', 'parliamentary service commission', 'clerk staff parliament'])

# Chapter 8 Question Answering mechanism

In [42]:
qa_mapping = {
    "establishment parliament": "establishment parliament",
    "role parliament": "role parliament",
    "role national assembly": "role national assembly",
    "role senate": "role senate",
    "membership national assembly": "membership national assembly",
    "membership senate": "membership senate",
    "qualification member parliament": "qualification member parliament",
    "promotion representation marginalised group": "promotion representation marginalised group",
    "election member parliament": "election member parliament",
    "term parliament": "term parliament",
    "vacation office": "vacation office",
    "right recall": "right recall",
    "question membership": "question membership",
    "speaker parliament": "speaker parliament",
    "presiding parliament": "presiding parliament",
    "party leader": "party leader",
    "exercise legislative power": "exercise legislative power",
    "bill county government": "bill county government",
    "special bill county government": "special bill county government",
    "ordinary bill county government": "ordinary bill county government",
    "mediation committee": "mediation committee",
    "money bill": "money bill",
    "presidential assent": "presidential assent",
    "come force law": "come force law",
    "power privilege immunity": "power privilege immunity",
    "public access participation": "public access participation",
    "right petition parliament": "right petition parliament",
    "official language parliament": "official language parliament",
    "quorum": "quorum",
    "vote parliament": "vote parliament",
    "decision senate": "decision senate",
    "committee standing order": "committee standing order",
    "power call evidence": "power call evidence",
    "location sitting parliament": "location sitting parliament",
    "parliamentary service commission": "parliamentary service commission",
    "clerk staff parliament": "clerk staff parliament"
}

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 2
sections = chapter_8_sections

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")

    # Priority-based key selection
    prioritized_keys = []
    
    if "special" in processed_query:
        prioritized_keys.append("special bill county government")
    elif "ordinary" in processed_query:
        prioritized_keys.append("ordinary bill county government")
    elif "vote" in processed_query and "parliament" in processed_query:
        prioritized_keys.append("vote parliament")
    elif "defence" in processed_query and "forces" in processed_query:
        prioritized_keys.append("defence_forces")
        
    # Now iterate with priority in mind
    for key in prioritized_keys:
        # Check if the key is a prioritized key and appears in the processed query
        if key in sections:
            return sections[key]
        
     # Fallback if no specific priority match is found, but there's a general match
    for key in qa_mapping:
        if key in qa_mapping:
            if key in processed_query:
                # Debug line
                print(f"key: {key}")
                    # Return the relevant section in the text
                return sections[key]
        
        
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "voting in parliament?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)


Processed query: vote parliament
Voting in Parliament.
122. (1) Except as otherwise provided in this Constitution, any
question proposed for decision in either House of Parliament shall be
determined by a majority of the members in that House, present and
voting.
(2) On a question proposed for decision in either House—
(a) the Speaker has no vote; and
(b) in the case of a tie, the question is lost.
(3) A member shall not vote on any question in which the
member has a pecuniary interest.
(4) In reckoning the number of members of a House of
Parliament for any purpose of voting in that House, the Speaker of that
House shall not be counted as a member.


# CHAPTER 8 NLU

In [43]:
# Define synonym mapping
synonyms = {
    "establishment parliament": ["establishment parliament"],
    "role parliament": ["role parliament"],
    "role national assembly": ["role national assembly"],
    "role senate": ["role senate"],
    "membership national assembly": ["membership national assembly"],
    "membership senate": ["membership senate"],
    "qualification member parliament": ["qualification member parliament"],
    "promotion representation marginalised group": ["promotion representation marginalised group"],
    "election member parliament": ["election member parliament"],
    "term parliament": ["term parliament"],
    "vacation office": ["vacation office"],
    "right recall": ["right recall"],
    "question membership": ["question membership"],
    "speaker parliament": ["speaker parliament"],
    "presiding parliament": ["presiding parliament"],
    "party leader": ["party leader"],
    "exercise legislative power": ["exercise legislative power"],
    "bill county government": ["bill county government"],
    "special bill county government": ["special bill county government"],
    "ordinary bill county government": ["ordinary bill county government"],
    "mediation committee": ["mediation committee"],
    "money bill": ["money bill"],
    "presidential assent": ["presidential assent"],
    "come force law": ["come force law"],
    "power privilege immunity": ["power privilege immunity"],
    "public access participation": ["public access participation"],
    "right petition parliament": ["right petition parliament"],
    "official language parliament": ["official language parliament"],
    "quorum": ["quorum"],
    "vote parliament": ["vote parliament"],
    "decision senate": ["decision senate"],
    "committee standing order": ["committee standing order"],
    "power call evidence": ["power call evidence"],
    "location sitting parliament": ["location sitting parliament"],
    "parliamentary service commission": ["parliamentary service commission"],
    "clerk staff parliament": ["clerk staff parliament"]
}


# QA mapping
qa_mapping = {
    "establishment parliament": "establishment parliament",
    "role parliament": "role parliament",
    "role national assembly": "role national assembly",
    "role senate": "role senate",
    "membership national assembly": "membership national assembly",
    "membership senate": "membership senate",
    "qualification member parliament": "qualification member parliament",
    "promotion representation marginalised group": "promotion representation marginalised group",
    "election member parliament": "election member parliament",
    "term parliament": "term parliament",
    "vacation office": "vacation office",
    "right recall": "right recall",
    "question membership": "question membership",
    "speaker parliament": "speaker parliament",
    "presiding parliament": "presiding parliament",
    "party leader": "party leader",
    "exercise legislative power": "exercise legislative power",
    "bill county government": "bill county government",
    "special bill county government": "special bill county government",
    "ordinary bill county government": "ordinary bill county government",
    "mediation committee": "mediation committee",
    "money bill": "money bill",
    "presidential assent": "presidential assent",
    "come force law": "come force law",
    "power privilege immunity": "power privilege immunity",
    "public access participation": "public access participation",
    "right petition parliament": "right petition parliament",
    "official language parliament": "official language parliament",
    "quorum": "quorum",
    "vote parliament": "vote parliament",
    "decision senate": "decision senate",
    "committee standing order": "committee standing order",
    "power call evidence": "power call evidence",
    "location sitting parliament": "location sitting parliament",
    "parliamentary service commission": "parliamentary service commission",
    "clerk staff parliament": "clerk staff parliament"
}


sections = chapter_8_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What about coming into force of laws?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)

Processed Query: come force law
Corrected Query: come force law
Checking key: establishment parliament, value: establishment parliament
Trying synonym: establishment parliament
Checking key: role parliament, value: role parliament
Trying synonym: role parliament
Checking key: role national assembly, value: role national assembly
Trying synonym: role national assembly
Checking key: role senate, value: role senate
Trying synonym: role senate
Checking key: membership national assembly, value: membership national assembly
Trying synonym: membership national assembly
Checking key: membership senate, value: membership senate
Trying synonym: membership senate
Checking key: qualification member parliament, value: qualification member parliament
Trying synonym: qualification member parliament
Checking key: promotion representation marginalised group, value: promotion representation marginalised group
Trying synonym: promotion representation marginalised group
Checking key: election member parli

# CHAPTER 9

In [44]:
# Extracting chapter 9 content
chapter_9_raw_text = extract_specific_pages(pdf_path, 74,95 )
chapter_9_parts = re.split(r'CHAPTER NINE', chapter_9_raw_text, flags=re.IGNORECASE)

chapter_9_parts = re.split(r'CHAPTER NINE', chapter_9_raw_text, flags=re.IGNORECASE)
if len(chapter_9_parts) > 1:
    chapter_9 = chapter_9_parts[1]

chapter_10_parts = re.split(r'CHAPTER TEN', chapter_9, flags=re.IGNORECASE)
chapter_9 = chapter_10_parts[0]

In [45]:
def split_chapter(chapter_text):
    # Split at key headings and split extra white spaces
    sections = {
    "principle executive authority" : [],
    "national executive" : [],
    "authority president": [],
    "function president": [],
    "power mercy": [],
    "presidential powers temporary incumbency" : [],
    "decision president" : [],
    "election president" : [],
    "qualification disqualification election president" : [],
    "procedure presidential election" : [],
    "death assume office" : [],
    "validity presidential election" : [],
    "assumption office president" : [],
    "term office president" : [],
    "protection legal proceeding" : [],
    "removal president incapacity" : [],
    "removal president impeachment" : [],
    "vacancy office president" : [],
    "function deputy president" : [],
    "election deputy president" : [],
    "vacancy office deputy president" : [],
    "removal deputy president" : [],
    "benefit president" : [],
    "cabinet": [],
    "decision cabinet" : [],
    "secretary cabinet" : [],
    "principal secretary" : [], 
    "attorney general": [],
    "director public prosecution": [],
    "removal director public prosecution": []
        
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()
    current_section = None

    for line in lines:
        line = line.strip()
        
        # Check for section headers
        if line.startswith("Principles of executive authority"):
            current_section ='principle executive authority'
        if line.startswith("The National Executive"):
            current_section = "national executive"
        elif line.startswith("Authority of the President"):
            current_section = "authority president"
        elif line.startswith("Functions of the President"):
            current_section = "function president"
        elif line.startswith("Power of mercy"):
            current_section = "power mercy"
        elif line.startswith("Exercise of presidential powers"):
            current_section = "presidential powers temporary incumbency"
        elif line.startswith("Decisions of the President"):
            current_section = "decision president"
        elif line.startswith("Election of the President"):
            current_section = "election president"
        elif line.startswith("Qualifications and disqualifications for election"):
            current_section = "qualification disqualification election president"
        elif line.startswith("Elections of the president"):
            current_section = "election president"
        elif line.startswith("Procedure at presidential election"):
            current_section = "procedure presidential election"
        elif line.startswith("Death before assuming office"):
            current_section = "death assume office"
        elif line.startswith("Questions as to validity of presidential election"):
            current_section = "validity presidential election"
        elif line.startswith("Assumption of office of President"):
            current_section = "assumption office president"
        elif line.startswith("Term of office of President"):
            current_section = "term office president"
        elif line.startswith("Protection from legal proceedings"):
            current_section = "protection legal proceeding"
        elif line.startswith("Removal of President on grounds of incapacity"):
            current_section = "removal president incapacity"
        elif line.startswith("Removal of President by impeachment"):
            current_section = "removal president impeachment"
        elif line.startswith("Vacancy in the office of President"):
            current_section = "vacancy office president"
        elif line.startswith("Functions of the Deputy President"):
            current_section = "function deputy president"
        elif line.startswith("Election and swearing in of Deputy President"):
            current_section = "election deputy president"
        elif line.startswith("Vacancy in the office of Deputy President"):
            current_section = "vacancy office deputy president"    
        elif line.startswith("Removal of Deputy President"):
            current_section = "removal deputy president"    
        elif line.startswith("Remuneration and benefits of President and Deputy President"):
            current_section = "benefit president"
        elif line.startswith("Cabinet."):
            current_section = "cabinet"
        elif line.startswith("Decisions, responsibility and accountability of the Cabinet"):
            current_section = "decision cabinet"
        elif line.startswith("Secretary to the Cabinet"):
            current_section = "secretary cabinet"
        elif line.startswith("Principal Secretaries"):
            current_section = "principal secretary"    
        elif line.startswith("Attorney-General"):
            current_section = "attorney general"
        elif line.startswith("Director of Public Prosecutions"):
            current_section = "director public prosecution"
        elif line.startswith("Removal and resignation of Director of Public Prosecutions"):
            current_section = "removal director public prosecution"


        # Append line to the current section if it's set
        if current_section and line:
            sections[current_section].append(line)

    # Join each section into single string
    for key in sections:
        sections[key] = "\n".join(sections[key])
    
    return sections

# Split Chapter 9 into sections
chapter_9_sections = split_chapter(chapter_9)
chapter_9_sections

{'principle executive authority': 'Principles of executive authority.\n129. (1) Executive authority derives from the people of Kenya\nand shall be exercised in accordance with this Constitution.\n(2) Executive authority shall be exercised in a manner compatible\nwith the principle of service to the people of Kenya, and for their well-\nbeing and benefit.',
 'national executive': 'The National Executive.\n130. (1) The national executive of the Republic comprises the\nPresident, the Deputy President and the rest of the Cabinet.\n(2) The composition of the national executive shall reflect the\nregional and ethnic diversity of the people of Kenya.\nPART 2—THE PRESIDENT AND DEPUTY PRESIDENT',
 'authority president': 'Authority of the President.\n131. (1) The President—\n(a) is the Head of State and Government;\n(b) exercises the executive authority of the Republic, with the\nassistance of the Deputy President and Cabinet Secretaries;\n(c) is the Commander-in-Chief of the Kenya Defence Force

## CHAPTER 9 Question Answering Mechanism

In [46]:
sections = chapter_9_sections
##Define Q&A mapping for Chapter 6
qa_mapping = {
    "principle executive authority": "principle executive authority",
    "national executive": "national executive",
    "authority president": "authority president",
    "function president": "function president",
    "power mercy": "power mercy",
    "presidential powers temporary incumbency": "presidential powers temporary incumbency",
    "decision president": "decision president",
    "election president": "election president",
    "qualification disqualification election president": "qualification disqualification election president",
    "procedure presidential election": "procedure presidential election",
    "death assume office": "death assume office",
    "validity presidential election": "validity presidential election",
    "assumption office president": "assumption office president",
    "term office president": "term office president",
    "protection legal proceeding": "protection legal proceeding",
    "removal president incapacity": "removal president incapacity",
    "removal president impeachment": "removal president impeachment",
    "vacancy office president": "vacancy office president",
    "function deputy president": "function deputy president",
    "election deputy president": "election deputy president",
    "vacancy office deputy president": "vacancy office deputy president",
    "removal deputy president": "removal deputy president",
    "benefit president": "benefit president",
    "cabinet": "cabinet",
    "decision cabinet": "decision cabinet",
    "secretary cabinet": "secretary cabinet",
    "principal secretary": "principal secretary",
    "attorney general": "attorney general",
    "director public prosecution": "director public prosecution",
    "removal director public prosecution": "removal director public prosecution"
        
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "what does costitution state about the secretary to the cabinet?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Cabinet.
152. (1) The Cabinet consists of—
(a) the President;
(b) the Deputy President;
(c) the Attorney-General; and
(d) not fewer than fourteen and not more than twenty-two Cabinet
Secretaries.
(2) The President shall nominate and, with the approval of the
National Assembly, appoint Cabinet Secretaries.
(3) A Cabinet Secretary shall not be a Member of Parliament.
(4) Each person appointed as a Cabinet Secretary—
(a) assumes office by swearing or affirming faithfulness to the
people and the Republic of Kenya and obedience to this
Constitution, before the President and in accordance with the
Third Schedule; and
(b) may resign by delivering a written statement of resignation to
the President.
(5) The President—
(a) may re-assign a Cabinet Secretary;
90 Constitution of Kenya, 2010
(b) may dismiss a Cabinet Secretary; and
(c) shall dismiss a Cabinet Secretary if required to do so by a
resolution adopted under clauses (6) to (10).
(6) A member of the National Assembly, supported by at leas

In [47]:
# Example usage
user_query = "what does costitution state about the procedure for presidential election?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Procedure at presidential election.
138. (1) If only one candidate for President is nominated, that
candidate shall be declared elected.
(2) If two or more candidates for President are nominated, an
election shall be held in each constituency.
(3) In a presidential election—
(a) all persons registered as voters for the purposes of
parliamentary elections are entitled to vote;
(b) the poll shall be taken by secret ballot on the day specified in
Article 101 (1) at the time, in the places and in the manner
prescribed under an Act of Parliament; and
(c) after counting the votes in the polling stations, the
Independent Electoral and Boundaries Commission shall tally
and verify the count and declare the result.
(4) A candidate shall be declared elected as President if the
candidate receives—
(a) more than half of all the votes cast in the election; and
(b) at least twenty-five per cent of the votes cast in each of more
than half of the counties.
(5) If no candidate is elected, a fresh electi

## Chapter 9 NLU

In [48]:
# Define synonym mapping
synonyms = {
    "principle executive authority": ["principle executive authority"],
    "national executive": ["national executive"],
    "authority president": ["authority president"],
    "function president": ["function president"],
    "power mercy": ["power mercy"],
    "presidential powers temporary incumbency": ["presidential powers temporary incumbency"],
    "decision president": ["decision president"],
    "election president": ["election president"],
    "qualification disqualification election president": ["qualification disqualification election president"],
    "procedure presidential election": ["procedure presidential election"],
    "death assume office": ["death assume office"],
    "validity presidential election": ["validity presidential election"],
    "assumption office president": ["assumption office president"],
    "term office president": ["term office president"],
    "protection legal proceeding": ["protection legal proceeding"],
    "removal president incapacity": ["removal president incapacity"],
    "removal president impeachment": ["removal president impeachment", "impeachment president"],
    "vacancy office president": ["vacancy office president"],
    "function deputy president": ["function deputy president"],
    "election deputy president": ["election deputy president"],
    "vacancy office deputy president": ["vacancy office deputy president"],
    "removal deputy president": ["removal deputy president", "impeachment deputy president"],
    "benefit president": ["benefit president"],
    "cabinet": ["cabinet"],
    "decision cabinet": ["decision cabinet"],
    "secretary cabinet": ["secretary cabinet"],
    "principal secretary": ["principal secretary"],
    "attorney general": ["attorney general"],
    "director public prosecution": ["director public prosecution"],
    "removal director public prosecution": ["removal director public prosecution"]
}



# QA mapping
qa_mapping = {
    "principle executive authority": "principle executive authority",
    "national executive": "national executive",
    "authority president": "authority president",
    "function president": "function president",
    "power mercy": "power mercy",
    "presidential powers temporary incumbency": "presidential powers temporary incumbency",
    "decision president": "decision president",
    "election president": "election president",
    "qualification disqualification election president": "qualification disqualification election president",
    "procedure presidential election": "procedure presidential election",
    "death assume office": "death assume office",
    "validity presidential election": "validity presidential election",
    "assumption office president": "assumption office president",
    "term office president": "term office president",
    "protection legal proceeding": "protection legal proceeding",
    "removal president incapacity": "removal president incapacity",
    "removal president impeachment": "removal president impeachment",
    "vacancy office president": "vacancy office president",
    "function deputy president": "function deputy president",
    "election deputy president": "election deputy president",
    "vacancy office deputy president": "vacancy office deputy president",
    "removal deputy president": "removal deputy president",
    "benefit president": "benefit president",
    "cabinet": "cabinet",
    "decision cabinet": "decision cabinet",
    "secretary cabinet": "secretary cabinet",
    "principal secretary": "principal secretary",
    "attorney general": "attorney general",
    "director public prosecution": "director public prosecution",
    "removal director public prosecution": "removal director public prosecution"
}

sections = chapter_9_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_9):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  # Debugging line
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                return key  # Only return section value of the key
    
    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example
user_query = "What does it say about the attorney genral?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)



Processed Query: attorney genral
Corrected Query: attorney general
Checking key: principle executive authority, value: principle executive authority
Trying synonym: principle executive authority
Checking key: national executive, value: national executive
Trying synonym: national executive
Checking key: authority president, value: authority president
Trying synonym: authority president
Checking key: function president, value: function president
Trying synonym: function president
Checking key: power mercy, value: power mercy
Trying synonym: power mercy
Checking key: presidential powers temporary incumbency, value: presidential powers temporary incumbency
Trying synonym: presidential powers temporary incumbency
Checking key: decision president, value: decision president
Trying synonym: decision president
Checking key: election president, value: election president
Trying synonym: election president
Checking key: qualification disqualification election president, value: qualification disqua

In [49]:
# Example 2
user_query = "impeachment of deputy president?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)

Processed Query: impeachment deputy president
Corrected Query: impeachment deputy president
Checking key: principle executive authority, value: principle executive authority
Trying synonym: principle executive authority
Checking key: national executive, value: national executive
Trying synonym: national executive
Checking key: authority president, value: authority president
Trying synonym: authority president
Checking key: function president, value: function president
Trying synonym: function president
Checking key: power mercy, value: power mercy
Trying synonym: power mercy
Checking key: presidential powers temporary incumbency, value: presidential powers temporary incumbency
Trying synonym: presidential powers temporary incumbency
Checking key: decision president, value: decision president
Trying synonym: decision president
Checking key: election president, value: election president
Trying synonym: election president
Checking key: qualification disqualification election president, va

# CHAPTER 10

In [50]:
# Extract chapter 10
chapter_10 = extract_specific_pages(pdf_path, 94,106 )
chapter_10_trimmed= chapter_10.split("CHAPTER TEN")[1].strip()

In [51]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Example usage of the preprocessing function
user_query = "What is Judicial Authority and Legal System?"
processed_query = preprocess_query(user_query)
print(processed_query)

judicial authority legal system


In [52]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "judicial authority": [],
        "judicial independence": [],
        "judicial offices": [], 
        "court systems": [],
        "supreme court": [],
        "appeal court": [],
        "high court" : [],
        "judicial appointments": [],
        "office tenure": [],
        "office removal": [],
        "subordinate courts" : [],
        "jsc establishment": [],
        "jsc functions": [],
        "judiciary fund": [] 
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()
        

        if stripped_line.startswith("Judicial authority"):
            current_section = "judicial authority"
        elif stripped_line.startswith("Independence of the Judiciary"):
            current_section = "judicial independence"
        elif stripped_line.startswith("Judicial offices and officers"):
            current_section = "judicial offices"
        elif stripped_line.startswith("System of courts"):
            current_section = "court systems"
        elif stripped_line.startswith("Supreme Court"):
            current_section = "supreme court"
        elif stripped_line.startswith("Court of Appeal"):
            current_section = "appeal court"
        elif stripped_line.startswith("High Court"):
            current_section = "high court"
        elif ("Appointment" in stripped_line and 
            ("Chief Justice" in stripped_line or "Deputy Chief Justice" in stripped_line or "judges" in stripped_line)):
             current_section = "judicial appointments"
        elif stripped_line.startswith("Tenure of office of the Chief Justice and other judges"):
            current_section = "office tenure"
        elif stripped_line.startswith("Removal from office"):
            current_section = "office removal"
        elif stripped_line.startswith("Subordinate courts"):
            current_section = "subordinate courts"
        elif stripped_line.startswith("Establishment of the Judicial Service Commission"):
            current_section = "jsc establishment"
        elif stripped_line.startswith("Functions of the Judicial Service Commission"):
            current_section = "jsc functions"
        elif stripped_line.startswith("Judiciary Fund"):
            current_section = "judiciary fund"  

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key]= "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_10_sections= split_chapter(chapter_10_trimmed)

# Print a specific section as an example
print("judicial authority Section:\n", chapter_10_sections['judicial authority'])

judicial authority Section:
 Judicial authority.
159. (1) Judicial authority is derived from the people and vests in,
and shall be exercised by, the courts and tribunals established by or
under this Constitution.
(2) In exercising judicial authority, the courts and tribunals shall
be guided by the following principles—
(a) justice shall be done to all, irrespective of status;
(b) justice shall not be delayed;
(c) alternative forms of dispute resolution including reconciliation,
96 Constitution of Kenya, 2010
mediation, arbitration and traditional dispute resolution
mechanisms shall be promoted, subject to clause (3);
(d) justice shall be administered without undue regard to
procedural technicalities; and
(e) the purpose and principles of this Constitution shall be
protected and promoted.
(3) Traditional dispute resolution mechanisms shall not be used
in a way that—
(a) contravenes the Bill of Rights;
(b) is repugnant to justice and morality or results in outcomes that
are repugnant to 

## Question Answering mechanism 

In [53]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 10
sections = chapter_10_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "judicial authority": "judicial authority",
    "judicial independence": "judicial independence",
    "judicial office": "judicial offices",
    "system": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointment": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate court": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc function": "jsc functions",
    "judiciary fund": "judiciary fund"
}


# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query= preprocess_query(query)
    # Debug
    print( f"Processed query:{processed_query}")
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[qa_mapping[key]]
    
        
    return "Sorry, I couldn't find an answer to your question."

# Example 1
user_query = "what is the judiciary fund?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query:judiciary fund
Judiciary Fund.
173. (1) There is established a fund to be known as the Judiciary
Fund which shall be administered by the Chief Registrar of the
Judiciary.
(2) The Fund shall be used for administrative expenses of the
Judiciary and such other purposes as may be necessary for the
discharge of the functions of the Judiciary.
(3) Each financial year, the Chief Registrar shall prepare
estimates of expenditure for the following year, and submit them to the
National Assembly for approval.
(4) On approval of the estimates by the National Assembly, the
expenditure of the Judiciary shall be a charge on the Consolidated
Fund and the funds shall be paid directly into the Judiciary Fund.
(5) Parliament shall enact legislation to provide for the regulation
of the Fund.


In [54]:
# Example 2
user_query = "what is office tenure?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query:office tenure
Tenure of office of the Chief Justice and other judges.
167. (1) A judge shall retire from office on attaining the age of
seventy years, but may elect to retire at any time after attaining the
age of sixty-five years.
(2) The Chief Justice shall hold office for a maximum of ten years
or until retiring under clause (1), whichever is the earlier.
(3) If the Chief Justice’s term of office expires before the Chief
Justice retires under clause (1), the Chief Justice may continue in
102 Constitution of Kenya, 2010
office as a judge of the Supreme Court.
(4) If, on the expiry of the term of office of a Chief Justice, the
Chief Justice opts to remain on the Supreme Court under clause (3),
the next person appointed as Chief Justice may be selected in
accordance with Article 166 (1), even though that appointment may
result in there being more than the maximum permitted number of


### Chapter 10 NLU

In [55]:
# Define synonym mapping
synonyms_1 = {
    "judicial authority": ["judicial authority","legal authority","court jurisdiction"],
    "judicial independence": ["judicial independence","judicial autonomy","judicial impartiality"],
    "judicial offices": ["judicial office","legal position","judicial role"],
    "court systems": ["court system","judicial system","legal system"],
    "supreme court": ["supreme court","highest court","apex court"],
    "appeal court": ["appeal court","appellate court","review court"],
    "high court": ["high court","superior court"],
    "judicial appointments": ["judicial appointment","judge appointment","judicial selection"],
    "office tenure": ["office tenure","office term","service duration"],
    "office removal": ["office removal","office dismissal","office termination"],
    "subordinate courts": ["subordinate court","local court","inferior court"],
    "jsc establishment": ["judicial service commission establishment","judicial service commission creation","judicial service commission formation"],
    "jsc functions": ["judicial service commission function","judicial service commission role","judicial service commission responsibility"],
    "judiciary fund": ["judiciary fund","judicial fund","legal fund"]

  
}
# QA mapping
qa_mapping = {
    "judicial authority": "judicial authority",
    "judicial independence" :"judiciary independence",
    "judicial offices": "judicial offices",
    "court systems": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointments": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate courts": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc functions": "jsc functions",
    "judiciary fund": "judiciary fund"
}

sections = chapter_10_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

            

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_1):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  
        for synonym in synonyms_1.get(key, [key]):
            print(f"Trying synonym: {synonym}")  
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  
                return key  # Only return section value of the key
    
    print("No match found")  
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
        # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example 1
user_query = "What is court jurisdiction?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_1)
print(answer)



Processed Query: court jurisdiction
Corrected Query: court jurisdiction
Checking key: judicial authority, value: judicial authority
Trying synonym: judicial authority
Trying synonym: legal authority
Trying synonym: court jurisdiction
Match found with synonym: court jurisdiction
Judicial authority.
159. (1) Judicial authority is derived from the people and vests in,
and shall be exercised by, the courts and tribunals established by or
under this Constitution.
(2) In exercising judicial authority, the courts and tribunals shall
be guided by the following principles—
(a) justice shall be done to all, irrespective of status;
(b) justice shall not be delayed;
(c) alternative forms of dispute resolution including reconciliation,
96 Constitution of Kenya, 2010
mediation, arbitration and traditional dispute resolution
mechanisms shall be promoted, subject to clause (3);
(d) justice shall be administered without undue regard to
procedural technicalities; and
(e) the purpose and principles of th

In [56]:
# Example 2
user_query2 = " what is process for judge appointments?"
answer = answer_question_nlp(user_query2, sections, qa_mapping, synonyms_1)
print(answer)

Processed Query:   process judge appointment
Corrected Query: process judge appointment
Checking key: judicial authority, value: judicial authority
Trying synonym: judicial authority
Trying synonym: legal authority
Trying synonym: court jurisdiction
Checking key: judicial independence, value: judiciary independence
Trying synonym: judicial independence
Trying synonym: judicial autonomy
Trying synonym: judicial impartiality
Checking key: judicial offices, value: judicial offices
Trying synonym: judicial office
Trying synonym: legal position
Trying synonym: judicial role
Checking key: court systems, value: court systems
Trying synonym: court system
Trying synonym: judicial system
Trying synonym: legal system
Checking key: supreme court, value: supreme court
Trying synonym: supreme court
Trying synonym: highest court
Trying synonym: apex court
Checking key: appeal court, value: appeal court
Trying synonym: appeal court
Trying synonym: appellate court
Trying synonym: review court
Checking 

# Chapter 11

In [57]:
# chapter 11 extraction
chapter_11= extract_specific_pages(pdf_path, 106,121)
chapter_11_trimmed = chapter_11.split("CHAPTER ELEVEN")[1].strip()
chapter_11_trimmed = chapter_11.split("CHAPTER TWELVE")[0].strip()

In [58]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "object devolution": [],
        "devolved government": [],
        "county government": [],
        "membership county assembly": [],
        "speaker county assembly": [],
        "county executive committee": [],
        "election governor": [],
        "removal governor": [],
        "vacancy office governor": [],
        "function county executive committee": [],
        "urban area": [],
        "legislative authority county assembly": [],
        "power national government": [],
        "transfer power level government": [],
        "boundary county": [],
        "cooperation national county government": [],
        "support county government": [],
        "conflict law": [],
        "suspension county government": [],
        "qualification election member county assembly": [],
        "vacation office member county assembly": [],
        "county assembly power summon witness": [],
        "public participation county assembly power": [],
        "county assembly gender balance": [],
        "county government transition": [],
        "publication county legislation": [],
        "legislation chapter": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Objects of devolution"):
            current_section = "object devolution"
        elif stripped_line.startswith("Principles of devolved government"):
            current_section = "devolved government"
        elif stripped_line.startswith("County governments"):
            current_section = "county government"
        elif stripped_line.startswith("Membership of county assembly"):
            current_section = "membership county assembly"
        elif stripped_line.startswith("Speaker of a county assembly"):
            current_section = "speaker county assembly"
        elif stripped_line.startswith("County executive committees"):
            current_section = "county executive committee"
        elif stripped_line.startswith("Election of county governor and deputy county governor"):
            current_section = "election governor"
        elif stripped_line.startswith("Removal of a county governor"):
            current_section = "removal governor"
        elif stripped_line.startswith("Vacancy in the office of county governor"):
            current_section = "vacancy office governor"
        elif stripped_line.startswith("Functions of county executive committees"):
            current_section = "function county executive committee"
        elif stripped_line.startswith("Urban areas and cities"):
            current_section = "urban area"
        elif stripped_line.startswith("Legislative authority of county assemblies"):
            current_section = "legislative authority county assembly"
        elif stripped_line.startswith("Respective functions and powers of national"):
            current_section = "power national government"
        elif stripped_line.startswith("Transfer of functions and powers between levels of government"):
            current_section = "transfer power level government"
        elif stripped_line.startswith("Boundaries of counties"):
            current_section = "boundary county"
        elif stripped_line.startswith("Cooperation between national and county governments"):
            current_section = "cooperation national county government"
        elif stripped_line.startswith("Support for county governments"):
            current_section = "support county government"
        elif stripped_line.startswith("Conflict of laws"):
            current_section = "conflict law"
        elif stripped_line.startswith("Suspension of a county government"):
            current_section = "suspension county government"
        elif stripped_line.startswith("Qualifications for election as member of county assembly"):
            current_section = "qualification election member county assembly"
        elif stripped_line.startswith("Vacation of office of member of county assembly"):
            current_section = "vacation office member county assembly"
        elif stripped_line.startswith("County assembly power to summon witnesses"):
            current_section = "county assembly power summon witness"
        elif stripped_line.startswith("Public participation and county assembly powers"):
            current_section = "public participation county assembly power"
        elif stripped_line.startswith("County assembly gender balance and diversity"):
            current_section = "county assembly gender balance"
        elif stripped_line.startswith("County government during transition"):
            current_section = "county government transition"
        elif stripped_line.startswith("Publication of county legislation"):
            current_section = "publication county legislation"
        elif stripped_line.startswith("Legislation on Chapter"):
            current_section = "legislation chapter"
        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections



# Split the chapter into sections
chapter_11_sections = split_chapter(chapter_11_trimmed)

# Print a specific section as an example
chapter_11_sections.keys()

dict_keys(['object devolution', 'devolved government', 'county government', 'membership county assembly', 'speaker county assembly', 'county executive committee', 'election governor', 'removal governor', 'vacancy office governor', 'function county executive committee', 'urban area', 'legislative authority county assembly', 'power national government', 'transfer power level government', 'boundary county', 'cooperation national county government', 'support county government', 'conflict law', 'suspension county government', 'qualification election member county assembly', 'vacation office member county assembly', 'county assembly power summon witness', 'public participation county assembly power', 'county assembly gender balance', 'county government transition', 'publication county legislation', 'legislation chapter'])

## Question Answering mechanism 

In [59]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 2
sections = chapter_11_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "object devolution": "object devolution",
    "devolved government": "devolved government",
    "county government": "county government",
    "membership county assembly": "membership county assembly",
    "speaker county assembly": "speaker county assembly",
    "county executive committee": "county executive committee",
    "election governor": "election governor",
    "removal governor": "removal governor",
    "vacancy office governor": "vacancy office governor",
    "function county executive committee": "function county executive committee",
    "urban area": "urban area",
    "legislative authority county assembly": "legislative authority county assembly",
    "power national government": "power national government",
    "transfer power level government": "transfer power level government",
    "boundary county": "boundary county",
    "cooperation national county government": "cooperation national county government",
    "support county government": "support county government",
    "conflict law": "conflict law",
    "suspension county government": "suspension county government",
    "qualification election member county assembly": "qualification election member county assembly",
    "vacation office member county assembly": "vacation office member county assembly",
    "county assembly power summon witness": "county assembly power summon witness",
    "public participation county assembly power": "public participation county assembly power",
    "county assembly gender balance": "county assembly gender balance",
    "county government transition": "county government transition",
    "publication county legislation": "publication county legislation",
    "legislation chapter": "legislation chapter"
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[qa_mapping[key]]
            
    return "Sorry, I couldn't find an answer to your question."


# Example 1
user_query = "what about conflict law?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: conflict law
Conflict of laws.
191. (1) This Article applies to conflicts between national and
county legislation in respect of matters falling within the concurrent
jurisdiction of both levels of government.
(2) National legislation prevails over county legislation if—
(a) the national legislation applies uniformly throughout Kenya and
any of the conditions specified in clause (3) is satisfied; or
(b) the national legislation is aimed at preventing unreasonable
action by a county that—
(i) is prejudicial to the economic, health or security
interests of Kenya or another county; or
(ii) impedes the implementation of national economic policy.
(3) The following are the conditions referred to in clause (2) (a)—
(a) the national legislation provides for a matter that cannot be
regulated effectively by legislation enacted by the individual
counties;
(b) the national legislation provides for a matter that, to be dealt
with effectively, requires uniformity across the nation, a

In [60]:
user_query = "what is county assembly gender balance?"
answer= answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: county assembly gender balance
County assembly gender balance and diversity.
197. (1) Not more than two-thirds of the members of any county
assembly or county executive committee shall be of the same gender.
(2) Parliament shall enact legislation to—
(a) ensure that the community and cultural diversity of a county is
reflected in its county assembly and county executive
committee; and
(b) prescribe mechanisms to protect minorities within counties.


## Chapter 11 NLU

In [61]:
# Define synonym mapping
synonyms = {
    "object devolution": ["object devolution"],
    "devolved government": ["devolved government", "principle devolved government"],
    "county government": ["county government"],
    "membership county assembly": ["membership county assembly", "member county assembly"],
    "speaker county assembly": ["speaker county assembly"],
    "county executive committee": ["county executive committee", "county executive"],
    "election governor": ["election governor", "election deputy governor", "election county governor deputy county governor", "election county governor deputy county governor"],
    "removal governor": ["removal governor", "removal county governor"],
    "vacancy office governor": ["vacancy office governor", "vacancy governor"],
    "function county executive committee": ["function county executive committee", "function county executive"],
    "urban area": ["urban area", "city", "urban area city"],
    "legislative authority county assembly": ["legislative authority county assembly", "authority county assembly"],
    "power national government": ["power national government", "power county government, power national county government", "function national government, function county government"],
    "transfer power level government": ["transfer power level government", "transfer function level government", "transfer power government", "transfer function government", "transfer function power level government"],
    "boundary county": ["boundary county"],
    "cooperation national county government": ["cooperation national county government", "cooperation government"],
    "support county government": ["support county government", "assistance county government"],
    "conflict law": ["conflict law"],
    "suspension county government": ["suspension county government", "suspension government"],
    "qualification election member county assembly": ["qualification election member county assembly", "election county assembly"],
    "vacation office member county assembly": ["vacation office member county assembly", "vacation county assembly", "vacation member county"],
    "county assembly power summon witness": ["county assembly power summon witness", "power summon witness"],
    "public participation county assembly power": ["public participation county assembly power", "power county assembly", "county assembly power" "public participation county assembly power immunity", ],
    "county assembly gender balance": ["county assembly gender balance", "county assembly diversity", "county gender balance", "county diversity"],
    "county government transition": ["county government transition", "transition county government"],
    "publication county legislation": ["publication county legislation"],
    "legislation chapter": ["legislation chapter"]
}


# Define the qa mapping
qa_mapping = {
    "object devolution": "object devolution",
    "devolved government": "devolved government",
    "county government": "county government",
    "membership county assembly": "membership county assembly",
    "speaker county assembly": "speaker county assembly",
    "county executive committee": "county executive committee",
    "election governor": "election governor",
    "removal governor": "removal governor",
    "vacancy office governor": "vacancy office governor",
    "function county executive committee": "function county executive committee",
    "urban area": "urban area",
    "legislative authority county assembly": "legislative authority county assembly",
    "power national government": "power national government",
    "transfer power level government": "transfer power level government",
    "boundary county": "boundary county",
    "cooperation national county government": "cooperation national county government",
    "support county government": "support county government",
    "conflict law": "conflict law",
    "suspension county government": "suspension county government",
    "qualification election member county assembly": "qualification election member county assembly",
    "vacation office member county assembly": "vacation office member county assembly",
    "county assembly power summon witness": "county assembly power summon witness",
    "public participation county assembly power": "public participation county assembly power",
    "county assembly gender balance": "county assembly gender balance",
    "county government transition": "county government transition",
    "publication county legislation": "publication county legislation",
    "legislation chapter": "legislation chapter"
}


sections = chapter_11_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_2):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  
        for synonym in synonyms_2.get(key, [key]):
            print(f"Trying synonym: {synonym}")  
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  
                return key  # Only return section value of the key
    
    print("No match found")  
    return None

# Answer function with synonym matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
         # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example 1
user_query = "vacancy office governor?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_2)
print(answer)


Processed Query: vacancy office governor
Corrected Query: vacancy office governor
Checking key: object devolution, value: object devolution
Trying synonym: object devolution
Checking key: devolved government, value: devolved government
Trying synonym: devolved government
Checking key: county government, value: county government
Trying synonym: county government
Checking key: membership county assembly, value: membership county assembly
Trying synonym: membership county assembly
Checking key: speaker county assembly, value: speaker county assembly
Trying synonym: speaker county assembly
Checking key: county executive committee, value: county executive committee
Trying synonym: county executive committee
Checking key: election governor, value: election governor
Trying synonym: election governor
Checking key: removal governor, value: removal governor
Trying synonym: removal governor
Checking key: vacancy office governor, value: vacancy office governor
Trying synonym: vacancy office govern

# Chapter 12

In [62]:
chapter_12= extract_specific_pages(pdf_path, 120,138)
chapter_12_trimmed= chapter_12.split("CHAPTER TWELVE")[1].strip()

In [63]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "principle public finance": [],
        "equitable share national revenue": [],
        "equitable share": [],
        "equalisation fund": [],
        "consultation financial legislation": [],
        "consolidate fund": [],
        "revenue fund": [],
        "contingency fund": [],
        "power impose tax": [],
        "imposition tax": [],
        "borrow national government": [],
        "borrow county": [],
        "loan guarantee": [],
        "public debt": [],
        "commission revenue allocation": [],
        "function commission": [],
        "division revenue": [],
        "annual division revenue": [],
        "transfer equitable share": [],
        "budget form content": [],
        "budget estimate": [],
        "expenditure budget": [],
        "supplementary appropriation": [],
        "county appropriation bill": [],
        "financial control": [],
        "account audit": [],
        "procurement public good": [],
        "controller budget": [],
        "auditor general": [],
        "salary remuneration commission": [],
        "central bank kenya": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Principles of public finance"):
            current_section = "principle public finance"
        elif stripped_line.startswith("Equitable sharing of national revenue"):
            current_section = "equitable share national revenue"
        elif stripped_line.startswith("Equitable share and other financial laws"):
            current_section = "equitable share"
        elif stripped_line.startswith("Equalisation Fund"):
            current_section = "equalisation fund"
        elif stripped_line.startswith("Consultation on financial legislation affecting counties"):
            current_section = "consultation financial legislation"
        elif stripped_line.startswith("Consolidated Fund and other public funds"):
            current_section = "consolidate fund"
        elif stripped_line.startswith("Revenue Funds for county governments"):
            current_section = "revenue fund"
        elif stripped_line.startswith("Contingencies Fund"):
            current_section = "contingency fund"
        elif stripped_line.startswith("Power to impose taxes and charges"):
            current_section = "power impose tax"
        elif stripped_line.startswith("Imposition of tax"):
            current_section = "imposition tax"
        elif stripped_line.startswith("Borrowing by national government"):
            current_section = "borrow national government"
        elif stripped_line.startswith("Borrowing by counties"):
            current_section = "borrow county"
        elif stripped_line.startswith("Loan guarantees by national government"):
            current_section = "loan guarantee"
        elif stripped_line.startswith("Public debt"):
            current_section = "public debt"
        elif stripped_line.startswith("Commission on Revenue Allocation"):
            current_section = "commission revenue allocation"
        elif stripped_line.startswith("Functions of the Commission on Revenue Allocation"):
            current_section = "function commission"
        elif stripped_line.startswith("Division of revenue"):
            current_section = "division revenue"
        elif stripped_line.startswith("Annual Division and Allocation of Revenue Bills"):
            current_section = "annual division revenue"
        elif stripped_line.startswith("Transfer of equitable share"):
            current_section = "transfer equitable share"
        elif stripped_line.startswith("Form, content and timing of budgets"):
            current_section = "budget form content"
        elif stripped_line.startswith("Budget estimates and annual Appropriation Bill"):
            current_section = "budget estimate"
        elif stripped_line.startswith("Expenditure before annual budget is passed"):
            current_section = "expenditure budget"
        elif stripped_line.startswith("Supplementary appropriation"):
            current_section = "supplementary appropriation"
        elif stripped_line.startswith("County appropriation Bills"):
            current_section = "county appropriation bill"
        elif stripped_line.startswith("Financial control"):
            current_section = "financial control"
        elif stripped_line.startswith("Accounts and audit of public entities"):
            current_section = "account audit"
        elif stripped_line.startswith("Procurement of public goods and services"):
            current_section = "procurement public good"
        elif stripped_line.startswith("Controller of Budget"):
            current_section = "controller budget"
        elif stripped_line.startswith("Auditor-General"):
            current_section = "auditor general"
        elif stripped_line.startswith("Salaries and Remuneration Commission"):
            current_section = "salary remuneration commission"
        elif stripped_line.startswith("Central Bank of Kenya"):
            current_section = "central bank kenya"

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections



# Split Chapter 12 into sections
chapter_12_sections = split_chapter(chapter_12_trimmed)
chapter_12_sections


{'principle public finance': 'Principles of public finance.\n201. The following principles shall guide all aspects of public\nfinance in the Republic—\n(a) there shall be openness and accountability, including public\nparticipation in financial matters;\n(b) the public finance system shall promote an equitable society,\nand in particular—\n(i) the burden of taxation shall be shared fairly;\n(ii) revenue raised nationally shall be shared equitably\namong national and county governments; and\n(iii) expenditure shall promote the equitable development of\nthe country, including by making special provision for\nmarginalised groups and areas;\n(c) the burdens and benefits of the use of resources and public\nborrowing shall be shared equitably between present and\nfuture generations;\n122 Constitution of Kenya, 2010\n(d) public money shall be used in a prudent and responsible way;\nand\n(e) financial management shall be responsible, and fiscal\nreporting shall be clear.',
 'equitable share na

### Question Answering Mechanism

In [64]:
# Load spacy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 12
sections = chapter_12_sections
    
# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "principle public finance": "principle public finance",
    "equitable share national revenue": "equitable share national revenue",
    "equitable share": "equitable share",
    "equalisation fund": "equalisation fund",
    "consultation financial legislation": "consultation financial legislation",
    "consolidate fund": "consolidate fund",
    "revenue fund": "revenue fund",
    "contingency fund": "contingency fund",
    "power impose tax": "power impose tax",
    "imposition tax": "imposition tax",
    "borrow national government": "borrow national government",
    "borrow county": "borrow county",
    "loan guarantee": "loan guarantee",
    "public debt": "public debt",
    "commission revenue allocation": "commission revenue allocation",
    "function commission": "function commission",
    "division revenue": "division revenue",
    "annual division revenue": "annual division",
    "transfer equitable share": "transfer equitable share",
    "budget form content": "budget form content",
    "budget estimate": "budget estimate",
    "expenditure budget": "expenditure budget",
    "supplementary appropriation": "supplementary appropriation",
    "county appropriation bill": "county appropriation bill",
    "financial control": "financial control",
    "account audit": "account audit",
    "procurement public good": "procurement public good",
    "controller budget": "controller budget",
    "auditor general": "auditor general",
    "salary remuneration commission": "salary remuneration commission",
    "central bank kenya": "central bank kenya"
}


# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")
    
    # Search for a key in qa_mapping12 that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[qa_mapping[key]]
            
    return "Sorry, I couldn't find an answer to your question."

# Example 1
user_query = "What are loan guarantees?"
response = answer_question_nlp(user_query, sections, qa_mapping)
print(response)


Processed query: loan guarantee
Loan guarantees by national government.
213. (1) An Act of Parliament shall prescribe terms and
conditions under which the national government may guarantee loans.
(2) Within two months after the end of each financial year, the
national government shall publish a report on the guarantees that it
gave during that year.


In [65]:
# Example 2
user_query = "Explain division on procurement of public goods?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: explain division procurement public good
Procurement of public goods and services.
227. (1) When a State organ or any other public entity contracts
for goods or services, it shall do so in accordance with a system that is
fair, equitable, transparent, competitive and cost-effective.
(2) An Act of Parliament shall prescribe a framework within which
policies relating to procurement and asset disposal shall be
implemented and may provide for all or any of the following—
(a) categories of preference in the allocation of contracts;
(b) the protection or advancement of persons, categories of
persons or groups previously disadvantaged by unfair
competition or discrimination;
(c) sanctions against contractors that have not performed
according to professionally regulated procedures, contractual
agreements or legislation; and
(d) sanctions against persons who have defaulted on their tax
obligations, or have been guilty of corrupt practices or serious
violations of fair employmen

In [66]:
# Define synonym mapping
synonyms = {
    "principle public finance": ["principle public finance", "public finance guideline"],
    "equitable share national revenue": ["equitable share national revenue", "fair distribution revenue"],
    "equitable share": ["equitable share", "financial law"],
    "equalisation fund": ["equalisation fund", "equalization fund"],
    "consultation financial legislation": ["consultation financial legislation"],
    "consolidate fund": ["consolidate fund", "public fund"],
    "revenue fund": ["revenue fund", "revenue fund county"],
    "contingency fund": ["contingency fund"],
    "power impose tax": ["power impose tax", "power impose charge"],
    "imposition tax": ["imposition tax"],
    "borrow national government": ["borrow national government"],
    "borrow county": ["borrow county"],
    "loan guarantee": ["loan guarantee", "loan guarantee national government"],
    "public debt": ["public debt"],
    "commission revenue allocation": ["commission revenue allocation", "revenue allocation"],
    "function commission": ["function commission", "function commission revenue allocation"],
    "division revenue": ["division revenue"],
    "annual division revenue": ["annual division revenue", "annual allocation"],
    "transfer equitable share": ["transfer equitable share"],
    "budget form content": ["budget form content"],
    "budget estimate": ["budget estimate"],
    "expenditure budget": ["expenditure budget"],
    "supplementary appropriation": ["supplementary appropriation"],
    "county appropriation bill": ["county appropriation bill"],
    "financial control": ["financial control"],
    "account audit": ["account audit", "account audit public entity", "audit entity"],
    "procurement public good": ["procurement public good", "procurement private good", "procurement service"],
    "controller budget": ["controller budget"],
    "auditor general": ["auditor general"],
    "salary remuneration commission": ["salary remuneration commission", "salary remuneration"],
    "central bank kenya": ["central bank kenya", "central bank"]
}

# qa mapping
qa_mapping = {
    "principle public finance": "principle public finance",
    "equitable share national revenue": "equitable share national revenue",
    "equitable share": "equitable share",
    "equalisation fund": "equalisation fund",
    "consultation financial legislation": "consultation financial legislation",
    "consolidate fund": "consolidate fund",
    "revenue fund": "revenue fund",
    "contingency fund": "contingency fund",
    "power impose tax": "power impose tax",
    "imposition tax": "imposition tax",
    "borrow national government": "borrow national government",
    "borrow county": "borrow county",
    "loan guarantee": "loan guarantee",
    "public debt": "public debt",
    "commission revenue allocation": "commission revenue allocation",
    "function commission": "function commission",
    "division revenue": "division revenue",
    "annual division revenue": "annual division revenue",
    "transfer equitable share": "transfer equitable share",
    "budget form content": "budget form content",
    "budget estimate": "budget estimate",
    "expenditure budget": "expenditure budget",
    "supplementary appropriation": "supplementary appropriation",
    "county appropriation bill": "county appropriation bill",
    "financial control": "financial control",
    "account audit": "account audit",
    "procurement public good": "procurement public good",
    "controller budget": "controller budget",
    "auditor general": "auditor general",
    "salary remuneration commission": "salary remuneration commission",
    "central bank kenya": "central bank kenya"
}



sections = chapter_12_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms_3):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  


    for key, value in qa_mapping.items():
        print(f"Checking key: {key}, value: {value}")  
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  
                return key  # Only return section value of the key
    
    print("No match found")  
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_key = match_with_synonyms(query, qa_mapping, synonyms)
    
    if section_key:
         # Retrieve the relevant section from the specified chapter
        return sections.get(section_key, "Section not found.")
    
    return "Sorry, I couldn't find an answer to your question."

# Example 1
user_query = "what is the equalisation fund?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_3)
print(answer)


Processed Query: equalisation fund
Corrected Query: equalization fund
Checking key: principle public finance, value: principle public finance
Trying synonym: principle public finance
Trying synonym: public finance guideline
Checking key: equitable share national revenue, value: equitable share national revenue
Trying synonym: equitable share national revenue
Trying synonym: fair distribution revenue
Checking key: equitable share, value: equitable share
Trying synonym: equitable share
Trying synonym: financial law
Checking key: equalisation fund, value: equalisation fund
Trying synonym: equalisation fund
Trying synonym: equalization fund
Match found with synonym: equalization fund
Equalisation Fund.
204. (1) There is established an Equalisation Fund into which
shall be paid one half per cent of all the revenue collected by the
national government each year calculated on the basis of the most
recent audited accounts of revenue received, as approved by the
National Assembly.
(2) The natio

In [67]:
# Example 1
user_query = "equitable share?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms_3)
print(answer)

Processed Query: equitable share
Corrected Query: equitable share
Checking key: principle public finance, value: principle public finance
Trying synonym: principle public finance
Trying synonym: public finance guideline
Checking key: equitable share national revenue, value: equitable share national revenue
Trying synonym: equitable share national revenue
Trying synonym: fair distribution revenue
Checking key: equitable share, value: equitable share
Trying synonym: equitable share
Match found with synonym: equitable share
Equitable share and other financial laws.
203. (1) The following criteria shall be taken into account in
determining the equitable shares provided for under Article 202 and in
all national legislation concerning county government enacted in terms
of this Chapter—
(a) the national interest;
(b) any provision that must be made in respect of the public debt
and other national obligations;
(c) the needs of the national government, determined by
objective criteria;
(d) the 

## Cumulative synonyms, qa_mapping and sections

In [68]:
# Define the synonyms explicitly so that we can easily reference them later
synonyms = {
    "supremacy": ["supremacy", "authority", "ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration": ["declaration", "proclamation", "statement", "announcement", "affirmation"],
    "territory": ["territory", "land", "region", "area", "jurisdiction", "bounds"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "value": ["value", "principle", "ethic", "core value", "standard", "national ideal"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement": ["entitlement", "right", "eligibility", "entitlement rights", "benefits", "privileges"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation": ["legislation", "laws", "legal framework", "statutes", "enactment"],
    "fundamental": ["fundamental", "fundamental freedoms", "essential"],
    "application": ["application", "exercise", "freedom enforcement"],
    "implementation": ["implementation", "right support", "execution"],
    "authority": ["authority", "judicial power", "court jurisdiction", "judicial review"],
    "limitation": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right"],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty of thought", "religion freedom", "belief"],
    "expression": ["expression", "communication", "speak"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right to information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "migration", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "benefit", "employ"],
    "environment": ["environment", "conserve", "right to a healthy environment"],
    "economic": ["economic", "standard of life", "social welfare"],
    "language": ["language", "cultural right", "linguistic freedom", "cultural expression"],
    "family": ["family", "society foundation", "parent right"],
    "consumer": ["consumer", "client entitlement", "buyer safeguard"],
    "administrative": ["administrative", "fair administration rule", "legal administration measure"],
    "justice": ["justice", "legal access", "justice available"],
    "arrest": ["arrest", "capture", "seizure", "apprehension"],
    "fair hearing": ["fair hearing", "prosecute", "equity hearing"],
    "custody": ["custody", "confinement", "imprisonment", "prison"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "kid", "toddler"],
    "disable": ["disable", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "diversity right", "affirmative right"],
    "old": ["old", "elder", "vintage"],
    "emergency": ["emergency", "danger", "crisis", "disaster"],
    "national": ["national", "human right", "equaity Body", "right authority"],
    "principle": ["principle", "land management principle", "land policy guide"],
    "classification": ["classification", "land category", "types of land"],
    "public": ["public", "government land", "state property", "national land", "public landhold"],
    "community": ["community", "ethnic land", "cultural landhold", "community land"],
    "private": ["private", "individual landhold", "personal property", "freehold land", "private land"],
    "landhold" : ["landhold", "foreign lease", "alien land", "non-citizen land"],
    "regulation" : ["regulation", "land use policy", "property regulation", "land oversight"],
    "commission" : ["commission", "land authority", "public land commission", "land policy agency"],
    "land" : ["land", "land legislation", "property law", "land-use regulation", "land act"],
    "obligation" : ["obligation", "biodiversity duty", "conservation mandate"],
    "environmental" : ["environmental", "ecological redress", "sustainability enforcement"],
    "resource" : ["resource", "resource use agreement", "environmental concession"],
    "legislation" : ["legislation", "environmental law", "green policy enactment"],
    "responsibility": ["responsibility", "accountable", "commitment", "charge"],
    "oath": ["oath", "swear", "state office affirmation", "vow", "declare"],
    "conduct": ["conduct", "behavior", "comportment", "demeanor"],
    "finance": ["finance", "finance integrity", "wealth", "invest", "finance management"],
    "restrict": ["restrict", "constrain", "regulate", "prohibit", "curtail", "control"],
    "citizen": ["citizen", "inhabitant", "local resident"],
    "establish": ["establish", "corrupt", "bribe", "fraud"],
    "leader": ["leader", "governing, authority"],
    "judicial authority": ["judicial authority","legal authority","court jurisdiction"],
    "judicial independence": ["judicial independence","judicial autonomy","judicial impartiality"],
    "judicial offices": ["judicial office","legal position","judicial role"],
    "court systems": ["court system","judicial system","legal system"],
    "supreme court": ["supreme court","highest court","apex court"],
    "appeal court": ["appeal court","appellate court","review court"],
    "high court": ["high court","superior court"],
    "judicial appointments": ["judicial appointment","judge appointment","judicial selection"],
    "office tenure": ["office tenure","office term","service duration"],
    "office removal": ["office removal","office dismissal","office termination"],
    "subordinate courts": ["subordinate court","local court","inferior court"],
    "jsc establishment": ["judicial service commission establishment","judicial service commission creation","judicial service commission formation"],
    "jsc functions": ["judicial service commission function","judicial service commission role","judicial service commission responsibility"],
    "judiciary fund": ["judiciary fund","judicial fund","legal fund"],
    "object devolution": ["object devolution", "power decentralization", "government devolution"],
    "devolve government": ["devolve government", "decentralization principle"],
    "county government": ["county government", "local government", "regional administration"],
    "membership": ["membership", "composition", "structure"],
    "speaker": ["speaker", "assembly speaker", "preside officer"],
    "county executive committee": ["county executive committee", "regional executive body", "local executive board"],
    "county election": ["county election", "local election", "regional election"],
    "county removal": ["county removal", "county office dismissal", "regional removal"],
    "vacancy": ["vacancy", "position vacancy", "office vacancy"],
    "functions of county executive committees": ["functions of county executive committees", "duties of county executive commitees", "county executive commitees roles"],
    "city": ["city", "urban center", "municipality"],
    "legislative authority": ["legislative authority", "law make power", "regulatory authority"],
    "function and power of national and county government": ["function and power of national and county government", "government duty", "government responsibility"],
    "transfer": ["transfer", "delegation", "reassignment"],
    "county boundary": ["county boundary", "county border", "regional limit"],
    "cooperation": ["cooperation", "collaboration", "coordination"],
    "support": ["support", "assistance", "aid"],
    "conflict of law": ["conflict of law", "legal discrepancy", "jurisdiction conflict"],
    "suspension": ["suspension", "interruption", "temporary halt"],
    "qualification": ["qualification", "eligibility", "requirement"],
    "vacation": ["vacation", "office leave", "office absence"],
    "summon witness": ["summon witness", "convene witness", "request witness attendance"],
    "public participation county assembly power privilege immunity": ["public participation county assembly power privilege immunity", "county assembly right", "local governance privilege"],
    "gender balance": ["gender balance", "gender equity", "equal representation"],
    "county transition": ["county transition", "regional handover", "local government transition"],
    "publication": ["publication", "official release", "public announcement"],
    "legislation": ["legislation", "regulation", "law"],
    "principle public finance": ["principle public finance", "public finance guideline", "fundamental financial rule"],
    "equitable sharing of national revenue": ["equitable share of national revenue", "fair distribution of revenue"],
    "equitable share law": ["equitable share law", "financial law", "equitable distribution law"],
    "equalisation fund": ["equalisation fund", "equalization", "equal opportunity fund"],
    "consultation": ["consultation", "discussion", "advisement"],
    "consolidated fund": ["consolidated fund", "main fund", "central fund"],
    "revenue funds": ["revenue funds", "revenue", "income fund"],
    "contingencies fund": ["contingency fund", "reserve fund", "emergency fund"],
    "tax impose": ["tax impose", "tax authority", "tax power"],
    "national government borrow": ["national government borrow", "national debt", "federal borrow"],
    "loan guarantee": ["loan guarantee", "credit guarantee", "loan assurance"],
    "public debt": ["public debt", "national debt", "government debt"],
    "commission on revenue allocation": ["commission on revenue allocation", "allocation commission", "revenue allocation committee"],
    "division of revenue": ["division of revenue", "revenue distribution", "allocation of revenue"],
    "budget content": ["budget content", "budget structure", "budget framework"],
    "expenditure": ["expenditure", "cost", "expense"],
    "supplementary appropriation": ["supplementary appropriation", "additional appropriation", "extra allocation"],
    "financial control": ["financial control", "financial oversight", "budget control"],
    "procurement": ["procurement", "acquisition", "purchase"],
    "budget contoller": ["budget contoller", "cost controller"],
    "auditor-general": ["auditor general", "chief auditor", "head auditor"],
    "salary": ["salary","remuneration", "pay"],
    "central bank of kenya": ["central bank of kenya", "central bank", "central financial authority"]

}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration": "declaration",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "value": "value",
    "culture": "culture",
    "entitlement": "entitlement",
    "retention": "retention",  
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation": "legislation",
    "fundamental": "fundamental",
    "application": "application",
    "implementation": "implementation",
    "authority": "authority",
    "limitation": "limitation",
    "limit": "limit",
    "life": "life",
    "equality": "equality",
    "dignity": "dignity",
    "security": "security",
    "slavery": "slavery",
    "privacy": "privacy",
    "conscience": "conscience",
    "expression": "expression",
    "medium": "medium",
    "information": "information",
    "association": "association",
    "assembly": "assembly",
    "political": "political",
    "movement": "movement",
    "property": "property",
    "work": "work",
    "environment": "environment",
    "economic": "economic",
    "language": "language",
    "family": "family",
    "consumer": "consumer",
    "administrative": "administrative",
    "justice": "justice",
    "arrest": "arrest",
    "fair hearing": "fair hearing",
    "custody": "custody",
    "interpret": "interpret",
    "infant": "infant",
    "disable": "disable",
    "youth": "youth",
    "minority": "minority",
    "old": "old",
    "emergency": "emergency",
    "national": "national",
    "principle": "principle",
    "classification": "classification",
    "public": "public",
    "community": "community",
    "private": "private",
    "landhold": "landhold",
    "regulation": "regulation",
    "commission": "commission",
    "land": "land",
    "obligation": "obligation",
    "environmental": "environmental",
    "resource": "resource",
    "legislation": "legislation",
    "responsibility": "responsibility",
    "oath": "oath",
    "conduct": "conduct",
    "finance": "finance",
    "restrict": "restrict",
    "citizen": "citizen",
    "establish": "establish",
    "leader": "leader",
    "judicial authority": "judicial authority",
    "judicial independence" :"judiciary independence",
    "judicial offices": "judicial offices",
    "court systems": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointments": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate courts": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc functions": "jsc functions",
    "judiciary fund": "judiciary fund",
    "object devolution": "object devolution",
    "devolve government": "devolve government",
    "county government": "county government",
    "membership": "membership",
    "speaker": "speaker",
    "county executive committee": "county executive committee",
    "county election": "county election",
    "county removal": "county removal",
    "vacancy": "vacancy",
    "functions of county executive committees": "executive function",
    "city": "city",
    "legislative authority": "legislative authority",
    "function and power of national and county government": "function and power of national and county government",
    "transfer": "transfer",
    "county boundary": "county boundary",
    "cooperation": "cooperation",
    "support": "support",
    "conflict of law": "law conflict",
    "suspension": "suspension county government",
    "qualification": "qualification",
    "vacation": "vacation",
    "summon witness": "summon witness",
    "public participation county assembly power privilege immunity": "public participation county assembly power privilege immunity",
    "gender balance": "gender balance",
    "county transition": "transition",
    "publication": "publication",
    "legislation": "legislation",
    "principle public finance": "principle public finance",
    "equitable sharing of national revenue": "equitable sharing of national revenue",
    "equitable share law": "equitable share",
    "equalisation fund": "equalisation fund",
    "consultation": "consultation",
    "consolidated fund": "consolidate",
    "revenue funds": "revenue",
    "contingencies fund": "contingency",
    "tax impose": "tax",
    "national government borrow": "borrow",
    "loan guarantee": "loan",
    "public debt": "debt",
    "commission on revenue allocation": "commission",
    "division of revenue": "division",
    "budget content": "budget",
    "expenditure": "expenditure",
    "supplementary appropriation": "supplementary",
    "financial control": "control",
    "procurement": "procurement",
    "budget controller": "budget controller",
    "auditor-general": "auditor",
    "salary": "salary",
    "central bank of kenya": "bank"

    
}

combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections, **chapter_4_sections, **chapter_5_sections, **chapter_6_sections, **chapter_10_sections, **chapter_11_sections, **chapter_12_sections }
# Print the keys of the dictionary
len(combined_sections.keys())

153

# CHAPTER 13

In [69]:
chapter_13 = extract_specific_pages(pdf_path, 138,143)
chapter_13_trimmed = chapter_13.split("CHAPTER FOURTEEN")[0].strip()


In [70]:
## load spacy
nlp = spacy.load("en_core_web_sm")

## preprocess the user query using spacy

def preprocess_query(query):
  ## Parse the query with spacy
  doc = nlp(query)
  ## Normalize the query: lowercase,lemmatize and remove stopwords
  tokens = [token.lemma_.lower()for token in doc if not token.is_stop and not token.is_punct]
  return " ".join(tokens)

## Example
user_query = "What are Values and principles of public service?"
processed_query= preprocess_query(user_query)
print(processed_query)


value principle public service


In [71]:
def split_chapter(chapter_text):
  #split at key headings and split extra white spaces

  sections = {
      "principle":[],
      "public service commission":[],
      "functions":[],
      "staffing":[],
      "protection":[],
      "teacher service commission":[]
  }

  ## Split by new lines to process line by line
  lines = chapter_text.splitlines()
  current_section = None

  for line in lines:
    line = line.strip()
    # Use line instead of stripped_line
    if line.startswith("Values and principles of public service"):
      current_section = "principle"
    elif line.startswith("The Public Service Commission"):
        current_section = "public service commission"
    elif line.startswith("Functions and powers of the Public Service Commission"):
        current_section = "functions"
    elif line.startswith("Staffing of county governments"):
        current_section = "staffing"
    elif line.startswith("Protection of public officers"):
        current_section = "protection"
    elif line.startswith("Teachers Service Commission"):
            current_section = "teacher service commission"

    # Append line to the current section if it's set
    if current_section:
        sections[current_section].append(line) # Use line instead of stripped_line
  ## Join each section into single string
  for key in sections:
    sections[key] = "\n".join(sections[key])
  return sections

chapter_13_sections = split_chapter(chapter_13_trimmed)

## Question answering mechanism

In [72]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 1
sections = chapter_13_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "principle": "Values and principles of public service",
    "public service commission": "The Public Service Commission",
    "functions": "Functions and powers of the Public Service Commission",
    "staffing" : "Staffing of county governments",
    "protection": "Protection of public officers",
    "teacher service commission": "Teachers Service Commision"

}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What about teacher service commission?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)


Teachers Service Commission.
237. (1) There is established the Teachers Service Commission.
(2) The functions of the Commission are—
(a) to register trained teachers;
(b) to recruit and employ registered teachers;
Constitution of Kenya, 2010 143
(c) to assign teachers employed by the Commission for service in
any public school or institution;
(d) to promote and transfer teachers;
(e) to exercise disciplinary control over teachers; and
(f) to terminate the employment of teachers.
(3) The Commission shall—
(a) review the standards of education and training of persons
entering the teaching service;
(b) review the demand for and the supply of teachers; and
(c) advise the national government on matters relating to the
teaching profession.


### Chapter 13 NLU

# CHAPTER 14

In [73]:
chapter_14 = extract_specific_pages(pdf_path, 142,150)
# Trim the chapter_14 text to start from "CHAPTER FOURTEEN"
chapter_14_trimmed = chapter_14[chapter_14.find("CHAPTER FOURTEEN"):].strip()

## Further trim to remove Chapter 15 if it's present on the same page
if "CHAPTER FIFTEEN" in chapter_14_trimmed:
  chapter_14_trimmed = chapter_14_trimmed.split("CHAPTER FIFTEEN")[0].strip()

In [74]:
## load spacy
nlp = spacy.load("en_core_web_sm")

## preprocess the user query using spacy

def preprocess_query(query):
  ## Parse the query with spacy
  doc = nlp(query)
  ## Normalize the query: lowercase,lemmatize and remove stopwords
  tokens = [token.lemma_.lower()for token in doc if not token.is_stop and not token.is_punct]
  return " ".join(tokens)

## Example
user_query = "What are the principles of national security?"
preprocess_query= preprocess_query(user_query)
print(preprocess_query)


principle national security


In [75]:
def split_chapter(chapter_text):
    # Define sections with lowercase keys
    sections = {
        "principle national security": [],
        "national security organ": [],
        "national security council": [],
        "defence force": [],
        "national intelligence service": [],
        "national police service": [],
        "function national police service": [],
        "command national police service": [],
        "national police service commission": [],
        "police service": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()


    current_section = None

    for line in lines:
        stripped_line = line.strip()

        # Match line starts with capitalized section headers
        if stripped_line.startswith("Principles of national security"):
            current_section = "principle national security"
        elif stripped_line.startswith("National security organs"):
            current_section = "national security organ"
        elif stripped_line.startswith("Establishment of the National Security Council"):
            current_section = "national security council"
        elif stripped_line.startswith("Establishment of Defence Forces and Defence Council"):
            current_section = "defence force"
        elif stripped_line.startswith("Establishment of National Intelligence Service"):
            current_section = "national intelligence service"
        elif stripped_line.startswith("Establishment of the National Police Service"):
            current_section = "national police service"
        elif stripped_line.startswith("Objects and functions of the National Police Service"):
            current_section = "function national police service"
        elif stripped_line.startswith("Command of the National Police Service"):
            current_section = "command national police service"
        elif stripped_line.startswith("National Police Service Commission"):
            current_section = "national police service commission"
        elif stripped_line.startswith("Other police services"):
            current_section = "police service"

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Apply the function to the chapter 14
chapter_14_sections = split_chapter(chapter_14_trimmed)


In [76]:
chapter_14_sections

{'principle national security': 'Principles of national security.\n238. (1) National security is the protection against internal and\nexternal threats to Kenya’s territorial integrity and sovereignty, its\npeople, their rights, freedoms, property, peace, stability and prosperity,\nand other national interests.\n(2) The national security of Kenya shall be promoted and\nguaranteed in accordance with the following principles—\n(a) national security is subject to the authority of this Constitution\nand Parliament;\n(b) national security shall be pursued in compliance with the law\nand with the utmost respect for the rule of law, democracy,\nhuman rights and fundamental freedoms;\n(c) in performing their functions and exercising their powers,\nnational security organs shall respect the diverse culture of\nthe communities within Kenya; and\n(d) recruitment by the national security organs shall reflect the\ndiversity of the Kenyan people in equitable proportions.',
 'national security organ':

## Question answering mechanism

In [77]:
qa_mapping = {
    "principle national security": "principle national security",
    "national security organ": "national security organ",
    "national security council": "national security council",
    "defence force": "defence force",
    "national intelligence service": "national intelligence service",
    "national police service": "national police service",
    "function national police service": "function national police service",
    "command national police service": "command national police service",
    "national police service commission": "national police service commission",
    "police service": "police service"
}

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")

    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]

    return "Sorry, I couldn't find an answer to your question."

##Example
user_query = "What are the principles of national security?"
answer = answer_question_nlp(user_query,chapter_14_sections, qa_mapping) # Pass the user_query
answer

Processed query: principle national security
key: principle national security


'Principles of national security.\n238. (1) National security is the protection against internal and\nexternal threats to Kenya’s territorial integrity and sovereignty, its\npeople, their rights, freedoms, property, peace, stability and prosperity,\nand other national interests.\n(2) The national security of Kenya shall be promoted and\nguaranteed in accordance with the following principles—\n(a) national security is subject to the authority of this Constitution\nand Parliament;\n(b) national security shall be pursued in compliance with the law\nand with the utmost respect for the rule of law, democracy,\nhuman rights and fundamental freedoms;\n(c) in performing their functions and exercising their powers,\nnational security organs shall respect the diverse culture of\nthe communities within Kenya; and\n(d) recruitment by the national security organs shall reflect the\ndiversity of the Kenyan people in equitable proportions.'

# CHAPTER 15

In [78]:
chapter_15 = extract_specific_pages(pdf_path, 148,155)

In [79]:
# Trim the chapter_15 text to start from "CHAPTER FIFTEEN"
chapter_15_trimmed = chapter_15[chapter_15.find("CHAPTER FIFTEEN"):].strip()

# Further trim to remove Chapter 16 if it's present on the same page
if "CHAPTER SIXTEEN" in chapter_15_trimmed:
  chapter_15_trimmed = chapter_15_trimmed.split("CHAPTER SIXTEEN")[0].strip()

chapter_15_trimmed

'CHAPTER FIFTEEN—COMMISSIONS AND INDEPENDENT\nOFFICES\nApplication of Chapter.\n248. (1) This Chapter applies to the commissions specified in\nclause (2) and the independent offices specified in clause (3), except\nto the extent that this Constitution provides otherwise.\n(2) The commissions are—\n150 Constitution of Kenya, 2010\n(a) the Kenya National Human Rights and Equality Commission;\n(b) the National Land Commission;\n(c) the Independent Electoral and Boundaries Commission;\n(d) the Parliamentary Service Commission;\n(e) the Judicial Service Commission;\n(f) the Commission on Revenue Allocation;\n(g) the Public Service Commission;\n(h) the Salaries and Remuneration Commission;\n(i) the Teachers Service Commission; and\n(j) the National Police Service Commission.\n(3) The independent offices are—\n(a) the Auditor-General; and\n(b) the Controller of Budget.\nObjects, authority and funding of commissions and independent\noffices.\n249. (1) The objects of the commissions and the ind

In [80]:
## preprocess the user query using spacy

def preprocess_query(query):
  ## Parse the query with spacy
  doc = nlp(query)
  ## Normalize the query: lowercase,lemmatize and remove stopwords
  tokens = [token.lemma_.lower()for token in doc if not token.is_stop and not token.is_punct]
  return " ".join(tokens)

## Example
user_query = "What are the commissions and independent offices?"
preprocess_query= preprocess_query(user_query)
print(preprocess_query)

commission independent office


In [81]:
def split_chapter_15(chapter_text):
  #split at key headings and split extra white spaces
  # Initialize sections as a dictionary with list values
  sections = {
        "application chapter": [],
        "commission independent office": [],
        "term office": [],
        "removal office": [],
        "general function power": [],
        "incorporation commission independent office": [],
        "report commission independent office": []
    }

  # Split by new lines to process line by line
  lines = chapter_text.splitlines()
  current_section = None

  for line in lines:
    stripped_line = line.strip()

    if line.startswith("Application of Chapter"):
          current_section = "application chapter"
    elif line.startswith("Objects, authority and funding of commissions and independent"):
          current_section = "commission independent office"
    elif line.startswith("Composition, appointment and terms of office"):
          current_section = "term office"
    elif line.startswith("Removal from office"):
          current_section = "removal office"
    elif line.startswith("General functions and powers"):
          current_section = "general function power"
    elif line.startswith("Incorporation of commissions and independent offices"):
          current_section = "incorporation commission independent office"
    elif line.startswith("Reporting by commissions and independent offices"):
          current_section = "report commission independent office"
          ## Append line to the current section if it's set

    if current_section:
            sections[current_section].append(stripped_line)

  # Join each section into single string
  for key in sections:
        sections[key] = "\n".join(sections[key])
  return sections

chapter_15_sections = split_chapter_15(chapter_15_trimmed)
chapter_15_sections

{'application chapter': 'Application of Chapter.\n248. (1) This Chapter applies to the commissions specified in\nclause (2) and the independent offices specified in clause (3), except\nto the extent that this Constitution provides otherwise.\n(2) The commissions are—\n150 Constitution of Kenya, 2010\n(a) the Kenya National Human Rights and Equality Commission;\n(b) the National Land Commission;\n(c) the Independent Electoral and Boundaries Commission;\n(d) the Parliamentary Service Commission;\n(e) the Judicial Service Commission;\n(f) the Commission on Revenue Allocation;\n(g) the Public Service Commission;\n(h) the Salaries and Remuneration Commission;\n(i) the Teachers Service Commission; and\n(j) the National Police Service Commission.\n(3) The independent offices are—\n(a) the Auditor-General; and\n(b) the Controller of Budget.',
 'commission independent office': 'Objects, authority and funding of commissions and independent\noffices.\n249. (1) The objects of the commissions and t

In [82]:
# Define sections with lemmatized keys
sections = chapter_15_sections

# Ensure qa_mapping uses lemmatized keys
qa_mapping = {
    "application chapter": "application chapter",
    "object commission independent office": " object commission independent office",  # Use the actual key from 'sections'
    "composition term office": "composition term office",
    "removal office commission": "removal office commission",
    "general function power commission": "general function power commission",
    "incorporation commission independent office": "incorporation commission independent office",
    "report commission independent office": "report commission independent office"
}


def preprocess_query(query):
    ## Parse the query with spacy
    doc = nlp(query)
    ## Normalize the query: lowercase,lemmatize and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)


# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")

    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Debug line
            print(f"key: {key}")
            # Return the relevant section in the text
            return sections[key]  # Access using the original key
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What about the commission independent office?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: commission independent office
Sorry, I couldn't find an answer to your question.


# Cumulative sections, qa mapping and synonyms

In [83]:
# Define the synonyms explicitly so that we can easily reference them later
synonyms = {
    "supremacy": ["supremacy", "authority", "ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration": ["declaration", "proclamation", "statement", "announcement", "affirmation"],
    "territory": ["territory", "land", "region", "area", "jurisdiction", "bounds"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "value": ["value", "principle", "ethic", "core value", "standard", "national ideal"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement": ["entitlement", "right", "eligibility", "entitlement rights", "benefits", "privileges"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation": ["legislation", "laws", "legal framework", "statutes", "enactment"],
    "right fundamental freedom": ["right fundamental freedom"],
    "application": ["application", "exercise", "freedom enforcement"],
    "implementation right fundamental freedom": ["implementation", "right support", "execution"],
    "authority": ["authority", "judicial power", "court jurisdiction", "judicial review"],
    "limitation": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right"],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty of thought", "religion freedom", "belief"],
    "expression": ["expression", "communication", "speak"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right to information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "migration", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "benefit", "employ"],
    "environment": ["environment", "conserve", "right to a healthy environment"],
    "economic": ["economic", "standard of life", "social welfare"],
    "language": ["language", "cultural right", "linguistic freedom", "cultural expression"],
    "family": ["family", "society foundation", "parent right"],
    "consumer": ["consumer", "client entitlement", "buyer safeguard"],
    "administrative": ["administrative", "fair administration rule", "legal administration measure"],
    "justice": ["justice", "legal access", "justice available"],
    "arrest": ["arrest", "capture", "seizure", "apprehension"],
    "fair hearing": ["fair hearing", "prosecute", "equity hearing"],
    "custody": ["custody", "confinement", "imprisonment", "prison"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "kid", "toddler"],
    "disable": ["disable", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "diversity right", "affirmative right"],
    "old": ["old", "elder", "vintage"],
    "emergency": ["emergency", "danger", "crisis", "disaster"],
    "national": ["national", "human right", "equaity Body", "right authority"],
    "principle": ["principle", "land management principle", "land policy guide"],
    "classification": ["classification", "land category", "types of land"],
    "public land": ["public land", "government land", "state property", "national land"],
    "community land": ["community", "ethnic land", "cultural landhold", "community land"],
    "private land": ["private land", "individual land", "personal property"],
    "landhold noncitizen" : ["landhold noncitizen", "foreign lease", "alien land", "non-citizen land"],
    "regulation" : ["regulation", "land use policy", "property regulation", "land oversight"],
    "commission" : ["commission", "land authority", "public land commission", "land policy agency"],
    "land" : ["land", "land legislation", "property law", "land-use regulation", "land act"],
    "obligation" : ["obligation", "biodiversity duty", "conservation mandate"],
    "environmental" : ["environmental", "ecological redress", "sustainability enforcement"],
    "resource" : ["resource", "resource use agreement", "environmental concession"],
    "legislation" : ["legislation", "environmental law", "green policy enactment"],
    "responsibility": ["responsibility", "accountable", "commitment", "charge"],
    "oath": ["oath", "swear", "state office affirmation", "vow", "declare"],
    "conduct": ["conduct", "behavior", "comportment", "demeanor"],
    "finance": ["finance", "finance integrity", "wealth", "invest", "finance management"],
    "restrict": ["restrict", "constrain", "regulate", "prohibit", "curtail", "control"],
    "citizen": ["citizen", "inhabitant", "local resident"],
    "establish": ["establish", "corrupt", "bribe", "fraud"],
    "leader": ["leader", "governing, authority"],
    "judicial authority": ["judicial authority","legal authority","court jurisdiction"],
    "judicial independence": ["judicial independence","judicial autonomy","judicial impartiality"],
    "judicial offices": ["judicial office","legal position","judicial role"],
    "court systems": ["court system","judicial system","legal system"],
    "supreme court": ["supreme court","highest court","apex court"],
    "appeal court": ["appeal court","appellate court","review court"],
    "high court": ["high court","superior court"],
    "judicial appointments": ["judicial appointment","judge appointment","judicial selection"],
    "office tenure": ["office tenure","office term","service duration"],
    "office removal": ["office removal","office dismissal","office termination"],
    "subordinate courts": ["subordinate court","local court","inferior court"],
    "jsc establishment": ["judicial service commission establishment","judicial service commission creation","judicial service commission formation"],
    "jsc functions": ["judicial service commission function","judicial service commission role","judicial service commission responsibility"],
    "judiciary fund": ["judiciary fund","judicial fund","legal fund"],
    "object devolution": ["object devolution", "power decentralization", "government devolution"],
    "devolve government": ["devolve government", "decentralization principle"],
    "county government": ["county government", "local government", "regional administration"],
    "membership": ["membership", "composition", "structure"],
    "speaker": ["speaker", "assembly speaker", "preside officer"],
    "county executive committee": ["county executive committee", "regional executive body", "local executive board"],
    "county election": ["county election", "local election", "regional election"],
    "county removal": ["county removal", "county office dismissal", "regional removal"],
    "vacancy": ["vacancy", "position vacancy", "office vacancy"],
    "functions of county executive committees": ["functions of county executive committees", "duties of county executive commitees", "county executive commitees roles"],
    "city": ["city", "urban center", "municipality"],
    "legislative authority": ["legislative authority", "law make power", "regulatory authority"],
    "function and power of national and county government": ["function and power of national and county government", "government duty", "government responsibility"],
    "transfer": ["transfer", "delegation", "reassignment"],
    "county boundary": ["county boundary", "county border", "regional limit"],
    "cooperation": ["cooperation", "collaboration", "coordination"],
    "support": ["support", "assistance", "aid"],
    "conflict of law": ["conflict of law", "legal discrepancy", "jurisdiction conflict"],
    "suspension": ["suspension", "interruption", "temporary halt"],
    "qualification": ["qualification", "eligibility", "requirement"],
    "vacation": ["vacation", "office leave", "office absence"],
    "summon witness": ["summon witness", "convene witness", "request witness attendance"],
    "public participation county assembly power privilege immunity": ["public participation county assembly power privilege immunity", "county assembly right", "local governance privilege"],
    "gender balance": ["gender balance", "gender equity", "equal representation"],
    "county transition": ["county transition", "regional handover", "local government transition"],
    "publication": ["publication", "official release", "public announcement"],
    "legislation": ["legislation", "regulation", "law"],
    "principle public finance": ["principle public finance", "public finance guideline", "fundamental financial rule"],
    "equitable sharing of national revenue": ["equitable share of national revenue", "fair distribution of revenue"],
    "equitable share law": ["equitable share law", "financial law", "equitable distribution law"],
    "equalisation fund": ["equalisation fund", "equalization", "equal opportunity fund"],
    "consultation": ["consultation", "discussion", "advisement"],
    "consolidated fund": ["consolidated fund", "main fund", "central fund"],
    "revenue funds": ["revenue funds", "revenue", "income fund"],
    "contingencies fund": ["contingency fund", "reserve fund", "emergency fund"],
    "tax impose": ["tax impose", "tax authority", "tax power"],
    "national government borrow": ["national government borrow", "national debt", "federal borrow"],
    "loan guarantee": ["loan guarantee", "credit guarantee", "loan assurance"],
    "public debt": ["public debt", "national debt", "government debt"],
    "commission on revenue allocation": ["commission on revenue allocation", "allocation commission", "revenue allocation committee"],
    "division of revenue": ["division of revenue", "revenue distribution", "allocation of revenue"],
    "budget content": ["budget content", "budget structure", "budget framework"],
    "expenditure": ["expenditure", "cost", "expense"],
    "supplementary appropriation": ["supplementary appropriation", "additional appropriation", "extra allocation"],
    "financial control": ["financial control", "financial oversight", "budget control"],
    "procurement": ["procurement", "acquisition", "purchase"],
    "budget contoller": ["budget contoller", "cost controller"],
    "auditor-general": ["auditor general", "chief auditor", "head auditor"],
    "salary": ["salary","remuneration", "pay"],
    "central bank of kenya": ["central bank of kenya", "central bank", "central financial authority"],
    "principle": ["principle", "values and principles of public service", "public service values", "core principles of public service"],
    "public service commission": ["public service commission", "the public service commission", "public service authority", "government service commission"],
    "functions": ["functions", "functions and powers of the public service commission", "roles of the public service commission", "duties of the public service commission"],
    "staffing": ["staffing", "staffing of county governments", "county government staffing", "human resources for county governments"],
    "protection": ["protection", "protection of public officers", "safeguarding public officers", "security for public officials"],
    "teachers service commission": ["teachers service commission", "commission for teachers", "educators service commission"],
    "national security": ["national security", "principles of national security"],
    "national security organs": ["national security organs", "security organs", "national security agencies"],
    "national security council": ["national security council", "establishment of the national security council", "national security council formation", "creation of the national security council"],
    "defence forces": ["defence forces and defence council", "defence forces establishment", "formation of defence forces and council"],
    "national intelligence service": ["national intelligence service", "establishment of national intelligence service", "national intelligence service creation", "formation of national intelligence service"],
    "national police service": ["national police service", "national police", "establishment national police service", "national police service formation", "creation of the national police service"],
    "functions national police service": ["functions national police service", "objects and functions of the national police service", "roles of the national police service", "duties and functions of the national police"],
    "command national police service": ["command national police service", "command of the national police service", "national police service command", "leadership of the national police service"],
    "national police service commission": ["national police service commission", "police service commission", "commission for the national police service"],
    "other police service": ["other police service", "additional police service", "alternative police service"],
    "application chapter": ["application of chapter", "chapter application", "provision chapter"],
    "commission independent office": ["commission independent office", "commission independent office", "independent commissions", "commission offices"],
    "term office": ["term office", "term office", "office term"],
    "removal office": ["removal office", "dismissal office", "termination office"],
    "general function power": ["general function power", "overall powers", "general powers and functions"],
    "incorporation commission independent office": ["incorporation commission independent office", "incorporation commissions and independent offices", "establishment of commissions", "setting up independent office"],
    "reporting commission independent office": ["reporting commission independent office", "reporting commission independent office", "commissions reporting", "report independent office"]


}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration": "declaration",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "value": "value",
    "culture": "culture",
    "entitlement": "entitlement",
    "retention": "retention",  
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation": "legislation",
    "right fundamental freedom": "right fundamental freedom",
    "application": "application",
    "implementation right fundamental freedom": "implementation right fundamental freedom",
    "authority": "authority",
    "limitation": "limitation",
    "limit": "limit",
    "life": "life",
    "equality": "equality",
    "dignity": "dignity",
    "security": "security",
    "slavery": "slavery",
    "privacy": "privacy",
    "conscience": "conscience",
    "expression": "expression",
    "medium": "medium",
    "information": "information",
    "association": "association",
    "assembly": "assembly",
    "political": "political",
    "movement": "movement",
    "property": "property",
    "work": "work",
    "environment": "environment",
    "economic": "economic",
    "language": "language",
    "family": "family",
    "consumer": "consumer",
    "administrative": "administrative",
    "justice": "justice",
    "arrest": "arrest",
    "fair hearing": "fair hearing",
    "custody": "custody",
    "interpret": "interpret",
    "infant": "infant",
    "disable": "disable",
    "youth": "youth",
    "minority": "minority",
    "old": "old",
    "emergency": "emergency",
    "national": "national",
    "principle": "principle",
    "classification": "classification",
    "public land": "public land",
    "community land": "community land",
    "private land": "private land",
    "landhold non citizen": "landhold non citizen",
    "regulation": "regulation",
    "commission": "commission",
    "land": "land",
    "obligation": "obligation",
    "environmental": "environmental",
    "resource": "resource",
    "legislation": "legislation",
    "responsibility": "responsibility",
    "oath": "oath",
    "conduct": "conduct",
    "finance": "finance",
    "restrict": "restrict",
    "citizen": "citizen",
    "establish": "establish",
    "leader": "leader",
    "judicial authority": "judicial authority",
    "judicial independence" :"judiciary independence",
    "judicial offices": "judicial offices",
    "court systems": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointments": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate courts": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc functions": "jsc functions",
    "judiciary fund": "judiciary fund",
    "object devolution": "object devolution",
    "devolve government": "devolve government",
    "county government": "county government",
    "membership": "membership",
    "speaker": "speaker",
    "county executive committee": "county executive committee",
    "county election": "county election",
    "county removal": "county removal",
    "vacancy": "vacancy",
    "functions of county executive committees": "executive function",
    "city": "city",
    "legislative authority": "legislative authority",
    "function and power of national and county government": "function and power of national and county government",
    "transfer": "transfer",
    "county boundary": "county boundary",
    "cooperation": "cooperation",
    "support": "support",
    "conflict of law": "law conflict",
    "suspension": "suspension county government",
    "qualification": "qualification",
    "vacation": "vacation",
    "summon witness": "summon witness",
    "public participation county assembly power privilege immunity": "public participation county assembly power privilege immunity",
    "gender balance": "gender balance",
    "county transition": "transition",
    "publication": "publication",
    "legislation": "legislation",
    "principle public finance": "principle public finance",
    "equitable sharing of national revenue": "equitable sharing of national revenue",
    "equitable share law": "equitable share",
    "equalisation fund": "equalisation fund",
    "consultation": "consultation",
    "consolidated fund": "consolidate",
    "revenue funds": "revenue",
    "contingencies fund": "contingency",
    "tax impose": "tax",
    "national government borrow": "borrow",
    "loan guarantee": "loan",
    "public debt": "debt",
    "commission on revenue allocation": "commission",
    "division of revenue": "division",
    "budget content": "budget",
    "expenditure": "expenditure",
    "supplementary appropriation": "supplementary",
    "financial control": "control",
    "procurement": "procurement",
    "budget controller": "budget controller",
    "auditor-general": "auditor",
    "salary": "salary",
    "central bank of kenya": "bank",
    "principle": "principle",
    "public service commission": "public service commission",
    "functions": "functions",
    "staffing": "staffing",
    "protection": "protection",
    "teachers service commission": "teachers service commission",
    "national security": "national security",
    "national security organs": "national security organs",
    "national security council": "national security council",
    "defence forces": "defence forces",
    "national intelligence service": "national intelligence service",
    "national police service": "national police service",
    "functions national police service": "functions national police service",
    "command national police service": "command national police service",
    "national police service commission": "national police service commission",
    "other police services": "other police services",
    "application of chapter": "application of chapter",
    "commission independent office": "commission independent office",
    "term of office": "term of office",
    "removal from office": "removal from office",
    "general function power": "general function power",
    "incorporation commission independent office": "incorporation commission independent office",
    "reporting commission independent office": "reporting commission independent office"


}

combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections, **chapter_4_sections,
                     **chapter_5_sections, **chapter_6_sections, **chapter_10_sections, **chapter_11_sections,
                     **chapter_12_sections, **chapter_13_sections, **chapter_14_sections, **chapter_15_sections}
# Print the keys of the dictionary
len(combined_sections.keys())

176

# CHAPTER 16

In [84]:
chapter_16 = extract_specific_pages(pdf_path, 153,157)
# Trim chapter 16 text to start from chapter sixteen
chapter_16_trimmed = chapter_16[chapter_16.find('CHAPTER SIXTEEN'):].strip()

## Further trim to remove chapter 17 if its present on the same page

if "CHAPTER SEVENTEEN" in chapter_16_trimmed:
  chapter_16_trimmed = chapter_16_trimmed.split("CHAPTER SEVENTEEN")[0].strip()


In [85]:
## load spacy
nlp = spacy.load("en_core_web_sm")

## preprocess the user query using spacy

def preprocess_query(query):
  ## Parse the query with spacy
  doc = nlp(query)
  ## Normalize the query: lowercase,lemmatize and remove stopwords
  tokens = [token.lemma_.lower()for token in doc if not token.is_stop and not token.is_punct]
  return " ".join(tokens)

## Example
user_query = "What is the amendment of this constitution?"
preprocess_query= preprocess_query(user_query)
print(preprocess_query)


amendment constitution


In [86]:
def split_chapter(chapter_text):
  ## split at key headings and split extra white spaces
  sections = {
      "amendment constitution":[],  # Changed key to match the assignment in the loop
      "parliamentary initiative":[], # Changed key to match the assignment in the loop
      "popular initiative":[] # Changed key to match the assignment in the loop

  }

  ##Split by new lines to process line by line
  lines = chapter_text.splitlines()
  current_section = None

  for line in lines:
    line = line.strip()
    if line.startswith("Amendment of this Constitution"):
      current_section = "amendment constitution" # This line now correctly assigns to an existing key
    elif line.startswith("Amendment by parliamentary initiative"):
      current_section = "parliamentary initiative" # This line now correctly assigns to an existing key
    elif line.startswith("Amendment by popular initiative"):
      current_section = "popular initiative" # This line now correctly assigns to an existing key
    elif current_section:
      sections[current_section].append(line) ## Append line to the current section if it's set
  for key, value in sections.items():
    sections[key] = "\n".join(value) ## Join the lines back together
  return sections

chapter_16_sections = split_chapter(chapter_16_trimmed)
chapter_16_sections

{'amendment constitution': '255. (1) A proposed amendment to this Constitution shall be\nenacted in accordance with Article 256 or 257, and approved in\naccordance with clause (2) by a referendum, if the amendment relates\nto any of the following matters—\n(a) the supremacy of this Constitution;\n(b) the territory of Kenya;\n(c) the sovereignty of the people;\n(d) the national values and principles of governance referred to in\nArticle 10 (2) (a) to (d);\n(e) the Bill of Rights;\n(f) the term of office of the President;\n(g) the independence of the Judiciary and the commissions and\nindependent offices to which Chapter Fifteen applies;\n(h) the functions of Parliament;\nConstitution of Kenya, 2010 155\n(i) the objects, principles and structure of devolved government;\nor\n(j) the provisions of this Chapter.\n(2) A proposed amendment shall be approved by a referendum\nunder clause (1) if—\n(a) at least twenty per cent of the registered voters in each of at\nleast half of the counties vo

## Question Answering Mechanism

In [87]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 1
sections = chapter_16_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "amendment constitution":"amendment constitution",
    "parliamentary initiative":"parliamentary initiative",
    "popular initiative":"popular initiative"
    # Add more mappings as needed
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What is the amendment of the constitution?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

# Example 2
user_query_2 = 'Explain popular initiative'
answer_2 = answer_question_nlp(user_query_2, sections, qa_mapping)
answer_2


255. (1) A proposed amendment to this Constitution shall be
enacted in accordance with Article 256 or 257, and approved in
accordance with clause (2) by a referendum, if the amendment relates
to any of the following matters—
(a) the supremacy of this Constitution;
(b) the territory of Kenya;
(c) the sovereignty of the people;
(d) the national values and principles of governance referred to in
Article 10 (2) (a) to (d);
(e) the Bill of Rights;
(f) the term of office of the President;
(g) the independence of the Judiciary and the commissions and
independent offices to which Chapter Fifteen applies;
(h) the functions of Parliament;
Constitution of Kenya, 2010 155
(i) the objects, principles and structure of devolved government;
or
(j) the provisions of this Chapter.
(2) A proposed amendment shall be approved by a referendum
under clause (1) if—
(a) at least twenty per cent of the registered voters in each of at
least half of the counties vote in the referendum; and
(b) the amendment is su

'257. (1) An amendment to this Constitution may be proposed by\na popular initiative signed by at least one million registered voters.\n(2) A popular initiative for an amendment to this Constitution may\nbe in the form of a general suggestion or a formulated draft Bill.\n(3) If a popular initiative is in the form of a general suggestion, the\npromoters of that popular initiative shall formulate it into a draft Bill.\n(4) The promoters of a popular initiative shall deliver the draft Bill\nand the supporting signatures to the Independent Electoral and\nBoundaries Commission, which shall verify that the initiative is\nsupported by at least one million registered voters.\n(5) If the Independent Electoral and Boundaries Commission is\nsatisfied that the initiative meets the requirements of this Article, the\nCommission shall submit the draft Bill to each county assembly for\nconsideration within three months after the date it was submitted by\nthe Commission.\n(6) If a county assembly appro

# CHAPTER 17

In [88]:
chapter_17 = extract_specific_pages(pdf_path, 156,164 )
if "CHAPTER SEVENTEEN" in chapter_17:
    if "CHAPTER EIGHTEEN" in chapter_17:
        chapter_17_trimmed = chapter_17.split("CHAPTER SEVENTEEN", 1)[1].split("CHAPTER EIGHTEEN", 1)[0].strip()
    else:
        chapter_17_trimmed = chapter_17.split("CHAPTER SEVENTEEN", 1)[1].strip()


In [89]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Example usage of the preprocessing function
user_query = "What is the supremacy of the constitution?"
processed_query = preprocess_query(user_query)
print(processed_query)

supremacy constitution


In [90]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "enforcement constitution": [],
        "construe constitution": [],
        "interpretation": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        if stripped_line.startswith("Enforcement of this Constitution"):
            current_section = "enforcement constitution"
        elif stripped_line.startswith("Construing this Constitution"):
            current_section = "construe constitution"
        elif stripped_line.startswith("Interpretation"):
            current_section = "interpretation"

        # Append line to the current section if it's set
        if current_section:
            sections[current_section].append(stripped_line)

    # Join each section into a single string
    for key in sections:
        sections[key] = "\n".join(sections[key])

    return sections

# Split the chapter into sections
chapter_17_sections = split_chapter(chapter_17_trimmed)
chapter_17_sections

{'enforcement constitution': 'Enforcement of this Constitution.\n258. (1) Every person has the right to institute court proceedings,\nclaiming that this Constitution has been contravened, or is threatened\nwith contravention.\n(2) In addition to a person acting in their own interest, court\nproceedings under clause (1) may be instituted by—\n(a) a person acting on behalf of another person who cannot act in\ntheir own name;\n(b) a person acting as a member of, or in the interest of, a group\nor class of persons;\n(c) a person acting in the public interest; or\n(d) an association acting in the interest of one or more of its\nmembers.',
 'construe constitution': 'Construing this Constitution.\n259. (1) This Constitution shall be interpreted in a manner that—\n(a) promotes its purposes, values and principles;\n(b) advances the rule of law, and the human rights and\nfundamental freedoms in the Bill of Rights;\n(c) permits the development of the law; and\n(d) contributes to good governance.\

## Question Answer mechanism

In [91]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 1
sections = chapter_17_sections

# Define the QA mapping based on key phrases and corresponding sections
qa_mapping = {
    "enforcement constitution": "enforcement constitution",
    "construe constitution": "construing constitution",
    "interpretation": "interpretation"
    
}

# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)
    
    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]
            
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What is the enforcement of this Constitution?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

# Example 2
user_query_2 = "explain what construing constitution means"
answer_2 = answer_question_nlp(user_query_2, sections, qa_mapping)
answer_2


Enforcement of this Constitution.
258. (1) Every person has the right to institute court proceedings,
claiming that this Constitution has been contravened, or is threatened
with contravention.
(2) In addition to a person acting in their own interest, court
proceedings under clause (1) may be instituted by—
(a) a person acting on behalf of another person who cannot act in
their own name;
(b) a person acting as a member of, or in the interest of, a group
or class of persons;
(c) a person acting in the public interest; or
(d) an association acting in the interest of one or more of its
members.


'Construing this Constitution.\n259. (1) This Constitution shall be interpreted in a manner that—\n(a) promotes its purposes, values and principles;\n(b) advances the rule of law, and the human rights and\nfundamental freedoms in the Bill of Rights;\n(c) permits the development of the law; and\n(d) contributes to good governance.\n(2) If there is a conflict between different language versions of\nthis Constitution, the English language version prevails.\n158 Constitution of Kenya, 2010\n(3) Every provision of this Constitution shall be construed\naccording to the doctrine of interpretation that the law is always\nspeaking and, therefore, among other things—\n(a) a function or power conferred by this Constitution on an office\nmay be performed or exercised as occasion requires, by the\nperson holding the office;\n(b) any reference in this Constitution to a State or other public\noffice or officer, or a person holding such an office, includes a\nreference to the person acting in or other

# CHAPTER 18

In [92]:
chapter_18 = extract_specific_pages(pdf_path, 163,165 )
chapter_18_cleaned = chapter_18.split("CHAPTER EIGHTEEN")[1].strip()
chapter_18_trimmed = chapter_18_cleaned.split('SCHEDULES')[0].strip()

In [93]:
def split_chapter(chapter_text):
    # Split at key headings and strip extra whitespace
    sections = {
        "consequential legislation": [],
        "consequential provision": [],
        "effective date": [],
        "repeal": []
    }

    # Split by new lines to process line by line
    lines = chapter_text.splitlines()

    current_section = None

    for line in lines:
        stripped_line = line.strip()

        # Check for headings and update current_section accordingly
        if stripped_line.startswith("Consequential legislation"):
            current_section = "consequential legislation"
        elif stripped_line.startswith("Transitional and consequential provisions"):
            current_section = "consequential provision"
        elif stripped_line.startswith("Effective Date"):  # Updated to match the heading case
            current_section = "effective date"
        elif stripped_line.startswith("Repeal of previous constitution"):
            current_section = "repeal"
        else:
            # Only append lines if there is an active section
            if current_section:
                sections[current_section].append(stripped_line)

    # Join each section into a single string, only if there are lines in the section
    for key in sections:
        sections[key] = "\n".join(sections[key]).strip()  # Use strip to remove leading/trailing whitespace

    return sections

# Split the chapter into sections
chapter_18_sections = split_chapter(chapter_18_trimmed)


# Question answering mechanism

In [94]:
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Preprocess the user query using spaCy
def preprocess_query(query):
    # Parse the query with spaCy
    doc = nlp(query)
    # Normalize the query: lowercase, lemmatize, and remove stopwords
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Define the sections variable with Chapter 2
sections = chapter_18_sections

# Define the QA mapping based on key phrases and corresponding sections
# Define the qa mapping
qa_mapping = {
    "consequential legislation": "consequential legislation",
    "consequential provision": "consequential provision",
    "effective date": "effective date",
    "repeal": "repeal"
}



# Update the Q&A system to use preprocessed queries
def answer_question_nlp(query, sections, qa_mapping):
    # Preprocess the user query
    processed_query = preprocess_query(query)

    # Debug
    print(f"Processed query: {processed_query}")

    # Search for a key in qa_mapping that matches the preprocessed query
    for key in qa_mapping:
        if key in processed_query:
            # Return the relevant section in the text
            return sections[key]

    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "constitution repeal?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)
print('\n')
# example 2
user_query = "consequential provision?"
answer = answer_question_nlp(user_query, sections, qa_mapping)
print(answer)

Processed query: constitution repeal
264. Subject to the Sixth Schedule, for the avoidance of doubt,
the Constitution in force immediately before the effective date shall
stand repealed on the effective date.


Processed query: consequential provision
262. The transitional and consequential provisions set out in the
Sixth Schedule shall take effect on the effective date.


In [95]:
# Define the synonyms
synonyms = {
    "consequential legislation": ["consequential legislation", "related laws"],
    "consequential provision": ["consequential provision", "transitional provisions"],
    "effective date": ["effective date", "commencement date"],
    "repeal" : ["repeal", "constitution repeal", "repeal of the constitution"]
}

# Define the qa mapping
qa_mapping = {
    "consequential legislation": "consequential legislation",
    "consequential provision": "consequential provision",
    "effective date": "effective date",
    "repeal": "repeal"
}

sections = chapter_18_sections

# Preprocess the query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct the spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)


    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word) # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)

    return corrected_input

# Match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms, sections):
    processed_query = preprocess_query(query)
    print(f"Processed Query: {processed_query}")  # Debugging line

    # Correct spelling in the processed query
    corrected_query = correct_spelling(processed_query)
    print(f"Corrected Query: {corrected_query}")  # Debugging line

    for key in qa_mapping.keys():
        print(f"Checking key: {key}")  # Debugging line

        # Get synonyms for current key
        for synonym in synonyms.get(key, [key]):
            print(f"Trying synonym: {synonym}")  # Debugging line
            if synonym in corrected_query:
                print(f"Match found with synonym: {synonym}")  # Debugging line
                # Return the section content corresponding to the found key
                section_content = sections.get(key, "Section not found.")
                print(f"Returning section content: {section_content}")  # Debugging line
                return section_content

    print("No match found")  # Debugging line if no match is found
    return None

# Answer function with synonym and fuzzy matching
def answer_question_nlp(query, sections, qa_mapping, synonyms):
    section_content = match_with_synonyms(query, qa_mapping, synonyms, sections)

    if section_content:
        return section_content  # Return the relevant section content

    return "Sorry, I couldn't find an answer to your question."

# Sample user query
user_query = "What is a repeal?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms)
print(answer)


Processed Query: repeal
Corrected Query: repeal
Checking key: consequential legislation
Trying synonym: consequential legislation
Trying synonym: related laws
Checking key: consequential provision
Trying synonym: consequential provision
Trying synonym: transitional provisions
Checking key: effective date
Trying synonym: effective date
Trying synonym: commencement date
Checking key: repeal
Trying synonym: repeal
Match found with synonym: repeal
Returning section content: 264. Subject to the Sixth Schedule, for the avoidance of doubt,
the Constitution in force immediately before the effective date shall
stand repealed on the effective date.
264. Subject to the Sixth Schedule, for the avoidance of doubt,
the Constitution in force immediately before the effective date shall
stand repealed on the effective date.


# Cumulative sections, qa_mapping and synonyms

In [96]:
# Define the synonyms explicitly so that we can easily reference them later
synonyms = {
    "supremacy": ["supremacy","ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration republic": ["declaration republic" "proclamation republic"],
    "territory": ["territory"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "national value principle governance": ["national value principle governance"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement citizen": ["entitlement citizen", "entitlement right"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation citizen": ["legislation citizen"],
    "right fundamental freedom": ["right fundamental freedom"],
    "application": ["application", "exercise", "freedom enforcement"],
    "implementation right fundamental freedom": ["implementation right fundamental freedom"],
    "authority": ["authority", "judicial power", "court jurisdiction", "judicial review"],
    "limitation": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right"],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty of thought", "religion freedom", "belief"],
    "expression": ["expression", "communication", "speak"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right to information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "migration", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "benefit", "employ"],
    "environment": ["environment", "conserve", "right to a healthy environment"],
    "economic": ["economic", "standard of life", "social welfare"],
    "language": ["language", "cultural right", "linguistic freedom", "cultural expression"],
    "family": ["family", "society foundation", "parent right"],
    "consumer": ["consumer", "client entitlement", "buyer safeguard"],
    "administrative": ["administrative", "fair administration rule", "legal administration measure"],
    "justice": ["justice", "legal access", "justice available"],
    "arrest": ["arrest", "capture", "seizure", "apprehension"],
    "fair hearing": ["fair hearing", "prosecute", "equity hearing"],
    "custody": ["custody", "confinement", "imprisonment", "prison"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "kid", "toddler"],
    "disable": ["disable", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "diversity right", "affirmative right"],
    "old": ["old", "elder"],
    "emergency": ["emergency", "danger", "crisis", "disaster"],
    "national": ["national", "human right", "equaity Body", "right authority"],
    "principle land": ["principle land", "land management principle", "land policy"],
    "classification land": ["classification land", "land category", "type land"],
    "public land": ["public land", "government land", "state property", "national land", "public landhold"],
    "community land": ["community land", "ethnic land", "cultural landhold", "community land"],
    "private land": ["private land", "individual landhold", "personal property", "freehold land"],
    "landhold non citizen" : ["landhold non citizen", "foreign lease", "alien land", "non citizen land"],
    "regulation land use" : ["regulation land use", "land use policy", "property regulation", "land oversight"],
    "land commission" : ["land commission", "land authority", "public land commission", "land policy agency"],
    "land legislation" : ["land legislation", "property law", "land use regulation", "land act"],
    "obligation respect environment" : ["obligation respect environment"],
    "enforcement environmental right" : ["enforcement environmental right"],
    "agreement relating natural resource" : ["agreement relating natural resource"],
    "legislation environment" : ["legislation environment"],
    "responsibility leadership": ["responsibility leadership"],
    "oath office": ["oath office", "state office affirmation"],
    "conduct state": ["conduct state", "behaviour state"],
    "financial probity": ["financial probity"],
    "restriction activity": ["restriction activity"],
    "citizenship leadership": ["citizenship leadership"],
    "establish ethic anti corruption": ["establish ethic anti corruption"],
    "legislation leadership": ["legislation leadership"],
    "general principles electoral system": ["general principles electoral system"],
    "legislation election": ["legislation election"],
    "registration voter": ["registration voter"],
    "candidate election": ["candidate election"],
    "eligibility stand independent candidate": ["eligibility stand independent candidate"],
    "vote": ["vote"],
    "electoral dispute": ["electoral dispute"],
    "independent electoral boundary commission": ["independent electoral boundary commission"],
    "delimitation electoral unit": ["delimitation electoral unit"],
    "allocation party seat": ["allocation party seat"],
    "requirement political party": ["requirement political party"],
    "legislation political party": ["legislation political party"],
    "establishment parliament": ["establishment parliament"],
    "role parliament": ["role parliament"],
    "role national assembly": ["role national assembly"],
    "role senate": ["role senate"],
    "membership national assembly": ["membership national assembly"],
    "membership senate": ["membership senate"],
    "qualification member parliament": ["qualification member parliament"],
    "promotion representation marginalised group": ["promotion representation marginalised group"],
    "election member parliament": ["election member parliament"],
    "term parliament": ["term parliament"],
    "vacation office": ["vacation office"],
    "right recall": ["right recall"],
    "question membership": ["question membership"],
    "speaker parliament": ["speaker parliament"],
    "presiding parliament": ["presiding parliament"],
    "party leader": ["party leader"],
    "exercise legislative power": ["exercise legislative power"],
    "bill county government": ["bill county government"],
    "special bill county government": ["special bill county government"],
    "ordinary bill county government": ["ordinary bill county government"],
    "mediation committee": ["mediation committee"],
    "money bill": ["money bill"],
    "presidential assent": ["presidential assent"],
    "come force law": ["come force law"],
    "power privilege immunity": ["power privilege immunity"],
    "public access participation": ["public access participation"],
    "right petition parliament": ["right petition parliament"],
    "official language parliament": ["official language parliament"],
    "quorum": ["quorum"],
    "vote parliament": ["vote parliament"],
    "decision senate": ["decision senate"],
    "committee standing order": ["committee standing order"],
    "power call evidence": ["power call evidence"],
    "location sitting parliament": ["location sitting parliament"],
    "parliamentary service commission": ["parliamentary service commission"],
    "clerk staff parliament": ["clerk staff parliament"],
    "principle executive authority": ["principle executive authority"],
    "national executive": ["national executive"],
    "authority president": ["authority president"],
    "function president": ["function president"],
    "power mercy": ["power mercy"],
    "presidential powers temporary incumbency": ["presidential powers temporary incumbency"],
    "decision president": ["decision president"],
    "election president": ["election president"],
    "qualification disqualification election president": ["qualification disqualification election president"],
    "procedure presidential election": ["procedure presidential election"],
    "death assume office": ["death assume office"],
    "validity presidential election": ["validity presidential election"],
    "assumption office president": ["assumption office president"],
    "term office president": ["term office president"],
    "protection legal proceeding": ["protection legal proceeding"],
    "removal president incapacity": ["removal president incapacity"],
    "removal president impeachment": ["removal president impeachment", "impeachment president"],
    "vacancy office president": ["vacancy office president"],
    "function deputy president": ["function deputy president"],
    "election deputy president": ["election deputy president"],
    "vacancy office deputy president": ["vacancy office deputy president"],
    "removal deputy president": ["removal deputy president", "impeachment deputy president"],
    "benefit president": ["benefit president"],
    "cabinet": ["cabinet"],
    "decision cabinet": ["decision cabinet"],
    "secretary cabinet": ["secretary cabinet"],
    "principal secretary": ["principal secretary"],
    "attorney general": ["attorney general"],
    "director public prosecution": ["director public prosecution"],
    "removal director public prosecution": ["removal director public prosecution"],
    "judicial authority": ["judicial authority","legal authority","court jurisdiction"],
    "judicial independence": ["judicial independence","judicial autonomy","judicial impartiality"],
    "judicial offices": ["judicial office","legal position","judicial role"],
    "court systems": ["court system","judicial system","legal system"],
    "supreme court": ["supreme court","highest court","apex court"],
    "appeal court": ["appeal court","appellate court","review court"],
    "high court": ["high court","superior court"],
    "judicial appointments": ["judicial appointment","judge appointment","judicial selection"],
    "office tenure": ["office tenure","office term","service duration"],
    "office removal": ["office removal","office dismissal","office termination"],
    "subordinate courts": ["subordinate court","local court","inferior court"],
    "jsc establishment": ["judicial service commission establishment","judicial service commission creation","judicial service commission formation"],
    "jsc functions": ["judicial service commission function","judicial service commission role","judicial service commission responsibility"],
    "judiciary fund": ["judiciary fund","judicial fund","legal fund"],
    "object devolution": ["object devolution", "power decentralization", "government devolution"],
    "devolve government": ["devolve government", "decentralization principle"],
    "county government": ["county government", "local government", "regional administration"],
    "membership": ["membership", "composition", "structure"],
    "speaker": ["speaker", "assembly speaker", "preside officer"],
    "county executive committee": ["county executive committee", "regional executive body", "local executive board"],
    "county election": ["county election", "local election", "regional election"],
    "county removal": ["county removal", "county office dismissal", "regional removal"],
    "vacancy": ["vacancy", "position vacancy", "office vacancy"],
    "functions of county executive committees": ["functions of county executive committees", "duties of county executive commitees", "county executive commitees roles"],
    "city": ["city", "urban center", "municipality"],
    "legislative authority": ["legislative authority", "law make power", "regulatory authority"],
    "function and power of national and county government": ["function and power of national and county government", "government duty", "government responsibility"],
    "transfer": ["transfer", "delegation", "reassignment"],
    "county boundary": ["county boundary", "county border", "regional limit"],
    "cooperation": ["cooperation", "collaboration", "coordination"],
    "support": ["support", "assistance", "aid"],
    "conflict of law": ["conflict of law", "legal discrepancy", "jurisdiction conflict"],
    "suspension": ["suspension", "interruption", "temporary halt"],
    "qualification": ["qualification", "eligibility", "requirement"],
    "vacation": ["vacation", "office leave", "office absence"],
    "summon witness": ["summon witness", "convene witness", "request witness attendance"],
    "public participation county assembly power privilege immunity": ["public participation county assembly power privilege immunity", "county assembly right", "local governance privilege"],
    "gender balance": ["gender balance", "gender equity", "equal representation"],
    "county transition": ["county transition", "regional handover", "local government transition"],
    "publication": ["publication", "official release", "public announcement"],
    "legislation": ["legislation", "regulation", "law"],
    "principle public finance": ["principle public finance", "public finance guideline", "fundamental financial rule"],
    "equitable sharing of national revenue": ["equitable share of national revenue", "fair distribution of revenue"],
    "equitable share law": ["equitable share law", "financial law", "equitable distribution law"],
    "equalisation fund": ["equalisation fund", "equalization", "equal opportunity fund"],
    "consultation": ["consultation", "discussion", "advisement"],
    "consolidated fund": ["consolidated fund", "main fund", "central fund"],
    "revenue funds": ["revenue funds", "revenue", "income fund"],
    "contingencies fund": ["contingency fund", "reserve fund", "emergency fund"],
    "tax impose": ["tax impose", "tax authority", "tax power"],
    "national government borrow": ["national government borrow", "national debt", "federal borrow"],
    "loan guarantee": ["loan guarantee", "credit guarantee", "loan assurance"],
    "public debt": ["public debt", "national debt", "government debt"],
    "commission on revenue allocation": ["commission on revenue allocation", "allocation commission", "revenue allocation committee"],
    "division of revenue": ["division of revenue", "revenue distribution", "allocation of revenue"],
    "budget content": ["budget content", "budget structure", "budget framework"],
    "expenditure": ["expenditure", "cost", "expense"],
    "supplementary appropriation": ["supplementary appropriation", "additional appropriation", "extra allocation"],
    "financial control": ["financial control", "financial oversight", "budget control"],
    "procurement": ["procurement", "acquisition", "purchase"],
    "budget contoller": ["budget contoller", "cost controller"],
    "auditor-general": ["auditor general", "chief auditor", "head auditor"],
    "salary": ["salary","remuneration", "pay"],
    "central bank of kenya": ["central bank of kenya", "central bank", "central financial authority"],
    "principle": ["principle", "values and principles of public service", "public service values", "core principles of public service"],
    "public service commission": ["public service commission", "the public service commission", "public service authority", "government service commission"],
    "functions": ["function", "functions powers public service commission", "role the public service commission", "duties public service commission"],
    "staffing": ["staffing", "staffing of county governments", "county government staffing", "human resources for county governments"],
    "protection": ["protection", "protection of public officers", "safeguarding public officers", "security for public officials"],
    "teachers service commission": ["teachers service commission", "commission for teachers", "educators service commission"],
    "national security": ["national security", "principles of national security"],
    "national security organs": ["national security organs", "security organs", "national security agencies"],
    "national security council": ["national security council", "establishment of the national security council", "national security council formation", "creation of the national security council"],
    "defence forces": ["defence forces and defence council", "defence forces establishment", "formation of defence forces and council"],
    "national intelligence service": ["national intelligence service", "establishment of national intelligence service", "national intelligence service creation", "formation of national intelligence service"],
    "national police service": ["national police service", "establishment of the national police service", "national police service formation", "creation of the national police service"],
    "functions national police service": ["functions national police service", "objects and functions of the national police service", "roles of the national police service", "duties and functions of the national police"],
    "command national police service": ["command national police service", "command of the national police service", "national police service command", "leadership of the national police service"],
    "national police service commission": ["national police service commission", "police service commission", "commission for the national police service"],
    "police service": ["police service", "additional police service", "alternative police service"],
    "application of chapter": ["application of chapter", "chapter application", "provisions of the chapter"],
    "commission independent office": ["commission independent office", "commissions and independent offices", "independent commissions", "commission offices"],
    "term office": ["term office", "term office", "office term", "tenure condition"],
    "removal office": ["removal from office", "dismissal office", "termination office"],
    "general function power": ["general function power", "overall power", "general power functions"],
    "incorporation commission independent office": ["incorporation commission independent office", "incorporation commissionindependent office", "establishment commission", "set up independent office"],
    "reporting commission independent office": ["reporting commission independent office", "reporting commission independent office", "commissions report", "report independent offices"]


}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration republic": "declaration republic",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "national value principle governance": "national value principle governance",
    "culture": "culture",
    "entitlement citizen": "entitlement citizen",
    "retention": "retention",  
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation citizen": "legislation citizen",
    "right fundamental freedom": "right fundamental freedom",
    "application": "application",
    "implementation right fundamental freedom": "implementation right fundamental freedom",
    "authority": "authority",
    "limitation": "limitation",
    "limit": "limit",
    "life": "life",
    "equality": "equality",
    "dignity": "dignity",
    "security": "security",
    "slavery": "slavery",
    "privacy": "privacy",
    "conscience": "conscience",
    "expression": "expression",
    "medium": "medium",
    "information": "information",
    "association": "association",
    "assembly": "assembly",
    "political": "political",
    "movement": "movement",
    "property": "property",
    "work": "work",
    "environment": "environment",
    "economic": "economic",
    "language": "language",
    "family": "family",
    "consumer": "consumer",
    "administrative": "administrative",
    "justice": "justice",
    "arrest": "arrest",
    "fair hearing": "fair hearing",
    "custody": "custody",
    "interpret": "interpret",
    "infant": "infant",
    "disable": "disable",
    "youth": "youth",
    "minority": "minority",
    "old": "old",
    "emergency": "emergency",
    "national": "national",
    "principle land": "principle land",
    "classification land": "classification land",
    "public land": "public land",
    "community land": "community land",
    "private land": "private land",
    "landhold non citizen": "landhold non citizen",
    "regulation land use": "regulation land use",
    "land commission": " land commission",
    "land legislation": "land legislation",
    "obligation respect environment" : "obligation respect environment",
    "enforcement environmental right" : "enforcement environmental right",
    "agreement relating natural resource" : "agreement relating natural resource",
    "legislation environment" : "legislation environment",
    "responsibility leadership": "responsibility leadership",
    "oath office": "oath office",
    "conduct state": "conduct state",
    "financial probity": "financial probity",
    "restriction activity": "restriction activity",
    "citizenship leadership": "citizenship leadership",
    "establish ethic anti corruption": "establish ethic anti corruption",
    "legislation leadership": "legislation",
    "general principle electoral system": "general principle electoral system",
    "legislation election": "legislation election",
    "registration voter": "registration voter",
    "candidate election": "candidate election",
    "eligibility stand independent candidate": "eligibility stand independent candidate",
    "vote": "vote",
    "electoral dispute": "electoral dispute",
    "independent electoral boundary commission": "independent electoral boundary commission",
    "delimitation electoral unit": "delimitation electoral unit",
    "allocation party seat": "allocation party seat",
    "requirement political party": "requirement political party",
    "legislation political party": "legislation political party",
    "establishment parliament": "establishment parliament",
    "role parliament": "role parliament",
    "role national assembly": "role national assembly",
    "role senate": "role senate",
    "membership national assembly": "membership national assembly",
    "membership senate": "membership senate",
    "qualification member parliament": "qualification member parliament",
    "promotion representation marginalised group": "promotion representation marginalised group",
    "election member parliament": "election member parliament",
    "term parliament": "term parliament",
    "vacation office": "vacation office",
    "right recall": "right recall",
    "question membership": "question membership",
    "speaker parliament": "speaker parliament",
    "presiding parliament": "presiding parliament",
    "party leader": "party leader",
    "exercise legislative power": "exercise legislative power",
    "bill county government": "bill county government",
    "special bill county government": "special bill county government",
    "ordinary bill county government": "ordinary bill county government",
    "mediation committee": "mediation committee",
    "money bill": "money bill",
    "presidential assent": "presidential assent",
    "come force law": "come force law",
    "power privilege immunity": "power privilege immunity",
    "public access participation": "public access participation",
    "right petition parliament": "right petition parliament",
    "official language parliament": "official language parliament",
    "quorum": "quorum",
    "vote parliament": "vote parliament",
    "decision senate": "decision senate",
    "committee standing order": "committee standing order",
    "power call evidence": "power call evidence",
    "location sitting parliament": "location sitting parliament",
    "parliamentary service commission": "parliamentary service commission",
    "clerk staff parliament": "clerk staff parliament",
    "principle executive authority": "principle executive authority",
    "national executive": "national executive",
    "authority president": "authority president",
    "function president": "function president",
    "power mercy": "power mercy",
    "presidential powers temporary incumbency": "presidential powers temporary incumbency",
    "decision president": "decision president",
    "election president": "election president",
    "qualification disqualification election president": "qualification disqualification election president",
    "procedure presidential election": "procedure presidential election",
    "death assume office": "death assume office",
    "validity presidential election": "validity presidential election",
    "assumption office president": "assumption office president",
    "term office president": "term office president",
    "protection legal proceeding": "protection legal proceeding",
    "removal president incapacity": "removal president incapacity",
    "removal president impeachment": "removal president impeachment",
    "vacancy office president": "vacancy office president",
    "function deputy president": "function deputy president",
    "election deputy president": "election deputy president",
    "vacancy office deputy president": "vacancy office deputy president",
    "removal deputy president": "removal deputy president",
    "benefit president": "benefit president",
    "cabinet": "cabinet",
    "decision cabinet": "decision cabinet",
    "secretary cabinet": "secretary cabinet",
    "principal secretary": "principal secretary",
    "attorney general": "attorney general",
    "director public prosecution": "director public prosecution",
    "removal director public prosecution": "removal director public prosecution",
    "judicial authority": "judicial authority",
    "judicial independence" :"judiciary independence",
    "judicial offices": "judicial offices",
    "court systems": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointments": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate courts": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc functions": "jsc functions",
    "judiciary fund": "judiciary fund",
    "object devolution": "object devolution",
    "devolve government": "devolve government",
    "county government": "county government",
    "membership": "membership",
    "speaker": "speaker",
    "county executive committee": "county executive committee",
    "county election": "county election",
    "county removal": "county removal",
    "vacancy": "vacancy",
    "functions of county executive committees": "executive function",
    "city": "city",
    "legislative authority": "legislative authority",
    "function and power of national and county government": "function and power of national and county government",
    "transfer": "transfer",
    "county boundary": "county boundary",
    "cooperation": "cooperation",
    "support": "support",
    "conflict of law": "law conflict",
    "suspension": "suspension county government",
    "qualification": "qualification",
    "vacation": "vacation",
    "summon witness": "summon witness",
    "public participation county assembly power privilege immunity": "public participation county assembly power privilege immunity",
    "gender balance": "gender balance",
    "county transition": "transition",
    "publication": "publication",
    "legislation": "legislation",
    "principle public finance": "principle public finance",
    "equitable sharing of national revenue": "equitable sharing of national revenue",
    "equitable share law": "equitable share",
    "equalisation fund": "equalisation fund",
    "consultation": "consultation",
    "consolidated fund": "consolidate",
    "revenue funds": "revenue",
    "contingencies fund": "contingency",
    "tax impose": "tax",
    "national government borrow": "borrow",
    "loan guarantee": "loan",
    "public debt": "debt",
    "commission on revenue allocation": "commission",
    "division of revenue": "division",
    "budget content": "budget",
    "expenditure": "expenditure",
    "supplementary appropriation": "supplementary",
    "financial control": "control",
    "procurement": "procurement",
    "budget controller": "budget controller",
    "auditor-general": "auditor",
    "salary": "salary",
    "central bank of kenya": "bank",
    "principle": "principle",
    "public service commission": "public service commission",
    "functions": "functions",
    "staffing": "staffing",
    "protection": "protection",
    "teachers service commission": "teachers service commission",
    "national security": "national security",
    "national security organs": "national security organs",
    "national security council": "national security council",
    "defence forces": "defence forces",
    "national intelligence service": "national intelligence service",
    "national police service": "national police service",
    "functions national police service": "functions national police service",
    "command national police service": "command national police service",
    "national police service commission": "national police service commission",
    "police service": "police service",
    "application of chapter": "application of chapter",
    "commission independent office": "commission independent office",
    "term of office": "term of office",
    "removal from office": "removal from office",
    "general function power": "general function power",
    "incorporation commission independent office": "incorporation commission independent office",
    "reporting commission independent office": "reporting commission independent office",
    "amendment constitution":"amendment constitution",
    "parliamentary initiative":"parliamentary initiative",
    "popular initiative":"popular initiative",
    "enforcement constitution": "enforcement constitution",
    "construe constitution": "construing constitution",
    "interpretation": "interpretation",
    "consequential legislation": "consequential legislation",
    "consequential provision": "consequential provision",
    "effective date": "effective date",
    "repeal": "repeal"

}

combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections, **chapter_4_sections,
                     **chapter_5_sections, **chapter_6_sections, **chapter_7_sections, **chapter_8_sections,
                     **chapter_9_sections, **chapter_10_sections, **chapter_11_sections,
                     **chapter_12_sections, **chapter_13_sections, **chapter_14_sections, **chapter_15_sections,
                     **chapter_16_sections, **chapter_17_sections, **chapter_18_sections}
# Print the keys of the dictionary
len(combined_sections.keys())

264

In [97]:
combined_sections = {**chapter_1_sections, **chapter_2_sections, **chapter_3_sections, **chapter_4_sections,
                     **chapter_5_sections, **chapter_6_sections, **chapter_7_sections, **chapter_8_sections,
                     **chapter_9_sections, **chapter_10_sections, **chapter_11_sections,
                     **chapter_12_sections, **chapter_13_sections, **chapter_14_sections, **chapter_15_sections,
                     **chapter_16_sections, **chapter_17_sections, **chapter_18_sections}
# Print the keys of the dictionary
len(combined_sections.keys())

264

# The function below is now the main code and can be integrated with the telegram chatbot interface

In [98]:
import spacy
from spellchecker import SpellChecker

# Load the language model (make sure to have spaCy installed and the model downloaded)
nlp = spacy.load("en_core_web_sm")

sections = combined_sections

synonyms = {
    "supremacy": ["supremacy","ultimate power"],
    "sovereignty": ["sovereignty", "power of the people", "authority of the people", "self rule", "autonomy"],
    "defence": ["defense", "protection", "preservation"],
    "declaration republic": ["declaration republic" "proclamation republic"],
    "territory": ["territory"],
    "devolution": ["devolution", "decentralization", "delegation", "transfer of power", "local governance", "subsidiarity"],
    "languages": ["languages", "tongues", "dialects", "official languages", "linguistic diversity"],
    "religion": ["religion", "faith", "belief systems", "spiritual practice", "secularism", "church-state separation"],
    "symbol": ["symbol", "emblem", "insignia", "representation", "national icon"],
    "day": ["day", "holiday", "observance", "public holiday", "commemoration", "remembrance"],
    "national value principle governance": ["national value principle governance"],
    "governance": ["governance", "government", "administration", "management", "public service", "political structure"],
    "culture": ["culture", "heritage", "tradition", "customs", "societal norms", "arts"],
    "entitlement citizen": ["entitlement citizen", "entitlement right"],
    "retention": ["retention", "maintenance", "keeping", "preservation", "continuation"],
    "birth": ["birth", "nativity", "origin", "ancestry", "inborn citizenship"],
    "registration": ["registration", "enlistment", "enrollment", "citizenship application", "naturalization"],
    "dual": ["dual", "multiple", "dual nationality", "two-fold citizenship"],
    "revocation": ["revocation", "cancellation", "annulment", "rescission", "forfeiture", "withdrawal"],
    "legislation citizen": ["legislation citizen"],
    "fundamental right freedom": ["right fundamental freedom"],
    "application bill right": ["application", "exercise", "freedom enforcement"],
    "implementation right": ["implementation right fundamental freedom"],
    "authority court ": ["authority court","authority court bill rights" "uphold bill right"],
    "limitation right": ["limitation", "freedom restriction", "limit entitlement"],
    "limit": ["limit","absolute right", "inalienable freedom", "immutable right",],
    "life": ["life", "existence", "entitlement to life", "survive"],
    "equality": ["equality", "equal treatment", "justice"],
    "dignity": ["dignity", "intrinsic respect", "inherent value", "personal honor"],
    "security": ["security", "liberty", "personal safety"],
    "slavery": ["slavery", "bondage", "force servitude", "involuntary labor", "coerce work"],
    "privacy": ["privacy", "confidential", "personal space"],
    "conscience": ["conscience", "liberty thought", "religion freedom", "belief"],
    "expression": ["expression", "communication"],
    "medium": ["medium", "press autonomy", "journalist freedom", "right information"],
    "information": ["information", "transparent", "datum"],
    "association": ["association", "union", "group formation", "association right"],
    "assembly": ["assembly", "dissent", "peace assembly", "demonstrate", "petition right"],
    "political": ["political", "electoral right", "political engagement", "vote right"],
    "movement": ["movement", "travel", "resident freedom", "movement liberty"],
    "Property": ["Property", "property safeguard", "assets security"],
    "work": ["work", "labor relation", "labour practice"],
    "environment": ["environment"],
    "economic": ["economic"],
    "language culture": ["language culture"],
    "family": ["family", "society foundation","parent"],
    "consumer": ["consumer", "consumer right" "client entitlement"],
    "administrative action": ["administrative action", "fair administrative action"],
    "justice": ["justice", "legal access", "access justice", "justice access"],
    "arrest": ["arrest", "right arrest"],
    "fair hear": ["fair hear", "hear"],
    "custody": ["custody","held custody", "imprisoned"],
    "interpret": ["interpret", "explain", "clarify", "overview"],
    "infant": ["infant", "child", "toddler", "kid"],
    "disable": ["disable","disability", "handicap", "impairment", "challenge"],
    "youth": ["youth", "young adult", "adolescent"],
    "minority": ["minority", "marginalise", "marginalize"],
    "old": ["old", "elder", "veteran"],
    "emergency": ["emergency","state emergency"],
    "national human right commission": ["national human right","human right equality commission","national human right equality commission"],
    "principle land": ["principle land", "land management principle", "land policy"],
    "classification land": ["classification land", "land category", "type land"],
    "public land": ["public land", "government land", "state property", "national land", "public landhold"],
    "community land": ["community land", "ethnic land", "cultural landhold", "community land"],
    "private land": ["private land", "individual landhold", "personal property", "freehold land"],
    "landhold non citizen" : ["landhold non citizen", "foreign lease", "alien land", "non citizen land"],
    "regulation land use" : ["regulation land use", "land use policy", "property regulation", "land oversight"],
    "land commission" : ["land commission", "land authority", "public land commission", "land policy agency"],
    "land legislation" : ["land legislation", "property law", "land use regulation", "land act"],
    "obligation respect environment" : ["obligation respect environment"],
    "enforcement environmental right" : ["enforcement environmental right"],
    "agreement relating natural resource" : ["agreement relating natural resource"],
    "legislation environment" : ["legislation environment"],
    "responsibility leadership": ["responsibility leadership"],
    "oath office": ["oath office", "state office affirmation"],
    "conduct state": ["conduct state", "behaviour state"],
    "financial probity": ["financial probity"],
    "restriction activity": ["restriction activity"],
    "citizenship leadership": ["citizenship leadership"],
    "establish ethic anti corruption": ["establish ethic anti corruption"],
    "legislation leadership": ["legislation leadership"],
    "general principles electoral system": ["general principles electoral system"],
    "legislation election": ["legislation election"],
    "registration voter": ["registration voter"],
    "candidate election": ["candidate election"],
    "eligibility stand independent candidate": ["eligibility stand independent candidate"],
    "vote": ["vote"],
    "electoral dispute": ["electoral dispute"],
    "independent electoral boundary commission": ["independent electoral boundary commission"],
    "delimitation electoral unit": ["delimitation electoral unit"],
    "allocation party seat": ["allocation party seat"],
    "requirement political party": ["requirement political party"],
    "legislation political party": ["legislation political party"],
    "establishment parliament": ["establishment parliament"],
    "role parliament": ["role parliament"],
    "role national assembly": ["role national assembly"],
    "role senate": ["role senate"],
    "membership national assembly": ["membership national assembly"],
    "membership senate": ["membership senate"],
    "qualification member parliament": ["qualification member parliament"],
    "promotion representation marginalised group": ["promotion representation marginalised group"],
    "election member parliament": ["election member parliament"],
    "term parliament": ["term parliament"],
    "vacation office": ["vacation office"],
    "right recall": ["right recall"],
    "question membership": ["question membership"],
    "speaker parliament": ["speaker parliament"],
    "presiding parliament": ["presiding parliament"],
    "party leader": ["party leader"],
    "exercise legislative power": ["exercise legislative power"],
    "bill county government": ["bill county government"],
    "special bill county government": ["special bill county government"],
    "ordinary bill county government": ["ordinary bill county government"],
    "mediation committee": ["mediation committee"],
    "money bill": ["money bill"],
    "presidential assent": ["presidential assent"],
    "come force law": ["come force law"],
    "power privilege immunity": ["power privilege immunity"],
    "public access participation": ["public access participation"],
    "right petition parliament": ["right petition parliament"],
    "official language parliament": ["official language parliament"],
    "quorum": ["quorum"],
    "vote parliament": ["vote parliament"],
    "decision senate": ["decision senate"],
    "committee standing order": ["committee standing order"],
    "power call evidence": ["power call evidence"],
    "location sitting parliament": ["location sitting parliament"],
    "parliamentary service commission": ["parliamentary service commission"],
    "clerk staff parliament": ["clerk staff parliament"],
    "principle executive authority": ["principle executive authority"],
    "national executive": ["national executive"],
    "authority president": ["authority president"],
    "function president": ["function president"],
    "power mercy": ["power mercy"],
    "presidential powers temporary incumbency": ["presidential powers temporary incumbency"],
    "decision president": ["decision president"],
    "election president": ["election president"],
    "qualification disqualification election president": ["qualification disqualification election president"],
    "procedure presidential election": ["procedure presidential election"],
    "death assume office": ["death assume office"],
    "validity presidential election": ["validity presidential election"],
    "assumption office president": ["assumption office president"],
    "term office president": ["term office president"],
    "protection legal proceeding": ["protection legal proceeding"],
    "removal president incapacity": ["removal president incapacity"],
    "removal president impeachment": ["removal president impeachment", "impeachment president"],
    "vacancy office president": ["vacancy office president"],
    "function deputy president": ["function deputy president"],
    "election deputy president": ["election deputy president"],
    "vacancy office deputy president": ["vacancy office deputy president"],
    "removal deputy president": ["removal deputy president", "impeachment deputy president"],
    "benefit president": ["benefit president"],
    "cabinet": ["cabinet"],
    "decision cabinet": ["decision cabinet"],
    "secretary cabinet": ["secretary cabinet"],
    "principal secretary": ["principal secretary"],
    "attorney general": ["attorney general"],
    "director public prosecution": ["director public prosecution"],
    "removal director public prosecution": ["removal director public prosecution"],
    "judicial authority": ["judicial authority","legal authority","court jurisdiction"],
    "judicial independence": ["judicial independence","judicial autonomy","judicial impartiality"],
    "judicial offices": ["judicial office","legal position","judicial role"],
    "court systems": ["court system","judicial system","legal system"],
    "supreme court": ["supreme court","highest court","apex court"],
    "appeal court": ["appeal court","appellate court","review court"],
    "high court": ["high court","superior court"],
    "judicial appointments": ["judicial appointment","judge appointment","judicial selection"],
    "office tenure": ["office tenure","office term","service duration"],
    "office removal": ["office removal","office dismissal","office termination"],
    "subordinate courts": ["subordinate court","local court","inferior court"],
    "jsc establishment": ["judicial service commission establishment","judicial service commission creation","judicial service commission formation"],
    "jsc function": ["judicial service commission function","judicial service commission role","judicial service commission responsibility"],
    "judiciary fund": ["judiciary fund","judicial fund","legal fund"],
    "object devolution": ["object devolution"],
    "devolved government": ["devolved government", "principle devolved government"],
    "county government": ["county government"],
    "membership county assembly": ["membership county assembly", "member county assembly"],
    "speaker county assembly": ["speaker county assembly"],
    "county executive committee": ["county executive committee", "county executive"],
    "election governor": ["election governor", "election deputy governor", "election county governor deputy county governor", "election county governor deputy county governor"],
    "removal governor": ["removal governor", "removal county governor"],
    "vacancy office governor": ["vacancy office governor", "vacancy governor"],
    "function county executive committee": ["function county executive committee", "function county executive"],
    "urban area": ["urban area", "city", "urban area city"],
    "legislative authority county assembly": ["legislative authority county assembly", "authority county assembly"],
    "power national government": ["power national government", "power county government, power national county government", "function national government, function county government"],
    "transfer power level government": ["transfer power level government", "transfer function level government", "transfer power government", "transfer function government", "transfer function power level government"],
    "boundary county": ["boundary county"],
    "cooperation national county government": ["cooperation national county government", "cooperation government"],
    "support county government": ["support county government", "assistance county government"],
    "conflict law": ["conflict law"],
    "suspension county government": ["suspension county government", "suspension government"],
    "qualification election member county assembly": ["qualification election member county assembly", "election county assembly"],
    "vacation office member county assembly": ["vacation office member county assembly", "vacation county assembly", "vacation member county"],
    "county assembly power summon witness": ["county assembly power summon witness", "power summon witness"],
    "public participation county assembly power": ["public participation county assembly power", "power county assembly", "county assembly power" "public participation county assembly power immunity", ],
    "county assembly gender balance": ["county assembly gender balance", "county assembly diversity", "county gender balance", "county diversity"],
    "county government transition": ["county government transition", "transition county government"],
    "publication county legislation": ["publication county legislation"],
    "legislation chapter": ["legislation chapter"],
    "principle public finance": ["principle public finance", "public finance guideline"],
    "equitable share national revenue": ["equitable share national revenue", "fair distribution revenue"],
    "equitable share": ["equitable share", "financial law", "equitable sharing"],
    "equalisation fund": ["equalisation fund", "equalization fund"],
    "consultation financial legislation": ["consultation financial legislation"],
    "consolidate fund": ["consolidate fund", "public fund"],
    "revenue fund": ["revenue fund", "revenue fund county"],
    "contingency fund": ["contingency fund"],
    "power impose tax": ["power impose tax", "power impose charge"],
    "imposition tax": ["imposition tax"],
    "borrow national government": ["borrow national government"],
    "borrow county": ["borrow county"],
    "loan guarantee": ["loan guarantee", "loan guarantee national government"],
    "public debt": ["public debt"],
    "commission revenue allocation": ["commission revenue allocation", "revenue allocation"],
    "function commission": ["function commission", "function commission revenue allocation"],
    "division revenue": ["division revenue"],
    "annual division revenue": ["annual division revenue", "annual allocation"],
    "transfer equitable share": ["transfer equitable share"],
    "budget form content": ["budget form content"],
    "budget estimate": ["budget estimate"],
    "expenditure budget": ["expenditure budget"],
    "supplementary appropriation": ["supplementary appropriation"],
    "county appropriation bill": ["county appropriation bill"],
    "financial control": ["financial control"],
    "account audit": ["account audit", "account audit public entity", "audit entity"],
    "procurement public good": ["procurement public good", "procurement private good", "procurement service"],
    "controller budget": ["controller budget"],
    "auditor general": ["auditor general"],
    "salary remuneration commission": ["salary remuneration commission", "salary remuneration"],
    "central bank kenya": ["central bank kenya", "central bank"],
    "principle": ["principle", "values and principles of public service", "public service values", "core principles of public service"],
    "public service commission": ["public service commission", "the public service commission", "public service authority", "government service commission"],
    "functions": ["function", "functions powers public service commission", "role the public service commission", "duties public service commission"],
    "staffing": ["staffing", "staffing of county governments", "county government staffing", "human resources for county governments"],
    "protection": ["protection", "protection of public officers", "safeguarding public officers", "security for public officials"],
    "teachers service commission": ["teachers service commission", "commission for teachers", "educators service commission"],
    "principle national security": ["principle national security"],
    "national security organ": ["national security organ", "security organ", "national security agency"],
    "national security council": ["national security council", "establishment of the national security council", "national security council formation", "creation of the national security council"],
    "defence force": ["defence force", "defence forces defence council", "defence force establishment"],
    "national intelligence service": ["national intelligence service"],
    "national police service": ["national police service", "establishment of the national police service", "national police service formation", "creation of the national police service"],
    "functions national police service": ["functions national police service"],
    "command national police service": ["command national police service"],
    "national police service commission": ["national police service commission", "police service commission"],
    "police service": ["police service", "additional police service"],
    "application of chapter": ["application of chapter", "chapter application", "provisions of the chapter"],
    "commission independent office": ["commission independent office", "commissions and independent offices", "independent commissions", "commission offices"],
    "term office": ["term office", "term office", "office term", "tenure condition"],
    "removal office": ["removal from office", "dismissal office", "termination office"],
    "general function power": ["general function power", "overall power", "general power functions"],
    "incorporation commission independent office": ["incorporation commission independent office", "incorporation commissionindependent office", "establishment commission", "set up independent office"],
    "reporting commission independent office": ["reporting commission independent office", "reporting commission independent office", "commissions report", "report independent offices"]


}

# Define qa mapping explicitly too
qa_mapping = {
    "supremacy": "supremacy",
    "sovereignty": "sovereignty",
    "defence": "defence",
    "declaration republic": "declaration republic",
    "territory": "territory", 
    "devolution": "devolution",
    "language": "languages",
    "religion": "religion",
    "symbol": "symbol",
    "day": "day",
    "national value principle governance": "national value principle governance",
    "culture": "culture",
    "entitlement citizen": "entitlement citizen",
    "retention": "retention",  
    "birth": "birth",
    "registration": "registration",
    "dual": "dual",
    "revocation": "revocation",
    "legislation citizen": "legislation citizen",
    "fundamental right freedom": "right fundamental freedom",
    "application bill right": "application",
    "implementation right": "implementation right",
    "authority bill right": "authority bill right",
    "limitation right ": "limitation right",
    "limit": "limit",
    "life": "life",
    "equality": "equality",
    "dignity": "dignity",
    "security": "security",
    "slavery": "slavery",
    "privacy": "privacy",
    "conscience": "conscience",
    "expression": "expression",
    "medium": "medium",
    "information": "information",
    "association": "association",
    "assembly": "assembly",
    "political": "political",
    "movement": "movement",
    "property": "property",
    "work": "work",
    "environment": "environment",
    "economic": "economic",
    "language": "language",
    "family": "family",
    "consumer": "consumer",
    "administrative": "administrative",
    "justice": "justice",
    "arrest": "arrest",
    "fair hearing": "fair hearing",
    "custody": "custody",
    "interpret": "interpret",
    "infant": "infant",
    "disable": "disable",
    "youth": "youth",
    "minority": "minority",
    "old": "old",
    "emergency": "emergency",
    "national": "national",
    "principle land": "principle land",
    "classification land": "classification land",
    "public land": "public land",
    "community land": "community land",
    "private land": "private land",
    "landhold non citizen": "landhold non citizen",
    "regulation land use": "regulation land use",
    "land commission": " land commission",
    "land legislation": "land legislation",
    "obligation respect environment" : "obligation respect environment",
    "enforcement environmental right" : "enforcement environmental right",
    "agreement relating natural resource" : "agreement relating natural resource",
    "legislation environment" : "legislation environment",
    "responsibility leadership": "responsibility leadership",
    "oath office": "oath office",
    "conduct state": "conduct state",
    "financial probity": "financial probity",
    "restriction activity": "restriction activity",
    "citizenship leadership": "citizenship leadership",
    "establish ethic anti corruption": "establish ethic anti corruption",
    "legislation leadership": "legislation",
    "general principle electoral system": "general principle electoral system",
    "legislation election": "legislation election",
    "registration voter": "registration voter",
    "candidate election": "candidate election",
    "eligibility stand independent candidate": "eligibility stand independent candidate",
    "vote": "vote",
    "electoral dispute": "electoral dispute",
    "independent electoral boundary commission": "independent electoral boundary commission",
    "delimitation electoral unit": "delimitation electoral unit",
    "allocation party seat": "allocation party seat",
    "requirement political party": "requirement political party",
    "legislation political party": "legislation political party",
    "establishment parliament": "establishment parliament",
    "role parliament": "role parliament",
    "role national assembly": "role national assembly",
    "role senate": "role senate",
    "membership national assembly": "membership national assembly",
    "membership senate": "membership senate",
    "qualification member parliament": "qualification member parliament",
    "promotion representation marginalised group": "promotion representation marginalised group",
    "election member parliament": "election member parliament",
    "term parliament": "term parliament",
    "vacation office": "vacation office",
    "right recall": "right recall",
    "question membership": "question membership",
    "speaker parliament": "speaker parliament",
    "presiding parliament": "presiding parliament",
    "party leader": "party leader",
    "exercise legislative power": "exercise legislative power",
    "bill county government": "bill county government",
    "special bill county government": "special bill county government",
    "ordinary bill county government": "ordinary bill county government",
    "mediation committee": "mediation committee",
    "money bill": "money bill",
    "presidential assent": "presidential assent",
    "come force law": "come force law",
    "power privilege immunity": "power privilege immunity",
    "public access participation": "public access participation",
    "right petition parliament": "right petition parliament",
    "official language parliament": "official language parliament",
    "quorum": "quorum",
    "vote parliament": "vote parliament",
    "decision senate": "decision senate",
    "committee standing order": "committee standing order",
    "power call evidence": "power call evidence",
    "location sitting parliament": "location sitting parliament",
    "parliamentary service commission": "parliamentary service commission",
    "clerk staff parliament": "clerk staff parliament",
    "principle executive authority": "principle executive authority",
    "national executive": "national executive",
    "authority president": "authority president",
    "function president": "function president",
    "power mercy": "power mercy",
    "presidential powers temporary incumbency": "presidential powers temporary incumbency",
    "decision president": "decision president",
    "election president": "election president",
    "qualification disqualification election president": "qualification disqualification election president",
    "procedure presidential election": "procedure presidential election",
    "death assume office": "death assume office",
    "validity presidential election": "validity presidential election",
    "assumption office president": "assumption office president",
    "term office president": "term office president",
    "protection legal proceeding": "protection legal proceeding",
    "removal president incapacity": "removal president incapacity",
    "removal president impeachment": "removal president impeachment",
    "vacancy office president": "vacancy office president",
    "function deputy president": "function deputy president",
    "election deputy president": "election deputy president",
    "vacancy office deputy president": "vacancy office deputy president",
    "removal deputy president": "removal deputy president",
    "benefit president": "benefit president",
    "cabinet": "cabinet",
    "decision cabinet": "decision cabinet",
    "secretary cabinet": "secretary cabinet",
    "principal secretary": "principal secretary",
    "attorney general": "attorney general",
    "director public prosecution": "director public prosecution",
    "removal director public prosecution": "removal director public prosecution",
    "judicial authority": "judicial authority",
    "judicial independence" :"judiciary independence",
    "judicial offices": "judicial offices",
    "court systems": "court systems",
    "supreme court": "supreme court",
    "appeal court": "appeal court",
    "high court": "high court",
    "judicial appointments": "judicial appointments",
    "office tenure": "office tenure",
    "office removal": "office removal",
    "subordinate courts": "subordinate courts",
    "jsc establishment": "jsc establishment",
    "jsc functions": "jsc functions",
    "judiciary fund": "judiciary fund",
    "object devolution": "object devolution",
    "devolved government": "devolved government",
    "county government": "county government",
    "membership county assembly": "membership county assembly",
    "speaker county assembly": "speaker county assembly",
    "county executive committee": "county executive committee",
    "election governor": "election governor",
    "removal governor": "removal governor",
    "vacancy office governor": "vacancy office governor",
    "function county executive committee": "function county executive committee",
    "urban area": "urban area",
    "legislative authority county assembly": "legislative authority county assembly",
    "power national government": "power national government",
    "transfer power level government": "transfer power level government",
    "boundary county": "boundary county",
    "cooperation national county government": "cooperation national county government",
    "support county government": "support county government",
    "conflict law": "conflict law",
    "suspension county government": "suspension county government",
    "qualification election member county assembly": "qualification election member county assembly",
    "vacation office member county assembly": "vacation office member county assembly",
    "county assembly power summon witness": "county assembly power summon witness",
    "public participation county assembly power": "public participation county assembly power",
    "county assembly gender balance": "county assembly gender balance",
    "county government transition": "county government transition",
    "publication county legislation": "publication county legislation",
    "legislation chapter": "legislation chapter",
    "principle public finance": "principle public finance",
    "equitable share national revenue": "equitable share national revenue",
    "equitable sharing": "equitable sharing",
    "equalisation fund": "equalisation fund",
    "consultation financial legislation": "consultation financial legislation",
    "consolidate fund": "consolidate fund",
    "revenue fund": "revenue fund",
    "contingency fund": "contingency fund",
    "power impose tax": "power impose tax",
    "imposition tax": "imposition tax",
    "borrow national government": "borrow national government",
    "borrow county": "borrow county",
    "loan guarantee": "loan guarantee",
    "public debt": "public debt",
    "commission revenue allocation": "commission revenue allocation",
    "function commission": "function commission",
    "division revenue": "division revenue",
    "annual division revenue": "annual division revenue",
    "transfer equitable share": "transfer equitable share",
    "budget form content": "budget form content",
    "budget estimate": "budget estimate",
    "expenditure budget": "expenditure budget",
    "supplementary appropriation": "supplementary appropriation",
    "county appropriation bill": "county appropriation bill",
    "financial control": "financial control",
    "account audit": "account audit",
    "procurement public good": "procurement public good",
    "controller budget": "controller budget",
    "auditor general": "auditor general",
    "salary remuneration commission": "salary remuneration commission",
    "central bank kenya": "central bank kenya",
    "principle": "principle",
    "public service commission": "public service commission",
    "functions": "functions",
    "staffing": "staffing",
    "protection": "protection",
    "teachers service commission": "teachers service commission",
    "principle national security": "priciple national security",
    "national security organ": "national security organ",
    "national security council": "national security council",
    "defence force": "defence force",
    "national intelligence service": "national intelligence service",
    "national police service": "national police service",
    "function national police service": "function national police service",
    "command national police service": "command national police service",
    "national police service commission": "national police service commission",
    "police service": "police service",
    "application chapter": "application of chapter",
    "commission independent office": "commission independent office",
    "term office": "term of office",
    "removal office": "removal from office",
    "general function power": "general function power",
    "incorporation commission independent office": "incorporation commission independent office",
    "report commission independent office": "reporting commission independent office",
    "amendment constitution":"amendment constitution",
    "parliamentary initiative":"parliamentary initiative",
    "popular initiative":"popular initiative",
    "enforcement constitution": "enforcement constitution",
    "construe constitution": "construing constitution",
    "interpretation": "interpretation",
    "consequential legislation": "consequential legislation",
    "consequential provision": "consequential provision",
    "effective date": "effective date",
    "repeal": "repeal"

}


# Preprocess the user query using spaCy
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    print(tokens)
    return " ".join(tokens)

# Function to correct spelling
def correct_spelling(processed_query):
    spell = SpellChecker()
    words = processed_query.split()
    # Find misspelled words
    misspelled_words = spell.unknown(words)

    corrected_words = []
    for word in words:
        # Correct the word if it's misspelled
        if word in misspelled_words:
            corrected_word = spell.correction(word)  # Get the most likely correction
            corrected_words.append(corrected_word)
        else:
            corrected_words.append(word)

    # Reconstruct the initial sentence
    corrected_input = " ".join(corrected_words)
    return corrected_input

# Match with synonym and citizenship support
def match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping):
    processed_query = preprocess_query(query)
    corrected_query = correct_spelling(processed_query)

    # Check for specific citizenship subtopics first
    for subtopic, section_key in citizenship_mapping.items():
        if subtopic in corrected_query:
            return section_key  # Return the section key for the specific subtopic

    # Check for general citizenship if no specific subtopic was found
    if "citizenship" in corrected_query:
        return "citizenship"  # Return a placeholder indicating interest in citizenship

    # Look for synonym matches
    for key in qa_mapping:
        for synonym in synonyms.get(key, [key]):
            if synonym in corrected_query:
                return key  # Return the key if a match is found

    # No match found, return None or a generic response
    return None

# Answer function
def answer_question_nlp(query, sections, qa_mapping, synonyms, citizenship_mapping):
    section_key = match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping)
    
    if section_key is None:
        return "Sorry, I couldn't find an answer to your question."

    # Custom priority matching, more flexible can be added here
    prioritized_keys = []
    if "language" in query and "culture" in query:
        prioritized_keys.append('language culture')
    elif "implementation" in query and "rights" in query:
        prioritized_keys.append("implementation right")
    elif "authority" in query and "court" in query and "bill" in query and "right" in query:
        prioritized_keys.append("authority court bill right")
    elif "secretary" in query and "cabinet" in query:
        prioritized_keys.append("secretary cabinet")
    elif "special" in query:
        prioritized_keys.append("special bill county government")
    elif "ordinary" in query and "county" in query:
        prioritized_keys.append("ordinary bill county government")
    elif "vote" in query and "parliament" in query:
        prioritized_keys.append("vote parliament")
    elif "defence" in query and "forces" in query:
        prioritized_keys.append("defence forces")
    elif "legislative" in query and "authority" in query:
        prioritized_keys.append("legislative authority")
    elif "equitable" in query and "sharing" in query and "revenue" in query:
        prioritized_keys.append("equitable share national revenue")
    elif "national" in query and "debt" in query:
        prioritized_keys.append("national debt")
    elif "commission" in query and "revenue" in query:
        prioritized_keys.append("commission on revenue allocation")
    elif "central" in query and "financial" in query and "authority" in query:
        prioritized_keys.append("central financial authority")
    elif "division" in query and "revenue" in query:
        prioritized_keys.append("division revenue")
    elif "revenue" in query and "distribution" in query:
        prioritized_keys.append("revenue distribution")
    elif "allocation" in query and "revenue" in query:
        prioritized_keys.append("allocation of revenue")
    elif "budget" in query and "content" in query:
        prioritized_keys.append("budget content")
    elif "budget" in query and "structure" in query:
        prioritized_keys.append("budget structure")
    elif "budget" in query and "framework" in query:
        prioritized_keys.append("budget framework")
    elif "amendment" in query and "parliamentary" in query:
        prioritized_keys.append("parliamentary initiative")
    elif "amendment" in query and "popular" in query:
        prioritized_keys.append("popular initiative")
    elif "decision" in query and "cabinet" in query:
        prioritized_keys.append("decision cabinet")
    elif "incorporation" in query and "independent" in query:
        prioritized_keys.append("incorporation commission independent office")
    elif "principle" in query and "security" in query:
        prioritized_keys.append("principle national security")
    elif "national" in query and "security" in query and "organ" in query:
        prioritized_keys.append("national security organ")
    elif "security" in query and "council" in query:
        prioritized_keys.append("national security council")
    elif "function" in query and "police" in query:
        prioritized_keys.append("function national police service")
    elif "command" in query and "police" in query:
        prioritized_keys.append("command national police service")
    elif "police" in query and "commission" in query:
        prioritized_keys.append("national police service commission")
    elif "national" in query and "police" in query:
        prioritized_keys.append("national police service")

    # Process the priorities first
    for key in prioritized_keys:
        if key in sections:
            return sections[key]
    
    # If no specific priority match found, return the default section
    if section_key in sections:
        return sections[section_key]
    elif section_key == "citizenship":
        return f"It seems you're interested in citizenship. Available subtopics include: {list(citizenship_mapping.keys())}."
    
    return "Sorry, I couldn't find an answer to your question."

# Example usage
user_query = "What about command national police service ?"
answer = answer_question_nlp(user_query, sections, qa_mapping, synonyms, citizenship_mapping)
print(answer)

['command', 'national', 'police', 'service']
Command of the National Police Service.
245. (1) There is established the office of the Inspector-General
of the National Police Service.
(2) The Inspector-General—
(a) is appointed by the President with the approval of Parliament;
and
(b) shall exercise independent command over the National Police
Service, and perform any other functions prescribed by
national legislation.
148 Constitution of Kenya, 2010
(3) The Kenya Police Service and the Administration Police
Service shall each be headed by a Deputy Inspector-General
appointed by the President in accordance with the recommendation of
the National Police Service Commission.
(4) The Cabinet secretary responsible for police services may
lawfully give a direction to the Inspector-General with respect to any
matter of policy for the National Police Service, but no person may
give a direction to the Inspector-General with respect to—
(a) the investigation of any particular offence or offences;

In [99]:
import spacy
from spellchecker import SpellChecker
from telegram import Update
from telegram.ext import ApplicationBuilder, CommandHandler, MessageHandler, filters, CallbackContext
import nest_asyncio
import asyncio
import os
from dotenv import load_dotenv
load_dotenv()  # Loads the .env file
bot_token = os.getenv("BOT_TOKEN")

# Apply the nest_asyncio patch
nest_asyncio.apply()

# Initialize spaCy and load the language model
nlp = spacy.load("en_core_web_sm")

# Initialize spell checker
spell = SpellChecker()

# Function to preprocess query
def preprocess_query(query):
    doc = nlp(query)
    tokens = [token.lemma_.lower() for token in doc if not token.is_stop and not token.is_punct]
    return " ".join(tokens)

# Function to correct spelling
def correct_spelling(processed_query):
    words = processed_query.split()
    misspelled_words = spell.unknown(words)
    corrected_words = [spell.correction(word) if word in misspelled_words else word for word in words]
    return " ".join(corrected_words)

# Function to match with synonym support
def match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping):
    processed_query = preprocess_query(query)
    corrected_query = correct_spelling(processed_query)

    # Check for specific citizenship subtopics first
    for subtopic, section_key in citizenship_mapping.items():
        if subtopic in corrected_query:
            return section_key

    # General citizenship check
    if "citizenship" in corrected_query:
        return "citizenship"

    # Synonym matching
    for key in qa_mapping:
        for synonym in synonyms.get(key, [key]):
            if synonym in corrected_query:
                return key
    return None

# Answer function
def answer_question_nlp(query):
    prioritized_keys = []

    if "language" in query and "culture" in query:
        prioritized_keys.append('language culture')
    elif "implementation" in query and "rights" in query:
        prioritized_keys.append("implementation right")
    elif "authority" in query and "court" in query and "bill" in query and "right" in query:
        prioritized_keys.append("authority court bill right")
    elif "secretary" in query and "cabinet" in query:
        prioritized_keys.append("secretary cabinet")
    elif "special" in query:
        prioritized_keys.append("special bill county government")
    elif "ordinary" in query and "county" in query:
        prioritized_keys.append("ordinary bill county government")
    elif "vote" in query and "parliament" in query:
        prioritized_keys.append("vote parliament")
    elif "defence" in query and "forces" in query:
        prioritized_keys.append("defence forces")
    elif "legislative" in query and "authority" in query:
        prioritized_keys.append("legislative authority")
    elif "equitable" in query and "sharing" in query and "revenue" in query:
        prioritized_keys.append("equitable share national revenue")
    elif "national" in query and "debt" in query:
        prioritized_keys.append("national debt")
    elif "commission" in query and "revenue" in query:
        prioritized_keys.append("commission on revenue allocation")
    elif "central" in query and "financial" in query and "authority" in query:
        prioritized_keys.append("central financial authority")
    elif "division" in query and "revenue" in query:
        prioritized_keys.append("division revenue")
    elif "revenue" in query and "distribution" in query:
        prioritized_keys.append("revenue distribution")
    elif "allocation" in query and "revenue" in query:
        prioritized_keys.append("allocation of revenue")
    elif "budget" in query and "content" in query:
        prioritized_keys.append("budget content")
    elif "budget" in query and "structure" in query:
        prioritized_keys.append("budget structure")
    elif "budget" in query and "framework" in query:
        prioritized_keys.append("budget framework")
    elif "amendment" in query and "parliamentary" in query:
        prioritized_keys.append("parliamentary initiative")
    elif "amendment" in query and "popular" in query:
        prioritized_keys.append("popular initiative")
    elif "decision" in query and "cabinet" in query:
        prioritized_keys.append("decision cabinet")
    elif "incorporation" in query and "independent" in query:
        prioritized_keys.append("incorporation commission independent office")
    elif "principle" in query and "security" in query:
        prioritized_keys.append("principle national security")
    elif "national" in query and "security" in query and "organ" in query:
        prioritized_keys.append("national security organ")
    elif "security" in query and "council" in query:
        prioritized_keys.append("national security council")
    elif "function" in query and "police" in query:
        prioritized_keys.append("function national police service")
    elif "command" in query and "police" in query:
        prioritized_keys.append("command national police service")
    elif "police" in query and "commission" in query:
        prioritized_keys.append("national police service commission")
    elif "national" in query and "police" in query:
        prioritized_keys.append("national police service")
    
    #Check for a prioritized match
    for key in prioritized_keys:
        if key in sections:
            return sections[key]
    
    # Fallback to general synonym mapping if no priority match is found
    section_key = match_with_synonyms(query, qa_mapping, synonyms, citizenship_mapping)
    if section_key in sections:
        return sections.get(section_key, "Section not found.")
    elif section_key == "citizenship":
        return (f"It seems you're interested in citizenship. "
                f"Available subtopics include: {list(citizenship_mapping.keys())}.")
    return "Sorry, I couldn't find an answer to your question."

# Handler for messages
async def handle_message(update: Update, context: CallbackContext) -> None:
    user_query = update.message.text
    answer = answer_question_nlp(user_query)
    await update.message.reply_text(answer)

# Main function to set up the bot
async def main() -> None:
    # Replace 'YOUR_BOT_TOKEN' with your actual bot token
    application = ApplicationBuilder().token(bot_token).build()
    
    # Adding the message handler for text messages
    application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
    
    # Start the bot
    await application.initialize()
    await application.start()
    await application.updater.start_polling()

# Run the bot in the script
if __name__ == "__main__":
    asyncio.run(main())


## For reproducability, the final function will be exported to a python script

In [100]:
import json

In [101]:
# Save the variables to a json file
with open("combined_sections.json", "w") as json_file:
    json.dump(combined_sections, json_file, indent=4, ensure_ascii=False)

In [102]:
# Save the synonyms to a json file
with open("synonyms", "w") as json_file:
     json.dump(synonyms, json_file, indent=4, ensure_ascii=False)

In [103]:
# Save the qa mapping to a json file
with open("qa_mapping", "w") as json_file:
     json.dump(qa_mapping, json_file, indent=4, ensure_ascii=False)

In [104]:
# Save the qa mapping to a json file
with open("citizenship_mapping", "w") as json_file:
     json.dump(citizenship_mapping, json_file, indent=4, ensure_ascii=False)