#### Step 1. Install Python libraries in venv

In [None]:
! pip install langchain langchain-core langchain-community langchain-ollama langchain-text-splitters chromadb pypdf

#### Step 2. Load your documents

In [1]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(r"/home/lucille/Desktop/projects/FinanceBill-Assistant-KE2025/The_Finance_Bill_2025.pdf") # insert your document path
docs = loader.load()

Ignoring wrong pointing object 511 0 (offset 0)
Ignoring wrong pointing object 4016 0 (offset 0)


#### Step 3. Split data/text into chunks

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
chunks = splitter.split_documents(docs)

# Print the number of chunks
print(len(chunks))

257


#### Step 4. Embed the chunks (turn into vectors)

In [3]:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")

#### Step 5. Store in a vector database

In [None]:
from langchain_community.vectorstores import Chroma

# # create & persist
vectorstore = Chroma.from_documents(
    chunks,
    embedding=embeddings,
    persist_directory="./chroma_store" 
)

# # reload instead of recreating
# vectorstore = Chroma(
#     persist_directory="./chroma_store",
#     embedding_function=embeddings
# )


#### Step 6. Set Up LLM

In [5]:
from langchain_ollama.chat_models import ChatOllama

# Initialize LLaMA 3.1 (the "answering brain")
llm = ChatOllama(model="llama3.1", temperature=0)

#### Step 7. Set Up Retriever

In [11]:
from prompt import QUERY_PROMPT, ANSWER_PROMPT
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    vectorstore.as_retriever(),
    llm,
    prompt=QUERY_PROMPT
)

In [13]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate

# RAG prompt Template
template = """
Answer the question based ONLY on the following context:
{context}

Question: {question}

Your response must be in the same language as the question. If the question is in Swahili, respond in Swahili. If it is in English, respond in English.
"""
prompt = ChatPromptTemplate.from_template(template)

#### Step 8. Putting Retriever + Prompt Together

In [14]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


parser = StrOutputParser()


rag_chain = (
     {"context": retriever, "question": RunnablePassthrough()}
 | prompt
 | llm
 | parser
)

#### Step 9. Ask a question

In [15]:
from IPython.display import display, Markdown 

def ask_rag(question):
 response = display(Markdown(rag_chain.invoke(question)))
 return response

In [None]:
# Question 1

ask_rag("What is the main idea of this document?")

The main idea of this document appears to be a draft or proposal for amendments to the Income Tax Act in Kenya. The document outlines various changes and additions to the tax laws, including provisions related to multinational enterprise groups, country-by-country reporting, and exemptions from taxation for certain types of income. It also includes definitions and explanations of key terms and concepts used in the proposed amendments.

In [None]:
# Question 2

ask_rag("What Taxes does the document address?")

The document addresses various taxes including:

1. Income Tax
2. Value Added Tax (VAT)
3. Capital Gains Tax
4. Withholding Tax
5. Corporation Tax
6. Excise Duty
7. Stamp Duty
8. Pay As You Earn (PAYE) tax

These taxes are mentioned in the context of amendments to existing laws and regulations, as well as new provisions aimed at improving tax compliance and administration.

In [None]:
# Question 3

ask_rag("Mswada wa fedha wa 2025 unahusu nini?")


Mswada wa Fedha wa 2025 unaangazia mabadiliko katika Sheria ya Kifedha (Cap. 470), Sheria ya Kitaifa ya Kifedha (Cap. 476), Sheria ya Kifedha cha Ushuru (Cap. 472), Sheria ya Utendakazi wa Kifedha (Cap. 469B) na Sheria ya Malipo ya Vya Vilevile na Mipango (Cap. 469C). Pia inahusisha kubadilishwa kwa Sheria ya Ushuru wa Stempu ili kuwafungua watumiaji wa ushuru wa stempu kutoka kwa shirika la kampuni kwenda kwa washikadau wake kama sehemu ya mabadiliko ya ndani.

In [19]:
# Question 4

ask_rag("What does the document say about soccer?")

The document does not mention soccer at all. It appears to be a legislative document related to taxation and financial regulations in Kenya.

In [None]:
# Question 5

ask_rag("What does the document say about Income tax? Response in English first then provide another response in Swahili")

**English Response**

The document discusses various aspects of income tax, including:

1. Tax rates: The document mentions different tax rates for individuals and companies, but does not specify the exact rates.
2. Deductions: It outlines deductions that can be made from taxable income, such as interest paid on borrowed money, donations to certain institutions, and contributions to a post-retirement medical fund.
3. Exemptions: Certain types of income are exempt from tax, including income from the construction of public schools, hospitals, roads, and other social infrastructure.
4. Filing requirements: The document mentions that ultimate parent entities and constituent entities must file country-by-country reports with the Commissioner.
5. Amendments to assessments: It proposes amendments to section 31 of Cap.469B, which allows the Commissioner to amend assessments based on available information.

**Swahili Response**

Dokumeni hiki kinajadili mambo mbalimbali ya kodi ya mapato, ikiwemo:

1. Kiasi cha kodi: Dokumeni inaelezea viwango tofauti vya kodi kwa watu binafsi na mashirika, lakini haishangaa kiwango gani.
2. Kupungua kwa mapato: Inataja mapato ambayo yanaweza kupunguzwa kutoka kwenye mapato yaliyotolewa, kama vile faida iliyopatikana kutokana na biashara ya mawasiliano ya simu au radio.
3. Uchaguzi wa kodi: Kuna uwezekano kwamba baadhi ya mapato yanaweza kuachiliwa kodi, ikiwemo mapato yaliyotolewa kutoka kwenye ujenzi wa shule za umma, hospitali, barabara na infrastrukturu nyingine za kijamii.
4. Matumizi ya taarifa: Dokumeni inaelezea kwamba makampuni makuu na makampuni yanayohusika lazima yawe na taarifa za kila nchi zinazowasilishwa kwa Komisheni.
5. Ubadilishaji wa utambuzi: Inasema kuwa Komisheni inaweza kubadilisha utambuzi kulingana na taarifa iliyopo.

Kumbuka kwamba swali lililotolewa liliwekwa katika lugha ya Kiingereza, hivyo jibu langu pia litakuwa katika lugha hiyo.

In [21]:
# Question 6

ask_rag("What does the document say about Withholding Tax?")

The document discusses various aspects of taxation, including withholding tax. According to Section 13 of Cap.472, a supply of excisable services shall be deemed to be made in Kenya if the services are supplied from a place of business of the supplier in Kenya.

Withholding Tax is mentioned in several sections:

* Section 17(1) states that the Commissioner shall consider an application under section 16 and may grant or refuse to issue the applicant with a licence, which implies that withholding tax may be applicable to certain activities.
* Section 13 of Cap.472 proposes amendments related to the place of supply of excisable services, which may involve withholding tax.

However, there is no specific information provided about the rates, thresholds, or procedures for Withholding Tax in the document.

In terms of specific details about Withholding Tax, the document mentions:

* Section 17(2) states that the Commissioner may refuse an application under section 16 if satisfied that the applicant has been convicted of an offence involving dishonesty or fraud under any law.
* Section 13 of Cap.472 proposes amendments related to the place of supply of excisable services, which may involve withholding tax.

It is worth noting that the document appears to be a legislative proposal, and the specific details about Withholding Tax may be subject to change or further clarification in subsequent legislation or regulations.