# **Kenya Revenue Authority (KRA) ChatBot**

## Overview

The **Kenya Revenue Authority (KRA)** is responsible for the collection of revenue on behalf of the Government of Kenya. With increasing demand for tax-related services, citizens often encounter delays and confusion when trying to understand tax obligations, file returns, or access services through traditional channels.
To enhance public service delivery and ease access to tax information, this project aims to develop an AI-powered KRA Chatbot capable of answering common tax-related queries for individuals and businesses in real time.

## Business Understanding

The **Kenya Revenue Authority (KRA)** plays a critical role in national development by collecting taxes that fund public services and infrastructure. However, many citizens face challenges navigating Kenya’s complex tax ecosystem. Taxpayers—especially small business owners, first-time filers, informal sector workers, and youth—often struggle to understand their tax obligations, registration processes, deadlines, and compliance procedures.This leads to widespread confusion, delays in filing, non-compliance, and an overburdened KRA support system.

To address these challenges, the KRA Chatbot will be developed to provide instant, real-time, accessible and accurate assistance to users seeking tax-related information and services.The goal is to enhance accessability, reduce misinformation, and improve user engagement by leveraging natural language processing (NLP) for dynamic, intelligent responses.

## Problem Statement 

In Kenya, taxpayer education remains a significant barrier to improving compliance and understanding of the tax system. Many Kenyans struggle with filing taxes on time, understanding how much to pay, and navigating the KRA iTax system. Despite the resources available on the KRA website, clarity and accessibility of information remain a challenge. Taxpayers often face delays and confusion when trying to access support or find accurate information. Additionally, the language barriers between English and Swahili further hinder accessibility. The KRA Chatbot addresses these challenges by providing an AI powered, bilingual chatbot capable of handling the KRA-related queries with precision and also ensures that users receive relevant responses.

## Data Understanding

## Objectives 

## Success Metrics

## Loading the datasets

In [1]:
#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as sns
import warnings

warnings.filterwarnings('ignore')

In [3]:
# Loading the FAQs dataset
data= pd.read_csv("KRAfaqs.csv")
data

Unnamed: 0,Questions,Answers
0,How do I register for a KRA PIN,You can register for a KRA PIN by visiting the...
1,When is the deadline for filing individual inc...,The deadline for filing individual income tax ...
2,What is the penalty for late filing of VAT ret...,"The penalty is KES 10,000 or 5% of the VAT due..."
3,Can I recover a forgotten KRA password,Yes. Go to the iTax login page and click on 'F...
4,How do I apply for a KRA PIN,"To apply for a KRA PIN, visit https://itax.kra..."
...,...,...
619,Are there penalties for not declaring online i...,"Yes. You may face audits, penalties, and inter..."
620,How can I reduce my tax as a freelancer?,"Track allowable expenses (e.g., internet, soft..."
621,Can I use Turnover Tax (TOT) as a freelancer?,Only if your annual turnover is between KES 1M...
622,Is remote work for a foreign company taxed in ...,Yes. Kenyan residents must pay tax on worldwid...


In [8]:
# Checking the info in the dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 624 entries, 0 to 623
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Questions  615 non-null    object
 1   Answers    615 non-null    object
dtypes: object(2)
memory usage: 9.9+ KB


This dataset has 624 entries and 2 columns (Questions, Answers), both containing text data (object type).

In [9]:
#Describing the dataset
data.describe()

Unnamed: 0,Questions,Answers
count,615,615
unique,606,612
top,What happens if I don't file my tax returns,"VAT returns must be filed monthly, on or befor..."
freq,2,2


* There are 615 rows, meaning the dataset contains 615 FAQ pairs (each with a question and answer). Unique Values

* Questions column has 606 unique values:

* Answers column has 612 unique values:

* The most common question is "What happens if I don't file my tax returns", appearing 2 times in the dataset.

* The most common answer is "VAT returns must be filed monthly, on or befor...", appearing 2 times in the dataset.

* The most common question appears 2 times, indicating that users frequently ask about not filing the tax returns.

* The most common answer appears 2 times, suggesting that some questions share a similar response.

## Data Cleaning

### Accuracy

In [5]:
#Checking for duplicates
data.duplicated().sum()

np.int64(8)

There are 8 duplicates in the dataset. The duplicates will be dropped.

In [7]:
#Checking for null values
data.isnull().sum()

Questions    9
Answers      9
dtype: int64

There are 9 null values in the Questions column and 9 null values in the Answers column. The null values will be dropped.