## Social Health Authority (SHA) FAQ Chatbot for Kenya

## 1. Introduction and overview
(to update)

## 2. Business Understanding
### 2.1 Business Problem
(to update)
### 2.2 Objectives
(to update)

## 3. Data Understanding

This section will explore the data to become familiar with its characteristics as well as identify data quality issues and gather initial insights to guide further analysis.

In [1]:
# Importing the necessary libraries
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re

In [2]:
# Loading and previewing the twitter sentiments dataset
df = pd.read_csv(r'\Users\user\Desktop\DS_Projects\phase_5_project\capstone_project\data\FAQ_extracted.csv')
df

Unnamed: 0,Question,Answer
0,Why was the shift to Universal Health Coverage...,To ensure that all Kenyans have access to qual...
1,What is Social Health Insurance (SHI)?,It is a form of health financing mechanism bas...
2,What are the Principles of Social Health Insur...,"Accessible, quality, affordable healthcare ser..."
3,Why is Social Health Insurance important?,An efficient way of managing health funds – ab...
4,What are the 4 Health Laws?,1. The Social Health Insurance Law (2023) – Cr...
...,...,...
88,There is a need for measures to enhance the qu...,The Quality-of-Care Bill is underway to optimi...
89,Where should providers claim payment of pendin...,"SHA has absorbed all NHIF liabilities, therefo..."
90,What will be the role of SHA in ensuring commo...,SHA will reimburse healthcare providers based ...
91,How can individuals in Kenya file complaints o...,The Social Health Authority will develop a com...


Exploring the data to gain insights about the data.

In [3]:
# Previewing the top of the dataset
df.head()

Unnamed: 0,Question,Answer
0,Why was the shift to Universal Health Coverage...,To ensure that all Kenyans have access to qual...
1,What is Social Health Insurance (SHI)?,It is a form of health financing mechanism bas...
2,What are the Principles of Social Health Insur...,"Accessible, quality, affordable healthcare ser..."
3,Why is Social Health Insurance important?,An efficient way of managing health funds – ab...
4,What are the 4 Health Laws?,1. The Social Health Insurance Law (2023) – Cr...


In [4]:
# Define the category assignments based on question ranges
df['Category'] = ''  # Initialize the 'Category' column

# Assign categories based on the question ranges with meaningful names
df.loc[0:3, 'Category'] = 'Understanding Social Health Authority (SHA)'       # Questions 1 - 4
df.loc[4:17, 'Category'] = 'Institutions Created by UHC Laws and Transition'  # Questions 5 - 18
df.loc[18:23, 'Category'] = 'NHIF Staff Considerations During Transition'     # Questions 19 - 24
df.loc[24:30, 'Category'] = 'Primary Health Care & the PHC Fund'              # Questions 25 - 31
df.loc[31:35, 'Category'] = 'Emergency, Chronic, and Critical Illness Fund'   # Questions 32 - 36
df.loc[36:71, 'Category'] = 'Registration, Means Testing & Contributions'     # Questions 37 - 72
df.loc[72:83, 'Category'] = 'Benefits, Tariffs & Claims Management'           # Questions 73 - 84
df.loc[84:90, 'Category'] = 'Access & Quality of Service Provision'           # Questions 85 - 91
df.loc[91:92, 'Category'] = 'Feedback and Dispute Resolution'                 # Questions 92 - 93

In [5]:
# Previewing the top of the dataset
df.head()

Unnamed: 0,Question,Answer,Category
0,Why was the shift to Universal Health Coverage...,To ensure that all Kenyans have access to qual...,Understanding Social Health Authority (SHA)
1,What is Social Health Insurance (SHI)?,It is a form of health financing mechanism bas...,Understanding Social Health Authority (SHA)
2,What are the Principles of Social Health Insur...,"Accessible, quality, affordable healthcare ser...",Understanding Social Health Authority (SHA)
3,Why is Social Health Insurance important?,An efficient way of managing health funds – ab...,Understanding Social Health Authority (SHA)
4,What are the 4 Health Laws?,1. The Social Health Insurance Law (2023) – Cr...,Institutions Created by UHC Laws and Transition


In [6]:
# Previewing the bottom of the dataset
df.tail()

Unnamed: 0,Question,Answer,Category
88,There is a need for measures to enhance the qu...,The Quality-of-Care Bill is underway to optimi...,Access & Quality of Service Provision
89,Where should providers claim payment of pendin...,"SHA has absorbed all NHIF liabilities, therefo...",Access & Quality of Service Provision
90,What will be the role of SHA in ensuring commo...,SHA will reimburse healthcare providers based ...,Access & Quality of Service Provision
91,How can individuals in Kenya file complaints o...,The Social Health Authority will develop a com...,Feedback and Dispute Resolution
92,Which takes precedence between procurement tri...,The Public Procurement Tribunal as provisioned...,Feedback and Dispute Resolution


In [7]:
# Checking the dimensions of the data
df.shape

(93, 3)

This output shows:

 - Number of Rows (FAQs): There are 93 rows in the DataFrame.
 - Number of Columns (Variables): There are 3 columns (or variables) in the DataFrame.

In [8]:
# Checking on the data type
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93 entries, 0 to 92
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Question  93 non-null     object
 1   Answer    93 non-null     object
 2   Category  93 non-null     object
dtypes: object(3)
memory usage: 2.3+ KB


In [9]:
# Checking columns
df.columns

Index(['Question', 'Answer', 'Category'], dtype='object')

## 4. Data Preparation
Here I will clean and preprocess the data, including selecting relevant features, handling missing values, and transforming data into the desired format for modeling

In [10]:
# Checking for missing values
df.isnull().sum()

Question    0
Answer      0
Category    0
dtype: int64

From the output we can conclude that there are no missing values from any of the columns in the dataset.

In [11]:
# Checking for duplicates
df.duplicated().sum()

0

From the output we can conclude that there are no duplicate values from any of the columns in the dataset.

**NOTE:** Since there are no missing or duplicate values in the dataset data cleaning is not required.

### 4.1 Data Visualization
(?? to update)

### 4.2 Text Preprocessing

In [12]:
# Text preprocessing
def preprocess_text(text):
    # Lowercase conversion
    text = text.lower()
    # Remove special characters
    text = re.sub(r'[^\w\s]', '', text)
    # Tokenize
    words = text.split()
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    words = [word for word in words if word not in stop_words]
    # Lemmatization
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(word) for word in words]
    # Join words back into a string
    return ' '.join(words)

# Apply preprocessing to question and answer columns
df['preprocessed_question'] = df['Question'].apply(preprocess_text)
df['preprocessed_answer'] = df['Answer'].apply(preprocess_text)
df['preprocessed_category'] = df['Category'].apply(preprocess_text)

In [13]:
# Previewing the top of the preprocessed data
df.head()

Unnamed: 0,Question,Answer,Category,preprocessed_question,preprocessed_answer,preprocessed_category
0,Why was the shift to Universal Health Coverage...,To ensure that all Kenyans have access to qual...,Understanding Social Health Authority (SHA),shift universal health coverage necessary,ensure kenyan access quality affordable compre...,understanding social health authority sha
1,What is Social Health Insurance (SHI)?,It is a form of health financing mechanism bas...,Understanding Social Health Authority (SHA),social health insurance shi,form health financing mechanism based risk res...,understanding social health authority sha
2,What are the Principles of Social Health Insur...,"Accessible, quality, affordable healthcare ser...",Understanding Social Health Authority (SHA),principle social health insurance,accessible quality affordable healthcare servi...,understanding social health authority sha
3,Why is Social Health Insurance important?,An efficient way of managing health funds – ab...,Understanding Social Health Authority (SHA),social health insurance important,efficient way managing health fund able set mo...,understanding social health authority sha
4,What are the 4 Health Laws?,1. The Social Health Insurance Law (2023) – Cr...,Institutions Created by UHC Laws and Transition,4 health law,1 social health insurance law 2023 creating in...,institution created uhc law transition
