# AI-Powered Chatbot for Local Businesses: Project Walkthrough

## Problem Understanding
Small businesses in Africa often struggle to answer customer questions quickly and consistently. Many don’t have dedicated customer service teams, so an AI chatbot can help by providing instant answers to common questions, making customer service more reliable and freeing up staff.

## Goal of the Chatbot
Build a simple AI chatbot that helps local businesses answer frequently asked customer questions, using a small dataset and simple techniques.

## Dataset Summary
- **Main Question:** Typical customer question
- **Alternative Ways to Ask:** Other ways to phrase the same question
- **Simplified Phrasings:** Even simpler versions
- **Simplified Response:** The answer the chatbot should give

## Approach
We use a rule-based chatbot with basic NLP: lowercasing, cleaning punctuation, and fuzzy matching (SequenceMatcher). This keeps things simple and works well for small datasets.

## Data Pipeline Steps
1. Load the data
2. Inspect the data
3. Build a dictionary of question patterns to responses
4. Clean text for matching
5. Use fuzzy matching to find the closest question
6. Evaluate chatbot accuracy
7. Simulate training with train-test split

## Code Chunk Explanations
- **Chunk 1:** Load the dataset and preview the first few rows.
- **Chunk 2:** Build a dictionary mapping all question phrasings to their answers.
- **Chunk 3:** Define a function to get a response, using exact and fuzzy matching.
- **Chunk 4:** Clean up text by lowercasing and removing punctuation.
- **Chunk 5:** Use SequenceMatcher for fuzzy matching.
- **Chunk 6:** Create a DataFrame from the dictionary for splitting.
- **Chunk 7:** Backup the dictionary for restoration.
- **Chunk 8:** Define an ask function and show example interactions.
- **Chunk 9:** Split data into training and test sets (80/20) to simulate training.
- **Chunk 10:** Build a training dictionary and define a matcher using only training data.
- **Chunk 11:** Evaluate chatbot accuracy and print the result.

## How Rule-Based and NLP Techniques Are Used
This project uses a rule-based approach, meaning the chatbot matches user questions to answers using a dictionary of known patterns. To make matching smarter, we use simple NLP techniques:
- Lowercasing: All questions and patterns are converted to lowercase so "Password" and "password" are treated the same.
- Punctuation Cleaning: We remove punctuation to avoid mismatches caused by extra symbols.
- Fuzzy Matching: If a user's question isn't an exact match, we use SequenceMatcher to find the closest known question.
These steps help the chatbot understand different ways customers might ask the same thing, even if the wording is slightly different.

## Training Simulation
We split the data into training (80%) and test (20%) sets. The chatbot is trained on the training set and tested on the test set to see how well it answers new questions.

## Accuracy Calculation
Accuracy is the percentage of test questions answered correctly. For example, 75% accuracy means the chatbot got 3 out of 4 questions right.

## Challenges & Limitations
- Limited data and local language support
- Rule-based system can’t generalize or handle new phrasings
- No context awareness

## Possible Improvements
- Collect more data, especially in local languages
- Use more advanced NLP models
- Add context handling and user feedback

## Final Summary
This project shows how a simple AI chatbot can help local businesses answer customer questions quickly and consistently. With basic NLP and a rule-based approach, we achieved good accuracy and set the stage for future improvements.

In [225]:
import pandas as pd

# Load your dataset
file_path = "ChatbotData.csv"
df = pd.read_csv(file_path)

# Preview first rows
print(df.head())


             Source                Topic  \
0  Generic Retailer       Order Tracking   
1  Generic Retailer    Returns & Refunds   
2  Generic Retailer  Shipping & Delivery   
3  Generic Retailer  Shipping & Delivery   
4  Generic Retailer    Payment & Billing   

                          Main Question  \
0              How do I track my order?   
1           What is your return policy?   
2          How long does shipping take?   
3  Do you offer international shipping?   
4    What payment methods are accepted?   

                             Alternative Ways to Ask  \
0  Where is my order? | Can I see my delivery sta...   
1   Can I return items? | What's the refund process?   
2  When will my order arrive? | Delivery time est...   
3      Can you ship worldwide? | Ship outside Kenya?   
4  How can I pay? | Which cards/mobile money are ...   

                                   Response Template  \
0  To track your order, **sign into your account ...   
1  We offer a **60-day hass

In [226]:
qa_pairs = {}

for _, row in df.iterrows():
    # Extract the bot’s answer 
    response = row["Simplified Response"]

    # 1. Add the Main Question
    main_q = str(row["Main Question"]).lower().strip()
    qa_pairs[main_q] = response

    # 2. Add Alternative Ways to Ask
    if pd.notna(row["Alternative Ways to Ask"]):
        for alt in row["Alternative Ways to Ask"].split("|"):
            alt_q = alt.lower().strip()
            if alt_q:
                qa_pairs[alt_q] = response

    # 3. Add Simplified Phrasings
    if pd.notna(row["Simplified Phrasings"]):
        for phr in row["Simplified Phrasings"].split("|"):
            phr_q = phr.lower().strip()
            if phr_q:
                qa_pairs[phr_q] = response

print("Total patterns loaded:", len(qa_pairs))


Total patterns loaded: 462


In [227]:
from difflib import get_close_matches

def get_response(user_input):
    user_input = user_input.lower().strip()

    # Step 1: Try exact match
    if user_input in qa_pairs:
        return qa_pairs[user_input]

    # Step 2: Try fuzzy match (similar questions)
    possible_matches = get_close_matches(user_input, qa_pairs.keys(), n=1, cutoff=0.5)

    if possible_matches:
        best_match = possible_matches[0]
        return qa_pairs[best_match]

    # Step 3: Default fallback
    return "Sorry I don't understand that. Could you rephrase your question?"


In [228]:
import re

def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-z0-9\s]', '', text)
    return text.strip()


In [229]:
from difflib import SequenceMatcher

def find_best_match(user_q):
    user_q = clean_text(user_q)
    best_score = 0
    best_match = None

    for pattern in qa_pairs.keys():
        score = SequenceMatcher(None, user_q, pattern).ratio()
        if score > best_score:
            best_score = score
            best_match = pattern

    # Only return match if similarity is strong enough
    if best_score > 0.45:
        return best_match
    return None


def get_answer(user_q):
    match = find_best_match(user_q)

    if match:
        return qa_pairs[match]

    return "Sorry I couldn't understand that. Could you ask in a different way?"


In [230]:
import pandas as pd

# Rebuild a simple DataFrame of pattern -> response
patterns = []
for pattern, resp in qa_pairs.items():
    patterns.append([pattern, resp])

patterns_df = pd.DataFrame(patterns, columns=["pattern", "response"])
    

In [231]:
qa_pairs_backup = qa_pairs.copy()     # backup


In [232]:
# Restore original QA pairs
qa_pairs = qa_pairs_backup


In [233]:
def ask(question):
    print("You:", question)
    print("Bot:", get_answer(question))
    print()
  

## Test the Bot Here
You can test the chatbot by calling the `ask()` function with any customer question. For example:

```
ask("How do I reset my password?")
```
This will print both your question and the bot's response. Try different questions to see how well the bot matches and answers!

In [236]:
# Example interactions
ask("How can I reset my password?")
ask("What is the return policy?")
ask("Tell me about shipping options.")
ask("How do I contact customer support?")
ask("what should i do if i chose pay on delivery (pod)")
ask("how can i contact konga customer service")
ask("does zando offer free returns")
ask("how do i link my safaricom account to masoko")
ask("how do i reset my account password")
ask("what payment methods are accepted on jumia")
ask("how to track my order on jumia")


You: How can I reset my password?
Bot: Click **'Forgot Password'** on the login page.

You: What is the return policy?
Bot: **Returns are accepted** within the specified period (e.g., 7-30 days). Check the policy page for details.

You: Tell me about shipping options.
Bot: **Collect your order for free** at a designated secure location.

You: How do I contact customer support?
Bot: Use the **Contact Us page** or **live chat** for support.

You: what should i do if i chose pay on delivery (pod)
Bot: **Do not pay until delivered**. Report early payment requests.

You: how can i contact konga customer service
Bot: **Contact customer service** via the official website's **Email or WhatsApp**.

You: does zando offer free returns
Bot: **Returns are accepted** within the specified period (e.g., 7-30 days). Check the policy page for details.

You: how do i link my safaricom account to masoko
Bot: Account is often **linked to your Safaricom/M-Pesa credentials**.

You: how do i reset my account 

In [238]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Build dataset from your rule-based patterns
patterns_df = pd.DataFrame([[p, qa_pairs[p]] for p in qa_pairs], 
                           columns=["pattern", "response"])

# Train/Test split
train_df, test_df = train_test_split(patterns_df, test_size=0.2, random_state=42)

# Build training dictionary ONLY
qa_train = {row["pattern"]: row["response"] for _, row in train_df.iterrows()}

# Local matcher that uses ONLY qa_train
def get_answer_train_only(q):
    q = q.lower().strip()
    best_match = None
    best_score = 0

    # simple SequenceMatcher
    from difflib import SequenceMatcher
    for pattern in qa_train:
        score = SequenceMatcher(None, q, pattern).ratio()
        if score > best_score:
            best_score = score
            best_match = pattern

    if best_score > 0.45:
        return qa_train[best_match]
    return None

# Evaluate accuracy
correct = 0
for _, row in test_df.iterrows():
    expected = row["response"]
    predicted = get_answer_train_only(row["pattern"])
    if predicted == expected:
        correct += 1

accuracy = correct / len(test_df)
print(f"Chatbot Accuracy: {accuracy:.2%} ({correct}/{len(test_df)})")


Chatbot Accuracy: 75.27% (70/93)
