<a href="https://colab.research.google.com/github/Savith-02/notebooks/blob/main/election_bot_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json

# Load FAQ data
with open('faq_data.json', 'r') as f:
    faq_data = json.load(f)

# Extract FAQ questions for TF-IDF processing
faq_questions = [faq['question'] for faq in faq_data['faqs']]

In [2]:
# Function to get the best-matching FAQ using TF-IDF
def get_best_faq_answer(user_query):
    # Add the user's query to the list of FAQ questions
    questions = [user_query] + faq_questions

    # Vectorize the questions using TF-IDF
    vectorizer = TfidfVectorizer().fit_transform(questions)
    vectors = vectorizer.toarray()

    # Compute cosine similarity between the user query and each FAQ question
    cosine_similarities = cosine_similarity(vectors[0:1], vectors[1:])

    # Find the index of the most similar question
    best_match_index = cosine_similarities.argmax()

    # Get the answer for the best-matching question
    best_match_answer = faq_data['faqs'][best_match_index]['answer']

    return best_match_answer

In [4]:
# Example test
user_query = "election on?"
answer = get_best_faq_answer(user_query)
print(f"Question: {user_query}\nAnswer: {answer}")


Question: election on?
Answer: The election will be held on November 5th, 2024.


In [None]:
!pip install python-Levenshtein

In [6]:
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from Levenshtein import distance as levenshtein_distance

Collecting python-Levenshtein
  Downloading python_Levenshtein-0.25.1-py3-none-any.whl.metadata (3.7 kB)
Collecting Levenshtein==0.25.1 (from python-Levenshtein)
  Downloading Levenshtein-0.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Collecting rapidfuzz<4.0.0,>=3.8.0 (from Levenshtein==0.25.1->python-Levenshtein)
  Downloading rapidfuzz-3.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading python_Levenshtein-0.25.1-py3-none-any.whl (9.4 kB)
Downloading Levenshtein-0.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (177 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.4/177.4 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rapidfuzz-3.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: r

In [7]:
# === Data Loading ===
def load_data(file_path):
    with open(file_path, 'r') as file:
        return json.load(file)

faq_data = load_data('faq_data.json')
manifesto_data = load_data('manifesto_data.json')

# List of candidates for easy reference
candidates = list(manifesto_data['candidates'].keys())

# === FAQ Query Handling ===
faq_questions = [faq['question'] for faq in faq_data['faqs']]

def get_best_faq_answer(user_query):
    """Finds the best FAQ match using TF-IDF and cosine similarity."""
    questions = [user_query] + faq_questions
    vectorizer = TfidfVectorizer().fit_transform(questions)
    vectors = vectorizer.toarray()
    cosine_similarities = cosine_similarity(vectors[0:1], vectors[1:])
    best_match_index = cosine_similarities.argmax()
    return faq_data['faqs'][best_match_index]['answer']

# === Manifesto Query Handling ===
def get_manifesto_answer(candidate, topic):
    """Fetches the manifesto answer based on candidate and topic."""
    candidate_manifesto = manifesto_data['candidates'].get(candidate, {})
    return candidate_manifesto.get(topic, "No information available on this topic.")

# === Fuzzy Matching (Typos and Incomplete Names) ===
def fuzzy_match(query, candidates, threshold=2):
    """Performs fuzzy matching to handle typos in candidate names."""
    for candidate in candidates:
        if levenshtein_distance(candidate.lower(), query.lower()) <= threshold:
            return candidate
    return None

# === Edge Case: Detect Non-Relevant Queries ===
def detect_non_relevant_query(user_query):
    """Detects off-topic queries."""
    off_topic_keywords = ["weather", "movie", "song", "sports", "news"]
    for word in off_topic_keywords:
        if word in user_query.lower():
            return True
    return False

# === Edge Case: Handle Multiple Candidates/Topics ===
def handle_multiple_entities(user_query):
    """Handles cases where multiple candidates or topics are mentioned."""
    if "and" in user_query:
        sub_queries = user_query.split("and")
        responses = [handle_user_query(sub_query.strip()) for sub_query in sub_queries]
        return " ".join(responses)
    return None

# === Edge Case: Handle Vague Queries ===
def handle_vague_queries(user_query):
    """Handles queries that are too short or lack context."""
    if len(user_query.split()) < 3:
        return "Can you provide more details? Are you asking about a candidate's stance or general election info?"
    return None

# === Edge Case: Handle Overlapping or Missing Data ===
def handle_missing_data(candidate, topic):
    """Handles cases where candidate or topic data is missing."""
    if candidate not in manifesto_data['candidates']:
        return f"Sorry, I don't have information on {candidate}."
    if topic not in manifesto_data['candidates'][candidate]:
        return f"Sorry, I don't have {candidate}'s stance on {topic}."
    return None

# === Main Function to Handle User Queries ===
def handle_user_query(user_query):
    """The main function to process user queries."""
    # Check for off-topic queries
    if detect_non_relevant_query(user_query):
        return "I'm only trained to answer election-related questions. Can I help with something else?"

    # Handle multiple queries in one statement (e.g., multiple candidates)
    multi_entity_response = handle_multiple_entities(user_query)
    if multi_entity_response:
        return multi_entity_response

    # Handle vague or incomplete queries
    vague_query_response = handle_vague_queries(user_query)
    if vague_query_response:
        return vague_query_response

    # Check if the query is related to a candidate's manifesto (by name)
    for candidate in candidates:
        if candidate.lower() in user_query.lower():
            if "healthcare" in user_query.lower():
                return get_manifesto_answer(candidate, "healthcare")
            elif "education" in user_query.lower():
                return get_manifesto_answer(candidate, "education")
            else:
                return handle_missing_data(candidate, user_query.lower())

    # Try fuzzy matching for candidate names (in case of typos)
    candidate = fuzzy_match(user_query, candidates)
    if candidate:
        return f"Did you mean {candidate}? Please ask again with more details."

    # If no candidate is detected, treat it as an FAQ query
    return get_best_faq_answer(user_query)

In [9]:
# === Example Usage ===
if __name__ == "__main__":
    # Testing FAQ
    print(handle_user_query("election on"))
    print(handle_user_query("I wanna vote!"))

    # Testing manifesto queries
    # What is John Doe's stance on healthcare?
    print(handle_user_query("John cares about health?"))
    # Jane Smith's stance on education
    print(handle_user_query("Jane Smith on education?"))

    # Testing fuzzy matching (typo handling)
    # What is Jonh Doe's stance on healthcare?
    print(handle_user_query(" Jonh Doe's stance on healtcare?"))

    # Testing multiple queries
    print(handle_user_query("What is John Doe's stance on healthcare and Jane Smith's stance on education?"))

    # Testing vague query
    print(handle_user_query("healthcare"))

    # Testing non-relevant query
    print(handle_user_query("What's the weather today?"))


Can you provide more details? Are you asking about a candidate's stance or general election info?
You can register to vote by visiting the official government website before the registration deadline.
John Doe supports universal healthcare and plans to expand coverage to all citizens.
Jane Smith plans to increase funding for vocational training programs.
John Doe supports universal healthcare and plans to expand coverage to all citizens.
John Doe supports universal healthcare and plans to expand coverage to all citizens. Jane Smith plans to increase funding for vocational training programs.
Can you provide more details? Are you asking about a candidate's stance or general election info?
I'm only trained to answer election-related questions. Can I help with something else?


Edge Cases Covered:

Ambiguous Queries: Vague or incomplete queries are handled, asking the user for clarification.

Missing Data: If a candidate or topic is missing from the manifesto, the bot responds accordingly.

Typos: Fuzzy matching handles misspelled candidate names.

Multiple Queries: Handles cases with multiple candidates/topics in one query.

Non-Election Queries: Detects off-topic questions and responds accordingly.

Overlapping Candidate Names: Detects ambiguous or incomplete candidate names and asks for clarification.
