<a href="https://colab.research.google.com/github/LJThao/atlas-machine_learning/blob/main/LJ_QA_Bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
from google.colab import files
uploaded = files.upload()

Saving zendeskarticles.zip to zendeskarticles.zip


In [3]:
import zipfile

with zipfile.ZipFile("zendeskarticles.zip", "r") as zip_ref:
    zip_ref.extractall("zendeskarticles")

In [4]:
import os

print(os.listdir("zendeskarticles"))

['__MACOSX', 'ZendeskArticles']


In [None]:
!pip install --user tensorflow-hub==0.15.0
!pip install --user transformers==4.44.2

0-qa.py file - Question Answering - Task 0

In [34]:
#!/usr/bin/env python3
"""Question Answering Module"""
import tensorflow_hub as hub
import tensorflow as tf
from transformers import BertTokenizer


def question_answer(question, reference):
    """Function that finds a snippet of text within a reference document to
    answer a question:

    question is a string containing the question to answer
    reference is a string containing the reference document from which to
    find the answer
    Returns: a string containing the answer
    If no answer is found, return None
    Your function should use the bert-uncased-tf2-qa model from the
    tensorflow-hub library
    Your function should use the pre-trained BertTokenizer,
    bert-large-uncased-whole-word-masking-finetuned-squad, from the
    transformers library

    """
    # load resources
    tokenizer = BertTokenizer.from_pretrained(
        "bert-large-uncased-whole-word-masking-finetuned-squad")
    model = hub.load("https://tfhub.dev/see--/bert-uncased-tf2-qa/1")

    # tokenize input using tokenizer API
    q_tokens = tokenizer.tokenize(question)
    r_tokens = tokenizer.tokenize(reference)

    # special token formatting
    tokens = ["[CLS]"] + q_tokens + ["[SEP]"] + r_tokens + ["[SEP]"]
    input_ids = tokenizer.convert_tokens_to_ids(tokens)

    # segment and attention masks
    q_len = len(q_tokens) + 2
    segment_ids = [0] * q_len + [1] * (len(r_tokens) + 1)
    attention_mask = [1] * len(input_ids)

    # convert to tensors
    ids = tf.constant([input_ids])
    mask = tf.constant([attention_mask])
    segments = tf.constant([segment_ids])

    # predict start and end
    start_idx = tf.argmax(model([ids, mask, segments])[0][0][1:]) + 1
    end_idx = tf.argmax(model([ids, mask, segments])[1][0][1:]) + 1

    # handle invalid span
    if start_idx >= end_idx:
        return None

    # return decoded text
    return tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1]).strip()


In [35]:
answer_question = question_answer

In [36]:
with open('/content/zendeskarticles/ZendeskArticles/PeerLearningDays.md') as f:
    reference = f.read()

print(question_answer('When are PLDs?', reference))

on - site days from 9 : 00 am to 3 : 00 pm


1-loop.py file - Create the loop - Task 1

In [7]:
#!/usr/bin/env python3
"""Create the loop Module
Create a script that takes in input from the user with the prompt Q: and
prints A: as a response. If the user inputs exit, quit, goodbye, or bye,
case insensitive, print A: Goodbye and exit.
"""

# QA loop
while True:
    question = input("Q: ")
    if question.lower() in ["exit", "quit", "goodbye", "bye"]:
        print("A: Goodbye")
        break
    print("A:")

Q: Hello
A:
Q: How are you?
A:
Q: BYE
A: Goodbye


2-qa.py file - Answer Questions - Task 2

In [23]:
#!/usr/bin/enb python3
"""Answer Questions Module"""
EXIT_COMMANDS = {"exit", "quit", "bye", "goodbye"}


def answer_loop(reference):
    """Function that answers questions from a reference text:

    reference is the reference text
    If the answer cannot be found in the reference text, respond
    with Sorry, I do not understand your question.

    """
    while True:
        try:
            q = input("Q: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nA: Goodbye")
            break

        if q.lower() in EXIT_COMMANDS:
            print("A: Goodbye")
            break

        ans = question_answer(q, reference)
        print("A:", ans or "Sorry, I do not understand your question.")

In [24]:
with open("/content/zendeskarticles/ZendeskArticles/PeerLearningDays.md") as f:
    reference = f.read()

In [14]:
answer_loop(reference)

Q: When are PLDs?
A: on - site days from 9 : 00 am to 3 : 00 pm
Q: What are Mock Interviews?
A: Sorry, I do not understand your question.
Q: What does PLD stand for?
A: peer learning days
Q: EXIT
A: Goodbye


3-semantic_search.py file - Semantic Search - Task 3

In [7]:
#!/usr/bin/env python3
"""Semantic Search Module"""
import os
import numpy as np
import tensorflow_hub as hub


def semantic_search(corpus_path, sentence):
    """Function that performs semantic search on a corpus of documents:

    corpus_path is the path to the corpus of reference documents on which
    to perform semantic search
    sentence is the sentence from which to perform semantic search
    Returns: the reference text of the document most similar to sentence

    """
    # list to hold the input sentence and reference documents
    texts = [sentence]

    # load reference documents
    for filename in os.listdir(corpus_path):
        if filename.endswith(".md"):
            file_path = os.path.join(corpus_path, filename)
            with open(file_path, "r", encoding="utf-8") as f:
                texts.append(f.read())

    # load Universal Sentence Encoder model
    model = hub.load(
        "https://tfhub.dev/google/universal-sentence-encoder-large/5")
    embeddings = model(texts)

    # compute cosine similarities between input sentence and each document
    input_vector = embeddings[0]
    similarities = [
        np.dot(input_vector, doc_vec) /
        (np.linalg.norm(input_vector) * np.linalg.norm(doc_vec))
        for doc_vec in embeddings[1:]
    ]

    # find the most similar document
    reference_task = texts[np.argmax(similarities) + 1]
    return reference_task


In [8]:
result = semantic_search("/content/zendeskarticles/ZendeskArticles", "When are PLDs?")
print(result)

PLD Overview
Peer Learning Days (PLDs) are a time for you and your peers to ensure that each of you understands the concepts you've encountered in your projects, as well as a time for everyone to collectively grow in technical, professional, and soft skills. During PLD, you will collaboratively review prior projects with a group of cohort peers.
PLD Basics
PLDs are mandatory on-site days from 9:00 AM to 3:00 PM. If you cannot be present or on time, you must use a PTO. 
No laptops, tablets, or screens are allowed until all tasks have been whiteboarded and understood by the entirety of your group. This time is for whiteboarding, dialogue, and active peer collaboration. After this, you may return to computers with each other to pair or group program. 
Peer Learning Days are not about sharing solutions. This doesn't empower peers with the ability to solve problems themselves! Peer learning is when you share your thought process, whether through conversation, whiteboarding, debugging, or li

4-qa.py file - Multi-reference Question Answering - Task 4

In [37]:
#!/usr/bin/env python3
"""Multi-reference Question Answering Module"""
EXIT_COMMANDS = {"exit", "quit", "bye", "goodbye"}


def question_answer(corpus_path):
    """Function that answers questions from multiple reference texts:

    corpus_path is the path to the corpus of reference documents

    """
    while True:
        try:
            q = input("Q: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nA: Goodbye")
            break

        if q.lower() in EXIT_COMMANDS:
            print("A: Goodbye")
            break

        ref = semantic_search(corpus_path, q)
        ans = answer_question(q, ref)
        print("A:", ans or "Sorry, I do not understand your question.")

In [38]:
question_answer("/content/zendeskarticles/ZendeskArticles")

Q: When are PLDs?
A: on - site days from 9 : 00 am to 3 : 00 pm
Q: What are Mock Interviews?
A: help you train for technical interviews
Q: What does PLD stand for?
A: peer learning days
Q: goodbye
A: Goodbye
