H. Hsu and N. Huang, “Xiao-Shih: A Self-enriched Question Answering Bot With Machine Learning on Chinese-Based MOOCs,” <I>IEEE Trans. Learning Technologies</I>. (Under Review)

# Xiao-Shih: Question Answering Bot

### xiaoshih.QA(wv_model_path, keywords)
<b>Parameters:</b>
- wv_model_path: str, <I>path object of a word2vec model</I>
- QA_model_path: str, <I>path object of a question answering ML model</I>
- bert_model_path: str, <I>path object of a BERT model</I>
- keywords: set, <I>path object of keywords for tokenizing text</I>
- ml_features_path: str, <I>path object of features with DataFrame for ML</I>

In [1]:
import warnings
warnings.filterwarnings('ignore')

## 1. Preparing keywords and models (word2vec, QA, BERT)
Course "Python for Data Science" (PDS): 
- word2vec model: word2vec_model/pds
- QA model: QA_model/pds.pkl
- BERT model: bert_model/pds
- keywords: corpus/keywords_pds.txt
- features: QA_model/features_pds

Course "Introduction to Computer Networks" (ICN): 
- word2vec model: word2vec_model/icn
- QA model: QA_model/icn.pkl
- BERT model: bert_model/icn
- keywords: corpus/keywords_icn.txt
- features: QA_model/features_icn

In [2]:
keywords = set()
with open('corpus/keywords_pds.txt','r') as f:
    for line in f:
        keywords.add(line.strip())

## 2. Predicting if the archived question is duplicate
When Xiao-Shih receives a new question, it will find all candidate answers by ML model from the archived QA pairs.

### QA.duplicate_question_prediction(new_question, archived_question, answerer)
<b>Parameters:</b>
- new_question: str, <I>the text of a new question</I>
- archived_question: str, <I>the text of an archived question</I>
- answerer: str, <I>{'instructor', 'student', 'stackoverflow'}</I>

<b>Returns: boolean</b>

Whether the new question and the archived question are duplicates or not. If yes, Xiao-Shih may respond the answer of the archived question to the learner.

In [3]:
from xiaoshih import QA
qa = QA(wv_model_path='word2vec_model/pds', 
        QA_model_path='QA_model/pds.pkl',
        bert_model_path='bert_model/pds',
        keywords=keywords, 
        ml_features_path='QA_model/features_pds')

In [4]:
new_question = "!dot -Tpng tree.dot -o tree.png 的問題 老師好:  我在執行決策分類樹時，執行!dot -Tpng tree.dot -o tree.png跑出來的結果是:'dot' 不是內部或外部命令、可執行的程式或批次檔。不知道是什麼原因造成這樣，麻煩老師了。"
archived_question = "dot command not found 在觀看課程影片的時候，dot轉換成png檔時發生問題，執行程式: !dot -Tpng tree.dot -o tree.png錯誤訊息: 'dot' 不是內部或外部命令、可執行的程式或批次檔。後來我去上網下載graphviz後，依然沒辦法解決。想請問有什麼方法可以下載和解決?PS: 電腦是使用windows 10"

In [5]:
qa.answer_prediction(new_question=new_question, archived_question=archived_question, answerer='instructor')

Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/cr/_xbyk0wn69jdw48ygj_6w_w00000gn/T/jieba.cache
Loading model cost 0.732 seconds.
Prefix dict has been built successfully.


1.0