The goal of this project is to implement a Question Answering (QA) system that answers causal type questions. We use Wikipedia as a knowledge base, extracting answers to user questions from the articles.
Python
Latest commit 4791d29 Mar 27, 2013 @bwbaugh Remove reference to fast-paced development
Currently have been busy with other course projects, so removing this
out of date reference.

README.md

Cause of Why

The goal of this project is to implement a Question Answering (QA) system that answers causal type questions. We use Wikipedia as a knowledge base, extracting answers to user questions from the articles.

Currently we are focused on getting the system's engine working, so the user interface is on the back burner right now. Please stay tuned for lots of updates!

Causal Questions

Causal questions are generally why-questions. They ask for a reason or a cause, such as "Why do birds sing?". This differs from other QA systems, which usually try to answer factoid questions, such as "Where is the Louvre located?".

Required Libraries

This project uses several libraries that either need to be installed or need to be present in the project's lib/ directory. The following is a list of the required libraries, as well as at least one way (source) to obtain the library.

nltk

Natural Language Processing (NLP) functions such as sentence segmentation, word tokenization, and more.

nltk resources

In addition, you will need to download several nltk resources using nltk.download() after you have the nltk library installed.

  • 'taggers/maxent_treebank_pos_tagger/english.pickle'

gensim

Some useful Information Retrieval (IR) algorithms including string to vector functions and similarity queries such as TF-IDF. Also implements topic modelling such as Latent Semantic Analysis.

unidecode

Converts unicode strings to closest ASCII equivalent.

Tornado

Provides a web server interface.

WikiExtractor.py

Converts text from MediaWiki markup format to plain text.

Optional Libraries

PyMongo

Tools for interacting with MongoDB databases. This is useful for working with indices that can't be held entirely in memory, which is not a problem for a smaller corpus like the Simple English Wikipedia but is an issue for larger corpora like the full English Wikipedia.

MongoDB

Since the PyMongo library is just an interface, we need an instance of the actual database itself running.

  • Pick the version for your platform.
  • If using Windows 7 or higher get the Windows 2008+ build.
  • Tested with version: 2.2.1

Start the database process before running the application.