#### Problem Description
The program is a Question-Answering (QA) system that attempts to answer questions starting with "Where," "Who," "What," or "When." It utilizes the Wikipedia API to retrieve information and provide answers to user queries. The program also handles various question formats, including queries about birthdates, ages, and general information.

#### Usage Instructions:
Users can ask questions starting with "Where," "Who," "What," or "When."
To exit the program, users can type "exit."

#### Examples of Input and Output 
Input: Who is Albert Einstein
Output: Albert Einstein ( EYEN-styne; German: [ˈalbɛɐt ˈʔaɪnʃtaɪn] ; 14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time.

Please ask another question of type 'exit' to exit from the program:

Input: Where is the Eiffel Tower
Output: The Eiffel Tower ( EYE-fəl; French: Tour Eiffel [tuʁ ɛfɛl] ) is a wrought-iron lattice tower on the Champ de Mars in Paris, France.

Please ask another question of type 'exit' to exit from the program:

Input: exit
Output: Thank you! Goodbye.

#### Algorithm Description
The program starts with a loop that continuously accepts user input for questions.
It tokenizes the user's input using NLTK's natural language processing tools, including word tokenization and part-of-speech tagging.
It identifies the type of question (Where, Who, What, or When) by using regular expressions and extracts the relevant information from the question.
Depending on the question type and content, the program queries Wikipedia using the extracted information.
The program handles cases where multiple answers or disambiguation pages are encountered.
Finally, the program displays the answer or informs the user that the answer could not be found.
While the program is running it logs user questions and answers to a text file named "mylogfile.txt"

In [None]:
!pip install wikipedia

In [None]:
import wikipedia
import sys
import os
import lxml
import re
import nltk
from nltk.chunk import tree2conlltags
from datetime import datetime
from nltk import ne_chunk, pos_tag, word_tokenize

In [None]:
# Downloading NLTK data

nltk.download('punkt')
nltk.download('average_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

In [None]:
# This function tokenizes sentences and will extract named entities

def tokenizer(sent):
    words = nltk.word_tokenize(sent)
    pos_tags = nltk.pos_tag(words)
    tree = nltk.ne_chunk(pos_tags)
    conll_tags = nltk.tree2conlltags(tree)
    return conll_tags

In [None]:
# This function willl extract the question content from the user's input

def get_questions(s):
    rules = [[r'Where (Is|Was) (.+)', ["{0}"]],
             [r'Who (Is|Was) (.+)', ["{0}"]],
             [r'What (Is|Was) (.+) Age', ["{0}"]],
             [r'What (Is|Was) (.+)', ["{0}"]],
             [r'When (Is|Was) (.+) Born', ["{0}"]],
             [r'When (Is|Was) (.+) Birthday', ["{0}"]],
             [r'When (Is|Was) (.+)', ["{0}"]],
            ]
    
    for r, c in rules:
        m = re.match(r, s.rstrip('.!?'))
        if m:
            return m.groups()[1]
    return ""

In [None]:
# This function is used to retrieve a Wikipedia summary for a given query

def get_ans_wiki(query):
    try:
        return wikipedia.summary(query, sentences = 1)
    except wikipedia.exceptions.DisambiguationError as e:
        return wikipedia.summary(e.options[0], sentences = 1)
    except wikipedia.exceptions.PageError:
        return ""
    except Exception:
        return ""

In [None]:
# This function will check for any digits within a text

def check_digits(text):
    return re.findall(r'\d+', text)

In [None]:
# This function will transform the user's question into a statement

def transform(u, q):
    u = u.rstrip('.!?')
    ur = re.sub(r'When (Is|Was) (.+) Born', r'\2 Was Born On ', u)
    if ur != u:
        return ur + q.strftime("%B %d, %Y") + '.'
    
    ur = re.sub(r'When (Is|Was) (.+) Birthday', r'\2 Birthday Is On ', u)
    if ur != u:
        ur = ur + q.strftime("%B %d, %Y")   
        ur = re.sub(r'\d{4}', '', ur)
        return ur + '.'
    
    ur = re.sub(r'What (Is|Was) (.+) Age', r'\2 Age Is ', u)
    if ur != u:
        return ur + str(q) + '.'
    

In [None]:
def main():
    try:
        # Here a log file is opened and will record the user's questions and the program's answers
        log = open('mylogfile.txt', "w+", encoding = "utf-8")
        print("*** This is a QA system by Abiha Abbas, Devin Schechter, Jeffrey Stejskal, and Stuti Tandon. It will try to answer questions that start with Where, Who, What or When")
        counter = 1
        while True:
            s = input()
            e = s
            
            if s == "exit":
                print("\nThank you! Goodbye.")
                break
                
            log.write(str(counter) + "Ques) " + s + "\n")
            flag = False
            final = ""
            wiki = ""
            
            if s != "" and s is not None:
                
                s = s.title()             # This capitalizes the first letter of each word in the input
                n = tokenizer(s)          # This tokenizes and extracts named entities from the input 
                r = get_questions(s)      # Thi extracts the quetion content from the input
                
                if r != "":
                    if ("Where" in s or "What" in s):
                        
                        wiki = get_ans_wiki(r)
                        match = re.search(r'\d{4}', wiki)
                        
                        if match is not None:
                            wiki = ""
                            
                    if ('I-PERSON' or 'B-PERSON') in n[2]:
                        flag = True
                        
                    if "Who" in s and flag:
                        wiki = get_ans_wiki(r)
                        
                    if ("When" in s) and flag:
                        
                        wiki = get_ans_wiki(r)
                        result = check_digits(wiki)
                        
                        if result:
                            for dd in result:
                                if len(dd) == 4:
                                    birthdate = datetime.strptime(dd, "%Y")
                                    appends = transform(s, birthdate)
                                    if appends is not None:
                                        final = appends
                                        break
                                
                        else:
                            final = "" 
                    else:
                        final = get_ans_wiki(r)
                else:
                    final = ""
            else:
                print("Please ask a valid question!")
            
            if final == "":
                log.write(str(counter) + "A) Answer not found.\n\n")
                print("I am sorry, I don't know the answer.\n")
                
            else:
                try:
                    log.write(str(counter) + "A) " + final + "\n\n")
                    print(final + "\n")
                except Exception as GeneralException:
                    log.write(str(counter) + "A) Answer not found.\n\n")
                    print("I am sorry, I don't know the answer.\n")
                    
            counter = counter + 1
            print("Please ask another question of type 'exit' to exit from the program:\n")
    except Exception as GeneralException:
        print(GeneralException)
    finally:
        log.close()

# This will start the main program        
main()

#### References
[1] Dr. Heidari. (2023, October 6). George Mason. AIT526 - Natural Language Processing. Programming Assignment 3 Guide.