<a href="https://colab.research.google.com/github/JKrse/nlp_QA_QG_app/blob/multilingual/nlp_QA_QG_streamlit_app.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Streamlit demo for Question Answering & Question Generation

Updated: 1st September 2020

---

This notebook enables you to run a streamlit app through a COLAB. As the demo includes a number of models used for Question Answering and Question Generation it can be somewhat storage and computational demanding, which you don't have to think about here!

The final streamlit is essentially a combination of two seperate COLAB prototypes solving Question Answering and Question Generation respectively. These are now put together in a nice streamlit app with user interface. 



##**Question Answering**:


- https://colab.research.google.com/drive/1s6WH45ZNSyM38FDcpqYfLXejDUDKTn5s?usp=sharing

This is a prototype with various pre-trained NLP models for Questioning and Answering task.

A pre-defined text snippet is defined in "Text snippet - Data", simple change input for new topic.

Hugging Face models can be seen here (with the name of the pre-trained model): https://huggingface.co/transformers/pretrained_models.html

It is very easy to add extra model, simple follow the pattern in the code and add the name to 'model' and 'user2model'.

The pre-trained AllenNLP models found through the Usage: https://demo.allennlp.org/reading-comprehension

## **Question Generation**

- https://colab.research.google.com/drive/1D5T4Y2AZapoeNeD9fNcX0hqNQ36Vg7Vp?usp=sharing

**To get started simple collapse "Demo for Question Generation" and press run. Next open "Main" and run the script.**

---

The framework mimics 🤗 transformers pipeline for easy inference

**There are three pipeline tasks:** 
1. **question-generation: for single task question generation models**
2. **multitask-qa-qg: for multi-task qa,qg models**
3. **e2e-qg: for end-to-end question generation** 

With the option of using a model sizes "small" or a "base". All models are easily implemented but for this demo ***only the end-to-end question generation is included in this demo (both the small and base model)***. 

---
This is a demo using patil-suraj work: https://github.com/patil-suraj/question_generation

For fine-tuning the models please look at the git repository. 

# Get started

## Installing Packages

In [1]:
!pip install streamlit
!pip install pyngrok
!pip install gitpython
!pip install nltk

!pip install allennlp==1.0.0
!pip install allennlp_models==1.0.0

!pip install -U transformers==3.0.0

!pip install langdetect

Collecting streamlit
[?25l  Downloading https://files.pythonhosted.org/packages/73/f2/03193332b9ced126141f095a7b0b75a1a5f43665aa1e4eb58d7cc0f69977/streamlit-0.66.0-py2.py3-none-any.whl (7.2MB)
[K     |████████████████████████████████| 7.2MB 1.9MB/s 
Collecting enum-compat
  Downloading https://files.pythonhosted.org/packages/55/ae/467bc4509246283bb59746e21a1a2f5a8aecbef56b1fa6eaca78cd438c8b/enum_compat-0.0.3-py3-none-any.whl
Collecting blinker
[?25l  Downloading https://files.pythonhosted.org/packages/1b/51/e2a9f3b757eb802f61dc1f2b09c8c99f6eb01cf06416c0671253536517b6/blinker-1.4.tar.gz (111kB)
[K     |████████████████████████████████| 112kB 45.8MB/s 
[?25hCollecting validators
  Downloading https://files.pythonhosted.org/packages/41/4a/3360ff3cf2b4a1b9721ac1fbff5f84663f41047d9874b3aa1ac82e862c44/validators-0.18.1-py3-none-any.whl
Collecting watchdog
[?25l  Downloading https://files.pythonhosted.org/packages/0e/06/121302598a4fc01aca942d937f4a2c33430b7181137b35758913a8db10ad/watchd

Collecting pyngrok
  Downloading https://files.pythonhosted.org/packages/d1/4e/c050f5ed75f60cca4e9cc0a7056a9b424707a09f6293756e7dd19beb10db/pyngrok-4.1.12.tar.gz
Building wheels for collected packages: pyngrok
  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone
  Created wheel for pyngrok: filename=pyngrok-4.1.12-cp36-none-any.whl size=16810 sha256=c50b626f22d2a66ff7545239fd12085175e3190eec53255b6b2def7c6db5dfea
  Stored in directory: /root/.cache/pip/wheels/69/c3/d6/6968dd4d831794d41c311be1d7af6f4ac151c5d3bd0e6efab8
Successfully built pyngrok
Installing collected packages: pyngrok
Successfully installed pyngrok-4.1.12
Collecting gitpython
[?25l  Downloading https://files.pythonhosted.org/packages/09/bc/ae32e07e89cc25b9e5c793d19a1e5454d30a8e37d95040991160f942519e/GitPython-3.1.8-py3-none-any.whl (159kB)
[K     |████████████████████████████████| 163kB 2.9MB/s 
[?25hCollecting gitdb<5,>=4.0.1
[?25l  Downloading https://files.pythonhosted.org/packages/48/11/d1800bca0a3bae820b

## Clone Git
The git for Question Generation is cloned for this COLAB. 

In [2]:
import os
import git
if not os.path.exists(f"{os.getcwd()}/question_generation/"):
    git.Git(os.getcwd()).clone("https://github.com/patil-suraj/question_generation.git")


# Write App scripts
Writing the python scripts used when running the app

### App

In [14]:
%%writefile app.py

#===============================================================================

import streamlit as st
from screens import *

#===============================================================================

PAGE_CONFIG = {"page_title":"QG_QA_demo.io","page_icon":":shark:","layout":"centered"}
st.beta_set_page_config(**PAGE_CONFIG)

# App start: 
def main():
    menu = ["About", "Question Answering", "Question Generation"]
    choice = st.sidebar.selectbox("Menu", menu)
    
    #===========================================================================
    # Main Page: 
    if choice == "About":
        main_screen()

    #===========================================================================
    # QA Page: 
    if choice == "Question Answering": 
        QA_screen()

    #===========================================================================
    # QG Page: 
    if choice == "Question Generation":
        QG_screen()


if __name__ == '__main__':
    main()


Overwriting app.py


### Screens

In [15]:
%%writefile screens.py

#===============================================================================
import streamlit as st
from functions import *

import nltk
nltk.download('punkt')

from nltk.tokenize import sent_tokenize

import numpy as np
#===============================================================================

# Input variables  
models_dict_qg = Config.models_qg
models_dict_qa = Config.models_qa

demo_text_qg = Config.demo_text_qg
demo_text_qa = Config.demo_text_qa["context"]
demo_ques_qa = Config.demo_text_qa["question"]

language_lookup = Config.language_lookup

#===============================================================================

def QG_screen():
    st.title("Question Generation")	
    st.write("Question Generation (QG) aims to generate natural language, "\
            "questions based on given contents where the generated questions "\
            "need to be able to be answered by the contents.")
    #####################################
    st.subheader("Model selection")
    #### Model select and user input ####
    option_qg = st.selectbox("Select model size:",
                        (list(models_dict_qg.keys())))
    st.subheader("Provide context")
    user_input_qg = st.text_area("Please provide context text:", height=200,
                                value=f"{demo_text_qg}", max_chars=500)
    
    ######################################
    # List user context setence by setence
    sentences_qg = sent_tokenize(user_input_qg)
    list_context("Sentences", sentences_qg)

    ######################################
    #### Load the NLP model: ####
    nlp_qg  = modelsConfig_qg(option_qg)
    questions = nlp_qg(user_input_qg)
    
    st.write("**Question Generated:**")
    for i, question in enumerate(questions):
        st.write(f"{i+1}. {question}")


#===============================================================================

def QA_screen():
    st.title("Question Answering")
    st.write("Reading comprehension is the task of answering questions about "\
        "a passage of text to show that the system understands the passage.")
    #####################################
    #### Model select and user input ####
    st.subheader("Model selection")
    option_qa = st.selectbox("Select model:",
        (list(models_dict_qa.keys())))
    


    st.subheader("Provide context:")
    user_context_qa = st.text_area("Please provide text:", height=100,
                                value=f"{demo_text_qa}", max_chars=500)
    
    #####################################
    #### Context setence by setence ####
    sentences_qa = sent_tokenize(user_context_qa)
    list_context("Sentences", sentences_qa, checkbox=True)
    
    st.subheader("Provide the question(s):")
    user_question_qa = st.text_area("Please provide question text:", height=50,
                                value=f"{demo_ques_qa}", max_chars=200)
    

    questions = sent_tokenize(user_question_qa)


    #####################################
    #### Load the NLP model ####
    nlp_qa = modelsConfig_qa(option_qa)
    
    answers = {}
    for i, question in enumerate(questions):
        st.write(f"{i+1}. **Question**: {question}")
        answer = qa_compute_answer(nlp_qa, questions[i], 
                                    user_context_qa, models_dict_qa[option_qa])
        
        answers[answer] = answer_index(answer, user_context_qa)
        st.write(f"{i+1}. **Answer**: {answer}")


    #####################################
    #### Language detection ####
    
    if option_qa == "mrm8488/bert-multi-cased-finetuned-xquadv1 [multilingual]": 
        st.sidebar.markdown("**Note - This model is multilingual and support the following languages:**")
        st.sidebar.markdown(list(language_lookup.values()))
        st.sidebar.markdown("")
        
        st.sidebar.markdown("Context:")
        language_detect(user_context_qa,sidebar=True)
        st.sidebar.markdown("Question:")
        language_detect(user_question_qa,sidebar=True)

    #####################################
    #### Highlight answer in context ####
    st.subheader("Answer shown in context")
    num_answer = range(1, len(answers)+1)
    
    if len(num_answer) == 1: 
        index_range = answers[list(answers)[0]]
        write_answer(user_context_qa, index_range)
    
    elif len(num_answer) > 1:
        option_qa = st.selectbox("Select question in context:", list(num_answer))
        index_range = answers[list(answers)[option_qa-1]]
        write_answer(user_context_qa, index_range)


    #####################################
    
    
#===============================================================================


def main_screen():
    st.title("Question Answering & Question Generation")
    #####################################
    st.subheader("About :trophy:")
    st.write("Hello there and welcome!")
    st.write("In this prototype you are able to play around with "\
            "state-of-the-art NLP models in an easy user friendly environment.")
    st.write("There are two main task, namely; Question Answering and Question "\
            "Generation. Use the sidebar to navigate to them respectively.")
    
    #####################################
    st.subheader("Github :zap:")
    st.write("You can find the git repository for the app here:")
    st.write("https://github.com/JKrse/nlp_streamlit_QG_QA")

    #####################################
    st.subheader("COLAB :crocodile:")
    st.write("To play around with all of models can be both demanding in storage"\
            "(~6gb) and can be computationally. To cope with this a COLAB notebook has"\
            "been developed, which enables you to run the streamlit app through COLAB."\
            "Models are now stored and computationen made on using Google service.")
    st.write("https://colab.research.google.com/drive/1zjWn1OEvL_OJxQufjCnrtIIq25qT9DMz?usp=sharing")
    st.write("Note you will need to modify the script a little bit by adding"\
            "your own ngrok to generate the localhost server.")







Overwriting screens.py


### Functions

In [16]:
%%writefile functions.py

#===============================================================================
import streamlit as st

from allennlp.predictors.predictor import Predictor
import allennlp_models.rc

import torch
import transformers

import git
import os
if not os.path.exists(f"{os.getcwd()}/question_generation/"):
    git.Git(os.getcwd()).clone("https://github.com/patil-suraj/question_generation.git")

from question_generation.pipelines import pipeline as qg_pipline
from transformers import pipeline as qa_pipline

#===============================================================================
class Config:
    models_qg = {
        "Question generation (without answer supervision) [small]" : "qg",
        "Question generation (without answer supervision) [base]" : "qg",
    }
    
    models_qa = {
        "ELMo-BiDAF (Trained on SQuAD)" : "allennlp",
        "BiDAG (Trained on SQuAD)" : "allennlp",
        # "Transformer QA (Trained on SQuAD)" : "allennlp", # not working [hack]
        "distilbert-base-cased-distilled-squad" : "huggingface_pipline", 
        "bert-large-uncased-whole-word-masking-finetuned-squad"  : "huggingface_pipline",
        # Multilingual:
        "mrm8488/bert-multi-cased-finetuned-xquadv1 [multilingual]" : "huggingface_pipline",
        }


    demo_text_qg = "Infosys Limited, is an Indian multinational corporation" \
        "that provides business consulting, information technology" \
        "and outsourcing services. The company is headquartered in" \
        "Bangalore, Karnataka, India. Infosys is the second-largest" \
        "Indian IT company after Tata Consultancy Services by 2017 revenue" \
        "figures and the 596th largest public company in the world based" \
        "on revenue. On 29 March 2019, its market capitalisation was $46.52 billion."


    demo_text_qa = {"context" : "Python is a programming language. Created by Guido van Rossum and first released in 1991.",
                    "question" : "Who created Python? When was Python first released?"}

    language_lookup = {
        "ar" : "Arabic",
        "de" : "German",
        "el" : "Greek",
        "en" : "English",
        "es" : "Spanish",
        "hi" : "Hindi",
        "ru" : "Russian",
        "th" : "Thai",
        "tr" : "Turkish",
        "vi" : "Vietnamese",
        "zh" : "Chinese"
    }

def list_context(title, list_input, checkbox = False):
    
    if checkbox:
        checkbox_sent = st.checkbox(f"Show {str(title).lower()}")
        
        if checkbox_sent:
            st.write(f"**{title}:**")
            for sent in list_input:
                st.write(f"- {sent}")
    else: 
        st.write(f"**{title}:**")
        for sent in list_input:
            st.write(f"- {sent}")


## Load model
@st.cache
def modelsConfig_qg(model):
    ## Question Generation: 
    if model == "Question generation (without answer supervision) [small]":
        model_selected = qg_pipline("e2e-qg", model="valhalla/t5-small-e2e-qg")
    
    elif model == "Question generation (without answer supervision) [base]":
        model_selected = qg_pipline("e2e-qg", model="valhalla/t5-base-e2e-qg")    
    
    else:
        raise Exception("Not a valid model")   

    return model_selected

@st.cache(allow_output_mutation=True)
def modelsConfig_qa(model):
    ## Question Answering: 
    if model == "ELMo-BiDAF (Trained on SQuAD)":
        model_selected = Predictor.from_path(
            "https://storage.googleapis.com/allennlp-public-models/bidaf-elmo-model-2020.03.19.tar.gz")
    
    elif model == "BiDAG (Trained on SQuAD)":
        model_selected = Predictor.from_path(
            "https://storage.googleapis.com/allennlp-public-models/bidaf-model-2020.03.19.tar.gz")
    
    elif model == "Transformer QA (Trained on SQuAD)":
        model_selected = Predictor.from_path(
            "https://storage.googleapis.com/allennlp-public-models/transformer-qa-2020-05-26.tar.gz")
    
    elif model == "distilbert-base-cased-distilled-squad":
        model_selected = qa_pipline("question-answering", model=f"{model}")
    
    elif model == "bert-large-uncased-whole-word-masking-finetuned-squad":
        model_selected = qa_pipline("question-answering", model=f"{model}")
    
    # Multilingual:
    elif model == "mrm8488/bert-multi-cased-finetuned-xquadv1 [multilingual]":
        model = "mrm8488/bert-multi-cased-finetuned-xquadv1"
        model_selected = qa_pipline("question-answering", model=f"{model}")
    
    else:
        raise Exception("Not a valid model")    
    return model_selected



def qa_compute_answer(model, question, context, model_library):
    if model_library == "allennlp":
        answer = predict_QnA_allennlp(question, context, model)["best_span_str"]
    
    elif model_library == "huggingface_pipline":
        answer = model(question=question, context=context)["answer"]
    return answer


def predict_QnA_allennlp(question, passage, model): 
    ''' 
    Helper function for input convention used in hugging face implementation:
        [QUESTION : ANSWER_TEXT]
    '''
    prediction = model.predict(passage=passage, question=question)
    return prediction


def answer_index(answer, context):
    index_range = []
    word_len = []    

    for word in answer.split():
        word.lower()
        idx = context.find(word)
        
        word_len.append(len(word))    
        index_range.append(idx)

    index_range[-1]+word_len[-1]

    answer_span = [index_range[0], index_range[-1]+word_len[-1]]

    return answer_span


def write_answer(context, answer_span):
    st.write(f"{context[0: answer_span[0]]}"\
            f"**{context[answer_span[0]: answer_span[1]]}**"
            f"{context[answer_span[1]:]}")    

Overwriting functions.py


# Connecting Streamlit app from COLAB

## Your **ngrok** input - get Your Authentication Tokens
First task is to signup to ngrok.com and create a free account. This will give you access to several features.

To use ngrok,you will need to an authentication token which can be found on your dashboard of your ngrok account (https://dashboard.ngrok.com/get-started/setup).

This is what you will use to authenticate when working with ngrok. You can find your authtokens below the ‘Connect Your Account’ like this

./ngrok authtokens xxxxxxxxxxxxxxxxxxxx

This is what you will use to connect your account.

Followed this guide for connection: https://blog.jcharistech.com/2020/08/16/how-to-run-streamlit-apps-from-googles-colab/

In [None]:
!ngrok authtoken xxxxxxxxxxxxxxxxxxxx


##  Generate the proxy server!
You are ready 

In [7]:
!streamlit run app.py &>/dev/null&
from pyngrok import ngrok
# Setup a tunnel to the streamlit port 8501
public_url = ngrok.connect(port='8501')

# Get your local server

In [8]:
print(public_url)

http://5bb19e934973.ngrok.io
