This notebook is used to get a question recommendation from the three different weak supervision approaches. 

In [1]:
from tqdm import tqdm
import random
from IPython.display import clear_output
import os

from haystack.pipelines import QuestionGenerationPipeline, TranslationWrapperPipeline, QuestionAnswerGenerationPipeline
from haystack.nodes import QuestionGenerator, TransformersTranslator, FARMReader
from haystack.document_stores import ElasticsearchDocumentStore, InMemoryDocumentStore
from haystack.utils import print_questions

INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.github.io/apex/
ERROR - root -  Failed to import 'magic' (from 'python-magic' and 'python-magic-bin' on Windows). FileTypeClassifier will not perform mimetype detection on extensionless files. Please make sure the necessary OS libraries are installed if you need this functionality.


General methods needed in every of the three weak supervision approaches. The methods are used once for the third weak supervision approach, as it performs the best. The document store is the same for all three approaches, hence adding text to the document store is done only once.

In [2]:
def read_txt(txt_name: str):
    "Read text from file into a string"
    
    dir_path = 'data/split_files/'
    path = dir_path + txt_name 
    text1 = ""
    with open(path, "r") as file:
        text1 = file.read()

    return text1

In [3]:
def write_doc(text1):
    "Write text from parameter into document store"
    
    docs = [{"content":text1}]

    # use inmemorystore for fast loading of the document store, as it will only contain a single document (txt file)
    document_store = InMemoryDocumentStore()
    document_store.write_documents(docs)
    return document_store

3. Weak supervision approach: Question Answer Generation Pipeline in combination with a Translation Wrapper Pipeline

In [7]:
def initialize_QA_Pipeline():
    "Load the pipeline with two translation models and a squad model."

    question_generator = QuestionGenerator()
    #reader uses locally saved "deepset/roberta-base-squad2" model
    reader = FARMReader('deepset/roberta-base-squad2')
    qag_pipeline = QuestionAnswerGenerationPipeline(question_generator, reader)

    #model for in_translator "Helsinki-NLP/opus-mt-de-en"
    in_translator = TransformersTranslator(
        model_name_or_path="Helsinki-NLP/opus-mt-de-en")

    #model for out_translator "Helsinki-NLP/opus-mt-en-de"
    out_translator = TransformersTranslator(
        model_name_or_path="Helsinki-NLP/opus-mt-en-de")

    pipeline_with_translation = TranslationWrapperPipeline(input_translator=in_translator, output_translator=out_translator, pipeline=qag_pipeline)
    return pipeline_with_translation

In [12]:
#QA Generation
def get_QA_results(document_store, pipeline_with_translation):
    "returns question answer pairs created from the pipeline for a document. note that the document store always contains a single document."
    for idx, document in enumerate(tqdm(document_store)):
        print(f"\n * Generating questions and answers for document {idx}: {document.content[:100]}...\n")
        result = pipeline_with_translation.run(documents=[document])
    return result

In [None]:
#initialize pipeline of 3. weak supervision approach
pipeline_with_translation = initialize_QA_Pipeline()

In [76]:
dir_path = 'data/split_files/'

# select a  txt file. This file will be manually uploaded into the annotation tool and then annotated based 
# on the annotators intuition and the suggested question answer pairs by the pipeline
random_txt_file = random.choice(os.listdir(dir_path))
random_txt_file

'WAZ.DE___167311___wirtschaft.txt'

In [77]:
# execute all steps to print the question answer pairs suggested by the 3. weak supervision approach
text1 = read_txt(random_txt_file)
print(text1)
document_store = write_doc(text1)
result = get_QA_results(document_store, pipeline_with_translation)
clear_output(wait=True)
print_questions(result)


Generated pairs:
 - Q:Was hat Porsche für einen halben Milliarden Euro zu bezahlen?
      A: Geldbußen
 - Q:Was ist der Grund, warum sich die Staatsanwaltschaft im Landkreis Stuttgart entschieden hat?
      A: der Diesel-Skandal
 - Q:An welchem Tag hat die Staatsanwaltschaft in Stuttgart beschlossen, eine Geldstrafe zu zahlen?
      A: Dienstag
 - Q:Was war Audis Geldstrafe im Oktober?
      A: 800 Millionen Euro
 - Q:Wie viel erhielt Audi in einer Strafe?
      A: 800 Millionen Euro
 - Q:In welcher Stadt hatte Audi einen Streit über die Verteilung?
      A: Braunschweig
 - Q:In Braunschweig hatte sich bereits in welchem Zuge über die Verteilung des Fahrzeugs gestritten?
      A: Diesel-Untersuchungen
 - Q:Was hatte bereits bei Bekanntgabe der Quartalszahlen in Höhe von einer Milliarde Euro Geld zurückgezahlt?
      A: Volkswagen
 - Q:Wie hoch waren die Bußgelder gegen VW und Audi?
      A: eine Milliarde Euro
 - Q:Was war von VW nicht ausführlich erklärt worden?
      A: die Geldbuße

1. Weak supervision approach: Question Generation Pipeline

In [78]:
question_generator = QuestionGenerator()
question_generation_pipeline = QuestionGenerationPipeline(question_generator)
for _, document in enumerate(document_store):
    result = question_generation_pipeline.run(documents=[document])
    # print generated questions of the pipeline
    print_questions(result)

INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0



Generated questions:
 -  What does the Volkswagen-Tochter Porsche have to pay?
 -  What is the amount of money that must be paid to the Porsche?
 -  How much money must the Porsche pay for its Bußgeld?
 -  How much did Audi have to pay in fines?
 -  What was Audi accused of doing in October?
 -  How much money did Audi pay for a fine against Audi?
 -  How much money did Volkswagen receive in exchange for diesel-motors?
 -  What was the name of the company that received the money from Volkswagen and Audi?
 -  Where did the money received from VW and Audi go?
 -  Geldbußen gegen VW und Audi gingen an die jeweiligen Länderkassen.
 -  Volkswagen hatte bei der Verkündung seiner Quartalszahlen bereits Rückstellungen in Höhe von einer Milliarde Euro bekanntgegeben, wie hoch?
 -  What was the name of the marque Volkswagen?
 -  How many million Euro was the total profit of Porsche?
 -  What were the losses from the sale of the Porsches considered?
 -  How much did Audi owe Audi for dieselskand

2. Weak supervision approach: Question Generation Pipeline in combination with a Translation Wrapper Pipeline

In [80]:
# this cell loads the pipeline and prints the questions generated for the document in the pipeline


question_generator = QuestionGenerator()
question_generation_pipeline = QuestionGenerationPipeline(question_generator)

# Define translator for input data
in_translator = TransformersTranslator(
    model_name_or_path="Helsinki-NLP/opus-mt-de-en")

# Define translator for output data
out_translator = TransformersTranslator(
    model_name_or_path="Helsinki-NLP/opus-mt-en-de")

# Put two pipelines together
pipeline_with_translation = TranslationWrapperPipeline(
    input_translator=in_translator, output_translator=out_translator, pipeline=question_generation_pipeline)

# Get questions for the text of the document store
for _, document in enumerate(document_store):
    result = pipeline_with_translation.run(documents=[document])
    #print(result)
    print_questions(result)

INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0
INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0
INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0



Generated questions:
 -  What does Porsche have to pay for a half-billion euros?
 -  What is the reason the public prosecutor decided to do in the district of Stuttgart?
 -  On what date did the public prosecutor's office in Stuttgart decide to pay a fine?
 -  What was Audi's fine in October?
 -  How much did Audi receive in a penalty?
 -  In what city did Audi have a dispute over distribution?
 -  In Braunschweig had already declared a dispute over the distribution of the vehicle in the course of what?
 -  What had already returned money in the event of the announcement of its quarterly figures had already been announced in the amount of one billion euros?
 -  How much did the fines against VW and Audi amount to?
 -  What had not been explained in detail by VW?
