# 1. Scrape pages

In [47]:
!pip install wikipedia -q

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [105]:
%load_ext autoreload
%autoreload 2

import wikipedia
import json

topics = []

# Open file
with open("wiki-list.json") as topic_list:
    topics = json.load(topic_list)["Topics"][:5]

# Get wikipedia page
# Todo: make it loop through pages
# Sets the max length to 512 characters
contexts = []

for topic in topics:
    page = wikipedia.page(topic)
    contexts.append(page.content[:512])

print(contexts)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Ancient_Egypt
Artificial Intelligence
Deep Learning
Natural Language Processing
Data Science
['Ancient Egypt was a civilization in Northeast Africa situated in the Nile Valley. Ancient Egyptian civilization followed prehistoric Egypt and coalesced around 3100 BC (according to conventional Egyptian chronology) with the political unification of Upper and Lower Egypt under Menes (often identified with Narmer). The history of ancient Egypt occurred as a series of stable kingdoms, separated by periods of relative instability known as Intermediate Periods: the Old Kingdom of the Early Bronze Age, the Middl', 'Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by non-human animals and humans. Example tasks in which this is done include speech recognition, computer vision, translation between (natura

# 2. Generate Questions

In [66]:
%pip install transformers -q
%pip install datasets -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Imports the QG module

In [67]:
import sys
import os

# Get the parent directory
notebook_path = os.path.abspath("wikiscraper.ipynb")  # Replace "my_notebook.ipynb" with the name of your notebook
parent_dir = os.path.dirname(os.path.dirname(notebook_path))

# Add the parent directory to sys.path
sys.path.insert(0, parent_dir)

In [109]:
%load_ext autoreload
%autoreload 2

from models.qg import QG

# context = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace."

qg = QG(model="valhalla/t5-base-e2e-qg", tokenizer="t5-base")
# qg1 = QG("the-coorporation/t5-qgar", "t5-base")

contexts_questions = []

for context in contexts:
    context_questions = qg(context)
    contexts_questions.append(context_questions)

print(contexts_questions)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[{'context': 'Ancient Egypt was a civilization in Northeast Africa situated in the Nile Valley. Ancient Egyptian civilization followed prehistoric Egypt and coalesced around 3100 BC (according to conventional Egyptian chronology) with the political unification of Upper and Lower Egypt under Menes (often identified with Narmer). The history of ancient Egypt occurred as a series of stable kingdoms, separated by periods of relative instability known as Intermediate Periods: the Old Kingdom of the Early Bronze Age, the Middl', 'questions': ['Where was Ancient Egypt located?', 'When did Ancient Egypt civilization coalesce?', 'Who was often identified with Narmer?', 'What are the periods of relative instability known as?']}, {'context': 'Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by non-hum

# 4. Question Answering

In [125]:
%load_ext autoreload

%autoreload 2

from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

question_context_answers = []

for context_questions in contexts_questions:
    # result = question_answerer(question=question, context=context)
    # question_answers.append({ "question": question, "answer": result["answer"], "score": result["score"] })
    question_answers = []
    context = context_questions['context']

    for question in context_questions['questions']:
        print(question)
        result = question_answerer(question=question, context=context)
        question_answers.append({ "question": question, "answer": result["answer"], "score": result["score"] })
    result = {
        "context": context,
        "questions": question_answers
    }
    question_context_answers.append(result)

for question_context_answer in question_context_answers:
    print(question_context_answer)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Where was Ancient Egypt located?
When did Ancient Egypt civilization coalesce?
Who was often identified with Narmer?
What are the periods of relative instability known as?
What does AI stand for?
What are examples of tasks in which AI is displayed by machines?
Google Search is an example of what type of AI?
Deep learning is part of what family of machine learning methods?
Deep learning can be supervised, semi-supervised or what?
What has deep learning architectures been applied to?
What subfield of linguistics, computer science, and artificial intelligence is concerned with the interactions between computers and human language?
What is the goal of a computer capable of understanding the contents of documents?
The technology can then accurately extract what?
Data science uses what to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data?
Data science integrates d

# 3. Filter Questions