In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

pd.set_option('display.max_colwidth', None)

# First Model at: https://huggingface.co/potsawee/t5-large-generation-race-Distractor

In [2]:
MAX_INPUT_LENGTH = 512
MAX_TARGET_LENGTH = 128

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

first_tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-race-Distractor")
first_model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-race-Distractor").to("cuda")

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
import re
def postprocess_distractor(dis):
    """Post process generated distractors.
    Pipeline: Remove model's tags, remove redundant spaces, capitalize first word.

    Args:
        dis (str): generated distractor

    Returns:
        str: cleaned distractor
    """

    new_dis = dis
    special_tags = ['</s>', '<unk>', '<sep>', '<pad>',]
    for tag in special_tags:
        new_dis = new_dis.replace(tag, '') 

    new_dis = re.sub(r'\s+', ' ', new_dis)

    new_dis_c = list(new_dis)
    for i in range(len(new_dis_c)):
        if len(new_dis_c[i].strip()) == 0:
            continue
        new_dis_c[i] = new_dis_c[i].upper()
        new_dis_c = new_dis_c[i:]
        break
    return ''.join(new_dis_c)

In [5]:
def generate_distractors_first_model(context, question, answer):
    """Generate 3 distractors using the first model. Need available first_model and first_tokenizer

    Args:
        context (str): 
        question (str): 
        answer (str): 
    Returns:
        (dis1, dis2, dis3)
    """
    input = ' '.join([question, first_tokenizer.sep_token, answer, first_tokenizer.sep_token, context])
    input_tokens = first_tokenizer(input, return_tensors='pt').to('cuda')

    output = first_model.generate(**input_tokens, max_new_tokens=MAX_TARGET_LENGTH)

    distractors = first_tokenizer.decode(output[0], skip_special_tokens=False)
    distractors = distractors.replace(first_tokenizer.pad_token, "").replace(first_tokenizer.eos_token, "")
    distractors = [dis.strip() for dis in distractors.split(first_tokenizer.sep_token)]

    assert len(distractors) == 3
    dis1 = postprocess_distractor(distractors[0])
    dis2 = postprocess_distractor(distractors[1])
    dis3 = postprocess_distractor(distractors[2])

    return (dis1, dis2, dis3)

# Second Model: https://huggingface.co/voidful/bart-distractor-generation-both

In [47]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

second_tokenizer = AutoTokenizer.from_pretrained("voidful/bart-distractor-generation-both")
second_model = AutoModelForSeq2SeqLM.from_pretrained("voidful/bart-distractor-generation-both")

In [46]:
MAX_SECOND_INPUT_LENGTH = 1024
MAX_SECOND_TARGET_LENGTH = 128

In [63]:
def generate_distractors_second_model(context, question, answer):
    """Generate 3 distractors using the second model. Need available second_model and second_tokenizer

    Args:
        context (str): 
        question (str): 
        answer (str): 
    Returns:
        (dis1, dis2, dis3)
    """
    input = ' '.join([context, second_tokenizer.sep_token, question, second_tokenizer.sep_token, answer])
    input_tokens = second_tokenizer(input, max_length=MAX_SECOND_INPUT_LENGTH, padding='max_length', truncation=True, return_tensors='pt')

    output = second_model.generate(input_tokens['input_ids'])
    distractor = second_tokenizer.decode(output[0], skip_special_tokens=True)
    
    return distractor

In [72]:
context = r"""
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn 
from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to 
surpass many previous approaches in performance. ML finds application in many fields, including natural language processing, computer vision, speech recognition, 
email filtering, agriculture, and medicine. When applied to business problems, it is known under the name predictive analytics. Although not all machine learning 
is statistically based, computational statistics is an important source of the field\'s methods. The mathematical foundations of ML are provided by mathematical 
optimization (mathematical programming) methods. Data mining is a related (parallel) field of study, focusing on exploratory data analysis (EDA) through unsupervised learning. 
From a theoretical viewpoint, probably approximately correct (PAC) learning provides a framework for describing machine learning. HistoryThe term machine learning was 
coined in 1959 by Arthur Samuel, an IBM employee and pioneer in the field of computer gaming and artificial intelligence. The synonym self-teaching computers was also 
used in this time period. Although the earliest machine learning model was introduced in the 1950s when Arthur Samuel invented a program that calculated the winning 
chance in checkers for each side, the history of machine learning roots back to decades of human desire and effort to study human cognitive processes. In 1949, Canadian 
psychologist Donald Hebb published the book The Organization of Behavior, in which he introduced a theoretical neural structure formed by certain interactions among nerve cells. 
Hebb\'s model of neurons interacting with one another set a groundwork for how AIs and machine learning algorithms work under nodes, or artificial neurons used by computers 
to communicate data. Other researchers who have studied human cognitive systems contributed to the modern machine learning technologies as well, including logician Walter 
Pitts and Warren McCulloch, who proposed the early mathematical models of neural networks to come up with algorithms that mirror human thought processes. By the early 1960s an 
experimental "learning machine" with punched tape memory, called Cybertron, had been developed by Raytheon Company to analyze sonar signals, electrocardiograms, and speech 
patterns using rudimentary reinforcement learning. It was repetitively "trained" by a human operators/teacher to recognize patterns and equipped with a "goof" button to cause it 
to re-evaluate incorrect decisions. A representative book on research into machine learning during the 1960s was Nilsson\'s book on Learning Machines, dealing mostly with machine 
learning for pattern classification. Interest related to pattern recognition continued into the 1970s, as described by Duda and Hart in 1973. In 1981 a report was given on using 
teaching strategies so that an artificial neural network learns to recognize 40 characters (26 letters, 10 digits, and 4 special symbols) from a computer terminal.""".replace("\n", "")

question = """For what purpose did Walter Pitts and Warren McCulloch propose the early mathematical models of neural networks?"""
answer = """To come up with algorithms that mirror human thought processes."""

generate_distractors_second_model(context, question, answer)

'To show how machine learning can be applied'

# Overall Run

In [6]:
def add_distractors_to_dataframe(question_path, context):
    """Add distractors to dataframe containing questions and answers

    Args:
        question_path (DataFrame):
    Returns:
        DataFrame: dataframe with distractors added
    """

    questions = pd.read_csv(question_path)
    distractors = {'dis1': [], 'dis2': [], 'dis3': []}
    for ques, ans in zip(questions['question'], questions['answer']):
        dis1, dis2, dis3 = generate_distractors_first_model(context, ques, ans)
        distractors['dis1'].append(dis1)
        distractors['dis2'].append(dis2)
        distractors['dis3'].append(dis3)

    full_questions_with_distractors = pd.concat([questions, pd.DataFrame(distractors)], axis=1)
    
    return full_questions_with_distractors

In [7]:
def get_distractors_for_dataframe(question_path, context):
    """get distractors for dataframe containing questions and answers

    Args:
        question_path (DataFrame):
    Returns:
        DataFrame: dataframe with distractors added
    """

    questions = pd.read_csv(question_path)
    distractors = {'question': [], 'answer': [], 'dis1': [], 'dis2': [], 'dis3': []}

    for ques, ans in zip(questions['question'], questions['answer']):
        dis1, dis2, dis3 = generate_distractors_first_model(context, ques, ans)
        distractors['dis1'].append(dis1)
        distractors['dis2'].append(dis2)
        distractors['dis3'].append(dis3)
        distractors['question'].append(ques)
        distractors['answer'].append(ans)
    
    return pd.DataFrame(distractors)

In [8]:
def count_same_distractor(questions, keys=['dis1', 'dis2', 'dis3']):
    cnt = 0
    for row in questions.iterrows():
        dis1 = row[1][keys[0]]
        dis2 = row[1][keys[1]]
        dis3 = row[1][keys[2]]

        if (dis1.lower().strip() == dis2.lower().strip()) and (dis1.lower().strip() == dis3.lower().strip()) and (dis2.lower().strip() == dis3.lower().strip()):
            cnt += 1
    return cnt

In [9]:
context_path = 'wikipedia_articles/personalized_learning.txt'
question_path = 'generated_questions/my_model/personalized_learning/questions_and_distractors_personalized_learning.csv'

In [10]:
f = open(context_path, 'r')
context = f.read()

# questions_and_distractors = add_distractors_to_dataframe(question_path, context)
questiosn_and_distactors = get_distractors_for_dataframe(question_path, context)
count_same_distractor(questiosn_and_distactors)

Token indices sequence length is longer than the specified maximum sequence length for this model (2120 > 512). Running this sequence through the model will result in indexing errors


11

In [11]:
context_path = 'wikipedia_articles/economic_depression.txt'
question_path = 'generated_questions/my_model/economic_depression/questions_and_distractors_economic_depression.csv'

In [12]:
f = open(context_path, 'r')
context = f.read()

# questions_and_distractors = add_distractors_to_dataframe(question_path, context)
questiosn_and_distactors = get_distractors_for_dataframe(question_path, context)
count_same_distractor(questiosn_and_distactors)

3

Personality Same Distractor: 38
Personalized Learning Same Distractor: 1
Economic Depression Same Distractor: 3

In [9]:
questions_and_distractors.to_csv("generated_questions/mixqg_questions_and_t5lgrd_distractors_personlized_learning.csv")

In [10]:
questions_and_distractors.head(5)

Unnamed: 0,source_sent,question,answer,dis1,dis2,dis3
0,"Personalized learning, individualized instruction, personal learning environment and direct instruction all refer to efforts to tailor education to meet the different needs of students.","What do personalized learning, personal learning environment, and direct instruction all refer to efforts to tailor education to meet?",Different needs of students.,Different levels of students.,Different types of students.,Different types of teachers.
1,"\n\nThe use of the term ""personalized learning"" dates back to at least the early 1960s, but there is no widespread agreement on the definition and components of a personal learning environment.","When did the term ""personalized learning"" date back to?",Early 1960s.,Early 1970s.,Early 2000s.,Early 2000s.
2,Even enthusiasts for the concept admit that personal learning is an evolving term and doesn't have any widely accepted definition.,Is personalized learning a widely accepted term?,Doesn't have any widely accepted definition.,Is a new concept in education.,Is a new way of teaching.,Is a new way of teaching.
3,"\n\nIn 2005, Dan Buckley defined two ends of the personalized learning spectrum: ""personalization for the learner"", in which the teacher tailors the learning, and ""personalization by the learner"", in which the learner develops skills to tailor his own learning.","Who defines personalized learning as ""personalization for the learner""?",Dan Buckley.,Lucy Calkins.,Katie Wood Ray.,Eduard Pogorskiy.
4,"\n\nIn 2005, Dan Buckley defined two ends of the personalized learning spectrum: ""personalization for the learner"", in which the teacher tailors the learning, and ""personalization by the learner"", in which the learner develops skills to tailor his own learning.",What does personalized learning mean?,"""personalization for the learner"".","""personalization by the learner"".","""personalization by the teacher"".","""personalization by the learner""."


In [11]:
# write full question to file

questions_and_distractors_file_path = "generated_questions/full_mixqg_questions_and_t5lgrd_distractors_personalized_learning"
with open(questions_and_distractors_file_path, 'w') as f:
    f.write(f"Context: {context}\n")
    f.write("--------------------\n")

    for row in questions_and_distractors.iterrows():
        f.write(f"\nQuestion: {row[1]['question']}")
        f.write(f"\nAnswer: {row[1]['answer']}")
        f.write(f"\nDistractor 1: {row[1]['dis1']}")
        f.write(f"\nDistractor 2: {row[1]['dis2']}")
        f.write(f"\nDistractor 3: {row[1]['dis3']}\n")

In [12]:
questions_and_distractors

Unnamed: 0,source_sent,question,answer,dis1,dis2,dis3
0,"Personalized learning, individualized instruction, personal learning environment and direct instruction all refer to efforts to tailor education to meet the different needs of students.","What do personalized learning, personal learning environment, and direct instruction all refer to efforts to tailor education to meet?",Different needs of students.,Different levels of students.,Different types of students.,Different types of teachers.
1,"\n\nThe use of the term ""personalized learning"" dates back to at least the early 1960s, but there is no widespread agreement on the definition and components of a personal learning environment.","When did the term ""personalized learning"" date back to?",Early 1960s.,Early 1970s.,Early 2000s.,Early 2000s.
2,Even enthusiasts for the concept admit that personal learning is an evolving term and doesn't have any widely accepted definition.,Is personalized learning a widely accepted term?,Doesn't have any widely accepted definition.,Is a new concept in education.,Is a new way of teaching.,Is a new way of teaching.
3,"\n\nIn 2005, Dan Buckley defined two ends of the personalized learning spectrum: ""personalization for the learner"", in which the teacher tailors the learning, and ""personalization by the learner"", in which the learner develops skills to tailor his own learning.","Who defines personalized learning as ""personalization for the learner""?",Dan Buckley.,Lucy Calkins.,Katie Wood Ray.,Eduard Pogorskiy.
4,"\n\nIn 2005, Dan Buckley defined two ends of the personalized learning spectrum: ""personalization for the learner"", in which the teacher tailors the learning, and ""personalization by the learner"", in which the learner develops skills to tailor his own learning.",What does personalized learning mean?,"""personalization for the learner"".","""personalization by the learner"".","""personalization by the teacher"".","""personalization by the learner""."
5,"\n\nIn 2005, Dan Buckley defined two ends of the personalized learning spectrum: ""personalization for the learner"", in which the teacher tailors the learning, and ""personalization by the learner"", in which the learner develops skills to tailor his own learning.",What does personalized learning mean?,"""personalization by the learner"".","""personalization for the learner"".","""personalization for the teacher"".","""personalization for the teacher""."
6,\n\nThe United States National Education Technology Plan 2017 defines personalized learning as follows:\nPersonalized learning refers to instruction in which the pace of learning and the instructional approach are optimized for the needs of each learner.,What defines personalized learning as follows?,United States National Education Technology Plan 2017.,The Writing Workshop.,The Teaching of English.,The Practical Guide to Envisioning and Transforming Education.
7,\n\nThe United States National Education Technology Plan 2017 defines personalized learning as follows:\nPersonalized learning refers to instruction in which the pace of learning and the instructional approach are optimized for the needs of each learner.,What is optimized for the needs of each learner?,Pace of learning.,Instructional approach.,Learning objectives.,Instructional content.
8,"In addition, learning activities are meaningful and relevant to learners, driven by their interests, and often self-initiated.","What is the meaning of ""personalized learning""?",Meaningful and relevant to learners.,Developed by Microsoft.,Developed by teachers.,Developed by teachers.
9,"\n\nAccording to researcher Eduard Pogorskiy:\nICT can be a powerful tool for personalized learning as it allows learners access to research and information, and provides a mechanism for communication, debate, and recording learning achievements.",Who said that ICT can be a powerful tool for personalized learning?,Eduard Pogorskiy.,Antony Smith.,Lucy Calkins.,Katie Wood Ray.
