# Quiz / Question Generation using NLP

# Part A – Application area review

## 1.1 Utilities of Quiz applications

Quizzes are an integral tool of modern educational practices, leveraging technology to enhance learning experiences. In the era of digital education, quiz-related software applications offer numerous advantages, including real-time tests, immediate feedback, device flexibility, and the ability to conduct assessments beyond traditional classroom settings (Romero and Remón, 2022). This section explores the utilities of quizzes, shedding light on their multifaceted benefits.

### Advantages of Quiz-Related Software Applications:
**Real-time Tests:**
Quiz-related software applications enable the creation and administration of real-time tests, allowing educators to assess students' knowledge instantly. This feature fosters a dynamic learning environment, where learners can receive timely feedback and adapt their understanding of the material promptly.

**Immediate Feedback:**
One of the standout features of quiz applications is the provision of immediate feedback. Learners can gauge their performance instantly, identifying areas of strength and weakness. This quick feedback loop promotes active learning, as students can address misconceptions in real-time, reinforcing the learning process.

**Device Flexibility:**
Quiz applications offer the convenience of accessibility from various devices such as smartphones, tablets, and personal computers. This flexibility accommodates diverse learning styles and preferences, allowing students to engage with quizzes at their convenience, regardless of the device they use.

**Anywhere, Anytime Access:**
The asynchronous nature of quiz applications facilitates learning beyond traditional classroom boundaries. Students can participate in tests and games at any time and from anywhere, promoting a self-paced learning approach. This accessibility empowers learners to take control of their education, fostering a sense of autonomy.

### Effectiveness of quiz related applications
Research studies support the effectiveness of quiz-related software applications in education, particularly online pre-lecture quizzes. The findings indicate a sustained increase in students' engagement with course content, leading to improved attendance in lectures. The positive correlation between quiz utilization and student participation is evident as the implementation of pre-lecture quizzes resulted in a measurable rise in student interactions with course material. This heightened engagement dispelled concerns about negative impacts on students' commitment to traditional lectures. Moreover, the intervention of incorporating online quizzes positively influenced overall academic performance, with observed improvements in grades, highlighting the contribution of quiz-related software to enhanced comprehension and retention of course content (Evans et al., 2021).

## 1.2 Question generation using AI

Automatic Question Generation (AQG) stands at the forefront of innovative applications in the field of artificial intelligence, revolutionizing the way questions are created with minimal human intervention. This process involves leveraging machine learning models to produce valid, fluent, and comprehensible questions, often drawing insights from a given passage and additional context provided by the target answer (Shaheer et al., 2023). The overarching aim is to generate questions that are both syntactically and semantically correct, maintaining relevance to the provided context. The key objectives are the following,

**Valid Questions:**
AQG aims to produce questions that are not only grammatically correct but also logically valid. The machine learning models are trained to ensure that the generated questions make sense in the given context and contribute meaningfully to the overall comprehension.

**Fluency:**
Another crucial objective is to achieve fluency in question construction. The generated questions should read naturally, mirroring the linguistic nuances present in the original passage. This fluency enhances the overall learning experience for students.

**Minimal Human Involvement:**
One of the defining features of AQG is its ability to minimize human involvement in the question generation process. By automating this task, educators can save time and resources while ensuring a consistent and standardized approach to generating questions(Mulla and Gharpure, 2023).

Overall Automatic Question Generation, propelled by machine learning advancements, has demonstrated its potential in reshaping educational practices. By focusing on creating valid, fluent, and contextually relevant questions with minimal human input, AQG offers a powerful solution for enhancing language learning and comprehension. As technology continues to evolve, the integration of AQG in educational settings holds promise for fostering more dynamic and efficient learning environments.

Furthermore, the following paper highlights the evaluations for the given approaches(Mulla and Gharpure, 2023).

<a href="https://ibb.co/ssH8zjs"><img src="https://i.ibb.co/9ytQSny/SS1.png" alt="SS1" border="0"></a>
<a href="https://ibb.co/qBk1tqq"><img src="https://i.ibb.co/8M68Wvv/SS2.png" alt="SS2" border="0"></a>


# Part B – Evaluation of AI techniques

## 2.1 Question generation using Neural Networks

In the realm of question generation, neural networks play a pivotal role in extracting meaningful inquiries from given text sequences. The study in question leveraged advanced neural techniques, specifically Recurrent Neural Networks (RNNs) with attention mechanisms and the Transformer architecture, to achieve this task (Ferreira, 2019). Neural networks excel at capturing intricate patterns and relationships within data, allowing them to generate distinct and contextually relevant questions. The advantages of employing neural networks for question generation lie in their ability to produce correct, relevant, and even interesting queries, as demonstrated by the positive scores in human evaluations in the study. However, this approach is not without its challenges. The comparison between RNNs and the Transformer revealed difficulties in achieving conclusive results, attributed to a lack of shared words with reference questions and issues such as textual coherence, including repeated words. Despite these challenges, the automatic evaluations approached state-of-the-art levels, highlighting the potential of neural networks, particularly the Transformer architecture, as a promising alternative for advancing question generation tools. Another prominent drawback is the consistent inclusion of words from the answer in the generated questions, resulting in unintended queries. This limitation is particularly evident in existing NQG systems that heavily rely on the Recurrent Neural Network (RNN) sequence-to-sequence model. RNNs, by nature, lack the capability to effectively model high-level variability, leading to the generation of improper questions that do not have a specific target(Kim et al., 2019). To further enhance the performance of these models, the study suggests exploring alternative approaches for pre-trained word embeddings, experimenting with different embedding techniques like BERT, and fine-tuning hyperparameters that were initially set to default values.

Overall, while neural networks offer a powerful tool for question generation, ongoing efforts are needed to overcome specific challenges and unlock their full potential in improving the quality and relevance of generated questions.
## 2.2 Question generation using Reinforcement Learning
The following research addresses the complex task of automatic question generation (QG) within the field of natural language processing (NLP) by introducing a novel deep reinforcement learning (RL) based framework(Kumar et al., 2018). QG involves the generation of questions that are both syntactically and semantically accurate from diverse input formats, such as textual information or knowledge bases. Traditional methods, which rely on rule-based transformations and predefined templates, are acknowledged for their limitations, particularly in handling rare words and preventing word repetition, prompting the need for more sophisticated approaches.

The proposed RL-based framework introduces a sequence-to-sequence model, enhanced with a copy mechanism to address the challenges associated with rare words, and a coverage mechanism aimed at mitigating issues related to word repetition. Notably, the model is trained to optimize task-specific scores, including BLEU, GLEU, and ROUGE-L, through a reward-based system provided by an evaluator model. This reinforcement learning paradigm allows for a more nuanced and targeted training process, potentially enhancing the quality and diversity of the generated questions.It exhibits effectiveness in handling rare words through the incorporation of the copy mechanism and addresses the problem of word repetition using the coverage mechanism. Moreover, the model is capable of directly optimizing task-specific evaluation measures, providing a more tailored and contextually relevant approach to question generation.

However, it's essential to consider potential disadvantages associated with RL-based models. They often introduce increased complexity during the training phase, requiring meticulous parameter tuning and at times longer training durations. Furthermore, the implementation of the proposed approach may introduce additional intricacies during the inference phase, potentially impacting real-time applications. Designing an effective reward function for RL remains a challenge, as poorly designed rewards may lead to suboptimal results, necessitating careful consideration during model development.

In summary, the RL-based framework demonstrates notable advantages in addressing specific challenges within QG. Nevertheless, its implementation introduces complexities and considerations related to training and inference, emphasizing the need for a balanced approach in leveraging the benefits of RL for enhanced question generation capabilities in NLP applications.
## 2.3 Question generation using Transformer Models
Question generation using transformer models involves leveraging machine learning techniques to automatically create valid and coherent questions based on given passages and target answers. Transformer models, exemplified by architectures like T5 (Text-to-Text Transfer Transformer), have proven highly effective in this task(Shaheer et al., 2023).

**Methodology:**

**Contextual Understanding:** Transformer models are pre-trained on massive corpora, allowing them to understand contextual nuances and linguistic structures.

**Fine-Tuning**: The pre-trained models are fine-tuned using specific datasets, such as Stanford SQUAD, to adapt them for question generation tasks.

**Input and Output:** During training, the model receives passages and target answers as inputs and learns to generate questions as outputs. The transformer's encoder-decoder architecture enables this sequence-to-sequence transformation.

**Advantages:**

**Contextual Awareness:** Transformer models excel at capturing contextual information, allowing them to generate questions that consider the broader context of a passage.

**Linguistic Accuracy:** Due to their pre-training on vast amounts of text data, transformers often produce linguistically accurate and well-structured questions.

**Versatility:** These models can be fine-tuned for various downstream tasks, making them versatile for question generation in different contexts.

**Effective Transfer Learning**: Pre-training on extensive corpora facilitates effective transfer learning, enabling the model to perform well even on unobserved data.

**Disadvantages:**

**Computational Resources:** Training and fine-tuning transformer models require substantial computational resources, limiting accessibility for some researchers or applications.

**Need for Large Datasets:** Successful fine-tuning often relies on large, high-quality datasets. Availability of such datasets might be a limitation in certain domains or languages.

**Overfitting to Training Data:** Depending on the training data, transformers might overfit to specific styles or contexts, affecting their generalization capabilities.

**Lack of Explainability:** Transformer models, being complex neural networks, lack transparency and might produce questions without clear explanations, limiting their interpretability.

In conclusion, question generation using transformer models offers advanced capabilities in understanding context and generating linguistically accurate questions. However, challenges include resource-intensive training, dataset requirements, potential overfitting, and the inherent lack of explainability. As transformer technology evolves, addressing these challenges will likely enhance their effectiveness in question generation tasks


# Part C – Implementation.

## 3.1 High Level Diagram
<a href="https://ibb.co/9hN6TK4"><img src="https://i.ibb.co/KDXdq4s/highlevel-diagram-drawio.png" alt="highlevel-diagram-drawio" border="0"></a>

# Dependancies

In [1]:
!pip install --quiet  datasets
!pip install --quiet flashtext==2.7
!pip install git+https://github.com/boudinfl/pke.git
!pip install --quiet pyarrow
!pip install --quiet  tqdm
!pip install --quiet transformers==4.8.1
!pip install --quiet tokenizers
!pip install --quiet sentencepiece==0.1.95
!pip install --quiet pytorch-lightning
!pip install --quiet torchtext
!pip install --quiet textwrap3==0.9.2
!pip install --quiet gradio==3.0.20
!pip install --quiet strsim==0.0.3
!pip install --quiet sense2vec==2.0.0
!pip install --quiet ipython-autotime
!pip install --quiet sentence-transformers==2.2.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for flashtext (setup.py) ... [?25l[?25hdone
Collecting git+https://github.com/boudinfl/pke.git
  Cloning https://github.com/boudinfl/pke.git to /tmp/pip-req-build-nc0brro3
  Running command git clone --filter=blob:none --quiet https://github.com/boudinfl/pke.git /tmp/pip-req-build-nc0brro3
  Resolved https://github.com/boudinfl/pke.git to commit 69871ffdb720b83df23684fea53ec8776fd87e63
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting unidecode (from pke==2.0.0)
  Downloading Unidecode-1.3.7-py3-none-any.whl (235 kB)
[2K     [90m━━━━━

# Fetch SQUAD dataset for training (3.2 required Input)

In [2]:
#imports
import pandas as pd
import torch
from tqdm import tqdm
from datasets import load_dataset
from torch.utils.data import Dataset, DataLoader
from pprint import pprint
import copy

device  = 'cuda' if torch.cuda.is_available() else "cpu"
pd.options.display.max_rows , pd.options.display.max_columns  = 100,100

In [3]:
def create_pandas_dataset(data,
                          answer_threshold=7,
                          verbose = False):

  ''' Create a Pandas Dataframe from hugging face dataset.
  Params:
        answer_threshold: Only consider those Question Answer pairs where the Answer is short.
  '''
  count_long ,count_short = 0 , 0
  result_df  = pd.DataFrame(columns = ['context', 'answer','question'])
  for index,val in enumerate(tqdm(data)):
      passage = val['context']
      question = val['question']
      answer = val['answers']['text'][0]
      no_of_words = len(answer.split())
      if no_of_words >= answer_threshold:
          count_long = count_long + 1
          continue
      else:
          result_df.loc[count_short] = [passage] + [answer] + [question]
          count_short = count_short + 1
  if verbose:
    return (result_df,
            count_long,
            count_short)
  else:
    return result_df

In [4]:
train_dataset = load_dataset('squad', split='train')
valid_dataset = load_dataset('squad', split='validation')
print(f"Total Train Samples:{len(train_dataset)} , Total Validation Samples:{len(valid_dataset)}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.83k [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading data:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

Total Train Samples:87599 , Total Validation Samples:10570


Display values in dataset

In [5]:
sample_validation_dataset = next(iter(valid_dataset))
pprint (sample_validation_dataset)

{'answers': {'answer_start': [177, 177, 177],
             'text': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos']},
 'context': 'Super Bowl 50 was an American football game to determine the '
            'champion of the National Football League (NFL) for the 2015 '
            'season. The American Football Conference (AFC) champion Denver '
            'Broncos defeated the National Football Conference (NFC) champion '
            'Carolina Panthers 24–10 to earn their third Super Bowl title. The '
            "game was played on February 7, 2016, at Levi's Stadium in the San "
            'Francisco Bay Area at Santa Clara, California. As this was the '
            '50th Super Bowl, the league emphasized the "golden anniversary" '
            'with various gold-themed initiatives, as well as temporarily '
            'suspending the tradition of naming each Super Bowl game with '
            'Roman numerals (under which the game would have been known as '
            '"Super

In [6]:
df_train , df_validation = create_pandas_dataset(train_dataset) , create_pandas_dataset(valid_dataset)
print(f"\n Total Train Samples:{df_train.shape} , Total Validation Samples:{df_validation.shape}")

100%|██████████| 87599/87599 [05:56<00:00, 245.54it/s]
100%|██████████| 10570/10570 [00:34<00:00, 303.66it/s]


 Total Train Samples:(78664, 3) , Total Validation Samples:(9652, 3)





In [7]:
# Save data for future use
df_train.to_parquet('train_squad.parquet')
df_validation.to_parquet('validation_squad.parquet')

# Create Pytorch DataSet for training and validation of T5 model (3.2)

In [8]:
from transformers import (
    AdamW,
    T5ForConditionalGeneration,
    T5Tokenizer,
    get_linear_schedule_with_warmup
)

t5_tokenizer = T5Tokenizer.from_pretrained('t5-base',model_max_length=512)
t5_model = T5ForConditionalGeneration.from_pretrained('t5-base')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [9]:
class QuestionGenerationDataset(Dataset):
    def __init__(self, tokenizer, filepath, max_len_inp=512,max_len_out=96):
        self.path = filepath

        self.passage_column = "context"
        self.answer = "answer"
        self.question = "question"

        # self.data = pd.read_csv(self.path)
        self.data = pd.read_parquet(self.path).iloc[:2000,:]

        self.max_len_input = max_len_inp
        self.max_len_output = max_len_out
        self.tokenizer = tokenizer
        self.inputs = []
        self.targets = []
        self._build()

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, index):
        source_ids = self.inputs[index]["input_ids"].squeeze()
        target_ids = self.targets[index]["input_ids"].squeeze()

        src_mask = self.inputs[index]["attention_mask"].squeeze()  #squeeze to get rid of the batch dimension
        target_mask = self.targets[index]["attention_mask"].squeeze()  # convert [batch,dim] to [dim]

        labels = copy.deepcopy(target_ids)
        labels [labels==0] = -100

        return {"source_ids": source_ids, "source_mask": src_mask, "target_ids": target_ids, "target_mask": target_mask,"labels":labels}

    def _build(self):
        for rownum,val in tqdm(self.data.iterrows()): # Iterating over the dataframe
            passage,answer,target = val[self.passage_column],val[self.answer],val[self.question]

            input_ = f"context: {passage}  answer: {answer}" # T5 Input format
            target = f"question: {str(target)}" # Output format

            # tokenize inputs
            tokenized_inputs = self.tokenizer.batch_encode_plus(
                [input_], max_length=self.max_len_input,padding='max_length',
                truncation = True,return_tensors="pt"
            )
            # tokenize targets
            tokenized_targets = self.tokenizer.batch_encode_plus(
                [target], max_length=self.max_len_output,padding='max_length',
                truncation = True,
                return_tensors="pt"
            )

            self.inputs.append(tokenized_inputs)
            self.targets.append(tokenized_targets)

In [10]:
train_path = 'train_squad.parquet'
validation_path = 'validation_squad.parquet'
train_dataset = QuestionGenerationDataset(t5_tokenizer,train_path)
validation_dataset = QuestionGenerationDataset(t5_tokenizer,validation_path)

2000it [00:05, 337.35it/s]
2000it [00:05, 372.42it/s]


Show training input and output

In [11]:
train_sample = train_dataset[50] # thanks to __getitem__
decoded_train_input = t5_tokenizer.decode(train_sample['source_ids'])
decoded_train_output = t5_tokenizer.decode(train_sample['target_ids'])

print(decoded_train_input)
print(decoded_train_output)

context: In 1882, Albert Zahm (John Zahm's brother) built an early wind tunnel used to compare lift to drag of aeronautical models. Around 1899, Professor Jerome Green became the first American to send a wireless message. In 1931, Father Julius Nieuwland performed early work on basic reactions that was used to create neoprene. Study of nuclear physics at the university began with the building of a nuclear accelerator in 1936, and continues now partly through a partnership in the Joint Institute for Nuclear Astrophysics. answer: Professor Jerome Green</s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>

# Fine Tuning T5 Model (3.2-3.3)

In [12]:
import pytorch_lightning as pl
from torch.optim import AdamW
import argparse
from transformers import (
    get_linear_schedule_with_warmup
  )

class T5Tuner(pl.LightningModule):

    def __init__(self,t5model, t5tokenizer,batchsize=4):
        super().__init__()
        self.model = t5model
        self.tokenizer = t5tokenizer
        self.batch_size = batchsize

    def forward( self, input_ids, attention_mask=None,
                decoder_attention_mask=None,
                lm_labels=None):

         outputs = self.model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            decoder_attention_mask=decoder_attention_mask,
            labels=lm_labels,
        )

         return outputs

    def training_step(self, batch, batch_idx):
        outputs = self.forward(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            decoder_attention_mask=batch['target_mask'],
            lm_labels=batch['labels']
        )

        loss = outputs[0]
        self.log('train_loss',loss)
        return loss

    def validation_step(self, batch, batch_idx):
        outputs = self.forward(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            decoder_attention_mask=batch['target_mask'],
            lm_labels=batch['labels']
        )

        loss = outputs[0]
        self.log("val_loss",loss)
        return loss

    def train_dataloader(self):
        return DataLoader(train_dataset, batch_size=self.batch_size,
                          num_workers=2)

    def val_dataloader(self):
        return DataLoader(validation_dataset,
                          batch_size=self.batch_size,
                          num_workers=2)

    def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=3e-4, eps=1e-8)
        return optimizer

In [13]:
model = T5Tuner(t5_model,t5_tokenizer)

trainer = pl.Trainer(max_epochs = 3,accelerator=device)

trainer.fit(model)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 222 M 
-----------------------------------------------------
222 M     Trainable params
0         Non-trainable params
222 M     Total params
891.614   Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=3` reached.


In [14]:
# saving the model
!mkdir "t5_tokenizer"
!mkdir "t5_trained_model"
model.model.save_pretrained('t5_trained_model')
t5_tokenizer.save_pretrained('t5_tokenizer')

('t5_tokenizer/tokenizer_config.json',
 't5_tokenizer/special_tokens_map.json',
 't5_tokenizer/spiece.model',
 't5_tokenizer/added_tokens.json')

# Load Models (3.3 Implementation of working prototype)

In [15]:
question_model_1 = T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_squad_v1')
question_tokenizer_1 = T5Tokenizer.from_pretrained('ramsrigouthamg/t5_squad_v1')
question_model_1 = question_model_1.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.86k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

In [16]:
question_model_2 = T5ForConditionalGeneration.from_pretrained('mrm8488/t5-base-finetuned-question-generation-ap')
question_tokenizer_2 = T5Tokenizer.from_pretrained('mrm8488/t5-base-finetuned-question-generation-ap')
question_model_2 = question_model_2.to(device)

config.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


model.safetensors:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


In [25]:
trained_model_path = 't5_trained_model'
trained_tokenizer = 't5_tokenizer'
custom_model = T5ForConditionalGeneration.from_pretrained(trained_model_path)
custom_tokenizer = T5Tokenizer.from_pretrained(trained_tokenizer)
custom_model = custom_model.to(device)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Load Texts (4.0 Part D - Testing)

In [48]:
from textwrap3 import wrap

text = """
Following Apple's fiscal quarter report on October 21, 2008, it was revealed that only 14.21% of the total revenue for that period came from iPods. The decline in iPod
sales became more pronounced over the subsequent years. At the Apple Event keynote presentation on September 9, 2009, Phil Schiller announced a milestone – cumulative sales of
iPods had surpassed 220 million units. However, this success was not indicative of the future trajectory for iPods. By June 2009, Apple's Chief Financial Officer,
Peter Oppenheimer, acknowledged the anticipated decline in traditional MP3 player sales, attributing it to the cannibalization effect caused by the success of the iPod
Touch and the iPhone. As expected, iPod sales consistently dwindled in every financial quarter following 2009, leading to the notable absence of a new iPod model
introduction in 2013. This strategic shift in focus, prioritizing the iPhone and iPod Touch, reflected Apple's adaptability in responding to market trends and evolving consumer
preferences, ultimately reshaping the company's product portfolio."""

for wrp in wrap(text, 150):
  print (wrp)
print ("\n")

 Following Apple's fiscal quarter report on October 21, 2008, it was revealed that only 14.21% of the total revenue for that period came from iPods.
The decline in iPod  sales became more pronounced over the subsequent years. At the Apple Event keynote presentation on September 9, 2009, Phil
Schiller announced a milestone – cumulative sales of  iPods had surpassed 220 million units. However, this success was not indicative of the future
trajectory for iPods. By June 2009, Apple's Chief Financial Officer,  Peter Oppenheimer, acknowledged the anticipated decline in traditional MP3
player sales, attributing it to the cannibalization effect caused by the success of the iPod  Touch and the iPhone. As expected, iPod sales
consistently dwindled in every financial quarter following 2009, leading to the notable absence of a new iPod model  introduction in 2013. This
strategic shift in focus, prioritizing the iPhone and iPod Touch, reflected Apple's adaptability in responding to market trends an

# Text Summarization (4.0)

In [49]:
import torch
from transformers import T5ForConditionalGeneration,T5Tokenizer
summary_model = T5ForConditionalGeneration.from_pretrained('t5-base')
summary_tokenizer = T5Tokenizer.from_pretrained('t5-base')

summary_model = summary_model.to(device)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [50]:
import random
import numpy as np

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(42)

In [51]:
import nltk
nltk.download('punkt')
nltk.download('brown')
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
from nltk.tokenize import sent_tokenize

def postprocesstext (content):
  final=""
  for sent in sent_tokenize(content):
    sent = sent.capitalize()
    final = final +" "+sent
  return final


def summarizer(text,model,tokenizer):
  text = text.strip().replace("\n"," ")
  text = "summarize: "+text
  # print (text)
  max_len = 512
  encoding = tokenizer.encode_plus(text,max_length=max_len, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)

  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=3,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  min_length = 75,
                                  max_length=300)


  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]
  summary = dec[0]
  summary = postprocesstext(summary)
  summary= summary.strip()

  return summary


summarized_text = summarizer(text,summary_model,summary_tokenizer)


print ("\noriginal Text >>")
for wrp in wrap(text, 150):
  print (wrp)
print ("\n")
print ("Summarized Text >>")
for wrp in wrap(summarized_text, 150):
  print (wrp)
print ("\n")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!



original Text >>
 Following Apple's fiscal quarter report on October 21, 2008, it was revealed that only 14.21% of the total revenue for that period came from iPods.
The decline in iPod  sales became more pronounced over the subsequent years. At the Apple Event keynote presentation on September 9, 2009, Phil
Schiller announced a milestone – cumulative sales of  iPods had surpassed 220 million units. However, this success was not indicative of the future
trajectory for iPods. By June 2009, Apple's Chief Financial Officer,  Peter Oppenheimer, acknowledged the anticipated decline in traditional MP3
player sales, attributing it to the cannibalization effect caused by the success of the iPod  Touch and the iPhone. As expected, iPod sales
consistently dwindled in every financial quarter following 2009, leading to the notable absence of a new iPod model  introduction in 2013. This
strategic shift in focus, prioritizing the iPhone and iPod Touch, reflected Apple's adaptability in responding t

# Keywords (Answers) Extraction (4.0)

In [52]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import string
import pke
import traceback

def get_nouns_multipartite(content):
    out=[]
    try:
        extractor = pke.unsupervised.MultipartiteRank()
        extractor.load_document(input=content,language='en')
        #    not contain punctuation marks or stopwords as candidates.
        pos = {'PROPN','NOUN'}
        #pos = {'PROPN','NOUN'}
        stoplist = list(string.punctuation)
        stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
        stoplist += stopwords.words('english')
        # extractor.candidate_selection(pos=pos, stoplist=stoplist)
        extractor.candidate_selection(pos=pos)
        # 4. build the Multipartite graph and rank candidates using random walk,
        #    alpha controls the weight adjustment mechanism, see TopicRank for
        #    threshold/method parameters.
        extractor.candidate_weighting(alpha=1.1,
                                      threshold=0.75,
                                      method='average')
        keyphrases = extractor.get_n_best(n=15)


        for val in keyphrases:
            out.append(val[0])
    except:
        out = []
        traceback.print_exc()

    return out

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [53]:
from flashtext import KeywordProcessor


def get_keywords(originaltext,summarytext):
  keywords = get_nouns_multipartite(originaltext)
  print ("keywords unsummarized: ",keywords)
  keyword_processor = KeywordProcessor()
  for keyword in keywords:
    keyword_processor.add_keyword(keyword)

  keywords_found = keyword_processor.extract_keywords(summarytext)
  keywords_found = list(set(keywords_found))
  print ("keywords_found in summarized: ",keywords_found)

  important_keywords =[]
  for keyword in keywords:
    if keyword in keywords_found:
      important_keywords.append(keyword)

  return important_keywords[:4]


imp_keywords = get_keywords(text,summarized_text)
print (imp_keywords)


keywords unsummarized:  ['ipods', 'apple', 'sales', 'touch', 'decline', 'iphone', 'success', 'quarter report', 'chief financial officer', 'presentation', 'june', 'milestone', 'september', 'trajectory', 'adaptability']
keywords_found in summarized:  ['sales', 'iphone', 'success', 'june', 'milestone', 'apple', 'ipods', 'decline', 'touch', 'trajectory']
['ipods', 'apple', 'sales', 'touch']


# Question Generation (4.0)

In [54]:
def get_question(context,answer,model,tokenizer):
  text = "context: {} answer: {}".format(context,answer)
  encoding = tokenizer.encode_plus(text,max_length=384, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)
  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=5,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  max_length=72)


  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]


  Question = dec[0].replace("question:","")
  Question= Question.strip()
  return Question



for wrp in wrap(summarized_text, 150):
  print (wrp)
print ("\n")

for answer in imp_keywords:
  ques_1 = get_question(summarized_text,answer,question_model_1,question_tokenizer_1)
  print("Model 1 \n")
  print (ques_1)
  print (answer.capitalize())
  print ("\n")

for answer in imp_keywords:
  ques_2 = get_question(summarized_text,answer,question_model_2,question_tokenizer_2)
  print("Model 2 \n")
  print (ques_2)
  print (answer.capitalize())
  print ("\n")

for answer in imp_keywords:
  custom_mod = get_question(summarized_text,answer,custom_model,custom_tokenizer)
  print("Custom Model \n")
  print (custom_mod)
  print (answer.capitalize())
  print ("\n")

# Used for testing with SQUAD dataset values

# for i in range(5):
#   ques = get_question(train_contexts[2000+i],train_answers[2000+i],question_model_1,question_tokenizer_1)
#   print (ques)
#   print (train_answers[2000+i].capitalize())
#   print ("\n")

# for i in range(5):
#   print(train_questions[2000+i])

The decline in ipod sales became more pronounced over the subsequent years. Cumulative sales of ipods had surpassed 220 million units in 2009 - a
milestone that was not indicative of the future trajectory for the company's product portfolio. By june 2009, apple acknowledged the anticipated
decline of traditional mp3 player sales, attributing it to the cannibalization effect caused by the success of both the ipod touch and the iphone.


Model 1 

What device's sales reached 220 million units in 2009?
Ipods


Model 1 

What company acknowledged the decline of traditional mp3 player sales in 2009?
Apple


Model 1 

What declined in ipods in 2009?
Sales


Model 1 

Along with the iphone, what ipod product was blamed for the decline in mp3 sales?
Touch


Model 2 

Why did apple blame the decline in traditional mp3 player sales?
Ipods


Model 2 

What did apple attribute the decline in mp3 player sales to?
Apple


Model 2 

What did apple attribute the decline in traditional mp3 player sales

# Text Evaluation (5.0 Part E - Evaluation)

In [55]:
from nltk.translate.bleu_score import sentence_bleu

def compute_cumulative_bleu(reference_questions, candidate_questions):
   assert len(reference_questions) == len(candidate_questions), "Mismatch in number of reference and candidate questions"

   total_bleu = 0
   for ref, cand in zip(reference_questions, candidate_questions):
       total_bleu += sentence_bleu([ref], cand)

   average_bleu = total_bleu / len(reference_questions)
   return average_bleu

In [56]:
from nltk.translate.meteor_score import meteor_score

# METEOR score calculation
def calculate_meteor(reference, hypothesis):
  reference = [ref.split() for ref in reference]
  hypothesis = [hyp.split() for hyp in hypothesis]
  scores = [meteor_score([ref], hyp) for ref, hyp in zip(reference, hypothesis)]
  avg_score = sum(scores) / len(scores)
  return avg_score

In [57]:
!pip install rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=9ebbd1025903c286fb82855c728edf4695a4a836d9e690505b661f8ee26a3d2b
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [58]:
from rouge_score import rouge_scorer

def calculate_rogue_score(references, hypotheses):
  # Initialize the scorer
   scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

   # Compute the Rouge scores for each reference and hypothesis pair
   scores = []
   for reference, hypothesis in zip(references, hypotheses):
       score = scorer.score(reference, hypothesis)
       # Extract the F1 scores from the dictionaries
       f1_scores = [value.fmeasure for value in score.values()]
       scores.extend(f1_scores)

   # Compute the average score
   average_score = sum(scores) / len(scores)

   return average_score


**BLEU (Bilingual Evaluation Understudy):**

**Purpose**: BLEU is a metric used to evaluate the quality of machine-generated text, especially in the context of machine translation.

**How it works:** It measures the similarity between the generated text and a set of reference texts. BLEU computes precision at different n-gram levels (unigrams, bigrams, trigrams, etc.) and penalizes the generated text for generating n-grams not present in the reference text.

**Interpretation:** A higher BLEU score (closer to 1.0) indicates a better match between the generated and reference texts. However, it is essential to note that BLEU has some limitations, such as not considering semantic meaning.

In [89]:
refer_text = get_question(valid_dataset[3001]['context'],valid_dataset[3001]['answers']['text'][0],question_model_1,question_tokenizer_1)
referr = nltk.word_tokenize(refer_text)
candidate_text = nltk.word_tokenize(valid_dataset[3001]['question'])
bleu_score = sentence_bleu([referr], candidate_text)

for wrp in wrap(valid_dataset[3001]['context'], 150):
  print (wrp)
print('\n')
print("Generated question: ")
print(refer_text)
print("Question from SQUAD")
print(valid_dataset[3000]['question'])
print("BLEU Score for Model 1 : " + str(bleu_score))

As of August 2010, Victoria had 1,548 public schools, 489 Catholic schools and 214 independent schools. Just under 540,800 students were enrolled in
public schools, and just over 311,800 in private schools. Over 61 per cent of private students attend Catholic schools. More than 462,000 students
were enrolled in primary schools and more than 390,000 in secondary schools. Retention rates for the final two years of secondary school were 77 per
cent for public school students and 90 per cent for private school students. Victoria has about 63,519 full-time teachers.


Generated question: 
What percentage of private students attend Catholic schools?
Question from SQUAD
How many full time teachers does Victoria have?
BLEU Score for Model 1 : 0.3508439695638686



**METEOR (Metric for Evaluation of Translation with Explicit ORdering):**

**Purpose**: METEOR is another metric commonly used for machine translation evaluation.

**How it works:** METEOR considers precision, recall, stemming, synonymy, stemming, and word order into account. It uses a harmonic mean of precision and recall, and incorporates stemming and synonymy matching.

**Interpretation:** A higher METEOR score indicates better performance. It is designed to be more robust and account for variations in word forms and order.

In [90]:
refer_text = get_question(valid_dataset[1000]['context'],valid_dataset[1000]['answers']['text'][0],question_model_1,question_tokenizer_1)
referr = refer_text
hypothesis = valid_dataset[1000]['question']
meteor_1 = calculate_meteor(referr,hypothesis)

for wrp in wrap(valid_dataset[1000]['context'], 150):
  print (wrp)
print('\n')
print("Generated Question")
print(refer_text)
print("Question from SQUAD")
print(valid_dataset[1000]['question'])
print("METEOR score for Model 1 :" + str(meteor_1))

Other green spaces in the city include the Botanic Garden and the University Library garden. They have extensive botanical collection of rare domestic
and foreign plants, while a palm house in the New Orangery displays plants of subtropics from all over the world. Besides, within the city borders,
there are also: Pole Mokotowskie (a big park in the northern Mokotów, where was the first horse racetrack and then the airport), Park Ujazdowski
(close to the Sejm and John Lennon street), Park of Culture and Rest in Powsin, by the southern city border, Park Skaryszewski by the right Vistula
bank, in Praga. The oldest park in Praga, the Praga Park, was established in 1865–1871 and designed by Jan Dobrowolski. In 1927 a zoological garden
(Ogród Zoologiczny) was established on the park grounds, and in 1952 a bear run, still open today.


Generated Question
Where is a palm house located?
Question from SQUAD
Where is a palm house with subtropic plants from all over the world on display?
METEOR sc

**ROUGE (Recall-Oriented Understudy for Gisting Evaluation):**

**Purpose:** ROUGE is a family of metrics used for evaluating the quality of summaries, machine translation, and other text generation tasks.

**How it works:** ROUGE measures the overlap of n-grams (unigrams, bigrams, etc.) between the generated text and reference texts. It includes metrics like ROUGE-N, ROUGE-L, and ROUGE-W, which focus on n-gram overlap, longest common subsequence, and word overlap, respectively.

**Interpretation:** Higher ROUGE scores generally indicate better agreement between the generated and reference texts.

In [93]:
from rouge_score import rouge_scorer

# Initialize the scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
# Define your reference and hypothesis texts
reference = get_question(valid_dataset[1001]['context'],valid_dataset[1001]['answers']['text'][0],question_model_1,question_tokenizer_1)
hypothesis = valid_dataset[1001]['question']

# Compute the Rouge scores for reference and hypothesis.
scores = scorer.score(str(reference), str(hypothesis))

# Print the scores
print(reference)
print(hypothesis)
print(scores)

What was the first horse racetrack in Mokotów?
Where was the first horse racetrack located?
{'rouge1': Score(precision=0.7142857142857143, recall=0.5555555555555556, fmeasure=0.6250000000000001), 'rouge2': Score(precision=0.6666666666666666, recall=0.5, fmeasure=0.5714285714285715), 'rougeL': Score(precision=0.7142857142857143, recall=0.5555555555555556, fmeasure=0.6250000000000001)}


In [74]:
ref_ques_1 = []
ref_ques_2 = []
red_cust_1 = []
ref_cand_ques = []
for i in range(1000):
  context = valid_dataset[1000+i]['context']
  answer = valid_dataset[1000+i]['answers']['text'][0]
  question = valid_dataset[1000+i]['question']

  ques_1 = get_question(context,answer,question_model_1,question_tokenizer_1)
  ref_ques_1.append(ques_1)
  ques_2 = get_question(context,answer,question_model_2,question_tokenizer_2)
  ref_ques_2.append(ques_2)
  cust_1 = get_question(context,answer,custom_model,custom_tokenizer)
  red_cust_1.append(cust_1)
  ref_cand_ques.append(question)

avg_bleu_1 = compute_cumulative_bleu(ref_ques_1,ref_cand_ques)
avg_bleu_2 = compute_cumulative_bleu(ref_ques_2,ref_cand_ques)
avg_bleu_3 = compute_cumulative_bleu(red_cust_1,ref_cand_ques)

meteor_score_1 = calculate_meteor(ref_ques_1,ref_cand_ques)
meteor_score_2 = calculate_meteor(ref_ques_2,ref_cand_ques)
meteor_score_3 = calculate_meteor(red_cust_1,ref_cand_ques)

rogue_score_1 = calculate_rogue_score(ref_ques_1,ref_cand_ques)
rogue_score_2 = calculate_rogue_score(ref_ques_2,ref_cand_ques)
rogue_score_3 = calculate_rogue_score(red_cust_1,ref_cand_ques)

print("Average BLEU score for Model 1 :" + str(avg_bleu_1))
print("Average BLEU score for Model 2 :" + str(avg_bleu_2))
print("Average BLEU score for custom Model :" + str(avg_bleu_3))
print("METEOR score for Model 1 :" + str(meteor_score_1))
print("METEOR score for Model 2 :" + str(meteor_score_2))
print("METEOR score for custom Model :" + str(meteor_score_3))
print("ROGUE score for Model 1 :" + str(rogue_score_1))
print("ROGUE score for Model 2 :" + str(rogue_score_2))
print("ROGUE score for custom Model :" + str(rogue_score_3))


Average BLEU score for Model 1 :0.494377568020209
Average BLEU score for Model 2 :0.4268287183169667
Average BLEU score for custom Model :0.38884463560537286
METEOR score for Model 1 :0.45008744681849067
METEOR score for Model 2 :0.36517929532752447
METEOR score for custom Model :0.32109491213387004
ROGUE score for Model 1 :0.45971469023407713
ROGUE score for Model 2 :0.3853947042543647
ROGUE score for custom Model :0.3429193880629611


The overall explanation for the scores suggests that pre trained Model 1 outperforms both the pre trained Model 2 and the custom model across all three evaluation metrics: BLEU, METEOR, and ROUGE. Model 2 generally performs better than the custom model, which consistently has the lowest scores.

Despite the relatively low scores for the given metrics the 3 models are able to generate questions relevant to the context given the quality of the keywords / answers selected. The best approach for evaluation of this type of model would be a human evaluation as done in other cited papers.

# References

Evans, T., Kensington-Miller, B., Novak, J., 2021. Effectiveness, efficiency, engagement: Mapping the impact of pre-lecture quizzes on educational exchange. AJET 163–177. https://doi.org/10.14742/ajet.6258

Ferreira,  ., 2019. Question Generation using Deep Neural Networks.
Kim, Y., Lee, H., Shin, J., Jung, K., 2019. Improving Neural Question Generation Using Answer Separation. AAAI 33, 6602–6609. https://doi.org/10.1609/aaai.v33i01.33016602

Kumar, V., Ramakrishnan, G., Li, Y.-F., 2018. A framework for automatic question generation from text using deep reinforcement learning. ArXiv.
Mulla, N., Gharpure, P., 2023. Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Prog Artif Intell 12, 1–32. https://doi.org/10.1007/s13748-023-00295-9

Romero, E., Remón, J., 2022. SOFTWARE FOR INCREASING ENGAGEMENT IN THE CLASSROOM THROUGH QUIZZES AND GAMIFICATION: A CRITICAL REVISION ADDRESSING CHARACTERISTICS, ADVANTAGES AND DISADVANTAGES. INTED2022 Proceedings 10169–10175. https://doi.org/10.21125/inted.2022.2673

Shaheer, S., Hossain, I., Sarna, S.N., Kabir Mehedi, M.H., Rasel, A.A., 2023. Evaluating Question generation models using QA systems and Semantic Textual Similarity, in: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC). Presented at the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0431–0435. https://doi.org/10.1109/CCWC57344.2023.10099244