## Multiple Questions Generation on the given text using NLP

### Problem statement: <br>
We are tasked with developing a solution that can automatically generate
objective questions with multiple correct answers based on a given chapter from a subject.
The generated questions should test the reader's understanding of the chapter and have
more than one possible correct answer to increase the complexity and challenge of the
questions.The generated questions should not only test the reader's comprehension of the
chapter but also encourage them to think beyond the surface level and explore different
perspectives and possibilities. Ultimately, the objective of this project is to develop a robust
and accurate solution that can aid educators in creating engaging and challenging
assessments for their students.

### Solution methodology:
Here the goal of the problem is a little complex, where developing a solution from scratch takes time and thorough training and fine tuning on certain domain specific data. Although there are many API's available like gpt-3 etc, as this problem is posed in the interest of production usage for any company, we cannot rely on external API's very much. But we have taken small pretrained models here and demonstrated the overall work flow of the project, how to develop the solution for this kind of a problem.
In the interesrt of time, here a model which generates multiple choice questions with a single answer is developed and further steps are given in the future scope of the problem, where more robust model for this particular task will be developed later.

### Procedure outline:
1. Given any article we will perform an Extractive summarization ( which means we pick out the important sentences as they are and form a summary ) or Abstractive summarization ( where we get the summary of the text with a slightly changed phrasing or rewriting of the sentences )
2. If we perform Extractive summarization, we will perform paraphrasing of sentences using a language model.
3. Then we will extract the keywords from the processed text using models like (YAKE, TopicRank, KeyBERT, Multi-partitite algorithm etc )
4. Then we will generate the questions about the extracted keywords by giving the processed text along with the keyword to the fine-tuned model.
5. Then we generate the distractors/wrong choices for the model using ( wordnet, sense2vec or word2vec etc).
6. Now we got the question with the set of correct answer and also the wrong answers, so we will display the result.

#### Importing all the necessary libraries

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pip install --upgrade pip


Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-24.0


In [None]:
!pip install git+https://github.com/boudinfl/pke.git


Collecting git+https://github.com/boudinfl/pke.git
  Cloning https://github.com/boudinfl/pke.git to /tmp/pip-req-build-2f3egfgg
  Running command git clone --filter=blob:none --quiet https://github.com/boudinfl/pke.git /tmp/pip-req-build-2f3egfgg
  Resolved https://github.com/boudinfl/pke.git to commit 69871ffdb720b83df23684fea53ec8776fd87e63
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting unidecode (from pke==2.0.0)
  Downloading Unidecode-1.3.8-py3-none-any.whl.metadata (13 kB)
Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.5/235.5 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: pke
  Building wheel for pke (setup.py) ... [?25l[?25hdone
  Created wheel for pke: filename=pke-2.0.0-py3-none-any.whl size=6160628 sha256=da65c534f73e82020efba0a0937573958af955c20e013f6b7463780cc7ace56e
  Stored in directory: /tmp/pip-ephem-wheel-cache-3u90bye4/wheels/8c/07/

In [None]:
#import all the neccessary libraries
import warnings
warnings.filterwarnings("ignore")
import torch
from transformers import T5ForConditionalGeneration,T5Tokenizer
!pip install sense2vec
from sense2vec import Sense2Vec
!pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import textwrap
import random
import numpy as np
import nltk
nltk.download('punkt')
nltk.download('brown')
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
from nltk.tokenize import sent_tokenize
nltk.download('stopwords')
from nltk.corpus import stopwords
import string
!pip install pke
import pke
import traceback
!pip install flashtext
from flashtext import KeywordProcessor
from collections import OrderedDict
from sklearn.metrics.pairwise import cosine_similarity
nltk.download('omw-1.4')
!pip install python-Levenshtein
!pip install python-string-similarity
!pip install strsimpy
from strsimpy.normalized_levenshtein import NormalizedLevenshtein
from Levenshtein import distance
import pickle
import time
import os

Collecting sense2vec
  Downloading sense2vec-2.0.2-py2.py3-none-any.whl.metadata (54 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m592.4 kB/s[0m eta [36m0:00:00[0m
Downloading sense2vec-2.0.2-py2.py3-none-any.whl (40 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.6/40.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sense2vec
Successfully installed sense2vec-2.0.2
[0mCollecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


[0mCollecting flashtext
  Downloading flashtext-2.7.tar.gz (14 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: flashtext
  Building wheel for flashtext (setup.py) ... [?25l[?25hdone
  Created wheel for flashtext: filename=flashtext-2.7-py2.py3-none-any.whl size=9296 sha256=cf8a70c47ba73eea6cf171a3f583a48958305233411fbb534ec8a39aa0321815
  Stored in directory: /root/.cache/pip/wheels/bc/be/39/c37ad168eb2ff644c9685f52554440372129450f0b8ed203dd
Successfully built flashtext
Installing collected packages: flashtext
Successfully installed flashtext-2.7
[0m

[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


Collecting python-Levenshtein
  Downloading python_Levenshtein-0.25.1-py3-none-any.whl.metadata (3.7 kB)
Collecting Levenshtein==0.25.1 (from python-Levenshtein)
  Downloading Levenshtein-0.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Collecting rapidfuzz<4.0.0,>=3.8.0 (from Levenshtein==0.25.1->python-Levenshtein)
  Downloading rapidfuzz-3.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading python_Levenshtein-0.25.1-py3-none-any.whl (9.4 kB)
Downloading Levenshtein-0.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (177 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.4/177.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rapidfuzz-3.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: r

#### Checking for GPU availability

In [None]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


#### Getting the pretrained models and tokenizers from disc:
1. Sense2vec model trained on reddit 2019 data.
2. T5-base pretrained summarizer model.
3. Pretrained T5 question generator model trained on Squad-V1 dataset.
4. Msmarco-distilbert-base-v2 sentence transformer pretrained model.

In [None]:
# from sense2vec import Sense2Vec
# import spacy

# # Load English tokenizer, tagger, parser, NER, and word vectors
# nlp = spacy.load("en_core_web_sm")


# # Train Sense2Vec model on Reddit 2019 data
# sense2vec_model = Sense2Vec().from_disk("reddit_2019_s2v_model.bin")

# # Optionally, you can save the trained model to disk
# sense2vec_model.to_disk("trained_sense2vec_model.bin")


In [None]:
#we need to download the 2015 trained on reddit sense2vec model as it is shown to give better results than the 2019 one.
s2v = Sense2Vec().from_disk('/content/drive/MyDrive/s2v_reddit_2015_md/s2v_old')

#getitng the summary model and its tokenizer
if os.path.exists("t5_summary_model.pkl"):
    with open('t5_summary_model.pkl', 'rb') as f:
        summary_model = pickle.load(f)
    print("Summary model found in the disc, model is loaded successfully.")

else:
    print("Summary model does not exists in the path specified, downloading the model from web....")
    start_time = time.time()
    summary_model = T5ForConditionalGeneration.from_pretrained('t5-base')
    end_time = time.time()

    print("downloaded the summary model in ",(end_time-start_time)/60," min , now saving it to disc...")

    with open("t5_summary_model.pkl", 'wb') as f:
        pickle.dump(summary_model,f)

    print("Done. Saved the model to disc.")

if os.path.exists("t5_summary_tokenizer.pkl"):
    with open('t5_summary_tokenizer.pkl', 'rb') as f:
        summary_tokenizer = pickle.load(f)
    print("Summary tokenizer found in the disc and is loaded successfully.")
else:
    print("Summary tokenizer does not exists in the path specified, downloading the model from web....")

    start_time = time.time()
    summary_tokenizer = T5Tokenizer.from_pretrained('t5-base')
    end_time = time.time()

    print("downloaded the summary tokenizer in ",(end_time-start_time)/60," min , now saving it to disc...")

    with open("t5_summary_tokenizer.pkl",'wb') as f:
        pickle.dump(summary_tokenizer,f)

    print("Done. Saved the tokenizer to disc.")


#Getting question model and tokenizer
if os.path.exists("t5_question_model.pkl"):
    with open('t5_question_model.pkl', 'rb') as f:
        question_model = pickle.load(f)
    print("Question model found in the disc, model is loaded successfully.")
else:
    print("Question model does not exists in the path specified, downloading the model from web....")
    start_time= time.time()
    question_model = T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_squad_v1')
    end_time = time.time()

    print("downloaded the question model in ",(end_time-start_time)/60," min , now saving it to disc...")

    with open("t5_question_model.pkl", 'wb') as f:
        pickle.dump(question_model,f)

    print("Done. Saved the model to disc.")

if os.path.exists("t5_question_tokenizer.pkl"):
    with open('t5_question_tokenizer.pkl', 'rb') as f:
        question_tokenizer = pickle.load(f)
    print("Question tokenizer found in the disc, model is loaded successfully.")
else:
    print("Question tokenizer does not exists in the path specified, downloading the model from web....")

    start_time = time.time()
    question_tokenizer = T5Tokenizer.from_pretrained('ramsrigouthamg/t5_squad_v1')
    end_time=time.time()

    print("downloaded the question tokenizer in ",(end_time-start_time)/60," min , now saving it to disc...")

    with open("t5_question_tokenizer.pkl",'wb') as f:
        pickle.dump(question_tokenizer,f)

    print("Done. Saved the tokenizer to disc.")

#Loading the models in to GPU if available
summary_model = summary_model.to(device)
question_model = question_model.to(device)

#Getting the sentence transformer model and its tokenizer
# paraphrase-distilroberta-base-v1
if os.path.exists("sentence_transformer_model.pkl"):
    with open("sentence_transformer_model.pkl",'rb') as f:
        sentence_transformer_model = pickle.load(f)
    print("Sentence transformer model found in the disc, model is loaded successfully.")
else:
    print("Sentence transformer model does not exists in the path specified, downloading the model from web....")
    start_time=time.time()
    sentence_transformer_model = SentenceTransformer("sentence-transformers/msmarco-distilbert-base-v2")
    end_time=time.time()

    print("downloaded the sentence transformer in ",(end_time-start_time)/60," min , now saving it to disc...")

    with open("sentence_transformer_model.pkl",'wb') as f:
        pickle.dump(sentence_transformer_model,f)

    print("Done saving to disc.")


Summary model does not exists in the path specified, downloading the model from web....


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

downloaded the summary model in  0.2034963846206665  min , now saving it to disc...
Done. Saved the model to disc.
Summary tokenizer does not exists in the path specified, downloading the model from web....


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


downloaded the summary tokenizer in  0.020698606967926025  min , now saving it to disc...
Done. Saved the tokenizer to disc.
Question model does not exists in the path specified, downloading the model from web....


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

downloaded the question model in  0.40117359161376953  min , now saving it to disc...
Done. Saved the model to disc.
Question tokenizer does not exists in the path specified, downloading the model from web....


tokenizer_config.json:   0%|          | 0.00/1.86k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

downloaded the question tokenizer in  0.009210840861002604  min , now saving it to disc...
Done. Saved the tokenizer to disc.
Sentence transformer model does not exists in the path specified, downloading the model from web....


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.75k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/545 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/440 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

downloaded the sentence transformer in  0.05575402180353801  min , now saving it to disc...
Done saving to disc.


#### Defining all the utility functions

In [None]:
def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

def postprocesstext (content):
  """
  this function takes a piece of text (content), tokenizes it into sentences, capitalizes the first letter of each sentence, and then concatenates the processed sentences into a single string, which is returned as the final result. The purpose of this function could be to format the input content by ensuring that each sentence starts with an uppercase letter.
  """
  final=""
  for sent in sent_tokenize(content):
    sent = sent.capitalize()
    final = final +" "+sent
  return final

def summarizer(text,model,tokenizer):
  """
  This function takes the given text along with the model and tokenizer, which summarize the large text into useful information
  """
  text = text.strip().replace("\n"," ")
  text = "summarize: "+text
  # print (text)
  max_len = 512
  encoding = tokenizer.encode_plus(text,max_length=max_len, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)

  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=3,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  min_length = 75,
                                  max_length=300)

  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]
  summary = dec[0]
  summary = postprocesstext(summary)
  summary= summary.strip()

  return summary

def get_nouns_multipartite(content):
    """
    This function takes the content text given and then outputs the phrases which are build around the nouns , so that we can use them for context based distractors
    """
    out=[]
    try:
        extractor = pke.unsupervised.MultipartiteRank()
        extractor.load_document(input=content,language='en')
        #    not contain punctuation marks or stopwords as candidates.
        #pos = {'PROPN','NOUN',}
        pos = {'PROPN', 'NOUN', 'ADJ', 'VERB', 'ADP', 'ADV', 'DET', 'CONJ', 'NUM', 'PRON', 'X'}

        #pos = {'PROPN','NOUN'}
        stoplist = list(string.punctuation)
        stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
        stoplist += stopwords.words('english')
        # extractor.candidate_selection(pos=pos, stoplist=stoplist)
        extractor.candidate_selection( pos=pos)
        # 4. build the Multipartite graph and rank candidates using random walk,
        #    alpha controls the weight adjustment mechanism, see TopicRank for
        #    threshold/method parameters.
        extractor.candidate_weighting(alpha=1.1,
                                      threshold=0.75,
                                      method='average')
        keyphrases = extractor.get_n_best(n=15)


        for val in keyphrases:
            out.append(val[0])
    except:
        out = []
        #traceback.print_exc()

    return out

def get_keywords(originaltext):
  """
  This function takes the original text and the summary text and generates keywords from both which ever are more relevant
  This is done by checking the keywords generated from the original text to those generated from the summary, so that we get important ones
  """
  keywords = get_nouns_multipartite(originaltext)
  #print ("keywords unsummarized: ",keywords)
  #keyword_processor = KeywordProcessor()
  #for keyword in keywords:
    #keyword_processor.add_keyword(keyword)

  #keywords_found = keyword_processor.extract_keywords(summarytext)
  #keywords_found = list(set(keywords_found))
  #print ("keywords_found in summarized: ",keywords_found)

  #important_keywords =[]
  #for keyword in keywords:
    #if keyword in keywords_found:
      #important_keywords.append(keyword)

  #return important_keywords
  return keywords

def get_question(context,answer,model,tokenizer):
  """
  This function takes the input context text, pretrained model along with the tokenizer and the keyword and the answer and then generates the question from the large paragraph
  """
  text = "context: {} answer: {}".format(context,answer)
  encoding = tokenizer.encode_plus(text,max_length=384, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)
  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=5,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  max_length=72)


  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]


  Question = dec[0].replace("question:","")
  Question= Question.strip()
  return Question

def filter_same_sense_words(original,wordlist):

  """
  This is used to filter the words which are of same sense, where it takes the wordlist which has the sense of the word attached as the string along with the word itself.
  """
  filtered_words=[]
  base_sense =original.split('|')[1]
  #print (base_sense)
  for eachword in wordlist:
    if eachword[0].split('|')[1] == base_sense:
      filtered_words.append(eachword[0].split('|')[0].replace("_", " ").title().strip())
  return filtered_words

def get_highest_similarity_score(wordlist,wrd):
  """
  This function takes the given word along with the wordlist and then gives out the max-score which is the levenshtein distance for the wrong answers
  because we need the options which are very different from one another but relating to the same context.
  """
  score=[]
  normalized_levenshtein = NormalizedLevenshtein()
  for each in wordlist:
    score.append(normalized_levenshtein.similarity(each.lower(),wrd.lower()))
  return max(score)

def sense2vec_get_words(word,s2v,topn,question):
    """
    This function takes the input word, sentence to vector model and top similar words and also the question
    Then it computes the sense of the given word
    then it gets the words which are of same sense but are most similar to the given word
    after that we we return the list of words which satisfy the above mentioned criteria
    """
    output = []
    #print ("word ",word)
    try:
      sense = s2v.get_best_sense(word, senses= ["NOUN", "PERSON","PRODUCT","LOC","ORG","EVENT","NORP","WORK OF ART","FAC","GPE","NUM","FACILITY"])
      most_similar = s2v.most_similar(sense, n=topn)
      # print (most_similar)
      output = filter_same_sense_words(sense,most_similar)
      #print ("Similar ",output)
    except:
      output =[]

    threshold = 0.6
    final=[word]
    checklist =question.split()
    for x in output:
      if get_highest_similarity_score(final,x)<threshold and x not in final and x not in checklist:
        final.append(x)

    return final[1:]

def mmr(doc_embedding, word_embeddings, words, top_n, lambda_param):
    """
    The mmr function takes document and word embeddings, along with other parameters, and uses the Maximal Marginal Relevance (MMR) algorithm to extract a specified number of keywords/keyphrases from the document. The MMR algorithm balances the relevance of keywords with their diversity, helping to select keywords that are both informative and distinct from each other.
    """

    # Extract similarity within words, and between words and the document
    word_doc_similarity = cosine_similarity(word_embeddings, doc_embedding)
    word_similarity = cosine_similarity(word_embeddings)

    # Initialize candidates and already choose best keyword/keyphrase
    keywords_idx = [np.argmax(word_doc_similarity)]
    candidates_idx = [i for i in range(len(words)) if i != keywords_idx[0]]

    for _ in range(top_n - 1):
        # Extract similarities within candidates and
        # between candidates and selected keywords/phrases
        candidate_similarities = word_doc_similarity[candidates_idx, :]
        target_similarities = np.max(word_similarity[candidates_idx][:, keywords_idx], axis=1)

        # Calculate MMR
        mmr = (lambda_param) * candidate_similarities - (1-lambda_param) * target_similarities.reshape(-1, 1)
        mmr_idx = candidates_idx[np.argmax(mmr)]

        # Update keywords & candidates
        keywords_idx.append(mmr_idx)
        candidates_idx.remove(mmr_idx)

    return [words[idx] for idx in keywords_idx]

def get_distractors_wordnet(word):
    """
    the get_distractors_wordnet function uses WordNet to find a relevant synset for the input word and then generates distractor words by looking at hyponyms of the hypernym associated with the input word. These distractors are alternative words related to the input word and can be used, for example, in educational or language-related applications to provide choices for a given word.
    """
    distractors=[]
    try:
      syn = wn.synsets(word,'n')[0]

      word= word.lower()
      orig_word = word
      if len(word.split())>0:
          word = word.replace(" ","_")
      hypernym = syn.hypernyms()
      if len(hypernym) == 0:
          return distractors
      for item in hypernym[0].hyponyms():
          name = item.lemmas()[0].name()
          #print ("name ",name, " word",orig_word)
          if name == orig_word:
              continue
          name = name.replace("_"," ")
          name = " ".join(w.capitalize() for w in name.split())
          if name is not None and name not in distractors:
              distractors.append(name)
    except:
      print ("Wordnet distractors not found")
    return distractors

def get_distractors (word,origsentence,sense2vecmodel,sentencemodel,top_n,lambdaval):
  """
  this function generates distractor words (answer choices) for a given target word in the context of a provided sentence. It selects distractors based on their similarity to the target word's context and ensures that the target word itself is not included among the distractors. This function is useful for creating multiple-choice questions or answer options in natural language processing tasks.
  """
  distractors = sense2vec_get_words(word,sense2vecmodel,top_n,origsentence)
  #print ("distractors ",distractors)
  if len(distractors) ==0:
    return distractors
  distractors_new = [word.capitalize()]
  distractors_new.extend(distractors)
  # print ("distractors_new .. ",distractors_new)

  embedding_sentence = origsentence+ " "+word.capitalize()
  # embedding_sentence = word
  keyword_embedding = sentencemodel.encode([embedding_sentence])
  distractor_embeddings = sentencemodel.encode(distractors_new)

  # filtered_keywords = mmr(keyword_embedding, distractor_embeddings,distractors,4,0.7)
  max_keywords = min(len(distractors_new),5)
  filtered_keywords = mmr(keyword_embedding, distractor_embeddings,distractors_new,max_keywords,lambdaval)
  # filtered_keywords = filtered_keywords[1:]
  final = [word.capitalize()]
  for wrd in filtered_keywords:
    if wrd.lower() !=word.lower():
      final.append(wrd.capitalize())
  final = final[1:]
  return final

def get_mca_questions(context: str):
    """
    this function generates multiple-choice questions based on a given context. It summarizes the context, extracts important keywords, generates questions related to those keywords, and provides randomized answer choices, including the correct answer, for each question.
    """
    summarized_text = summarizer(context,summary_model,summary_tokenizer)

    #imp_keywords = get_keywords(context ,summarized_text)
    imp_keywords = get_keywords(context)
    output_list=[]
    for answer in imp_keywords:
      output=""
      ques = get_question(summarized_text,answer,question_model,question_tokenizer)

      distractors = get_distractors(answer.capitalize(),ques,s2v,sentence_transformer_model,40,0.2)

      output = output + ques + "\n"
      if len(distractors) == 0:
         distractors=imp_keywords

      if len(distractors)>0:
        random_integer = random.randint(0, 3)
        alpha_list = ['(a)','(b)','(c)','(d)']
        for d,distractor in enumerate(distractors[:4]):
            if d == random_integer:
               output = output + alpha_list[d] + answer + "\n"
            else:
              output = output + alpha_list[d] + distractor + "\n"
        output = output + "Correct answer is : " + alpha_list[random_integer] + "\n\n"

      output_list.append(output)

    mca_questions = output_list
    return mca_questions

#### Testing the solution on an example text

Example: 1

In [None]:
text_1 = "Natural Language Processing (NLP) stands as an interdisciplinary subfield bridging computer science and information retrieval, striving to equip computers with the capacity to comprehend and manipulate human language effectively. NLP draws upon concepts from theoretical linguistics to achieve this goal, aiming to accurately extract information and insights from documents while organizing and categorizing them efficiently. Historically, NLP traces its origins back to the 1940s, with notable milestones such as Alan Turing's proposal of the Turing test in 1940, which indirectly addressed automated interpretation and generation of natural language. Despite early optimism, progress in machine translation was slower than anticipated, leading to reduced funding post the ALPAC report in 1966. Nonetheless, the 1960s witnessed the development of successful NLP systems like SHRDLU and ELIZA, showcasing the potential of natural language processing. Machine learning algorithms, particularly statistical and neural network-based methods, gained prominence due to their ability to effectively process language data. Approaches in NLP have evolved over time, encompassing symbolic, statistical, and neural network-based methods. These tasks serve practical purposes across various domains, from information retrieval to sentiment analysis and machine translation. Cognition plays a pivotal role in shaping higher-level NLP applications, emulating intelligent behavior and comprehension of natural language."
final_questions = get_mca_questions(text_1)
for q in final_questions:
    print(q)

What does nlp stand for?
(a)Nlp
(b)Computer architecture
(c)Information retrieval
(d)natural language processing
Correct answer is : (d)


What does natural language processing stand for?
(a)Machine learning
(b)Mbti
Correct answer is : (d)


What is nlp also known as?
(a)information retrieval
(b)Mathematical modeling
(c)Bioinformatics
(d)Business processes
Correct answer is : (a)


Along with neural network-based methods, what type of machine learning algorithms gained prominence?
(a)natural language processing
(b)particularly statistical
(c)information retrieval
(d)particularly statistical
Correct answer is : (b)


What type of machine learning algorithms gained prominence?
(a)neural network-based methods
(b)nlp
(c)information retrieval
(d)particularly statistical
Correct answer is : (a)


What is the goal of nlp?
(a)striving
(b)Supplication
(c)Equanimity
(d)Highest good
Correct answer is : (a)


What gained prominence due to their ability to effectively process language data?
(a)Busi

Example: 2

In [None]:
text_2 = "Elon Musk and Bitcoin: A Complex Relationship.The intersection of Elon Musk, the enigmatic billionaire entrepreneur, and Bitcoin, the groundbreaking cryptocurrency, has been the subject of much fascination, speculation, and scrutiny in recent years. Musk, known for his ventures like Tesla, SpaceX, Neuralink, and The Boring Company, has proven to be a polarizing figure in the world of finance and technology, with his tweets and actions having significant implications for the price and perception of Bitcoin. This narrative explores the multifaceted relationship between Elon Musk and Bitcoin, delving into key events, controversies, and the broader implications for the cryptocurrency landscape.Elon Musk: A Brief Overview.Before delving into Musk's connection with Bitcoin, it's essential to understand who Elon Musk is and his role in the tech and automotive industries. Born in South Africa in 1971, Musk displayed a prodigious talent for technology from a young age. He moved to the United States to attend the University of Pennsylvania, where he earned dual bachelor's degrees in physics and economics.Musk's entrepreneurial journey began with the creation of Zip2, a software company he co-founded in 1995, which provided business directories and maps for newspapers. In 1999, Compaq acquired Zip2 for nearly $300 million, providing Musk with his first significant windfall.With his newfound wealth, Musk co-founded X.com, an online payment company, in 1999. X.com later evolved into PayPal and was sold to eBay in 2002 for $1.5 billion. Musk, however, did not rest on his laurels. Instead, he turned his attention to two ambitious and groundbreaking industries: electric vehicles and space exploration.Tesla, SpaceX, and Beyond.In 2004, Musk founded Tesla Motors (now Tesla, Inc.), with the goal of accelerating the world's transition to sustainable energy. Tesla's electric vehicles, beginning with the Roadster and followed by models like the Model S, Model 3, Model X, and Model Y, have transformed the automotive industry. Tesla's innovations in battery technology and electric drivetrains have played a crucial role in popularizing electric cars.Musk's SpaceX, founded in 2002, has achieved remarkable milestones in the field of space exploration. SpaceX developed the Falcon 1, the first privately developed liquid-fueled rocket to reach orbit, and later the Falcon 9 and Falcon Heavy, which have significantly reduced the cost of launching payloads into space. SpaceX also aims to establish a human presence on Mars through its Starship spacecraft.Beyond Tesla and SpaceX, Musk has pursued other ventures. Neuralink focuses on developing brain-computer interface technology, while The Boring Company is dedicated to creating underground transportation tunnels to alleviate urban congestion.Musk's achievements and ambitions have made him one of the most influential and scrutinized figures in the tech and business worlds. His social media presence, especially on Twitter, where he often shares updates on his companies and personal thoughts, has further amplified his reach and impact.Bitcoin: A Digital Revolution.Bitcoin, conceived in a 2008 whitepaper by the pseudonymous Satoshi Nakamoto, is a decentralized digital currency that operates on a peer-to-peer network. It represents a fundamental departure from traditional financial systems, as it doesn't rely on centralized authorities like banks or governments to validate and record transactions. Instead, Bitcoin transactions are confirmed by a network of computers, secured by cryptography, and recorded on a public ledger called the blockchain.Bitcoin's value proposition includes security, transparency, and the potential to serve as a store of value, similar to gold. It has gained traction as a means of transferring funds across borders, as a hedge against inflation, and as a speculative investment.As Bitcoin's popularity grew, it attracted attention from a diverse range of individuals and institutions, including tech entrepreneurs like Elon Musk.Elon Musk's Early Interest in Bitcoin.Musk's interest in Bitcoin became apparent through his tweets and comments on various occasions. While he didn't dive headfirst into the cryptocurrency, he showed a fascination with its technology and potential.In 2019, during a podcast interview with Ark Invest's Cathie Wood, Musk stated that Bitcoin is \"a far better way to transfer value than pieces of paper.\" He acknowledged its utility as a means of circumventing traditional financial intermediaries, highlighting its appeal to those who want greater control over their funds.Musk's acknowledgment of Bitcoin as a valuable innovation added to its credibility and mainstream acceptance. However, it was in the subsequent years that Musk's relationship with Bitcoin would become more complex, leading to significant fluctuations in the cryptocurrency's price and public perception.Tesla's Bitcoin Investment: A Game-Changer.The most pivotal moment in Elon Musk's involvement with Bitcoin came in early 2021 when Tesla, Inc. announced a groundbreaking move. In a filing with the U.S. Securities and Exchange Commission (SEC), Tesla revealed that it had purchased $1.5 billion worth of Bitcoin and intended to accept Bitcoin as a form of payment for its electric vehicles.This announcement sent shockwaves through the financial world, as it marked one of the most substantial endorsements of Bitcoin by a major corporation to date. Tesla's decision to allocate a portion of its corporate treasury to Bitcoin was seen as a bold move, signaling confidence in the cryptocurrency's long-term value.The rationale behind Tesla's Bitcoin investment, as outlined in its SEC filing, revolved around diversifying and maximizing returns on its cash reserves. Traditional avenues for cash management, like holding government bonds or placing funds in interest-bearing accounts, were yielding minimal returns. By allocating some of its capital to Bitcoin, Tesla sought to benefit from the cryptocurrency's potential for appreciation.This move had immediate repercussions on the price of Bitcoin, which surged to new all-time highs, surpassing $60,000 per BTC in early 2021. It also ignited a broader conversation about whether more corporations would follow Tesla's lead and allocate some of their treasuries to Bitcoin.Elon Musk's Tweets and Bitcoin's Volatility.Elon Musk's impact on Bitcoin extended beyond Tesla's investment. His prolific and often whimsical tweeting habits started influencing the cryptocurrency's price in ways that few could have predicted.Musk's tweets and public statements about Bitcoin and other cryptocurrencies were a double-edged sword. On one hand, his endorsement and public discussion of Bitcoin brought mainstream attention to the cryptocurrency, fueling interest and investment. On the other hand, his tweets could also lead to massive price swings, causing concern and frustration among investors and regulators.One of the most notable episodes occurred in May 2021 when Musk announced via Twitter that Tesla would no longer accept Bitcoin as payment for its vehicles, citing environmental concerns related to Bitcoin mining's energy consumption. This single tweet triggered a sharp decline in the price of Bitcoin, erasing billions of dollars in market value in a matter of hours.The environmental debate surrounding Bitcoin's energy use intensified following Musk's tweet. Critics argued that Bitcoin's energy consumption, driven by the proof-of-work consensus mechanism, was unsustainable and environmentally damaging. Proponents of Bitcoin countered that its energy use was comparable to, or even more efficient than, traditional banking and gold mining.Musk's influence over Bitcoin's price volatility didn't stop there. He continued to tweet about cryptocurrencies, often in cryptic and playful ways. For example, he tweeted about \"Baby Doge Coin\" and \"Shiba Inu,\" two meme cryptocurrencies, which led to speculative buying frenzies.These episodes prompted calls for greater regulation of Musk's Twitter activity and its potential impact on financial markets. Regulators expressed concerns about market manipulation and the need for greater transparency in Musk's communication regarding Tesla's Bitcoin holdings and intentions.Elon Musk's Impact on Altcoins.In addition to his influence on Bitcoin, Musk's tweets and comments have also affected various altcoins, which are cryptocurrencies other than Bitcoin. Dogecoin, in particular, emerged as a notable example of Musk's ability to move markets.Dogecoin, originally created as a meme cryptocurrency in 2013, gained a cult following on the internet due to its Shiba Inu dog logo and lighthearted community. However, its value remained relatively low and stable for years.In early 2021, Musk began tweeting about Dogecoin, referring to it as\"the people's crypto.\" His tweets, often accompanied by playful memes and comments, caused massive surges in Dogecoin\'s price. At one point, it reached an all-time high of over $0.60 per DOGE, a substantial increase from its previous fractions-of-a-penny valuation.Musk's involvement with Dogecoin extended to his hosting of \"Saturday Night Live\" in May 2021. During the show, he referred to Dogecoin as a \"hustle,\" which led to a temporary price drop. Nevertheless, his overall engagement with Dogecoin contributed to its growing popularity and market capitalization.The phenomenon of \"Musk tweets\" driving the prices of cryptocurrencies sparked debates about the role of celebrity endorsements and social media influencers in the cryptocurrency space. It also raised questions about the fundamental value of assets like Dogecoin and the risks of investing based on internet trends and social media hype.The Impact of Elon Musk\'s Environmental Concerns.Musk\'s environmental concerns regarding Bitcoin mining were not limited to his May 2021 tweet. He continued to advocate for more sustainable practices within the cryptocurrency industry. He called on Bitcoin miners to transition to renewable energy sources and even engaged in discussions with prominent figures in the cryptocurrency community about potential solutions.These concerns about Bitcoin\'s carbon footprint drew attention to a longstanding issue within the cryptocurrency space. Bitcoin's proof-of-work consensus mechanism requires significant computational power, which, in turn, demands substantial energy consumption. The majority of Bitcoin mining operations relied on fossil fuels, particularly coal, which contributed to concerns about its environmental impact.In response to Musk\'s comments and mounting environmental criticism, some Bitcoin miners began exploring cleaner energy sources and sustainable practices. Several Bitcoin mining companies announced commitments to using renewable energy, and discussions about transitioning to a more eco-friendly consensus mechanism gained momentum.Musk's advocacy for sustainability in the cryptocurrency sector highlighted the growing importance of environmental, social, and governance (ESG) considerations for investors and corporations. It also led to broader conversations about the environmental impact of blockchain technologies and the need for responsible innovation in the crypto space.Elon Musk\'s Evolving Stance on Bitcoin.Musk\'s relationship with Bitcoin and his stance on the cryptocurrency have evolved over time. While he initially praised Bitcoin for its technological advantages and supported its adoption within Tesla, he later raised concerns about its environmental impact and price volatility.In July 2021, during a B Word conference, Musk revealed that SpaceX held Bitcoin and that he personally held Bitcoin, Ethereum, and Dogecoin. This disclosure indicated that, despite his reservations, Musk maintained a personal interest in cryptocurrencies.As the year progressed, Musk\'s tweets about Bitcoin became less frequent, and his public statements about the cryptocurrency became more balanced. He acknowledged that he wanted to see Bitcoin succeed and that Tesla would likely accept Bitcoin again once its mining operations became more environmentally friendly.By late 2021, Musk indicated that he was working with Dogecoin developers to improve the cryptocurrency's efficiency, signaling his continued involvement in the crypto space.Elon Musk, Bitcoin, and Regulatory Scrutiny.The volatility in Bitcoin\'s price caused by Musk\'s tweets and statements raised concerns among regulators and lawmakers. The potential for market manipulation and the need for investor protection came into sharp focus.The U.S. Securities and Exchange Commission (SEC) and other regulatory bodies began examining Musk\'s social media activity and its impact on cryptocurrency and stock markets. The SEC had previously clashed with Musk over his tweets regarding Tesla's stock, resulting in legal settlements and restrictions on his communication.While Musk\'s influence over cryptocurrency markets is undoubtedly significant, the regulatory landscape for cryptocurrencies remains relatively nascent and complex. Regulators are grappling with how to address the unique challenges posed by the digital asset space, including the influence of high-profile individuals like Musk.Elon Musk's Vision for the Future of Cryptocurrency.Despite the complexities and controversies surrounding Elon Musk\'s involvement with cryptocurrency, his vision for the future of digital assets and blockchain technology remains of interest.Musk has expressed support for the concept of decentralized finance (DeFi), which leverages blockchain technology to create open and permissionless financial systems. DeFi platforms enable activities like lending, borrowing, trading, and earning interest without the need for traditional financial intermediaries.Furthermore, Musk's involvement with SpaceX and his aspirations for Mars colonization have raised questions about the role of cryptocurrencies in space exploration. Some have speculated that cryptocurrencies could become the primary means of conducting financial transactions and economic activities on future Martian colonies.Additionally, Musk\'s advocacy for renewable energy solutions aligns with the broader trend of sustainable blockchain technologies. Several cryptocurrencies, like Ethereum, are transitioning from proof-of-work to proof-of-stake consensus mechanisms, which are more energy-efficient and environmentally friendly. Musk's influence and financial resources could potentially contribute to the development and adoption of greener blockchain solutions.Conclusion: The Ongoing Saga of Elon Musk and Bitcoin.The relationship between Elon Musk and Bitcoin has been a rollercoaster ride, characterized by enthusiasm, controversy, and unpredictability. Musk\'s tweets have demonstrated the power of celebrity influence in the world of cryptocurrency, shaping market sentiment and investor behavior.While Musk\'s statements and actions have generated significant volatility in cryptocurrency markets, they have also drawn attention to important issues within the industry, such as environmental sustainability and responsible innovation. His involvement has forced a broader discussion about the role of cryptocurrencies in the global economy and their potential to reshape traditional financial systems.As cryptocurrency continues to evolve and mature, it remains to be seen how Musk\'s influence and vision will intersect with this rapidly changing landscape. Whether through his investments, innovations, or advocacy for sustainability, Elon Musk will likely continue to be a central figure in the ongoing narrative of cryptocurrencies and their place in the digital age."

final_questions = get_mca_questions(text_2)
for q in final_questions:
    print(q)

What is the name of Musk's venture?
(a)Tesla
(b)elon musk
(c)Richard branson
(d)Jeff bezos
Correct answer is : (b)


The narrative explores the multifaceted relationship between elon musk and what?
(a)bitcoin
(b)Fiat
(c)Coinbase
(d)Cryptocurrency
Correct answer is : (a)


What is the name of elon musk?
(a)Elon musk
(b)Gates
(c)Bezos
(d)musk
Correct answer is : (d)


Along with spacex, neuralink and boring company, what is a notable venture of Elon Musk?
(a)Tesla motors
(b)tesla
(c)Prius
(d)Nissan
Correct answer is : (b)


What is elon musk known for?
(a)tweets
(b)Twitter
(c)Reddit posts
(d)Facebook posts
Correct answer is : (a)


What aspect of bitcoin is elon musk known for?
(a)price
(b)Ridiculous price
(c)Market
(d)Price increase
Correct answer is : (a)


What type of investment did elon musk make?
(a)elon musk
(b)bitcoin
(c)musk
(d)bitcoin investment
Correct answer is : (d)


Along with tesla and neuralink, what is a notable venture of Elon Musk?
(a)Ula
(b)spacex
(c)Boeing
(d)Virgin