<a href="https://colab.research.google.com/github/adi-sharma707/Prepquest/blob/main/PrepQuest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installation of libraries

In [None]:
!pip install --quiet flashtext==2.7
!pip install git+https://github.com/boudinfl/pke.git

In [None]:
!pip install --quiet transformers==4.8.1
!pip install --quiet sentencepiece==0.1.95
!pip install --quiet textwrap3==0.9.2
!pip install --quiet gradio==3.0.20

In [None]:
!pip install --quiet strsim==0.0.3
!pip install --quiet sense2vec==2.0.0

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.4/42.4 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install --quiet ipython-autotime
%load_ext autotime

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25htime: 375 µs (started: 2023-11-29 05:08:29 +00:00)


In [None]:
!pip install sentence-transformers==2.2.2

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:
from textwrap3 import wrap

text = """An external force is a force originating from outside an object rather than a force internal to an object. For instance, the force of gravity that Earth exerts on the moon is an external force on the moon. However, the force of gravity that the inner core of the moon exerts on the outer crust of the moon is an internal force on the moon. Internal forces within an object can't cause a change in that object's overall motion."""

for wrp in wrap(text, 150):
  print (wrp)
print ("\n")

An external force is a force originating from outside an object rather than a force internal to an object. For instance, the force of gravity that
Earth exerts on the moon is an external force on the moon. However, the force of gravity that the inner core of the moon exerts on the outer crust of
the moon is an internal force on the moon. Internal forces within an object can't cause a change in that object's overall motion.




# **Summarization with T5**

In [None]:
import torch
from transformers import T5ForConditionalGeneration,T5Tokenizer
summary_model = T5ForConditionalGeneration.from_pretrained('t5-base')
summary_tokenizer = T5Tokenizer.from_pretrained('t5-base')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
summary_model = summary_model.to(device)


In [None]:
import random
import numpy as np

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(42)

In [None]:
import nltk
import pke
nltk.download('punkt')
nltk.download('brown')
nltk.download('wordnet')
from nltk.corpus import wordnet as wn
from nltk.tokenize import sent_tokenize

def postprocesstext (content):
  final=""
  for sent in sent_tokenize(content):
    sent = sent.capitalize()
    final = final +" "+sent
  return final


def summarizer(text,model,tokenizer):
  text = text.strip().replace("\n"," ")
  text = "summarize: "+text
  # print (text)
  max_len = 512
  encoding = tokenizer.encode_plus(text,max_length=max_len, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)

  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=3,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  min_length = 75,
                                  max_length=300)


  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]
  summary = dec[0]
  summary = postprocesstext(summary)
  summary= summary.strip()

  return summary


summarized_text = summarizer(text,summary_model,summary_tokenizer)


print ("\noriginal Text >>")
for wrp in wrap(text, 150):
  print (wrp)
print ("\n")
print ("Summarized Text >>")
for wrp in wrap(summarized_text, 150):
  print (wrp)
print ("\n")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...



original Text >>
An external force is a force originating from outside an object rather than a force internal to an object. For instance, the force of gravity that
Earth exerts on the moon is an external force on the moon. However, the force of gravity that the inner core of the moon exerts on the outer crust of
the moon is an internal force on the moon. Internal forces within an object can't cause a change in that object's overall motion.


Summarized Text >>
The force of gravity that earth exerts on the moon is an external force. The internal force within an object can't cause a change in that object's
overall motion, says sanjay gupta, dr. joe schmidt and edward mccartney jnr, respectively.




# **Answer Span Extraction (Keywords and Noun Phrases)**

In [None]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import string

import traceback

def get_nouns_multipartite(content):
    out=[]
    try:
        extractor = pke.unsupervised.MultipartiteRank()
        extractor.load_document(input=content,language='en')
        #    not contain punctuation marks or stopwords as candidates.
        pos = {'PROPN','NOUN'}
        stoplist = list(string.punctuation)
        stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
        stoplist += stopwords.words('english')
        extractor.candidate_selection(pos=pos)
        extractor.candidate_weighting(alpha=1.1,
                                      threshold=0.75,
                                      method='average')
        keyphrases = extractor.get_n_best(n=15)


        for val in keyphrases:
            out.append(val[0])
    except:
        out = []
        traceback.print_exc()

    return out

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("This is a sentence.")
print([(w.text, w.pos_) for w in doc])

[('This', 'PRON'), ('is', 'AUX'), ('a', 'DET'), ('sentence', 'NOUN'), ('.', 'PUNCT')]


In [None]:
from flashtext import KeywordProcessor


def get_keywords(originaltext,text):
  keywords = get_nouns_multipartite(originaltext)
  print ("keywords unsummarized: ",keywords)
  keyword_processor = KeywordProcessor()
  for keyword in keywords:
    keyword_processor.add_keyword(keyword)

  keywords_found = keyword_processor.extract_keywords(text)
  keywords_found = list(set(keywords_found))
  print ("keywords_found in summarized: ",keywords_found)

  important_keywords =[]
  for keyword in keywords:
    if keyword in keywords_found:
      important_keywords.append(keyword)

  return important_keywords[:4]


imp_keywords = get_keywords(text,summarized_text)
print (imp_keywords)


keywords unsummarized:  ['force', 'moon', 'object', 'gravity', 'earth exerts', 'instance', 'core', 'crust', 'change', 'moon exerts', 'motion']
keywords_found in summarized:  ['force', 'moon', 'object', 'earth exerts', 'change', 'gravity', 'motion']
['force', 'moon', 'object', 'gravity']


# **Question generation with T5**

In [None]:
question_model = T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_squad_v1')
question_tokenizer = T5Tokenizer.from_pretrained('ramsrigouthamg/t5_squad_v1')
question_model = question_model.to(device)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.86k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

In [None]:
def get_question(context,answer,model,tokenizer):
  text = "context: {} answer: {}".format(context,answer)
  encoding = tokenizer.encode_plus(text,max_length=384, pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)
  input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

  outs = model.generate(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  early_stopping=True,
                                  num_beams=5,
                                  num_return_sequences=1,
                                  no_repeat_ngram_size=2,
                                  max_length=72)

  dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]


  Question = dec[0].replace("question:","")
  Question= Question.strip()
  return Question



for wrp in wrap(text, 150):
  print (wrp)
print ("\n")

for answer in imp_keywords:
  ques = get_question(text,answer,question_model,question_tokenizer)
  print (ques)
  print (answer.capitalize())
  print ("\n")


The newton’s law of universal gravitation shows us that the gravitational force (F) of attraction between two substance let their masses be (m1) and
(m2), separated by the distance (r). Kepler’s laws of planetary motion state that:  (a) All planets move in elliptical orbits with the Sun at one of
the focal points  (b) The radius vector drawn from the Sun to a planet sweeps out equal areas in equal time intervals. This follows from the fact that
the force of gravitation on the planet is central and hence angular momentum is conserved.  (c) The square of the orbital period of a planet is
proportional to the cube of the semi-major axis of the elliptical orbit of the planet


What move in elliptical orbits with the Sun at one of the focal points?
Planets


What force is central on a planet?
Gravitation


What is gravitational force?
Force


Kepler's laws state that all planets move in elliptical orbits with what at one of the focal points?
Sun


time: 1.93 s (started: 2023-05-16 07:39:57 +