In [1]:
!pip install transformers
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 4.2 MB/s 
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.97


In [2]:
from transformers import T5ForConditionalGeneration, AutoTokenizer

import spacy
nlp = spacy.load("en_core_web_sm")   # effeciency, works great for this project

import re

In [3]:
def get_question(sentence, answer):
    """
    generates question for the sentence based on a given answer
    """

    mdl = T5ForConditionalGeneration.from_pretrained('mrm8488/t5-base-finetuned-question-generation-ap')
    tknizer = AutoTokenizer.from_pretrained('mrm8488/t5-base-finetuned-question-generation-ap')

    text = "context: {} answer: {}".format(sentence, answer)
    max_len = 256
    encoding = tknizer.encode_plus(text,
                                   max_length=max_len, 
                                   pad_to_max_length=False,
                                   truncation=True, 
                                   return_tensors="pt")

    input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

    outs = mdl.generate(input_ids=input_ids,
                        attention_mask=attention_mask,
                        early_stopping=True,
                        num_beams=5,
                        num_return_sequences=1,
                        no_repeat_ngram_size=2,
                        max_length=300)

    dec = [tknizer.decode(ids, skip_special_tokens=True) for ids in outs]

    Question = dec[0].replace("question:","")
    Question = Question.strip()
    Question = re.sub(r"I am", "you are", Question)
    Question = re.sub(r"I", "you", Question)
    Question = re.sub(r"my", "your", Question)
    
    return Question

> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 18px"> First Pass: let's provide no answer, and let the model come up with its own question:

In [4]:
# ADJ is present before noun
sentence = "I feel like I am living under a dark, heavy cloud!!!"
get_question(sentence, [])

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'What kind of cloud do you feel like you are living under?'

In [5]:
# ADJ is present, but not before the word "forest" or "way"; the ADJ's are in the next sentences
sentence = "I am trying to hack my way through a forest; the way is unclear, and the forest is dense."
get_question(sentence, [])

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'How do you hack your way through a forest?'

In [6]:
# two metaphors
sentence = "I am trying to hack my way through a dense forest, and also feel like I am living under a dark cloud!"
get_question(sentence, [])

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'What is the feeling of living under a dark cloud?'

In [7]:
# No ADJ
sentence = "I am trying to hack my way through a forest"
get_question(sentence, [])

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'How do you hack your way through a forest?'

In [8]:
# No ADJ
sentence = "in my heart, that felt like a dagger"
get_question(sentence, [])

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'What felt like a dagger?'

> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 18px"> these `t5` generated questions look pretty good!

> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 18px">we can also try to come up with an "answer" for model input:

>> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 16px"> After testing different parts of speech for our answer, I found that if we target the ADJ in the utterance as the answer, the model generates questions that are very similar to clean-language questions. So, picking ADJ's is going to be our first way of providing an answer for the model.
    
>> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 16px"> If there is no ADJ, we can use "pobj" (object of a preposition) as the answer.

In [9]:
sentence = "I feel like I am living under a dark, heavy cloud!!!"
answer = "dark"   # ADJ

get_question(sentence, answer)

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'What kind of cloud do you feel like you are living under?'

In [10]:
sentence = "I am trying to hack my way through a dense forest"
answer = "dense"   # ADJ

get_question(sentence, answer)

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'How dense is the forest?'

In [11]:
sentence = "I am trying to hack my way through a forest"
answer = "forest"   # POBJ

get_question(sentence, answer)

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


'you are trying to hack your way through a what?'

> <span style="color:LightSalmon"> <span style='font-family:sans-serif'> <span style="font-size: 18px"> We can see that the last question is not quite what we want as a clean-language question; so, I did some basic text tweaking for when POBJ is going to be the answer:

In [12]:
# custom fn: get OBJ from text as a list
def extract_obj(text):
    doc = nlp(text)
    obj=[]
    
    for token in doc:
        if re.findall(r'obj', token.dep_) == ['obj']:  # if token dependency has 'obj' in it --> dobj (direct object), pobj (object of a preposition)
            obj.append(token.text)
     
    return list(set(obj))   # set for getting unique obj's

In [13]:
# custom fn: get ADJ from text as alist
def extract_adj(text):
    doc = nlp(text)
    adj=[]
    
    for token in doc:
        if token.pos_ == 'ADJ':
            adj.append(token.text)
     
    return list(set(adj))   # set for getting unique obj's

In [14]:
def get_question_full():
    if len(extract_adj(sentence)) == 0:           # if no ADJ
        answer = extract_obj(sentence)
        answer = " and the ".join(answer[:2])     # for now, if more than one pobj exists in the utterance, we only get the first two pobj's
        print(f'Can you describe/tell me more about the {answer}?')   
    else:
        answer = extract_adj(sentence)
        answer = " and ".join(answer[:1])          # for now, if more than one adj, we only get the first one
        print(get_question(sentence, answer))

In [15]:
sentence = "I am trying to hack my way through a forest"
get_question_full()

Can you describe/tell me more about the way and the forest?


In [16]:
sentence = "in my heart, that felt like a dagger"
get_question_full()

Can you describe/tell me more about the heart and the dagger?
