## 1- Using AllenNLP in Python

Codes are obtained from AllenNLP demo site. There are more tasks there. We will examine only three of them.

This one works better:
https://demo.allennlp.org (check the tab named "usage", in each demo)

This one does not work sometimes:
https://docs.allennlp.org/v1.0.0rc3/tutorials/getting_started/using_pretrained_models/


### a- Reading Comprehension (Question Answering)

In [1]:
!pip install allennlp allennlp_models

Collecting allennlp
[?25l  Downloading https://files.pythonhosted.org/packages/72/f5/f4dd3424b3ae9dec0a55ae7f7f34ada3ee60e4b10a187d2ba7384c698e09/allennlp-1.3.0-py3-none-any.whl (506kB)
[K     |████████████████████████████████| 512kB 7.6MB/s 
[?25hCollecting allennlp_models
[?25l  Downloading https://files.pythonhosted.org/packages/5e/e8/3eab9b645a1bd4abac892229952572cd8df5b5de3bd316774f845cbd10f1/allennlp_models-1.3.0-py3-none-any.whl (378kB)
[K     |████████████████████████████████| 378kB 14.6MB/s 
Collecting tensorboardX>=1.2
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |████████████████████████████████| 317kB 19.2MB/s 
Collecting jsonpickle
  Downloading https://files.pythonhosted.org/packages/ee/d5/1cc282dc23346a43aab461bf2e8c36593aacd34242bee1a13fa750db0cfe/jsonpickle-1.4.2-py2.py3-none-any.whl
Collecting jsonnet>=0.10.0; sys_platform != "wi

In [2]:
from allennlp.predictors.predictor import Predictor
import allennlp_models.rc

In [3]:
qa_model = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bidaf-elmo-model-2020.03.19.tar.gz")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


Plugin allennlp_models could not be loaded: No module named 'nltk.translate.meteor_score'
downloading: 100%|##########| 418130723/418130723 [00:07<00:00, 57778103.11B/s]
downloading: 100%|##########| 336/336 [00:00<00:00, 93896.07B/s]
downloading: 100%|##########| 374434792/374434792 [00:05<00:00, 72255837.53B/s]


In [4]:
results = qa_model.predict(
  passage="The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano.",
  question="Who stars in The Matrix?")

In [5]:
print(' '.join(results['passage_tokens']), ' '.join(results['question_tokens']), results['best_span_str'], sep = '\n\n')

The Matrix is a 1999 science fiction action film written and directed by The Wachowskis , starring Keanu Reeves , Laurence Fishburne , Carrie - Anne Moss , Hugo Weaving , and Joe Pantoliano .

Who stars in The Matrix ?

Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano


### b- Named Entity Recognition (Tagging, Entity Extraction)

In [6]:
import allennlp_models.tagging

In [7]:
ner_model = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/ner-model-2020.02.10.tar.gz")

ERROR:allennlp.common.plugins:Plugin allennlp_models could not be loaded: No module named 'nltk.translate.meteor_score'
downloading: 100%|##########| 711853497/711853497 [00:10<00:00, 66611343.39B/s]


In [8]:
results = ner_model.predict(
  sentence="Did Uriah honestly think he could beat The Legend of Zelda in under three hours?")

In [11]:
results.keys()

dict_keys(['logits', 'mask', 'tags', 'words'])

In [9]:
for word, tag in zip(results["words"], results["tags"]):
    print(f"{word}\t{tag}")

Did	O
Uriah	U-PER
honestly	O
think	O
he	O
could	O
beat	O
The	B-MISC
Legend	I-MISC
of	I-MISC
Zelda	L-MISC
in	O
under	O
three	O
hours	O
?	O


### c- Sentiment Analysis

In [12]:
import allennlp_models.classification

In [13]:
sentiment_classifier = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz")

ERROR:allennlp.common.plugins:Plugin allennlp_models could not be loaded: No module named 'nltk.translate.meteor_score'
downloading: 100%|##########| 37033341/37033341 [00:00<00:00, 41165720.57B/s]


In [22]:
# Review of the movie "Joker" on Rotten Tomatoes
review = "A movie of a cynicism so vast and pervasive as to \
          render the viewing experience even emptier than its \
          slapdash aesthetic does."

In [23]:
result = sentiment_classifier.predict(sentence=review)

In [24]:
print('Positive with confidence of', result['probs'][0]) if result['label']=='1' else print('Negative with confidence of', result['probs'][1])

Negative with confidence of 0.8122999668121338


## 2- Using HuggingFace in Python

If you have either PyTorch >= 1.1 or TensorFlow >= 2.0, run the following command to install HuggingFace:

```pip install transformers```

### Using "Pipeline" for Sentiment Analysis

In [16]:
!pip install transformers



In [17]:
from transformers import pipeline

print(pipeline('sentiment-analysis')('we love you'))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…


[{'label': 'POSITIVE', 'score': 0.9998704791069031}]


### Using "Pipeline" for Question Answering

In [18]:
nlp = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
"""

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




In [19]:
result = nlp(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = nlp(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the task of extracting an answer from a text given a question', score: 0.6226, start: 34, end: 95
Answer: 'SQuAD dataset', score: 0.5053, start: 147, end: 160


### Text Generation with GPT-2

For this example, tensorflow >= 2.1 should be installed on the system

In [33]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [34]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='tf')

In [35]:
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=False))

Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to take a break


### Conversational Agent

In [28]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=642.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1752292117.0, style=ProgressStyle(descr…




KeyboardInterrupt: ignored