In this notebook we'll explore all the different models of the Parl AI project wizard of wikipedia

In [2]:
import yaml
from typing import Dict

from parlai.core.agents import create_agent
from parlai.agents.image_seq2seq.image_seq2seq import ImageSeq2seqAgent
from parlai.scripts.interactive import setup_args

In [3]:
models = {
    'end2end': 'zoo:wizard_of_wikipedia/end2end_generator/model',
    'full_dialogue': 'zoo:wizard_of_wikipedia/full_dialogue_retrieval_model/model',
    'tf_idf': 'zoo:wikipedia_full/tfidf_retriever/model',
    'knolwedge_retriever': 'zoo:wizard_of_wikipedia/knowledge_retriever/model',
}

## Knowledge Retriever

In [63]:
parser = setup_args()
parser.set_params(model_file=models['knolwedge_retriever'])
opt = parser.parse_kwargs()
parlai_agent = create_agent(opt, requireModelExists=True)

19:22:21 | [33mOverriding opt["model_file"] to /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/knowledge_retriever/model (previously: /checkpoint/edinan/20191119/wizard_doc_reader/lr=1e-05_lr-scheduler-patience=3_lr-scheduler-decay=0.9_warmupupdates=2000/model)[0m
19:22:21 | loading dictionary from /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/knowledge_retriever/model.dict
19:22:21 | num words = 54944
19:22:21 | Biencoder: full interactive mode on.
19:22:21 | Setting fixed_candidates path to: /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/knowledge_retriever/model.cands-wizard_of_wikipedia:docreader.cands
19:22:24 | Total parameters: 256,081,920 (256,081,920 trainable)
19:22:24 | Loading existing model parameters from /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site

In [76]:
parlai_agent.observe({'text': 'Flying', 'episode_done': True})

{'text': 'Flying',
 'episode_done': True,
 'full_text': 'Flying',
 'text_vec': tensor([   1, 2146,    1]),
 'full_text_vec': [2146],
 'context_original_length': 1,
 'context_truncate_rate': False,
 'context_truncated_length': 0,
 'added_start_end_tokens': True}

In [77]:
parlai_agent.act()['text']

'An airplane or aeroplane (informally plane) is a powered, fixed-wing aircraft that is propelled forward by thrust from a jet engine or propeller.'

## End2End

In [37]:
parser = setup_args()
parser.set_params(model_file=models['end2end'])
opt = parser.parse_kwargs()
parlai_agent = create_agent(opt, requireModelExists=True)

19:10:52 | [33mOverriding opt["model_file"] to /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/end2end_generator/model (previously: /tmp/wizard_endtoend_model)[0m
19:10:52 | [33mOld model inference method inferred as greedy[0m
19:10:52 | [33mLoading model with `--beam-block-full-context false`[0m
19:10:52 | loading dictionary from /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/end2end_generator/model.dict
19:10:52 | num words = 34883
19:10:52 | EndToEnd: full interactive mode on.
19:10:52 | Total parameters: 15,585,024 (15,519,488 trainable)
19:10:52 | Loading existing model params from /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/end2end_generator/model


In [38]:
parlai_agent.observe({'text': 'Gardening', 'episode_done': False})

{'text': 'Gardening',
 'episode_done': False,
 'full_text': 'Gardening',
 'text_vec': tensor([2450]),
 'full_text_vec': [2450],
 'context_original_length': 1,
 'context_truncate_rate': False,
 'context_truncated_length': 0}

In [39]:
parlai_agent.act()['text']

'i love to shop , i love to shop at a small container , what about you ?'

## Full dialogue

In [11]:
from projects.wizard_of_wikipedia.wizard_transformer_ranker.wizard_transformer_ranker import WizardTransformerRankerAgent
parser = setup_args()
WizardTransformerRankerAgent.add_cmdline_args(parser, partial_opt=None)
parser.set_params(
    task='wizard_of_wikipedia',
    model='projects:wizard_of_wikipedia:wizard_transformer_ranker',
    model_file='models:wizard_of_wikipedia/full_dialogue_retrieval_model/model',
    datatype='test',
    n_heads=6,
    ffn_size=1200,
    embeddings_scale=False,
    delimiter=' __SOC__ ',
    n_positions=1000,
    legacy=True,
    eval_candidates='fixed',
    interactive_mode=True,
)
opt = parser.parse_kwargs()
parlai_agent = create_agent(opt, requireModelExists=True)

20:02:46 | [33mOverriding opt["task"] to wizard_of_wikipedia (previously: internal:wizard_of_perzona:WizardDialogKnowledge,squad:sentence)[0m
20:02:46 | [33mOverriding opt["model"] to projects:wizard_of_wikipedia:wizard_transformer_ranker (previously: internal:transformer_ranker)[0m
20:02:46 | [33mOverriding opt["model_file"] to /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/full_dialogue_retrieval_model/model (previously: /checkpoint/edinan/20180908/transformer_wizard_response_finetune_squad_NEW/learn-embeddings=False/model)[0m
20:02:46 | [33mOverriding opt["datatype"] to test (previously: train)[0m
20:02:46 | loading dictionary from /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wizard_of_wikipedia/full_dialogue_retrieval_model/model.dict
20:02:46 | num words = 250004
20:02:46 | WizardTransformerRanker: full interactive mode on.
20:02:46 | Setting fixed_candidates p

### Ireland test

In [12]:
parlai_agent.observe({'text': 'Ireland', 'episode_done': False})
parlai_agent.act()['text']

'Hi, have you ever been to Dublin? It is the capital and largest city in Ireland.'

In [13]:
parlai_agent.observe({'text': 'no, i haven\'t but would love to', 'episode_done': False})
parlai_agent.act()['text']

"Make sure to go to Dublin then! It's a 1000 year old city. Plus the Guinness Museum is there."

In [14]:
parlai_agent.observe({'text': 'I’ve always found Ireland to be fascinating and would love to visit sometime', 'episode_done': False})
parlai_agent.act()['text']

'Same here, or really anywhere in Ireland honestly. Dublin is the best choice though.'

### Chatbot

In [79]:
topic = input('Enter topic: ')
message = input('Enter message: ')
parlai_agent.observe({'text': message, 'topic': topic, 'episode_done': False})

while True:
    if message == '[EXIT]':
        break
    response = parlai_agent.act()
    print(response['text'])
    message = input('Enter message: ')
    print(parlai_agent.observe({'text': message, 'episode_done': False}))


The first e-books were typed in plain text format and published as text files; other formats were made available later.
{'text': 'yes i like the physical fell and smell of a real book', 'episode_done': False, 'full_text': 'i do not know why, but I have never gotten into e-books\nAn electronic book (or e-book) is a book publication made available in digital form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices.\n[EXiT]\nLOL, or lol, is an acronym for laugh(ing) out loud or lots of laughs, and a popular element of Internet slang.\n[EXIT]]\ni do not know why but i have never gotten into e-books\nThe first e-books were typed in plain text format and published as text files; other formats were made available later.\nyes i like the physical fell and smell of a real book', 'text_vec': tensor([    1,    14,    59,    48,    70,   129,     6,    45,    14,    49,
          154,  2977,    88,   289,    23,  1089,     4,    50,  401

In [48]:
parlai_agent.observe({'text': 'Artificial Intelligence','topic': 'Computer Science' ,'episode_done': False})

{'text': 'Artificial Intelligence',
 'topic': 'Computer Science',
 'episode_done': False,
 'full_text': 'Artificial Intelligence',
 'text_vec': tensor([250003,  39193,  16243]),
 'full_text_vec': [39193, 16243],
 'context_original_length': 2,
 'context_truncate_rate': False,
 'context_truncated_length': 0,
 'memory_vecs': []}

In [49]:
parlai_agent.act()['text']

"I'd ask you if I was really an A.I. (artificial intelligence.) all my life..."

In [50]:
parlai_agent.observe({'text': 'What are your favorite plants?', 'episode_done': False})

{'text': 'What are your favorite plants?',
 'episode_done': False,
 'full_text': "Artificial Intelligence __SOC__ I'd ask you if I was really an A.I. (artificial intelligence.) all my life... __SOC__ What are your favorite plants?",
 'text_vec': tensor([250003,  39193,  16243, 250003,      5,      4,    132,    423,     12,
             49,      5,     31,     83,     63,    206,      0,      5,      0,
             26,   7624,   2986,      0,     23,     54,     39,    231,      0,
              0,      0, 250003,    191,     33,     47,    616,   3307,     22]),
 'full_text_vec': [39193,
  16243,
  250003,
  5,
  4,
  132,
  423,
  12,
  49,
  5,
  31,
  83,
  63,
  206,
  0,
  5,
  0,
  26,
  7624,
  2986,
  0,
  23,
  54,
  39,
  231,
  0,
  0,
  0,
  250003,
  191,
  33,
  47,
  616,
  3307,
  22],
 'context_original_length': 35,
 'context_truncate_rate': False,
 'context_truncated_length': 0,
 'memory_vecs': []}

In [51]:
parlai_agent.act()['text']

'All kinds but I am really interested in cultivated plant taxonomy. I like to see how all of the flowers are related.'

## TF-IDF

In [88]:
parser = setup_args()
parser.set_params(model_file=models['tf_idf'])
opt = parser.parse_kwargs()
parlai_agent = create_agent(opt, requireModelExists=True)

19:32:10 | [33mOverriding opt["model_file"] to /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wikipedia_full/tfidf_retriever/model (previously: wiki_full_notitle)[0m
19:32:10 | Loading /Users/gianmarcodonofrio/miniconda/envs/nlp-project/lib/python3.11/site-packages/data/models/wikipedia_full/tfidf_retriever/model.tfidf


In [89]:
parlai_agent.observe({'text': 'Gardening'})

{'text': 'Gardening'}

In [90]:
print(parlai_agent.act()['text'])


Gardening is the practice of growing and cultivating plants as part of horticulture. In gardens, ornamental plants are often grown for their flowers, foliage, or overall appearance; useful plants, such as root vegetables, leaf vegetables, fruits, and herbs, are grown for consumption, for use as dyes, or for medicinal or cosmetic use. Gardening is considered by many people to be a relaxing activity.

Gardening ranges in scale from fruit orchards, to long boulevard plantings with one or more different types of shrubs, trees, and herbaceous plants, to residential yards including lawns and foundation plantings, to plants in large or small containers grown inside or outside. Gardening may be very specialized, with only one type of plant grown, or involve a large number of different plants in mixed plantings. It involves an active participation in the growing of plants, and tends to be labor-intensive, which differentiates it from farming or forestry.

Forest gardening, a forest-based food 