# Walk Through for Directed Query Generation
This notebook outlines the process of generating novel questions based on a user's seed topic using MULTIVAC's semantic knowledge graph and trained query generator. 
First, we set up the required imports and arguments for the test. 

In [None]:
from multivac.src.rdf_graph.map_queries import *
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
from multivac.src.gan.gen_test import run
os.chdir('src/gan')

In [None]:
args_dict = {'dir': os.path.abspath('../../data'),
             'out': os.path.abspath('../../models'),
             'glove': '../../models/glove.42B.300d',
             'run': 'model',
             'model': 'transe',
             'threshold': 0.1,
             'num_top_rel': 10}


Next, we load up the knowledge graph embedding model previously calculated. This embedding model allows us to assign probabilities to missing nodes or relationships in the knowledge graph proposed via submitted queries. Here we are using TransE, an approach which models relationships by interpreting them as translations operating on the low-dimensional embeddings of entities.

In [None]:
con = config.Config()
con.set_in_path(args_dict['dir']+os.path.sep)
con.set_work_threads(8)
con.set_dimension(100)
con.set_test_link_prediction(True)
con.set_test_triple_classification(True)

files = glob.glob(os.path.join(args_dict['out'],'*tf*'))
times = list(set([file.split('.')[2] for file in files]))
ifile = max([datetime.strptime(x, '%d%b%Y-%H:%M:%S') for x in times]).strftime('%d%b%Y-%H:%M:%S')
con.set_import_files(os.path.join(args_dict['out'], 'model.vec.{}.tf'.format(ifile)))

con.init()
kem = set_model_choice(args_dict['model'])
con.set_model(kem)


files = [x for x in os.listdir(con.in_path) if '2id' in x]
rel_file = get_newest_file(con.in_path, files, 'relation')
ent_file = get_newest_file(con.in_path, files, 'entity')
trn_file = get_newest_file(con.in_path, files, 'train')

entities = pd.read_csv(ent_file, sep='\t', 
                       names=["Ent","Id"], skiprows=1)
relations = pd.read_csv(rel_file, sep='\t', 
                        names=["Rel","Id"], skiprows=1)
train = pd.read_csv(trn_file, sep='\t', 
                    names=["Head","Tail","Relation"], skiprows=1)

We then set up a GloVe embedding model. Here we use the large scale, pre-trained GloVe embedding model given the open domain nature of potential submitted questions.

In [None]:
glove_vocab, glove_emb = load_word_vectors(args_dict['glove'])


Finally, we input our seed topic and extract the knowledge graph elements and predicted elements most related to that topic. The system identifies all triples containing the topic or closely semantically related to it, and returns the top `num_top_rel` results (by default, 10).

In [None]:
sample_topic = 'avian flu'

In [None]:
results = predict_object(con, sample_topic, relations, entities, train, glove_vocab, glove_emb, exact=False)

These results are then fed to the query generator, which produces questions in response to each topic. The `run()` function called below does two main things: 1) submit the "query" triples to the Generator system to be parsed into a tree object representing the consituency parse of an English language question, and 2) translate that parse into the surface text for presentation:
```python
    results = netG.parse(query, beam_size=netG.args['beam_size'])
    texts = [asdl_ast_to_english(x.tree) for x in results]

    return texts
```

In [None]:
questions = results.Text.apply(lambda x: run({'query': list(x), 
                                              'model': os.path.join(args_dict['out'], 'gen_checkpoint.pth')}))

In [None]:
questions.values