# Walk Through for Pure Automated Query Generation
This notebook outlines the process of generating novel questions from MULTIVAC's trained query generator based on automated analysis of MULTIVAC's semantic knowledge graph. 
First, we set up the required imports and arguments for the test. 

In [None]:
from multivac.get_kg_query_params import build_network, analyze_network
from multivac.src.rdf_graph.map_queries import *
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
from multivac.src.gan.gen_test import run
os.chdir('src/gan')

In [None]:
args_dict = {'dir': os.path.abspath('../../data'),
             'out': os.path.abspath('../../models'),
             'glove': '../../models/glove.42B.300d',
             'run': 'model',
             'model': 'transe',
             'threshold': 0.1,
             'num_top_rel': 10}

Next, we load up the knowledge graph embedding model previously calculated. This embedding model allows us to assign probabilities to missing nodes or relationships in the knowledge graph proposed via submitted queries. Here we are using TransE, an approach which models relationships by interpreting them as translations operating on the low-dimensional embeddings of entities.

In [None]:
con = config.Config()
con.set_in_path(args_dict['dir']+os.path.sep)
con.set_work_threads(8)
con.set_dimension(100)
con.set_test_link_prediction(True)
con.set_test_triple_classification(True)

files = glob.glob(os.path.join(args_dict['out'],'*tf*'))
times = list(set([file.split('.')[2] for file in files]))
ifile = max([datetime.strptime(x, '%d%b%Y-%H:%M:%S') for x in times]).strftime('%d%b%Y-%H:%M:%S')
con.set_import_files(os.path.join(args_dict['out'], 'model.vec.{}.tf'.format(ifile)))

con.init()
kem = set_model_choice(args_dict['model'])
con.set_model(kem)


files = [x for x in os.listdir(con.in_path) if '2id' in x]
rel_file = sorted([(os.path.getmtime(os.path.join(con.in_path, x)), x)
                        for x in files \
                        if 'relation' in  x])
rel_file = os.path.join(con.in_path, rel_file[-1][1])

ent_file = sorted([(os.path.getmtime(os.path.join(con.in_path, x)), x)
                        for x in files \
                        if 'entity' in  x])
ent_file = os.path.join(con.in_path, ent_file[-1][1])

trn_file = sorted([(os.path.getmtime(os.path.join(con.in_path, x)), x)
                        for x in files \
                        if 'train' in  x])
trn_file = os.path.join(con.in_path, trn_file[-1][1])

entities = pd.read_csv(ent_file, sep='\t', 
                       names=["Ent","Id"], skiprows=1)
relations = pd.read_csv(rel_file, sep='\t', 
                        names=["Rel","Id"], skiprows=1)
train = pd.read_csv(trn_file, sep='\t', 
                    names=["Head","Tail","Relation"], skiprows=1)

In [None]:
glove_vocab, glove_emb = load_word_vectors(args_dict['glove'])


In this scenario, we let the knowledge graph guide the topic selection and ultimate query generation. We feed the network graph node and edge tuples `networkx` which builds a graph model from them. 

In [None]:
network = build_network(train.apply(lambda x: tuple(x), axis=1))

We then apply graph analytic measures to identify key elements in the graph for exploration. Here for demonstration purposes we use eigenvector centrality, returning the ten nodes with the highest eigenvector centrality values in the knowledge graph. 

In [None]:
results = analyze_network(network, {'measure': 'eigenvector',
                                'num_results': 10})

In [None]:
ids = [x[0] for x in results]
key_nodes = entities.Ent[entities.Id.apply(lambda x: x in ids)]

key_nodes

Finally, we select a seed topic at random from these returns and extract the knowledge graph elements and predicted elements most related to that topic. The system identifies all triples containing the topic or closely semantically related to it, and returns the top num_top_rel results (by default, 10).

In [None]:
sample_topic = key_nodes.sample().values[0]

In [None]:
results = predict_object(con, sample_topic, relations, entities, train, glove_vocab, glove_emb, exact=False)

In [None]:
questions = results.Text.apply(lambda x: run({'query': list(x), 
                                              'model': os.path.join(args_dict['out'], 'gen_checkpoint.pth')}))

In [None]:
questions.values