# LDA evaluation notebook

We use an interactive notebook to evaluate our summarization models using our LDA model.

The notebook uses custom-modules defined in other files, but to prevent ourselves from re-loading the data during training, it is easier to use a notebook.

### Setup logging

In [None]:
import logging
from logging import config
config.fileConfig('./logging.conf')

def pp(*args, **kwargs):
    logging.info(*args, **kwargs)

### Resource paths

In [None]:
import os
cwd = os.getcwd()

data_path = f'{cwd}/bart_output.json'
model_path = f'{cwd}/model/grid-xxx'
tf_idf_path = f'{cwd}/tf_idf'

### Load pre-computed resources

In [None]:
from gensim.models import TfidfModel

tf_idf = TfidfModel.load(tf_idf_path)

In [None]:
from lda_model import LdaModel

lda = LdaModel.load(model_path)

### Load data

We use our self-made JSON file that stores the original article and abstract (part of the dataset) and the BART model summary

In [None]:
import json

with open(data_path) as fin:
    data = json.load(fin)

articles = [doc['article'] for doc in data]
abstracts = [doc['abstract'] for doc in data]
summaries = [doc['bart'] for doc in data]

### Tokenize and pre-process test data

The LDA model expects a BOW input (in our case TF-IDF), not strings. Hence we need to convert each of the texts into the expected format.

In [None]:
from generate_preprocessed import PreProcessor
from generate_bow import BowProcessor
from generate_tf_idf import TfIdfProcessor

pp_processor = PreProcessor()
bow_processor = BowProcessor(lda.dictionary)
tf_idf_processor = TfIdfProcessor(tf_idf)

articles_pp = pp_processor(articles)
abstracts_pp = pp_processor(abstracts)
summaries_pp = pp_processor(summaries)

articles_bow = bow_processor(articles_pp)
abstracts_bow = bow_processor(abstracts_pp)
summaries_bow = bow_processor(summaries_pp)

articles_tf_idf = tf_idf_processor(articles_bow)
abstracts_tf_idf = tf_idf_processor(abstracts_bow)
summaries_tf_idf = tf_idf_processor(summaries_bow)

### Evaluate the topics for each doc and calculate distances

For every original article, we have two gists: one human-made (abstract) and one computer-made (summary).  
We calculate the distance between the two pair (original, abstract) and (original, summary), and examine which one retains topics better.

In [None]:
from lda_eval import LdaEvaluator

evaluator = LdaEvaluator(lda)

human_better = 0
comp_better = 0

for article, abstract, summary in zip(articles_tf_idf, abstracts_tf_idf, summaries_tf_idf):
    human_dist = evaluator.distance(article, abstract)
    comp_dist = evaluator.distance(article, summary)
    diff = abs(human_dist - comp_dist)
    pp(f'{human_dist:.3f}, {comp_dist:.3f} --> {diff:.3f}')
    if human_dist < comp_dist:
        human_better += 1
    else:
        comp_better += 1

pp('---------------------------------------------------')
pp(f'Human [{human_better}] vs. Comp [{comp_better}]')
