This notebook allows to explore the results of predicting the `click_rate` from `source_article` to `target_article` using different models (Doc2Vec, Wikipedia2Vec, Smash-RNN Paragraph Level, Smash-RNN Sentence Level and Smash-RNN Word Level).

The class `ResultsAnalyzer` encapsules the logic to compute the results. Main features:
- `get_ndcg_for_all_models`: Calculates the Normalized Discounted Cumulative Gain for each model
- `get_map_for_all_models`: Calculates the Mean Average Precision for each model
- `get_top_5_predicted_by_article_and_model(source_article, model)`: Gets the top 5 predictions for the `source_article`. The column `is_in_top_5` shows if the `target_article` is in the **actual** top 5 click rate.
- `ResultsAnalyzer.results`: It is a Pandas Datafram containing the consolidated results
- `get_sample_source_articles`: Samples 10 random `source_articles`. Can be used to manually check the results

In [1]:
import pandas as pd
from results_analyzer import ResultsAnalyzer

results_analyzer = ResultsAnalyzer()

Getting NDCG for all models:

In [2]:
results_analyzer.get_ndcg_for_all_models()

[2020-06-08 18:25:56,717] [INFO] Calculating NDCG for each model (get_ndcg_for_all_models@results_analyzer.py:186)
100%|██████████| 5/5 [01:05<00:00, 13.19s/it]


{'doc2vec': 0.767686533154876,
 'wikipedia2vec': 0.7934964048031573,
 'word': 0.7690180439706009,
 'sentence': 0.7667518345306319,
 'paragraph': 0.761542096405753}

Getting MAP for all models:

In [None]:
results_analyzer.get_map_for_all_models()

Getting a sample of the results

In [None]:
results_analyzer.results.sample(n=10)

Getting a sample of the source articles

In [None]:
results_analyzer.get_sample_source_articles()

Getting all the available models (models `paragraph`, `sentence` and `word` refer to Smash-RNN levels.)

In [None]:
results_analyzer.get_models()

Getting the top 5 predictions for a `source_article` and a `model`

In [None]:
sample_source_article = "Gerald Ford"
model = "sentence"

results_analyzer.get_top_5_predicted_by_article_and_model(sample_source_article, model)

Next steps:
- Create some analytics to understand better the results for each model (I will need help here!)