Popularity Ranker re-ranks results obtained via TF-IDF Ranker <tfidf_ranking>
using information about the number of article views. The number of Wikipedia articles views is an open piece of information which can be obtained via Wikimedia REST API. We assigned a mean number of views for the period since 2017/11/05 to 2018/11/05 to each article in our English Wikipedia database enwiki20180211.
The inner algorithm of Popularity Ranker is a Logistic Regression classifier based on 3 features:
- tfidf score of the article
- popularity of the article
- multiplication of two above features
The classifier is trained on SQuAD-v1.1 train set.
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_ranker_pop_enwiki20180211.json
Building the model
from deeppavlov import build_model, configs
ranker = build_model(configs.doc_retrieval.en_ranker_pop_enwiki20180211, download=True)
Inference
result = ranker(['Who is Ivan Pavlov?'])
print(result[:5])
Output
>> ['Ivan Pavlov', 'Vladimir Bekhterev', 'Classical conditioning', 'Valentin Pavlov', 'Psychology']
Text for the output titles can be further extracted with ~deeppavlov.vocabs.wiki_sqlite.WikiSQLiteVocab
class.
Default ranker config is doc_retrieval/en_ranker_pop_enwiki20180211.json <doc_retrieval/en_ranker_pop_enwiki20180211.json>
Note
About 17 GB of RAM required.
When interacting, the ranker returns document titles of the relevant documents.
Run the following to interact with the ranker:
python -m deeppavlov interact en_ranker_pop_enwiki20180211 -d
Available information about Wikipedia articles popularity is downloaded to ~/.deeppavlov/downloads/odqa/popularities.json
and pre-trained logistic regression classifier is downloaded to ~/.deeppavlov/models/odqa/logreg_3features.joblib
by default.