Popularity Ranker re-ranks results obtained via :doc:`TF-IDF Ranker <tfidf_ranking>` using information about the number of article views. The number of Wikipedia articles views is an open piece of information which can be obtained via Wikimedia REST API. We assigned a mean number of views for the period since 2017/11/05 to 2018/11/05 to each article in our English Wikipedia database enwiki20180211.
The inner algorithm of Popularity Ranker is a Logistic Regression classifier based on 3 features:
- tfidf score of the article
- popularity of the article
- multiplication of two above features
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_ranker_pop_enwiki20180211.json
Building the model
from deeppavlov import configs from deeppavlov.core.commands.infer import build_model ranker = build_model(configs.doc_retrieval.en_ranker_pop_enwiki20180211, load_trained=True)
result = ranker(['Who is Ivan Pavlov?']) print(result[:5])
>> ['Ivan Pavlov', 'Vladimir Bekhterev', 'Classical conditioning', 'Valentin Pavlov', 'Psychology']
Text for the output titles can be further extracted with :class:`~deeppavlov.vocabs.wiki_sqlite.WikiSQLiteVocab` class.
Running the Ranker
About 17 GB of RAM required.
When interacting, the ranker returns document titles of the relevant documents.
Run the following to interact with the ranker:
python -m deeppavlov interact en_ranker_pop_enwiki20180211 -d
Available Data and Pretrained Models
Available information about Wikipedia articles popularity is downloaded to
and pre-trained logistic regression classifier is downloaded to
~/.deeppavlov/models/odqa/logreg_3features.joblib by default.