InformationRetrievalTasks

Implementation of the simple IR models:

Vector model (http://www.minerazzi.com/tutorials/term-vector-3.pdf)

Every document and request represent as vector in vector space. Relevance ranking of the documents calculates as a cosine the angle between the vector of request and the vector of document.

Language model (https://en.wikipedia.org/wiki/Language_model)

A statistical language model is a probability distribution over sequences of words. Relevance ranking calculates as a weighted sum of the probability of the request in the docoment and the probability in all documents (I use lambda coefficient to weight them).

Requirements

python3

pymorphy2

How to run

Vector model:

python stat_model.py [ARTICLE_PATH] [REQUEST_PATH]

Example: python stat_model.py ./data/art_2019/art_1.txt ./data/requests/req_2019.txt

Language model:

python language_model.py [ARTICLE_PATH] [REQUEST_PATH] [epsilon]

epsilon - parameter for smoothing

Example: python language_model.py ./data/art_2019/art_1.txt ./data/requests/req_2019.txt 1e-3

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
README.md		README.md
language_model.py		language_model.py
stat_model.py		stat_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InformationRetrievalTasks

Requirements

How to run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InformationRetrievalTasks

Requirements

How to run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages