This repository contains all the relevant scripts and code for the course IR: Information Retrieval
.
- Verify
Zipf's Law
for the provided dataset - Implement Porter's Stemmer and stemmer for Bengali
- Explore the rules for the above
- Implement pre-processing (tokenization, spot word removal and stemming)
- Create a document index from the same
- Implement boolean retrieval based on the same for the following languages
- English
- Bengali
- Please ensure that you have downloaded the unzipped datasets in a
data
directory with the name of the language that you want to analyse in the code. - Install the required required dependencies with
pip install -r requirements.txt
- Run the code file for the particular assignment from the root of the project. For example,
python assignment2/english.py
.