# Introduction to PyTerrier

_IN4325: Information retrieval lecture, TU Delft_

**Part 1: Setup**

[Terrier](http://terrier.org) is an open-source information retrieval platform aimed at reserach and experimentation. We'll use [PyTerrier](https://pyterrier.readthedocs.io/), which provides a Python API for Terrier, in this lecture. This series of notebooks gives a brief introduction to PyTerrier.

## Installation

PyTerrier can be installed using `pip`:


In [None]:
pip install python-terrier

You may want to consider using virtual environments, such as [`venv`](https://docs.python.org/3/library/venv.html) or [`conda`](https://www.anaconda.com/download). You'll also need an up-to-date version of the [Java development kit](https://www.oracle.com/java/technologies/downloads/) installed and the `JAVA_HOME` environment variable set. More detailed installation instructions and troubleshooting can be found [here](https://pyterrier.readthedocs.io/en/latest/installation.html).

Now you should be able to import `pyterrier`:


In [None]:
import pyterrier as pt

## Configuration

As PyTerrier uses Terrier under the hood, we need to initially load the corresponding Java package. In addition, we can set PyTerrier up to show progress bars in Jupyter notebooks correctly:


In [None]:
if not pt.started():
    pt.init(tqdm="notebook")

## A test run

Time to test our setup! PyTerrier provides support for loading and indexing a large number of IR datasets (more on that later). Let's load the [ANTIQUE](https://arxiv.org/abs/1905.08957) dataset:


In [None]:
dataset = pt.get_dataset("irds:antique")

Now we can print one of the documents in the corpus:


In [None]:
from pprint import pprint

for doc in dataset.get_corpus_iter():
    pprint(doc)
    break

If you see a document above now: Congratulations! The setup was successful. If not: Take a look at the [troubleshooting section](https://pyterrier.readthedocs.io/en/latest/installation.html#installation-troubleshooting) in the official documentation.
