# Introduction to PyTerrier
__Part 1: Setup__

[Terrier](http://terrier.org) is an open-source information retrieval platform aimed at reserach and experimentation. We'll use [PyTerrier](https://pyterrier.readthedocs.io/), which provides a Python API for Terrier, in this lecture. This series of notebooks gives a brief introduction to PyTerrier.

## Installation
PyTerrier can be installed using `pip`:

In [1]:
pip install python-terrier

Note: you may need to restart the kernel to use updated packages.


You may want to consider using virtual environments, such as [`venv`](https://docs.python.org/3/library/venv.html) or [`conda`](https://www.anaconda.com/download). You'll also need an up-to-date version of the [Java development kit](https://www.oracle.com/java/technologies/downloads/) installed and the `JAVA_HOME` environment variable set. More detailed installation instructions and troubleshooting can be found [here](https://pyterrier.readthedocs.io/en/latest/installation.html).

Now you should be able to import `pyterrier`:

In [2]:
import pyterrier as pt

## Configuration
As PyTerrier uses Terrier under the hood, we need to initially load the corresponding Java package. In addition, we can set PyTerrier up to show progress bars in Jupyter notebooks correctly:

In [3]:
if not pt.started():
    pt.init(tqdm="notebook")

PyTerrier 0.10.0 has loaded Terrier 5.8 (built by craigm on 2023-11-01 18:05) and terrier-helper 0.0.8



## A test run
Time to test our setup! PyTerrier provides support for loading and indexing a large number of IR datasets (more on that later). Let's load the [ANTIQUE](https://arxiv.org/abs/1905.08957) dataset:

In [4]:
dataset = pt.get_dataset("irds:antique")

Now we can print one of the documents in the corpus:

In [5]:
from pprint import pprint

for doc in dataset.get_corpus_iter():
    pprint(doc)
    break

antique documents:   0%|          | 0/403666 [00:00<?, ?it/s]

[INFO] Please confirm you agree to the authors' data usage agreement found at <https://ciir.cs.umass.edu/downloads/Antique/readme.txt>
[INFO] If you have a local copy of https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt, you can symlink it here to avoid downloading it again: /home/steve/.ir_datasets/downloads/684f7015aff377062a758e478476aac8
[INFO] [starting] https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt

https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 0.0%| 0.00/93.6M [00:00<?, ?B/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 0.0%| 24.6k/93.6M [00:00<07:15, 215kB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 0.1%| 73.7k/93.6M [00:00<04:36, 338kB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 0.2%| 156k/93.6M [00:00<03:13, 483kB/s] [A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 0.3%| 287k/93.6M [00:00<02:18, 676kB/s][A
ht

https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 46.4%| 43.4M/93.6M [00:31<00:36, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 47.1%| 44.1M/93.6M [00:31<00:35, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 47.9%| 44.8M/93.6M [00:32<00:35, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 48.6%| 45.5M/93.6M [00:33<00:34, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 49.3%| 46.2M/93.6M [00:33<00:34, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 50.1%| 46.9M/93.6M [00:34<00:33, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 50.8%| 47.6M/93.6M [00:34<00:33, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 51.6%| 48.3M/93.6M [00:34<00:32, 1.38MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 52.3%| 48.9M/93.6M [

https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 98.6%| 92.3M/93.6M [01:11<00:01, 1.29MB/s][A
https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: 99.3%| 93.0M/93.6M [01:13<00:00, 1.27MB/s][A
[A                                                                                                           [INFO] [finished] https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: [01:14] [93.6MB] [1.26MB/s]

https://ciir.cs.umass.edu/downloads/Antique/antique-collection.txt: [01:14] [93.6MB] [1.26MB/s][A
                                                                                               [A

{'docno': '2020338_0',
 'text': 'A small group of politicians believed strongly that the fact that '
         'Saddam Hussien remained in power after the first Gulf War was a '
         'signal of weakness to the rest of the world, one that invited '
         'attacks and terrorism. Shortly after taking power with George Bush '
         'in 2000 and after the attack on 9/11, they were able to use the '
         'terrorist attacks to justify war with Iraq on this basis and '
         'exaggerated threats of the development of weapons of mass '
         'destruction. The military strength of the U.S. and the brutality of '
         "Saddam's regime led them to imagine that the military and political "
         'victory would be relatively easy.'}


If you see a document above now: Congratulations! The setup was successful. If not: Take a look at the [troubleshooting section](https://pyterrier.readthedocs.io/en/latest/installation.html#installation-troubleshooting) in the official documentation.