## Getting started with RadIO

Welcome to the first notebook in series of tutorials covering deep learning research of lung cancer using RadIO. Here you will learn about (1) general approach of RadIO when working with dataset of scans and (2) perform basic preprocessing operations (load of data into memory and resize of scans). As you will see in a minute, the (1) will save you the trouble of looping through the dataset that cannot fit into memory. In turn, learning (2) will give you a good idea of how one can build complex preprocessing pipelines using RadIO.

In this tutorial you will work with [LUNA16 competition dataset](https://luna16.grand-challenge.org/), consisting of 888 cancer-annotated examples of scans of Computational Tomography (CT-scans). Annotations is simply a csv-table that contains location and diameter of cancerous nodules. In order to follow the tutorial, we advice you to download at least part of this dataset on [this link](https://luna16.grand-challenge.org/download/) (you will need a registration).

Alternatively, you can download a small set of scans along with cancer-annotations [here](some-dropbox-link). The archive takes no more than 500 MB and is just enough to get started with RadIO.

### 1. Working with dataset of scans with ease

The very first thing that one should understand about CT-scans is that they are **voluminous**. It is not possible to fit more than a few scans into memory. The only way to work with such large datasets is through an **indexing structure** (think of ...). In RadIO, you can create this structure in two lines of code:

##### Note
RadIO works with datasets using the *Dataset-framework.* Check [the link](https://github.com/analysiscenter/dataset) to find out more.

In [2]:
import sys
sys.path.append('../')

In [3]:
LUNA_MASK = 'D:/SCANS/v-20161117/*'  # set mask for scans from Luna-dataset here
from radio.dataset import FilesIndex
luna_index = FilesIndex(path=LUNA_MASK, dirs=True) # ,no_ext=True) # preparing indexing structure

RadIO works with scans using *batch-classes* **`CTImagesBatch`** and **`CTImagesMaskedBatch`**, which describe the logic of processing batches of CT-scans (`load` of data from disk, `resize` of scans to different shape). When we combine data processing logic and the index, we obtain the `dataset`:

In [4]:
from radio.dataset import Dataset
from radio import CTImagesMaskedBatch
luna_dataset = Dataset(index=luna_index, batch_class=CTImagesMaskedBatch)

..and that's it, all necessary prerequisities are set up. It is time to get your hands on real data-processing operations of RadIO: `load` and `resize`.

### 2. Preprocessing with RadIO

RadIO-team thinks of preprocessing as of chained sequence of operations, called `actions`. A sequence of actions is called `Pipeline`. Whichever preprocessing you want to do, you should start with setting up a pipeline in **lazy-mode**. That is, you only provide a description of what will happen with data and no real computation is performed.

First and foremost, you need to load scans data from disk. Let us set up a short pipeline for this task.

In [18]:
from radio.dataset import Pipeline
preprocessing = Pipeline() # initialize empty Pipeline-object
preprocessing = preprocessing.load(fmt='raw') # the thing is lazy. No calculation is performed here!

What we did was to assemble a `Pipeline` from one action `load` with arguments `fmt="raw"`. Argument `fmt` specifies the format of data and `raw` stands for **MetaImage-format** Luna-dataset is stored in.

Real calculation starts only when we pass a dataset through a pipeline.

* Pic from Anya goes here

E.g., you can generate a batch of 3 scans with loaded data with this code: 

In [None]:
batch = (luna_dataset >> preprocessing).next_batch(batch_size=3)

RadIO includes the following actions:
* load (data from disk)
* resize (change shape of scans to, say, **[128, 256, 256]**)
* unify_spacing (reshape scans so that spacings between different scans be the same: say, **[1.0, 1.0, 1.0]** mms)
* dump (possibly preprocessed scans-data on disk)
* ..and many more

Go [here](link-on-preprocessing-page) to see the full list