In [None]:
import sys
sys.path.append('../')

# Diving into scans-preprocessing with RadIO

## Quick reminder

Hello again! This is the second tutorial in the series, dedicated to the lung cancer research with RadIO. In the [first notebook](link-on-first) we talked about using RadIO to create a `Dataset` of scans from [LUNA16 competition dataset](https://luna16.grand-challenge.org/). In short, `Dataset` simplifies operating with large datasets that cannot fit in memory (see more [here](link-on-dataset)). Setting up a `Dataset` takes only several lines of code:

In [None]:
from radio.dataset import FilesIndex, Dataset
from radio import CTImagesMaskedBatch

LUNA_MASK = '/notebooks/data/MRT/luna/s*/*.mhd'                            # set glob-mask for scans from Luna-dataset here
luna_index = FilesIndex(path=LUNA_MASK, no_ext=True)                       # preparing indexing structure
luna_dataset = Dataset(index=luna_index, batch_class=CTImagesMaskedBatch)

We've also seen how easy it is to build simple preprocessing pipelines, that include `load` of data from disk and `resize` of scans to differrent shape:

In [None]:
from radio.dataset import Pipeline
preprocessing = (Pipeline()                      # initialize empty workflow
                 .load(fmt='raw')                # add load of scans from MetaImage to the workflow
                 .resize(shape=(92, 256, 256)))  # add resize to a shape to the workflow. Nothing is computed here,
                                                 # the whole thing is lazy!

..and generate a batch with 3 loaded and resized scans: 

In [None]:
batch = (luna_dataset >> preprocessing).next_batch(3)  # pass a batch of luna-scans of size 3 through the workflow 

In this tutorial we are diving deeper into preprocessing with RadIO. We will cover actions that allow to considerably augment Luna-dataset. `sample_nodules` - action, that samples cancerous/non-cancerous scan-crops, `unify_spacing`, `rotate`, `central_crop` are among actions, that help to perform augmentation. What's more, we will cover actions `create_mask` and `fetch_nodules_info`, that will help you to transform with ease [Luna cancer annotations](link-on-annots) into *cancerous masks*, **target (Y)** for segmenting nets (think of [Vnet](Vnet-link)). In short, after reading this tutorial you will be able to prepare a large and augmented dataset of crops for training [Vnet](Vnet-link).

### Augmentation of Luna

### `unify_spacing`: alternative to `resize`

### some more augmenting actions

### preparing a target-tensor for a segmenting net