# Example: Set Collection and Process Tiles

### Some Assumptions You Should Know

For the convenience and consistency of making the framework, the folders of raw data to be processed into collections should be organized in the following ways:
Please refer to [this example](../uab_collectionFunctions.py.py) for more details.
- Server (some NAS shared by the group)
    - [Dataset Name 1] (name of the ds, could be 'inria', 'isprs', 'urban_mapper', recommend to have no space in the name)
        - data (all the raw and processed files associate with this dataset)
            - Original_Tiles (the raw data, including rgb data as well as ground truth)
                - Naming rule for files: [CityName]\_[TileName]\_[fileType].[extension]
                - e.g. Austin_1_RGB.tif
            - Processed_Tiles (Directory for all preprocessed tiles of this dataset, organized by folder)
                - [preproc_result]
                    - [TileName]\_[preprocExtension].[extension]
                
        - meta_data
            - collection.txt (file that is updated each time a new channel is made using preprocessing)
            - colTileNames.txt (file that contains the name of each tile in the collection without extensions)
            - mean_values.npy (file that contains the mean value of each channel)
        - collectionMeta.txt (a user-made file that specifies information of interest about this collection (e.g., data-resolution))
    - [Dataset Name 2]
    - ...
    - [Dataset Name N]

### Use the Framework in Your Settings

**A. Change [uabRepoPaths.py](../uabRepoPaths.py)**

In [1]:
import os
# this example is relevant if you have 1 folder on 1 machine that has both the data and the 
# results.  Your setup may be different
parentDir = r'/media/ei-edl01/data/remote_sensing_data/'
dataPath = parentDir
resPath = os.path.join(parentDir, 'Results')

`parentDir` is where you have all the raw files from those dataset organized in our rules defined above. We already made some collections in `/ei-edl01/remote_sensing_data/` as you can see above. `dataPath` and `resPath` is where you have the data and the results (including extracted patches and experiment results). It is recommended to have the `resPath` as a local path. In training you want to have the training data as close to the GPU as possible. 

**B. Make Collection**

Here is an example script:

In [6]:
import uab_collectionFunctions
import danielCustom.uabPreprocClasses
import uabPreprocClasses

blCol = uab_collectionFunctions.uabCollection('inria_orgd')
opDetObj = danielCustom.uabPreprocClasses.uabOperTileDiffRescale(13, 7)
rescObj = uabPreprocClasses.uabPreprocMultChanOp(
    [1,2,3], 'RDIFF.tif' , 'Linearly rescale difference between R & B', [1, 2],opDetObj)
rescObj.run(blCol)



'/media/ei-edl01/data/remote_sensing_data/inria_orgd/data/TilePreproc/MultChanOp_chans1-2_DiffResc_rF13p000_rB7p000'

The first line `blCol = uab_collectionFunctions.uabCollection('inria_orgd')` makes a new collection reading data from `/media/ei-edl01/data/remote_sensing_data/inria_orgd`. The second and third line defines an linear operation on channels `[1,2]` (the second and third channel). Where the functions are defined as follows: 

In [5]:
class uabOperTileOps(object):
    def __init__(self, defName):
        self.defaultName = defName
        
    def getName(self):
        raise NotImplementedError('Must be implemented by the subclass')
    
    def run(self, tiles):
        raise NotImplementedError('Must be implemented by the subclass')

class uabOperTileDiffRescale(uabOperTileOps):
    def __init__(self, rescFact, rescBias, defName = 'DiffResc'):
        super(uabOperTileDiffRescale, self).__init__(defName)
        self.rescFact = rescFact
        self.rescBias = rescBias
    
    def getName(self):
        return '%s_rF%s_rB%s' % (self.defaultName, util_functions.d2s(self.rescFact,3), util_functions.d2s(self.rescBias,3))
    
    def run(self, tiles):
        return self.rescFact * (tiles[1] - tiles[0]) + self.rescBias

To make your own uab operations, you can write a customized class inherit from uabOperTileOps, make sure you overwrite `getName()` and `run()` 

After finishing making collections, you can see the meta data of the collection by calling function `readMetadata()`. It will display a list of existed channels in this dataset, with their idex at the begining of each line and extensions at the end of each line.

In [7]:
blCol.readMetadata()

Description:  these are all the preprocessed tiles available for this dataset.  Use the indexes output on the start of each line to select this tile-type when going to patch extraction in the following step
[0] Original Layer 0: Original_Tiles, [ext: GT.tif]
[1] Channel RGB Layer 0: TilePreproc/TileChanSplit_chan0, [ext: RGB0.tif]
[2] Channel RGB Layer 1: TilePreproc/TileChanSplit_chan1, [ext: RGB1.tif]
[3] Channel RGB Layer 2: TilePreproc/TileChanSplit_chan2, [ext: RGB2.tif]
[4] Linearly rescale difference between R & B: TilePreproc/MultChanOp_chans1-2_DiffResc_rF13p000_rB7p000, [ext: RDIFF.tif]
