# iCOR on MEP notebook
This notebook shows how to use iCOR on the MEP platform via a Jupyter notebook. The input products must be made available on the MEP platform. Processing is done on the Hadoop cluster of the MEP platform and the outputs are made available under your MEP private folder (/data/users/Private/<username>/icor_results).

## step 1: specify the input products
These input products can be either in your Probav-V MEP Public or Private folder (https://proba-v-mep.esa.int/documentation/manuals/how-get-data-mep), because they need to be accessible from the Hadoop processing nodes. Change the defaults provided below to process your own input files.

In [1]:
products = ['/data/users/Private/daemsd/icor_input/S2A_20170527T102031_T33UUB_20170527T102301.SAFE/MTD_MSIL1C.xml',
           '/data/users/Private/daemsd/icor_input/S2A_20170822T101031_T32TQN_20170822T101057.SAFE/MTD_MSIL1C.xml']

## step 2: specify iCOR parameters
Specify the iCOR parameters, change them if needed.

In [2]:
icor_params = {
  # input data type: 'S2' or 'L8'
  "data_type" : 'S2',
  # band to apply cloud low threshold (zero based): 'B01','B02','B03','B04','B05','B06','B07','B08','B8A','B09','B10','B11' or 'B12'
  "low_band" : 'B01',
  # water detection band id (zero based): 'B01','B02','B03','B04','B05','B06','B07','B08','B8A','B09','B10','B11' or 'B12'
  "water_band" : 'B08',
  # water detection threshold
  "water_threshold" : '0.05',
  # upper threshold with average in the visual bands to be detected as cloud. Range 0.0 .. 1.0.
  "average_threshold" : '0.19',
  # low band threshold to be detected as cloud. Range 0.0 .. 1.0.
  "low_threshold" : '0.25',
  # apply AOT retrieval algorithm
  "aot" : 'true',
  # square window size in pixels to perform aot estimation
  "aot_window_size" : '100',
  # AOT override values
  "aot_override" : '0.1',
  # use cirrus band for cloud detection: true or false
  "cirrus" : 'true',
  # cloud mask threshold value. Range 0.0 .. 1.0
  "cirrus_threshold" : '0.01',
  # apply adjacency correction: true or false
  "simec" : 'true',
  # apply watervapor estimation: true or false
  "watervapor" : 'false',
  # WATERVAPOR override value. Range 0.0 .. 5.0.
  "watervapor_override" : '0.1',
  # default background window size
  "bg_window" : '1',
  # OZONE override values
  "ozone_override" : '0.33',
  "keep_intermediate" : "false"
}

## step 3: setup iCOR
This step creates a method which will be used to apply iCOR on each input product. Normally, you should not change the code below.

In [3]:
def process_product(product, params):
    
    import ConfigParser
    import logging
    import icor.landsat8
    import icor.sentinel2
    import getpass
    import os, errno
    import traceback

    # setup logging
    logger = logging.getLogger("py4j")
    logger.setLevel(logging.INFO)
    # avoid adding multiple handlers which would cause one message to be printed multiple times
    logger.handlers[0] = logging.StreamHandler()

    logging.root = logger
    logging.Logger.manager.root = logger

    conf = ConfigParser.SafeConfigParser()
    try:
        icor_dir = str(os.environ['ICOR_DIR'])
        logger.info('Using iCOR dir %s', icor_dir)
    except Exception:
        icor_dir = '/data/icor/v1.0.0'
        os.environ['ICOR_DIR'] = icor_dir
        logger.info('Using default iCOR dir %s', icor_dir)

    # set GDAL data dir, otherwise images are not projected
    os.environ['GDAL_DATA'] = '/opt/gdal2/share/gdal'

    conf.set("DEFAULT", "install_dir", icor_dir)

    if icor_params.get("data_type") == "L8":
        conf.read(icor_dir + "/src/config/local_landsat8_simec.ini")
    elif icor_params.get("data_type") == "S2":
        conf.read(icor_dir + "/src/config/local_sentinel2_simec.ini")

    parent_output_dir = "/data/users/Private/" + getpass.getuser() + "/icor_results/"
    try:
        os.makedirs(parent_output_dir)
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise

    product_name = os.path.basename(os.path.dirname(product))
    output_dir = os.path.join(parent_output_dir, product_name)
            
    params["keep_intermediate"] = "false"
    params["output_file"] = output_dir

    # convert to params for context
    context = icor.context.SimpleContext(params, logger=logger)

    for param, value in conf.items("DEFAULT"):
        params[param] = value

    for section in conf.sections():
        for param, value in conf.items(section):
            params[section + "_" + param] = value

    try:
        working_folder = os.getcwd()
        if context["instrument"] == "landsat8":
            if context["workflow"] == "simec":
                icor.landsat8.process_tgz(context, product, working_folder)
            else:
                raise ValueError("Unknown 'instrument'")
        elif context["instrument"] == "sentinel2":
            if context["workflow"] == "simec":
                icor.sentinel2.process_tar(context, product, working_folder)
            else:
                raise ValueError("Unknown 'instrument'")
        else:
            raise ValueError("Unknown 'workflow'")
            
        return output_dir
    except:
        logger.error(traceback.format_exc())

## step 4: initialize Spark and start the job
This step will first initialize the Spark context and then start the job. Of course, this can take a while, depending on the number of products you would like to process.

In [4]:
from pyspark import SparkContext
from pyspark import SparkConf

# Setup the Spark cluster
conf = SparkConf()
conf.set('spark.yarn.executor.memoryOverhead', 1024)
conf.set('spark.executor.memory', '6g')
conf.set('spark.executor.cores', '1')
conf.set('spark.executor.instances', 2)

sc = SparkContext(appName='icor-mep', conf=conf)

try:  
    productsRDD = sc.parallelize(products)
    outputs = productsRDD.map(lambda product: process_product(product, icor_params)).collect()
    print 'Output files can be found here:\n'
    print outputs
finally:
    sc.stop()

Output files can be found here:

['/data/users/Private/daemsd/icor_results/S2A_20170527T102031_T33UUB_20170527T102301.SAFE', '/data/users/Private/daemsd/icor_results/S2A_20170822T101031_T32TQN_20170822T101057.SAFE']


AttributeError: 'SparkContext' object has no attribute 'close'