Created on Thu Dec 16 15:41:50 2021

@author: bwijers1


The aim of this script is to show a variable working example of the pvol_vp_processor used in the pipeline

In general the workflow is as follows:

    1. Retrieve / Identify input
        a. A list of items to be processed. For the sake of reducting of
        complexity the assumption is being made that files are valid ODIM .h5
        polar volumes
    2. Run neccesary integrity checks
        a. Lack of parameters
        b. Corruption of source file
        c. Format of file, does it require format conversion pre-processing?
        d. repairing broken meta-data
    3. Apply naming convention according to Theoretical Computational Ecology
    (TCE) group
        a. {CORAD}_{TYPE}_{DATE}T{TIME}_WMOCODE.EXTENSION
            1. CORAD
                1. COuntry : UK / NL / DE / BE / ..
                2. RAdar:
            2. TYPE : Polar Volume (PVOL) or Vertical Profile (VP)
            3. DATE : %Y%m%d
            4. TIME : %H%M
            5. WMOCODE : Reference radar code, part of ODIM system
            6. EXTENSION : .h5 (hdf5)
    4. Process PVOL -> VP
        a. Makes a call to vol2bird (.C) through subprocess
    5. Upload results to S3

For the example, a test dataset with test credentials are used. This is to
provide a working example to see the interaction of components. In reality,
there are many ways of calling this script and providing input data. However,
these should not be part of a minimal example due to being pre-processors.

Therefore, for this situation a s3 UvA hosted dataset will be used. The output
will also be written to S3 buckets as that is where our data will end up.

Furthermore, command line arguments have largely been removed, as for the example
everything except input and output locations will be pre-set. In reality, a command line argument is added
to provide flexibility in processing parameters.

Lastly, many specific checks have been redacted from the script. Purely because
I can reduce the complexity of the script by preparing a dataset to be converted.


In [1]:
from tools.cluster_exceptions import NoInputFilesFound, UnsupportedInputFile
from tools.countries import PVOL_PROCESSING_POSSIBLE_COUNTRIES, determine_country
from tools.helpers import load_radar_db

from tools.minio_tools import (
    list_bucket_objects,
    download_and_uncompress_bucket_objects,
    get_minio_client,
)
from tools import fops, strops, helpers

# For exiting help pages with ctrl-c
from signal import signal, SIGINT

from tools.vol2bird import v2bo, print_v2b_options, check_v2b_options


ModuleNotFoundError: No module named 'tools'

# Setup
## Return version of program, for welcome message


In [2]:
# Return arguments for this run

def get_script_args():
    """Creates the argument parser, configure it and return the parsed args"""
    parser = helpers.get_base_argument_parser(
        description="PVOL to VP (example) processing"
    )
    parser.add_argument(
        "-v",
        metavar="-verbose",
        type=str,
        help="Control verbosity: [CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET]",
        required=False,
        default="NOTSET",
    )
    
    parser.add_argument(
        "-l",
        metavar="-localdir",
        type=str,
        help="Local processing directory. Dry run controls if this data is sent to remote or kept local. Default: $TMPDIR",
        required=False,
        default=os.environ.get("TMPDIR"),
    )

    parser.add_argument(
        "-w",
        metavar="--workers",
        type=int,
        help="Number of cpu cores to use. Default: all",
        required=False,
        default=int(mp.cpu_count()),
    )

    parser.add_argument(
        "-d",
        metavar="--dry",
        type=helpers.str2bool,
        help="Should this be a Dry run or not. Dry run does NOT output to server.",
        required=False,
        default=False,
    )

    parser.add_argument(
        "-x",
        type=str,
        help="Output pvol bucket Default: pvol",
        required=False,
        default="pvol",
    )

    parser.add_argument(
        "-y",
        type=str,
        help="Output vp bucket Default: verticalprofilepredb",
        required=False,
        default="verticalprofilepredb",
    )
    

    parser.add_argument(
        "-r", type=str, help="Project name", required=False, default=None
    )

    # Passing configurational options like this actually sends over as a json. Json outputs as dict which will make it work again
    parser.add_argument(
        "-c",
        type=str,
        help="Configuration [vol2bird] {PARAM1 [STR] : VAL1 [INT], PARAM2 : VAL2 [FLOAT]} \n"
        '\'{"RANGEMIN" : 35000, "RANGEMAX" : 100000}\'',
        required=False,
        default=None,
    )

    parser.add_argument(
        "-z",
        help="Print all available vol2bird configuration options",
        action="store_true",
    )

    parser.add_argument(
        "-u",
        type=str,
        help="Configuration [vol2bird] name",
        required=False,
        default=None,
    )

    parser.add_argument(
        "-s", type=str, help="pvol Setting name", required=False, default=None
    )
    return parser.parse_args()


repository_revision = helpers.get_git_revision_hash()
args = get_script_args()
print(args)
project = args.r
pvol_setting = args.s
vp_setting = (
    args.u
)

NameError: name 'helpers' is not defined