# Working with the *Vortex* provider and "stacks"

In [1]:
%load_ext ivortex
%vortex tmpcocoon
# Fake MTOOL directory for demonstration purposes
e.MTOOLDIR=e.HOME
# Create the archive on localhost (for test purposes)
e.VORTEX_DEFAULT_STORAGE='localhost'
# Decrease footrints.collectors verbosity in order to have a better presentation
fp.collectors.logger.setLevel('ERROR')

# [2020/12/10-19:41:00][vortex.sessions][_set_rundir:0155][INFO]: Session <root> set rundir </home/meunierlf/vortex-workdir/auto_cocoon_58wehh2k>


Vortex 1.7.2 loaded ( Thursday 10. December 2020, at 19:40:59 )
The working directory is now: /home/meunierlf/vortex-workdir/auto_cocoon_58wehh2k/root


Sadly, it is sometime needed to archive small files (such as listings). Since such files usually weight a few megabytes, they are poorly handled by data handling systems such as *HPSS*. Consequently, a recurring requested feature is the ability to gather such files in a single file (like a *tar* file) and to archive them as whole. The "stack" mechanism was designed to address that, however it comes with some drawbacks:

* Since the small files are produced in several tasks, there is a need to manually archive the "stack" when all the files part of the "stack" are available. This is the responsibility of the user to trigger this archiving at the appropriate time.
* If a task is restarted, the archived "stack" may become inconsistent (unless it is archived again).

A typical workflow when working with "stacks" is:

* The stackable resources are placed in cache using a dedicated namespace called ``vortex.stack.fr``. (__Technical Note__: for a resource to be stackable, its Python class needs to be modified) ;
* From now and on, the stacked resources are accessible indiscriminately through namespaces ``vortex.stack.fr`` or ``vortex.cache.fr``. This ensures compatibility with existing resources and/or un-stacked resources ;
* At some point, the stacked resources can be archived ;
* From now and on, the individual stacked resources are accessible indiscriminately through standard Vortex namespaces such as ``vortex.archive.fr`` or ``vortex.multi.fr``. (__Technical note__: If the resource needs to be fetched from the archived, the whole stack is retrieved and the desired resource is extracted from it. When using ``vortex.multi.fr``, the whole stack is cached for future use.) ;

This will be demonstrated below:

## Creating some fake listing files (for demonstration purposes)

In [2]:
for i in range(3):
    with open('local_listing.{:04d}'.format(i), 'w') as fh_l:
        fh_l.write("This is listing number #{:d}".format(i))

## Appending the listings on the "stack"

Let's imagine that a forecast task produced a listing for member number 0:

In [3]:
rh_l0s = toolbox.output(
    now=True, role='Listing',
    # Resource
    kind='listing',
    task='forecast',
    model='arpege',
    date='2020120100',
    cutoff='production',
    # Provider
    experiment='ABCD@meunierlf',
    block='forecast',
    member=0,
    namespace='vortex.stack.fr',
    # Container
    local='local_listing.[member%04d]',
)

# New output section with options :
+ role         = Listing

# Resource handler description :
+ block        = forecast
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ kind         = listing
+ local        = local_listing.[member%04d]
+ member       = 0
+ model        = arpege
+ namespace    = vortex.stack.fr
+ task         = forecast

# This command options :
+ complete     = False
+ loglevel     = None
+ now          = True
+ verbose      = True


----------------------------------------------------------------------------------------------------
# Resource no 01/01                                                                                #
----------------------------------------------------------------------------------------------------
01. <vortex.data.handlers.Handler object at 0x7f8de9350a10>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8e004c0f50 | path='local_listing.0000'>
  Provider  : <vor

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1251][INFO]: incacheput to vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb000/forecast/listing.arpege-forecast.all (from: local_listing.0000)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1260][INFO]: incacheput insert rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb000/forecast/listing.arpege-forecast.all



# ----  Result from put: [True]  ---- #


__Technical note__: The URI generated by the resource handler looks familiar except for the additional `stackfmt` and `stackpath` query items that describe in which "stack" the resource will be inserted. `stackfmt` and `stackpath` are computed based on the resource' class that needs to define a `stackedstorage_resource` method. It will return a dictionary that will provide the necessary arguments to build a new resource describing the "stack" itself. From this resource describing the "stack", `stackfmt` and `stackpath` will be generated using the Vortex' name builder.

Let's have a look to the *locate* method result:

In [4]:
rh_l0s[0].locate()

'/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb000/forecast/listing.arpege-forecast.all'

The "stack" is stored in a predefined `stacks` block and its identified by its name (computed by the Vortex' name builder. See the technical note above). Below these subdirectories, the usual Vortex cache layout is used (which allows us to stack an arbitrary number of listings for various members, blocks and tasks).

The "stack" has a name (defined based on the Vortex code) ; this allows for some flexibility since, all the listings (and more generally log files) can be stored and archived in the "flow_logs" stack; while other kind of resources could be stacked separately (provided that a dedicated "stack" resource is defined for them).

For demonstration purposes, let's suppose that members 1 and 2 also produce a listing file:

In [5]:
rh_lothers = toolbox.output(
    now=True, role='Listing',
    # Resource
    kind='listing',
    task='forecast',
    model='arpege',
    date='2020120100',
    cutoff='production',
    # Provider
    experiment='ABCD@meunierlf',
    block='forecast',
    member=[1, 2],
    namespace='vortex.stack.fr',
    # Container
    local='local_listing.[member%04d]',
)

# New output section with options :
+ role         = Listing

# Resource handler description :
+ block        = forecast
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ kind         = listing
+ local        = local_listing.[member%04d]
+ member       = [1, 2]
+ model        = arpege
+ namespace    = vortex.stack.fr
+ task         = forecast

# This command options :
+ complete     = False
+ loglevel     = None
+ now          = True
+ verbose      = True


----------------------------------------------------------------------------------------------------
# Resource no 01/02                                                                                #
----------------------------------------------------------------------------------------------------
01. <vortex.data.handlers.Handler object at 0x7f8de9235390>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8de922cf10 | path='local_listing.0001'>
  Provider  :

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1251][INFO]: incacheput to vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all (from: local_listing.0001)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1260][INFO]: incacheput insert rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all



# ----  Result from put: [True]  ---- #

----------------------------------------------------------------------------------------------------
# Resource no 02/02                                                                                #
----------------------------------------------------------------------------------------------------
02. <vortex.data.handlers.Handler object at 0x7f8de9235610>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8de922cbd0 | path='local_listing.0002'>
  Provider  : <vortex.data.providers.VortexFreeStd object at 0x7f8de922c690 | namespace='vortex.stack.fr' block='forecast'>
  Resource  : <common.data.logs.Listing object at 0x7f8de922c910 | model='arpege' date='2020-12-01T00:00:00Z' cutoff='production' task='forecast' part='all' binary='arpege'>

# ----  Action PUT on vortex://vortex-free.stack.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/mb002/forecast/listing.arpege-forecast.all?stackfmt=filespack&stackpath=play%2F

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1251][INFO]: incacheput to vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb002/forecast/listing.arpege-forecast.all (from: local_listing.0002)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheput:1260][INFO]: incacheput insert rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb002/forecast/listing.arpege-forecast.all



# ----  Result from put: [True]  ---- #


## Accessing the stacked resources (transparently)

As mentioned in the introductory description, the stacked resources can be accessed transparently using the usual `vortex.cache.fr` namespace. For example, to access the member one listing:

In [6]:
rh_l1s = toolbox.input(
    now=True, role='Listing',
    # Resource
    kind='listing',
    task='forecast',
    model='arpege',
    date='2020120100',
    cutoff='production',
    # Provider
    experiment='ABCD@meunierlf',
    block='forecast',
    member=1,
    namespace='vortex.cache.fr',
    local='fromcache_listing.[member%04d]',
)

# New input section with options :
+ role         = Listing

# Resource handler description :
+ block        = forecast
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ insitu       = False
+ kind         = listing
+ local        = fromcache_listing.[member%04d]
+ member       = 1
+ model        = arpege
+ namespace    = vortex.cache.fr
+ task         = forecast

# This command options :
+ complete     = False
+ loglevel     = None
+ now          = True
+ verbose      = True


----------------------------------------------------------------------------------------------------
# Early-get for all resources.                                                                     #
----------------------------------------------------------------------------------------------------


# [2020/12/10-19:41:00][vortex.toolbox][add_section:0364][INFO]: Early-get was unavailable for all of the resources.



----------------------------------------------------------------------------------------------------
# Resource no 01/01                                                                                #
----------------------------------------------------------------------------------------------------
01. <vortex.data.handlers.Handler object at 0x7f8e004a8250>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8de9289b10 | path='fromcache_listing.0001'>
  Provider  : <vortex.data.providers.VortexFreeStd object at 0x7f8de926c810 | namespace='vortex.cache.fr' block='forecast'>
  Resource  : <common.data.logs.Listing object at 0x7f8de8a941d0 | model='arpege' date='2020-12-01T00:00:00Z' cutoff='production' task='forecast' part='all' binary='arpege'>

# ----  Action GET on vortex://vortex-free.cache.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/mb001/forecast/listing.arpege-forecast.all?stackfmt=filespack&stackpath=play%2Fsandbox%2FABCD%40meunierlf%2F20201201

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.cache-mt.fr//play/sandbox/ABCD@meunierlf/20201201T0000P/mb001/forecast/listing.arpege-forecast.all (to: fromcache_listing.0001)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all (to: fromcache_listing.0001)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1245][INFO]: incacheget retrieve rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all



# ----  Result from get: [True]  ---- #


The `locate` method shows that the `vortex.cache.fr` namespace is associated with multistore that makes stacked resources retrieval painless (the user doesn't need to know if the resource is stored on a "stack" or not):

In [7]:
rh_l1s[0].locate()

'/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/mb001/forecast/listing.arpege-forecast.all;/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all'

## Archiving the whole "stack"

A dedicated `toolbox` method has been created for that matter. It just needs to be provided with the "stack" resource and provider description. (Note: the `block` attribute is omitted sign it is always equal to `stacks` when working with "stack" resources).

In [8]:
rh_refills = toolbox.stack_archive_refill(
    kind='flow_logs',
    date='2020120100',
    cutoff='production',
    experiment='ABCD@meunierlf',
    namespace='vortex.multi.fr',
)

# Archive Refill Ressource+Provider description :
+ block        = stacks
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ kind         = flow_logs
+ namespace    = vortex.multi.fr


----------------------------------------------------------------------------------------------------
# Resource no 01/01                                                                                #
----------------------------------------------------------------------------------------------------


# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.cache-mt.fr//play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack (to: archive_refills/f236bb7d572c4cd5b0f8fe5f5c36ea02)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1245][INFO]: incacheget retrieve rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack
# [2020/12/10-19:41:00][vortex.data.abstractstores][_verbose_log:0146][INFO]: Skip this "<class 'vortex.data.stores.VortexCacheMtStore'>" store because an archive is requested
# [2020/12/10-19:41:00][vortex.data.abstractstores][_load_config:1003][INFO]: Some store configuration data is needed (for vortex://vortex-free.archive-legacy.fr)
# [2020/12/10-19:41:00][vortex.data.abstractstores][_load_config:1006][INFO]: Reading config file: @store-vortex-free.ini
# [2020/12/10-19:41:00][vortex.data.abstractstores][_load_config:1017][INFO]: 

01. <vortex.data.handlers.Handler object at 0x7f8e004d11d0>
  Complete  : True
  Container : <vortex.data.containers.Uuid4UnamedSingleFile object at 0x7f8e004d1250 | path='archive_refills/f236bb7d572c4cd5b0f8fe5f5c36ea02'>
  Provider  : <vortex.data.providers.VortexFreeStd object at 0x7f8de928c310 | namespace='vortex.multi.fr' block='stacks'>
  Resource  : <common.data.logs.FlowLogsStack object at 0x7f8de9283990 | date='2020-12-01T00:00:00Z' cutoff='production'>

# ----  Result from get: [True], from put: [True]  ---- #


Extra informations :

* For this method to actually do something, a namespace associated with a multi store needs to be provided. Otherwise, it will just do nothing (but it won't crash).
* If an extra `fatal=False` argument is added, the `stack_archive_refill` method won't crash even if the operation fails.

__Technical note:__ Under the hood, it is fairly simple. First, the "stack" is fetched from cache in the current working directory using the usual Handler's `get` method (the container is created on the fly). Then, it is archived using the Handler's `put` method.

For the following part of this demo to be meaningful, let's delete the "stack" from cache:

In [9]:
rh_refills[0].delete(incache=True)

# [2020/12/10-19:41:00][vortex.data.abstractstores][incachedelete:1266][INFO]: incachedelete on vortex://vortex-free.cache-mt.fr//play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack
# [2020/12/10-19:41:00][vortex.data.abstractstores][_verbose_log:0146][INFO]: Skip this "<class 'vortex.data.stores.VortexFreeStdBaseArchiveStore'>" store because a cache is requested


True

## Retrieving a stacked resource from the archive

That's easy, just proceed as usual, use a multi store:

In [10]:
rh_l1s = toolbox.input(
    now=True, role='Listing',
    # Resource
    kind='listing',
    task='forecast',
    model='arpege',
    date='2020120100',
    cutoff='production',
    # Provider
    experiment='ABCD@meunierlf',
    block='forecast',
    member=1,
    namespace='vortex.multi.fr',
    local='fromarchive_listing.[member%04d]',
)

# New input section with options :
+ role         = Listing

# Resource handler description :
+ block        = forecast
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ insitu       = False
+ kind         = listing
+ local        = fromarchive_listing.[member%04d]
+ member       = 1
+ model        = arpege
+ namespace    = vortex.multi.fr
+ task         = forecast

# This command options :
+ complete     = False
+ loglevel     = None
+ now          = True
+ verbose      = True


----------------------------------------------------------------------------------------------------
# Early-get for all resources.                                                                     #
----------------------------------------------------------------------------------------------------


# [2020/12/10-19:41:00][vortex.toolbox][add_section:0364][INFO]: Early-get was unavailable for all of the resources.



----------------------------------------------------------------------------------------------------
# Resource no 01/01                                                                                #
----------------------------------------------------------------------------------------------------
01. <vortex.data.handlers.Handler object at 0x7f8de921e3d0>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8de8d98850 | path='fromarchive_listing.0001'>
  Provider  : <vortex.data.providers.VortexFreeStd object at 0x7f8de8d98ad0 | namespace='vortex.multi.fr' block='forecast'>
  Resource  : <common.data.logs.Listing object at 0x7f8de926c750 | model='arpege' date='2020-12-01T00:00:00Z' cutoff='production' task='forecast' part='all' binary='arpege'>

# ----  Action GET on vortex://vortex-free.multi.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/mb001/forecast/listing.arpege-forecast.all?stackfmt=filespack&stackpath=play%2Fsandbox%2FABCD%40meunierlf%2F202012

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.cache-mt.fr//play/sandbox/ABCD@meunierlf/20201201T0000P/mb001/forecast/listing.arpege-forecast.all (to: fromarchive_listing.0001)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb001/forecast/listing.arpege-forecast.all (to: fromarchive_listing.0001)
# [2020/12/10-19:41:00][vortex.data.abstractstores][_verbose_log:0431][INFO]: Multistore get vortex://vortex-free.cache.fr: none of the opened store succeeded.
# [2020/12/10-19:41:00][vortex.data.abstractstores][inarchiveget:0847][INFO]: inarchiveget on vortex://vortex-free.archive-legacy.fr//home/meunierlf/vortex/play/sandbox/ABCD/20201201T0000P/mb001/forecast/listing.arpege-forecast.all (to: fromarchive_listing.0001)
# [2020/12/10-19:41:00][vortex.tools.storage][_ftpretrieve:06


# ----  Result from get: [True]  ---- #


__Technical note:__ The usual location (e.g. outside of the stack) is first looked for: it fails. The "stack" location is then looked for: The whole "stack" is retrieved and refilled in the Vortex cache store for later use

The whole "stack" is now available in cache:

In [11]:
rh_l1s[0].check(incache=True)

os.stat_result(st_mode=33060, st_ino=571983283, st_dev=64517, st_nlink=3, st_uid=2022, st_gid=20103, st_size=25, st_atime=1607625660, st_mtime=1607625660, st_ctime=1607625660)

For example, the listing of the second member can easily be retrieved (in cache since the stack has been refilled):

In [12]:
rh_l1s = toolbox.input(
    now=True, role='Listing',
    # Resource
    kind='listing',
    task='forecast',
    model='arpege',
    date='2020120100',
    cutoff='production',
    # Provider
    experiment='ABCD@meunierlf',
    block='forecast',
    member=2,
    namespace='vortex.multi.fr',
    local='fromarchive_listing.[member%04d]',
)

# New input section with options :
+ role         = Listing

# Resource handler description :
+ block        = forecast
+ cutoff       = production
+ date         = 2020120100
+ experiment   = ABCD@meunierlf
+ insitu       = False
+ kind         = listing
+ local        = fromarchive_listing.[member%04d]
+ member       = 2
+ model        = arpege
+ namespace    = vortex.multi.fr
+ task         = forecast

# This command options :
+ complete     = False
+ loglevel     = None
+ now          = True
+ verbose      = True


----------------------------------------------------------------------------------------------------
# Early-get for all resources.                                                                     #
----------------------------------------------------------------------------------------------------


# [2020/12/10-19:41:00][vortex.toolbox][add_section:0364][INFO]: Early-get was unavailable for all of the resources.



----------------------------------------------------------------------------------------------------
# Resource no 01/01                                                                                #
----------------------------------------------------------------------------------------------------
01. <vortex.data.handlers.Handler object at 0x7f8de8a83250>
  Complete  : True
  Container : <vortex.data.containers.SingleFile object at 0x7f8de8a833d0 | path='fromarchive_listing.0002'>
  Provider  : <vortex.data.providers.VortexFreeStd object at 0x7f8de8a834d0 | namespace='vortex.multi.fr' block='forecast'>
  Resource  : <common.data.logs.Listing object at 0x7f8de8a83350 | model='arpege' date='2020-12-01T00:00:00Z' cutoff='production' task='forecast' part='all' binary='arpege'>

# ----  Action GET on vortex://vortex-free.multi.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/mb002/forecast/listing.arpege-forecast.all?stackfmt=filespack&stackpath=play%2Fsandbox%2FABCD%40meunierlf%2F202012

# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.cache-mt.fr//play/sandbox/ABCD@meunierlf/20201201T0000P/mb002/forecast/listing.arpege-forecast.all (to: fromarchive_listing.0002)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1231][INFO]: incacheget on vortex://vortex-free.stacked-cache-mt.fr/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb002/forecast/listing.arpege-forecast.all (to: fromarchive_listing.0002)
# [2020/12/10-19:41:00][vortex.data.abstractstores][incacheget:1245][INFO]: incacheget retrieve rc=True location=/home/meunierlf/cache/vortex/play/sandbox/ABCD@meunierlf/20201201T0000P/stacks/flow_logs.filespack/mb002/forecast/listing.arpege-forecast.all



# ----  Result from get: [True]  ---- #
