# The Uget/Uenv providers: A way to rob the official GCO store

## Introduction

The **Uget**/**Uenv** providers have been designed to:

  * allow any user to start from an existing GCO's "Genv", modify it and use it in any Vortex script (generated manually or automatically with Olive);
  * create a new "Genv" out of the blue in order to build a completely new system. Once the new system is extensively tested, it's handed over to GCO that will create an official "Genv";

The **Uenv**/**Uget** architecture is strongly inspired by the **Gget**/**Genv** system:

  * The **Uget** store and provider are jointly used to retrieve *elements* identified by their unique ID keys. With **Uget**, the ID key (later referred as *UgetID*) must conform to a precise syntax: it is something like `uget:unique_id@location` where *location* is the account or project where the data is stored (typically the user's logname) and *unique_id* is the identifier of the *element* (for a given location, it must be unique).
  * The **Uenv** provider reads a mapping between a *UenvKey* and a **Uget/Gget** *element*. It can later retrieve the targeted *element* using the **Uget** or **Gget** stores. The mapping file between *UenvKey*s and a *Uget*/*GgetID*s is later referred as a *UenvCycle* (actually, the *UenvCycle* mapping file is stored using the **Uget** store).

On the Vortex's script side, the **Uget**/**Uenv** providers are only allowed to retrieve data. However, a command-line interface is provided in the *bin* sub-directory to help manage the **Uget**/**Uenv** stores. This command-line interface, `uget.py`, is self documented: after the prompt just type "help" to get a list of available commands or "help command-name" to get detailed informations on a given command.

## The **Uget** stores structure 

![](../images/uget_store.png) 


Usually the **Uget** MultiStore (`uget.multi.fr`) is used. It has three tiers (see the above diagram): 

  * the First tier is the *Hack* store: most of the time it should be empty but can be used to create new *elements* (hence the name *Hack* store). When the developer is happy with the new *elements*, those should be pushed to the third tier of the MultiStore to be available for the all world to see ;
  * the Second tier is a usual cache store. Its purpose is to speed up the data retrieval. If an *element* is missing from the *Cache* store, it is fetched from the third tier and refiled in the cache. That way, it will be available in any subsequent call. The Cache store is not intended to act as a persistent storage area, consequently it can be cleaned up on a regular basis in order to gain disk space.
  * the Third (and last) tier is the persistent storage area of the **Uget** stystem: It is located on a mass archive system and should not be cleaned.

On the first and second tiers, the data hierarchy is relatively simple: `location/type/unique_id` where *location* and *unique_id* are taken from the *UgetID* while *type* can be either 'data' or 'env' depending on whether the stored *element* is a regular data file or a *UenvCycle* mapping file.

On the third (and last) tier, the data layout is a bit more complicated : The root directory, later referred as *archivebase*, depends on the *location* specified in the *UgetID* key (this can be configured using Vortex's configuration files). Starting from this *archivebase*, the data hierarchy is organised as follows: `type/hashkey/unique_id` where *type* and *unique_id* have the same meaning as previously and *hashkey* is one hexadecimal digit computed by applying a hash function on the *unique_id*. This weird hash mechanism was found necessary since it will prevent too many *elements* to pile up in a unique directory.

Since the Third tier data is available to any users (and possibly mirrored in several second tier caches), once an element is pushed to the third tier, it should never be modified.

Note: The **Uget** store behaves like the GCO's **Gget** store. It will automatically untar files (if their local name ends with .tar, .tgz or .tar.gz) and the `extract` feature can be used (i.e. it is possible to extract a particular file from a Tar file).

## Example: Creating a customised Uenv starting from scratch

In [1]:
%load_ext ivortex
%vortex tmpcocoon
import common, gco

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:41 )


# [2017/03/06-17:47:41][vortex.sessions][_set_rundir:0150][INFO]: Session <root> set rundir </home/meunierlf/vortex-workdir/auto_cocoon_5sMjCW>


The working directory is now: /home/meunierlf/vortex-workdir/auto_cocoon_5sMjCW/root


Let's consider an existing resource "`common.data.consts.RtCoef`", that expects the "rtcoef_tgz" *Uenv/Genv* key, and an entirely new resource defined below:

In [2]:
class MyDemoResource(common.data.consts.GenvModelResource):
    _footprint = dict(
        info = 'Demo constant file',
        attr = dict(
            kind = dict(
                values  = [ 'democst' ]
            ),
            gvar = dict(
                default = 'demo_cstfile'
            ),
        )
    )

    @property
    def realkind(self):
        return 'democst'

First, we need to create the proper directory structure in the **Uget**'s *Hack* store :

In [3]:
# The uget.py command is used
!uget.py bootstrap_hack ugetdemo
# NB: In real life, replace "ugetdemo" by your logname

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:42 )
/home/meunierlf/.vortexrc/hack/uget/ugetdemo/env created (if necessary)
/home/meunierlf/.vortexrc/hack/uget/ugetdemo/data created (if necessary)
Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:42 )


Now, we are creating the *UenvCycle* needed for this demo:

In [4]:
with open(sh.path.join(e.HOME, '.vortexrc/hack/uget/ugetdemo/env/cy00_demo.01'), 'w') as fh:
    fh.write("\n".join(['RTCOEF_TGZ=var.sat.misc_rtcoef.23.tgz',
                        'DEMO_CSTFILE=uget:demo.constant.01@ugetdemo']))
# Of course, outside of a notebook, a sane person would do that with vi, emacs or whatever...

...and a fake constant file:

In [5]:
with open(sh.path.join(e.HOME, '.vortexrc/hack/uget/ugetdemo/data/demo.constant.01'), 'w') as fh:
    fh.write("I'm a fraud...\n")

That's it ! we can start our tests:

In [6]:
rhcoeffs = toolbox.rh(model='arpege',
                      kind='rtcoef',
                      genv='uenv:cy00_demo.01@ugetdemo',
                      local='the_rtcoef.tgz')

In [7]:
# Where will the data be looked for ?
rhcoeffs.location()

# [2017/03/06-17:47:42][vortex.data.stores][incacheget:1421][INFO]: incacheget on uget://uget.hack.fr/ugetdemo/env/cy00_demo.01 (to: <StringIO.StringIO instance at 0x7f24e8367a28>)


'gget://gco.meteo.fr/tampon/var.sat.misc_rtcoef.23.tgz'

In [8]:
# Note: Since it was the first use of the 'cy00_demo.01' UenvCycle it's been initialised (see the logger message above)

In [9]:
rhdemo = toolbox.rh(model='arpege',
                    kind='democst',
                    genv='uenv:cy00_demo.01@ugetdemo',
                    local='the_demo')

In [10]:
# Where will the data be looked for ?
rhdemo.location()

'uget://uget.multi.fr/data/demo.constant.01@ugetdemo'

Let's sumarise. As requested in the *UenvCycle* file, the Rtcoef data will be fetched using the **Gget** store whereas the DemoCst data will come from our fake `demo.constant.01` file.

Let's try to retrieve the constant file:

In [11]:
rhdemo.get()

# [2017/03/06-17:47:42][vortex.data.stores][incacheget:1421][INFO]: incacheget on uget://uget.hack.fr/ugetdemo/data/demo.constant.01 (to: the_demo)


True

In [12]:
%ls

the_demo


In [13]:
%cat the_demo

I'm a fraud...


After a lot of testing, we are very happy with our work. It's time to push the new *elements* to the Archive store (Third tier). The `uget.py` script offers such a feature:

In [14]:
!uget.py push env cy00_demo.01@ugetdemo

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:43 )
Digging into this particular Uenv:
Uploading: uget:demo.constant.01@ugetdemo
Unchecked: var.sat.misc_rtcoef.23.tgz

Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:49 )


That's great, the new environment has been pushed to the third tier store and the `uget:demo.constant.01@ugetdemo` *element* was also uploaded. Now, data are safely stowed away in the archive, so we do not need them anymore in the *Hack* store. The `uget.py` script can be used to safely get rid of them (only data also available in the third tier will be deleted):

In [15]:
!echo "y" | uget.py clean_hack 

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:49 )
The following elements will be deleted:
  /data/demo.constant.01@ugetdemo
  /env/cy00_demo.01@ugetdemo
Please, confirm... [y/N/q] Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:49 )


If we fetch the `the_demo` file again, it should be retrieved from the third tier store (and refiled in the cache store):

In [16]:
sh.rm('the_demo')
rhdemo.get()

# [2017/03/06-17:47:50][vortex.data.stores][incacheget:1421][INFO]: incacheget on uget://uget.hack.fr/ugetdemo/data/demo.constant.01 (to: the_demo)
# [2017/03/06-17:47:50][vortex.data.stores][_verbose_log:0262][INFO]: UgetHackCacheStore get on the_demo was not successful (rc=False)
# [2017/03/06-17:47:50][vortex.data.stores][incacheget:1421][INFO]: incacheget on uget://uget.cache.fr/ugetdemo/data/demo.constant.01 (to: the_demo)
# [2017/03/06-17:47:50][vortex.data.stores][_verbose_log:0262][INFO]: UgetMtCacheStore get on the_demo was not successful (rc=False)
# [2017/03/06-17:47:50][vortex.data.stores][_load_config:1059][INFO]: Some store configuration data is needed (for uget://uget.archive.fr)
# [2017/03/06-17:47:50][vortex.data.stores][_load_config:1066][INFO]: Reading config file: @store-uget.ini
# [2017/03/06-17:47:50][vortex.data.stores][_load_config:1073][INFO]: Reading config file: @store-uget-hendrix.ini
# [2017/03/06-17:47:50][vortex.data.stores][ftpget:0975][INFO]: ftpget on 

True

## The uget.py script magic

`uget.py` can be talkative:

In [17]:
!uget.py info

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:50 )
Default location: None
Hack store      : <gco.data.stores.UgetHackCacheStore object at 0x7f1ad5c27f90 | entry=/home/meunierlf/.vortexrc/hack/uget>
Archive store   : <gco.data.stores.UgetArchiveStore object at 0x7f1ad5c37190 | hostname=hendrix.meteo.fr>

Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:50 )


It is possible to check if a given *element* exists :

In [18]:
!uget.py check env cy00_demo.01@ugetdemo

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:51 )
Hack   : MISSING (/home/meunierlf/.vortexrc/hack/uget/ugetdemo/env/cy00_demo.01)
Archive: Ok      (meunierlf@hendrix.meteo.fr:~meunierlf/ugetdemo/uget/env/8/cy00_demo.01)

Digging into this particular Uenv:
  DEMO_CSTFILE                        : Archive        (uget:demo.constant.01@ugetdemo)
  RTCOEF_TGZ                          : unchecked      (var.sat.misc_rtcoef.23.tgz)

Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:51 )


In [19]:
!uget.py check data demo.constant.01@ugetdemo

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:52 )
Hack   : MISSING (/home/meunierlf/.vortexrc/hack/uget/ugetdemo/data/demo.constant.01)
Archive: Ok      (meunierlf@hendrix.meteo.fr:~meunierlf/ugetdemo/uget/data/a/demo.constant.01)

Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:53 )


Of course, it's also possible to retrieve any existing *element* :

In [20]:
!uget.py pull env cy00_demo.01@ugetdemo

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:53 )

RTCOEF_TGZ=var.sat.misc_rtcoef.23.tgz
DEMO_CSTFILE=uget:demo.constant.01@ugetdemo

Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:55 )


In [21]:
!uget.py pull data demo.constant.01@ugetdemo

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:55 )
Vortex 1.1.0 completed ( Monday 06. March 2017, at 17:47:57 )


In [22]:
%ls

demo.constant.01  the_demo


The `uget.py` script can be used as a shell command (as demonstrated above), but it also provides a prompt: just type `uget.py` and discover what `uget.py` can do for you ! (the *help* command is a good way to start)

## Other usecases...

The example above is probably not the most common usecase since, most of the time, we will be keen on hacking existing GCO cycles. To achieve this, `uget.py` can help you:

In [23]:
!uget.py help hack

Vortex 1.1.0 loaded ( Monday 06. March 2017, at 17:47:57 )

        Retrieve an element and place it in the Hack store.

        Syntax: hack [data|env|gdata|genv] SourceId into UgetId

        * The kacked element may originate from different sources:
          * hack data: A Uget element is looked for (in such a case,
            'SourceId' must be a valid UgetId)
          * hack env: A Uget environment file is looked for (in such
            a case, 'SourceId' must be a valid UgetId)
          * hack gdata: A Gget element is looked for (in such a case
            'SourceId' must be a valid gget identifier).
          * hack genv: A genv cycle is looked for (in such a case
            'SourceId' must be a valid genv identifier).
        * Once the source element is retrieved, it is saved to the Hack
          store using the UgetId identifier
        * The 'UgetId' identifies a Uget element. Its formed of an *element_name* and
          of a *location* : it looks like 'element_name@

## Final notes for IFS/Arpege users

Most of the *IFS/Arpege* Binaries and AlgoComponents are using the GCO's Genv cycle name to detect the source code version number and act accordingly (for example, from *Arpege/IFS* version 41 onward, no command line arguments are added when the model is started). A side-effect of this interesting feature is that you should be really careful when choosing a *UenvCycle* name. It's recomended you keep the official GCO naming convention suffixed with something of your choice.

For example, if you start from the `cy42_op2.11` GCO cycle, you may choose something like: `cy42_op2.11a`.