In [None]:
import sys
sys.path.insert(0, '../')

# First Steps

This section outlines the steps required to get started with the main features
of the library. Before starting, make sure the library is configured to run on
your machine.

## Initialization of the environment

Before starting, we will create a dataset to handle our collection.

In [None]:
import zcollection.tests.data


def create_dataset():
    generator = zcollection.tests.data.create_test_dataset_with_fillvalue()
    return next(generator)


ds = create_dataset()
ds.to_xarray()

Then we will create a file system in memory.

In [None]:
import fsspec


fs = fsspec.filesystem('memory')

Finally we create a local dask cluster using only threads in order to work
with the file system stored in memory.

In [None]:
import dask.distributed

cluster = dask.distributed.LocalCluster(processes=False)
client = dask.distributed.Client(cluster)
client

## Collection

This introduction will describe the main functionalities allowing to handle a
collection : create, open, load, modify a collection.

Before creating our collection, we define the partitioning of our dataset. In
this example, we will partition the data by **month** using the variable
`time`.

In [None]:
import zcollection

partition_handler = zcollection.partitioning.Date(("time", ), resolution="M")

Finally, we create our collection.

In [None]:
collection = zcollection.create_collection(axis="time",
                                           ds=ds,
                                           partition_handler=partition_handler,
                                           partition_base_dir="/my_collection",
                                           filesystem=fs)

---
**Note**

The collection created can be accessed using the following command: 

    >> collection = zcollection.open_collection("/my_collection",
    >>                                          filesystem=fs)
---

When the collection has been created, a configuration file is created. This file
contains all the metadata to ensure that all future inserted data will have the
same features as the existing data (data consistency).

In [None]:
collection.metadata.get_config()

Now that the collection has been created, we can insert new records.

In [None]:
collection.insert(ds)

---
**Note**

When inserting it’s possible to specify the merge strategy of a partition.
By default, the last inserted data overwrite the exising
ones. Others strategy can be defined, for example, to update existing data
(overwrite the updated data, while keeping the existing ones). This last
strategy allows updating incrementally an existing partition.

    >> import zcollection.merging
    >>
    >> collection.insert(ds, merge_callable=merging.merge_time_series)
---

Let's look at the different partitions thus created.

In [None]:
fs.listdir("/my_collection/year=2000")

This collection is composed of several partitions, but it is always handled as a
single data set.

### Loading data

To load the dataset call the method `load` on the instance.  By default, the
method loads all partitions stored in the collection.

In [None]:
collection.load()

You can also select the partitions to be considered by filtering the partitions
using keywords used for partitioning.

In [None]:
collection.load("year == 2000 and month == 2")

Note that the `load` function may return None if no partition has been selected.

In [None]:
collection.load("year == 2002 and month == 2") is None

### Editing variables

*The functions for modifying collections are not usable if the collection is
open in read-only mode.*

It's possible to delete a variable from a collection.

In [None]:
collection.drop_variable("var2")

In [None]:
collection.load()

**Warning**: The variable used for partitioning cannot be deleted.

In [None]:
collection.drop_variable("time")

The `add_variable` method allows you to add a new variable to the collection.

In [None]:
collection.add_variable(ds.metadata().variables["var2"])

The newly created variable is initialized with its default value.

In [None]:
collection.load().variables["var2"].values

Finally it's possible to update the existing variables.

In this example, we will alter the variable `var2` by setting it to 1 anywhere
the variable `var1` is defined.

In [None]:
def ones(ds):
    return ds.variables["var1"].values * 0 + 1


collection.update(ones, "var2")

In [None]:
collection.load().variables["var2"].values

## Views

A view allows you to extend a collection (a view reference) that you are not allowed to modify.

In [None]:
view = zcollection.create_view("/my_view",
                               zcollection.view.ViewReference(
                                   "/my_collection", fs),
                               filesystem=fs)

When the view is created, it has no data of its own, it uses all the data
defined in the reference view.

In [None]:
fs.listdir("/my_view")

In [None]:
view.load()

Such a state of the view is not very interesting. But it is possible to add and
modify variables in order to enhance the view.

In [None]:
var3 = ds.metadata().variables["var2"]
var3.name = "var3"

In [None]:
view.add_variable(var3)

This step creates all necessary partitions for the new variable.

In [None]:
fs.listdir("/my_view/year=2000")

The new variable is not initialized.

In [None]:
view.load().variables["var3"].values

 The same principle used by the collection allows to update the variables.

In [None]:
view.update(ones, "var3")

In [None]:
var3 = view.load().variables["var3"].values
var2 = view.load().variables["var2"].values
var2 - var3

**Warning**: the variables of the reference collection cannot be edited.

In [None]:
view.update(ones, "var2")

In [None]:
view.load()

Finally, a method allows you to delete variables from the view.

In [None]:
view.drop_variable("var3")

**Warning**: the variables of the reference collection cannot be deleted.

In [None]:
view.drop_variable("var2")