# Versioned Multi dimension storage

### Features:
- Multiple branches
- Multiple collaborators
- Can jump anytime to any historical point
- Data is not replicated and no extra reading writing cost.

### Data structure:

![Data structure](img/datastucture.png)
![Data structure](img/datastucture2.png)

Every change is a now commit:

![Commits](img/commits.png)

### How to:
#### Create data:
`$ data = VersionedZarrData(root_path=PATH, dimension=DIMS, chunk_size=CHUNK_SIZE)`

`$ data.create(overwrite=True)`
#### Open data:
`data = open_versioned_data(PATH)`

#### Commit now modification:
`$ data.write_block(data=dummy_data, grid_position=grid_position)`

#### Checkout branch:
`$ data.git.checkout_branch(branch_name=BRANCH_NAME, create=True)`

#### Checkout history commit:
`$ data.git.checkout_branch(COMMIT_ID)`

Currently, benchmarking different scenarios.



![](img/benchmarking/best_chunk_all.png)
![](img/benchmarking/best_chunk_commit_1.png)
![](img/benchmarking/git_compression.png)

# Demo : VersionedData Zarr storage

In [1]:
import sys
import numpy as np
import zarr

### Import our library
VersionedData is a Zarr Storage class

In [2]:
from versionedzarrlib import VersionedDataStore

In [3]:
path = "/Users/zouinkhim/Desktop/versioned_data"
dims = (600, 600, 600)
chunk_size = (128, 128, 128)

#### Create new data

In [4]:
data = VersionedDataStore(path=path, shape=dims, raw_chunk_size=chunk_size)
data.create(overwrite=True)

Grid dimensions: [5, 5, 5]
Start file creation ..
File already exists ! 
File will be deleted !
{'zarr_format': 2, 'shape': (600, 600, 600), 'chunks': (128, 128, 128), 'dtype': dtype('int8'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': 0, 'order': 'C', 'filters': None, 'total_chunks': 1}
Dataset created!


#### Open in Zarrr

In [5]:
data2 = VersionedDataStore.open(path=path)

{'zarr_format': 2, 'shape': (600, 600, 600), 'chunks': (128, 128, 128), 'dtype': dtype('int8'), 'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1}, 'fill_value': 0, 'order': 'C', 'filters': None, 'total_chunks': 1}
Grid dimensions: [5, 5, 5]


In [6]:
z = zarr.open(store=data2)
z.info

0,1
Type,zarr.core.Array
Data type,int8
Shape,"(600, 600, 600)"
Chunk shape,"(128, 128, 128)"
Order,C
Read-only,False
Compressor,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)"
Store type,versionedzarrlib.data.VersionedDataStore
No. bytes,216000000 (206.0M)
No. bytes stored,388


In [7]:
z[500, 500, 500] = 5
z[10,10,10] = 10

New file /Users/zouinkhim/Desktop/versioned_data/raw/1
Writing (3, 3, 3)
New file /Users/zouinkhim/Desktop/versioned_data/raw/2
Writing (0, 0, 0)


In [8]:
z[7,10,10] = 9

File to open:/Users/zouinkhim/Desktop/versioned_data/raw/2
New file /Users/zouinkhim/Desktop/versioned_data/raw/3
Writing (0, 0, 0)


In [9]:
print(z[5:11,10,10])

File to open:/Users/zouinkhim/Desktop/versioned_data/raw/3
[ 0  0  9  0  0 10]


In [10]:
# No chunk file to open
print(z[300,300,300])

0


In [11]:
data.vc.show_history()

Committed by mzouink on Mon, 11 Apr 2022 14:39 with sha 61d5db31a5f4c9d27ba8958e1ce0dc6912a342fa
Committed by mzouink on Mon, 11 Apr 2022 14:39 with sha 1ee71b8e12b98e6574fa7afb7af939d0b57496ad
Committed by mzouink on Mon, 11 Apr 2022 14:39 with sha e88259ae8ff7466d78137b12f406387243d6cd20


In [22]:
data.vc.checkout_commit("1ee71b8e12b98e6574fa7afb7af939d0b57496ad")

In [23]:
print(z[5:11,10,10])

File to open:/Users/zouinkhim/Desktop/versioned_data/raw/2
[ 0  0  0  0  0 10]


In [14]:
data.vc.checkout_branch("dev",create=True)

Create new branch: dev


### Direct manipulation
Without the use of Zarr open

In [15]:
dummy_data = np.ones(data.raw_chunk_size, dtype='i8')

In [16]:
data2.write_block(dummy_data,(2,2,2))

In [17]:
data2.vc.add_all()
data2.vc.commit("Add block at {}".format( (2,2,2)))

In [18]:
tmp = data2.get_chunk((1,2,2))

raw file for (1, 2, 2) is 0
No data valid for position: (1, 2, 2)


In [19]:
tmp = data2.get_chunk((2,2,2))

raw file for (2, 2, 2) is 4
/Users/zouinkhim/Desktop/versioned_data/raw/4
