# A quick guide to some WEAVE tools

## The problem

In this demo we are simulating drop a ball and letting it bounce.

We will show how we can take the simulation results and converts them to a sina format for ingestion in other script or in a sina store.

## The "simulation"

We will be using [this scipt](ball_bounce.py) to generate a single simulation of a bouncing ball

```bash
usage: ball_bounce.py [-h] [--xpos XPOS] [--ypos YPOS] [--zpos ZPOS]
                      [--xvel XVEL] [--yvel YVEL] [--zvel ZVEL]
                      [--gravity GRAVITY] [--box_side_length BOX_SIDE_LENGTH]
                      [--runtime RUNTIME] [--frequency FREQUENCY]
                      [--drag DRAG] [--output OUTPUT] [--group GROUP]
                      [--run RUN]

optional arguments:
  -h, --help            show this help message and exit
  --xpos XPOS, -x XPOS  initial x position (default: 0.0)
  --ypos YPOS, -y YPOS  initial y position (default: 0.0)
  --zpos ZPOS, -z ZPOS  initial z position (default: 0.0)
  --xvel XVEL, -X XVEL  initial x velocity (default: 0.0)
  --yvel YVEL, -Y YVEL  initial y velocity (default: 0.0)
  --zvel ZVEL, -Z ZVEL  initial z velocity (default: 0.0)
  --gravity GRAVITY, -g GRAVITY
                        gravity (default: 9.81)
  --box_side_length BOX_SIDE_LENGTH, -b BOX_SIDE_LENGTH
                        length of the box's sides (default: 10)
  --runtime RUNTIME, -r RUNTIME
                        length of time we let the simualtion run for (default:
                        20)
  --frequency FREQUENCY, --ticks_per_seconds FREQUENCY
                        sampling rate (default: 20)
  --drag DRAG, -d DRAG  drag coefficient (default: 0.1)
  --output OUTPUT, -o OUTPUT
                        output file (default: None)
  --group GROUP, -G GROUP
                        group id (default: 1)
  --run RUN, -R RUN     run id (default: 1)
```


This simulation produces a delimeter separated values (`dsv`) file containing the results.

## Running many parameters

### Basic maestro

We can easily run many of this simulations with maestro with [this yaml file](ball_bounce_simple.yaml)

```bash
maestro run ball_bounce_simple.yaml
```

### PGEN

You can guess if the number of simulation increase it would be very tedious to manually put all these numbers in the yaml file.

Fortunately maestro allows for python-generation of the parameters. [This file](pgen.py) will generate 20 random samples for us.

## Keeping track of what we ran: Sina

As the number of simulation expands it will quickly become hard to figure out what we run

Sina can help with this.

## Creating sina records from the simulation results

The [following script](dsv_to_sina.py) can comb through our generated `dsv` files, and ingest them into a sina catalog.

Some LLNL code have Sina built in and produce the `.json` files as they run. You could also run the `sina ingest` command on these files to create the store.

In [this maestro yaml file](ball_bounce_suite.yaml) we add an extra step to generate the store after the simulations are ran.

Let's run the following command to generate data

```bash
maestro run -p pgen.py bounce_ball_suite.yaml
```

### Loading the store

Now that we have a store, let's open it up and run some queries on it.





In [1]:
import sina

store =sina.connect("output.sqlite")

In [2]:
# let's see what is in the store:
print(len(list(store.records.find())))

105


In [3]:
# let's open a record
rec = next(store.records.find_with_max("num_bounces", 1))
#rec = store.records.get(r_id)
print(rec.raw)

{'id': 'b14f7f_3', 'type': 'csv_rec', 'data': {'x_pos_initial': {'value': 87.0}, 'y_pos_initial': {'value': 86.0}, 'z_pos_initial': {'value': 91.0}, 'x_vel_initial': {'value': 10.0}, 'y_vel_initial': {'value': -9.0}, 'z_vel_initial': {'value': 5.0}, 'gravity': {'value': 0.5}, 'box_side_length': {'value': 100.0}, 'group_id': {'value': 'b14f7f'}, 'x_pos_final': {'value': 48.87384400847369}, 'y_pos_final': {'value': 0.0}, 'z_pos_final': {'value': 19.870369796226736}, 'x_vel_final': {'value': 0.3332205903240224}, 'y_vel_final': {'value': 0.0}, 'z_vel_final': {'value': -0.199977670715381}, 'num_bounces': {'value': 25.0}}, 'curve_sets': {'time_series': {'independent': {'time': {'value': [0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3, 1.35, 1.4, 1.45, 1.5, 1.55, 1.6, 1.65, 1.7, 1.75, 1.8, 1.85, 1.9, 1.95, 2.0, 2.05, 2.1, 2.15, 2.2, 2.25, 2.3, 2.35, 2.4, 2.45, 2.5, 2.55, 2.6, 2.65, 2.7, 2.75, 2


## Sina

In [this notebook](visualization.ipynb) we take a look at some of Sina's query and viz capabilities.


## Kosh

We have seen how Sina can helps us tracking our simulations and searching through them.

Kosh is built on top of Sina and allows the user to access data that are too big to be in the store.

In this example we will be working with small files

In [4]:
import kosh

store = kosh.connect("output.sqlite")  # Similar syntack to Sina

# Let's open a record using the id we found in Sina above (record with max of bounces)

dataset = store.open(rec["id"])
print(dataset)

KOSH DATASET
	id: b14f7f_3
	name: ???
	creator: ???

--- Attributes ---
	box_side_length: 100.0
	gravity: 0.5
	group_id: b14f7f
	num_bounces: 25.0
	x_pos_final: 48.87384400847369
	x_pos_initial: 87.0
	x_vel_final: 0.3332205903240224
	x_vel_initial: 10.0
	y_pos_final: 0.0
	y_pos_initial: 86.0
	y_vel_final: 0.0
	y_vel_initial: -9.0
	z_pos_final: 19.870369796226736
	z_pos_initial: 91.0
	z_vel_final: -0.199977670715381
	z_vel_initial: 5.0
--- Associated Data (1)---
	Mime_type: sina/curve
		internal ( time_series )
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---



In [5]:
# Attributes on a Kosh dataset are easy to alter and instantly updated in the db by default
print("N bounces:",dataset.num_bounces)
dataset.my_new_attribute = 6.
print("New:", dataset.my_new_attribute)

N bounces: 25.0
New: 6.0


In [6]:
# On can also easily acces curves:
print(dataset.list_features())
dataset["time_series/time"][:5]

['time_series', 'time_series/time', 'time_series/x_pos', 'time_series/y_pos', 'time_series/z_pos']


array([0.  , 0.05, 0.1 , 0.15, 0.2 ])

In [7]:
# let's loop through all the records/dataset in this record group and compute x_vel
# and store them in ahdf5 file (outside of db)
import h5py
for ds in store.find(group_id=dataset.group_id):
    x_pos = ds["time_series/x_pos"]
    y_pos = ds["time_series/y_pos"]
    z_pos = ds["time_series/z_pos"]
    time = ds["time_series/time"]
    x_vel = (x_pos[1:] - x_pos[:-1])/(time[1:]-time[:-1])
    y_vel = (y_pos[1:] - y_pos[:-1])/(time[1:]-time[:-1])
    z_vel = (z_pos[1:] - z_pos[:-1])/(time[1:]-time[:-1])
    speed = (x_vel+y_vel+z_vel)/3.
    nm = f"vel_{ds.id}.hdf5"
    h5 = h5py.File(nm,"w")
    h5["x_vel"] = x_vel
    h5["y_vel"] = y_vel
    h5["z_vel"] = z_vel
    h5["speed"] = speed
    h5.close()
    # Associate this new external data to dataset
    ds.associate(nm, "hdf5")

print(ds)
print(ds.list_features())

KOSH DATASET
	id: b14f7f_6
	name: ???
	creator: ???

--- Attributes ---
	box_side_length: 100.0
	gravity: 0.5
	group_id: b14f7f
	num_bounces: 21.0
	x_pos_final: 47.58586072308898
	x_pos_initial: 87.0
	x_vel_final: 0.28564555614411885
	x_vel_initial: -8.0
	y_pos_final: 0.0
	y_pos_initial: 86.0
	y_vel_final: 0.0
	y_vel_initial: 5.0
	z_pos_final: 70.94562198166595
	z_pos_initial: 91.0
	z_vel_final: 0.3332205903240224
	z_vel_initial: -10.0
--- Associated Data (2)---
	Mime_type: hdf5
		/g/g19/cdoutrix/git/weave_demos/ball_bounce/vel_b14f7f_6.hdf5 ( 37cb0785037d4fbda4f6693d342c7def )
	Mime_type: sina/curve
		internal ( time_series )
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---

['time_series', 'time_series/time', 'time_series/x_pos', 'time_series/y_pos', 'time_series/z_pos', 'speed', 'x_vel', 'y_vel', 'z_vel']


In [8]:
# We can access both curves or external data in the same way:
print(dataset["time_series/x_pos"][:5])
print(dataset["x_vel"][:5])

[87.5        87.999375   88.49812656 88.99625624 89.49376559]
[9.9875     9.97503123 9.96259357 9.95018692 9.93781114]


In [9]:
# Kosh also offer the notion of ensembles which is based on Sina' relationships
my_group = store.create_ensemble()
# attributes of a group are shared by all memebers
my_group.a_group_attribute = "foo"

# Let's add our group members to this ensemble:
for ds in store.find(group_id=dataset.group_id):
    my_group.add(ds)

In [10]:
print(my_group)


KOSH ENSEMBLE
	id: 20b6a8fde3b14f54ad210d7271e1e328
	name: Unnamed Ensemble
	creator: cdoutrix

--- Attributes ---
	a_group_attribute: foo
	creator: cdoutrix
	name: Unnamed Ensemble
--- Associated Data (0)---
--- Member Datasets (10)---
	['b14f7f_10', 'b14f7f_1', 'b14f7f_3', 'b14f7f_8', 'b14f7f_4', 'b14f7f_9', 'b14f7f_7', 'b14f7f_2', 'b14f7f_5', 'b14f7f_6']


In [11]:
print(ds.a_group_attribute)

foo


In [12]:
# We could search the ensemble
dss = list(my_group.find_datasets(num_bounces=sina.utils.DataRange(min=21)))
print(len(dss))

8


In [13]:
# let's compute the average speed for this ensemble
# for this we will use an operator
@kosh.numpy_operator
def Avg(*inputs):
    avg = inputs[0][:]
    for input_ in inputs[1:]:
        avg += input_[:]
    return avg/len(inputs)


avg_speed = Avg(*( _["speed"] for _ in my_group.find_datasets()))[:]
print(avg_speed[:5])
            
    

[ -2.93349167  -6.25931359  -9.57061229 -12.85922575 -16.11718603]


In [14]:
# we can now store that result in a file and associate that file with the group
import numpy
nm = f"avg_speed_{my_group.id}.hdf5"
h5 = h5py.File(nm, "w")
h5["avg_speed"]= avg_speed
h5.close()

my_group.associate(nm, "hdf5")
my_group.group_speed = float(numpy.average(avg_speed))
print(my_group)
print(ds.group_speed)

KOSH ENSEMBLE
	id: 20b6a8fde3b14f54ad210d7271e1e328
	name: Unnamed Ensemble
	creator: cdoutrix

--- Attributes ---
	a_group_attribute: foo
	creator: cdoutrix
	group_speed: -2.9704100812090957
	name: Unnamed Ensemble
--- Associated Data (1)---
	Mime_type: hdf5
		/g/g19/cdoutrix/git/weave_demos/ball_bounce/avg_speed_20b6a8fde3b14f54ad210d7271e1e328.hdf5 ( 1efa7aa19644455a85f0ec486b0ec64e )
--- Member Datasets (10)---
	['b14f7f_10', 'b14f7f_1', 'b14f7f_3', 'b14f7f_8', 'b14f7f_4', 'b14f7f_9', 'b14f7f_7', 'b14f7f_2', 'b14f7f_5', 'b14f7f_6']
-2.9704100812090957


In [16]:
print(my_group["avg_speed"][:5])

[ -2.93349167  -6.25931359  -9.57061229 -12.85922575 -16.11718603]
