# Using SimStore

This notebook will show the current way to use SimStore if you're starting a new project. This will change over time as more functionality gets added to SimStore.

We'll load an existing trajectory using a netcdfplus file. Other than that, all objects will be created fresh and in the most OPS-2.0 style currently possible.

NOTE: The main caveat here is that you cannot disk-cache results of an old CV with new storage. However, once new-style CVs are available, you can disk-cache a new CV in new storage.

In [1]:
import openpathsampling as paths
from openpathsampling.engines import toy as toys

## Load up old sample set



In [2]:
old_storage = paths.Storage("../toy_mstis_1k_OPS1_py36.nc")
samples = old_storage.steps[-1].active

## Create simulation objects

This is all taken from the standard OPS MSTIS example.

In [3]:
pes = (toys.OuterWalls([1.0, 1.0], [0.0, 0.0])
       + toys.Gaussian(-0.7, [12.0, 12.0], [0.0, 0.4])
       + toys.Gaussian(-0.7, [12.0, 12.0], [-0.5, -0.5])
       + toys.Gaussian(-0.7, [12.0, 12.0], [0.5, -0.5]))

topology=toys.Topology(n_spatial=2,
                       masses=[1.0, 1.0],
                       pes=pes)

integ = toys.LangevinBAOABIntegrator(dt=0.02, temperature=0.1, gamma=2.5)

options={'integ': integ,
         'n_frames_max': 5000,
         'n_steps_per_frame': 1}

engine = toys.Engine(options=options,
                      topology=topology).named('toy_engine')

# note: no need for a template!

In [4]:
from openpathsampling.experimental.storage.collective_variables import CoordinateFunctionCV

Currently implemented in the experimental storage:

* `simstore.StorableFunction`: Completely generic function wrapper
* `CollectiveVariable`: Anything that takes snapshots as input
* `CoordinateFunctionCV`: Takes snapshots; independent of arrow of time
* `MDTrajFunctionCV`: Wrapper for MDTraj

More to come, help would be appreciated!


Major differences with the old CVs:

* Name is no longer required; name these the same way as other objects.
* kwarg `f` renamed to `func`.
* Currently require that you provide `result_type`: this will change soon, though.
* It is no longer necessary to provide a template object to storage before saving a CV.
* Everything is disk-cached by default. You can turn that off by setting `cv.mode = 'no-caching'`, but note that this will also turn off memory caching.

More minor differences:

* A small set of kwargs are off-limits:

    * `result_type`
    * `func_config`
    * `store_source`
    
  I hope that `result_type` will be removed from this list soon.
* The `store_source` parameter tells whether to store the source for the `func` in the object. By default, it stores the source if it was defined in `__main__` (i.e., if it `func` isn't imported from another package). So, for example, it will not store the source of `md.compute_distances`, but if you write a min-dist function that wraps `md.compute_distances`, it will store the source for that function.
* Instead of the old `cv_*` flags, these are each individual `simstore.storable_functions.Processor` objects, which get registered with a `simstore.storable_functions.StorableFunctionConfig` object. You can combine these to get various pre-processing and post-processing effect, and can create new, custom processors (e.g., for MDTraj). The `StorableFunctionConfig` is given to the `func_config` parameter.
* There are several "modes" that you can use. Set them with `cv.mode = mode`, where mode is one of the strings:

    * `'no-caching'`: do not do any caching at all; always evaluate the function
    * `'analysis'`: first look to memory cache, then to disk cache, then evaluate as last resort
    * `'production'`: first look to memory cache, then evaluate. Note: this will still save the results to disk.
  
  CVs will *always* load in `'analysis'` mode, so you much change that after loading. It would be useful to add something to `PathSimulators` to automatically set all CVs into `'production'` mode, but that's for the future.


In [5]:
def circle(snapshot, center):
    import math
    return math.sqrt((snapshot.xyz[0][0]-center[0])**2
                     + (snapshot.xyz[0][1]-center[1])**2)
    
opA = CoordinateFunctionCV(func=circle, result_type='float',
                           center=[-0.5, -0.5]).named("opA")
opB = CoordinateFunctionCV(func=circle, result_type='float', 
                           center=[0.5, -0.5]).named("opB")
opC = CoordinateFunctionCV(func=circle, result_type='float', 
                           center=[0.0, 0.4]).named("opC")

In [6]:
# when running simulations, setting CV mode to production is a major speed boost
# for analysis, leave it as the default ('analysis')
# NB: in practice, toy models like this might use 'no-caching' mode,
# because calculating the CV is cheap compared to looking it up on disk.
for op in [opA, opB, opC]:
    op.mode = 'production'

In [7]:
# set this before creating the interface sets
paths.InterfaceSet.simstore = True

In [8]:
stateA = paths.CVDefinedVolume(opA, 0.0, 0.2).named("A")
stateB = paths.CVDefinedVolume(opB, 0.0, 0.2).named("B")
stateC = paths.CVDefinedVolume(opC, 0.0, 0.2).named("C")

interfacesA = paths.VolumeInterfaceSet(opA, 0.0, [0.2, 0.3, 0.4])
interfacesB = paths.VolumeInterfaceSet(opB, 0.0, [0.2, 0.3, 0.4])
interfacesC = paths.VolumeInterfaceSet(opC, 0.0, [0.2, 0.3, 0.4])

In [9]:
ms_outers = paths.MSOuterTISInterface.from_lambdas(
    {ifaces: 0.5
     for ifaces in [interfacesA, interfacesB, interfacesC]}
)
mstis = paths.MSTISNetwork(
    [(stateA, interfacesA),
     (stateB, interfacesB),
     (stateC, interfacesC)],
    ms_outers=ms_outers
)

In [10]:
%%time
# this is slow because it has to load the trajectories from netcdfplus
scheme = paths.DefaultScheme(mstis, engine=engine)
init_conds = scheme.initial_conditions_from_trajectories(samples)

No missing ensembles.
No extra ensembles.
CPU times: user 18.6 s, sys: 1.55 s, total: 20.1 s
Wall time: 21.9 s


## Use new storage in a simulation

In [11]:
from openpathsampling.experimental.simstore import SQLStorageBackend
from openpathsampling.experimental.storage import Storage

In [12]:
backend = SQLStorageBackend("storage_01.db", mode='w')
storage = Storage.from_backend(backend)

In [13]:
simulation = paths.PathSampling(storage=storage,
                                move_scheme=scheme,
                                sample_set=init_conds)

In [14]:
simulation.save_frequency = 50  # with toy models, we can hold many steps in memory
simulation.run(750)  # increase to >= 10000 for better analysis

Working on Monte Carlo cycle number 10000
Running for 13 minutes 19 seconds -  0.08 seconds per step
Estimated time remaining: 0 seconds
DONE! Completed 10000 Monte Carlo cycles.


In [15]:
print(storage.summary())

File: storage_01.db
Includes tables:
* samples: 13224 items
* sample_sets: 10001 items
* trajectories: 8046 items
* move_changes: 47467 items
* steps: 10001 items
* details: 46838 items
* storable_functions: 6 items
* simulation_objects: 393 items
* storage_objects: 0 items
* snapshot0: 1484 items
* snapshot1: 321982 items



This finds two types of snapshots because the engine on the snapshots we loaded does not have the same UUID as the one we created here.

In [16]:
old_storage.close()
storage.close()
# new storage doesn't really need to be closed

## Performing analysis

In [17]:
# this is only because we're in the same notebook and using this as a test --
# it makes it look like we haven't previously opened this file
Storage._known_storages = {}

In [18]:
%%time
backend = SQLStorageBackend("storage_01.db", mode='r')
storage = Storage.from_backend(backend)

CPU times: user 410 ms, sys: 19.1 ms, total: 429 ms
Wall time: 462 ms


In [19]:
scheme = storage.schemes[0]

In [20]:
scheme.move_summary(storage.steps)

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=10001.0), HTML(value='')))


repex ran 22.720% (expected 22.39%) of the cycles with acceptance 492/2272 (21.65%)
shooting ran 44.810% (expected 44.78%) of the cycles with acceptance 3355/4481 (74.87%)
pathreversal ran 24.540% (expected 24.88%) of the cycles with acceptance 2153/2454 (87.73%)
minus ran 3.140% (expected 2.99%) of the cycles with acceptance 304/314 (96.82%)
ms_outer_shooting ran 4.790% (expected 4.98%) of the cycles with acceptance 337/479 (70.35%)


In [21]:
tis_analysis = paths.analysis.tis.StandardTISAnalysis(
    network=scheme.network,
    scheme=scheme,
    max_lambda_calcs={t: {'bin_width': 0.05, 'bin_range': (0.0, 0.5)}
                      for t in scheme.network.sampling_transitions}
)

In [22]:
%%time
rate_matrix = tis_analysis.rate_matrix(steps=storage.steps).to_pandas()

HBox(children=(HTML(value='Flux'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=94.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=106.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=104.0), HTML(value='')))




HBox(children=(HTML(value='Crossing probability'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value='Ensembles'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=713.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=586.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=477.0), HTML(value='')))

HBox(children=(HTML(value='Ensembles'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=674.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=553.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=464.0), HTML(value='')))

HBox(children=(HTML(value='Ensembles'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=687.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=632.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=509.0), HTML(value='')))


CPU times: user 4min 40s, sys: 4.91 s, total: 4min 45s
Wall time: 4min 49s


In [23]:
rate_matrix

Unnamed: 0,B,C,A
B,,0.00159841,0.00339158
C,0.000632501,,0.00122459
A,0.00113445,0.00203247,
