In [None]:
import datetime
import xedocs

In [None]:
# Create an arbitrary datetime object to demonstrate time based queries 
dtime = datetime.datetime(2020, 5, 16, 13, 10, 4, 781502)

# Introduction
Processing XENON data requires a large amount of detector parameters, correction information and other metadata not produced in the plugin dependency chain. These metadata collections have unique version control requirements. The xedocs package is a collection of schemas and interfaces for accessing this data from multiple data sources such as mongodb, pandas dataframes and API servers. The schema definition enforces consistent and uniform data and the common interface prevents hard coding of data access details in the plugins, notebooks and scripts that use the data. This allows analysts to easily switch out the data source for eg testing, development or when a database connection is not available.

## Indexed documents
The scope of xedocs has been generalized to include all indexed metadata, including versioned metadata. A collection of documents has at least one index field and a common schema for all documents. The set of all index fields for each document of a given schema must be unique. By default correction collections are insert-only, meaning you cannot change the values for an already set index.
All schemas in xedocs should inherit from `xedocs.schemas.XeDoc` or one of its subclasses.

## Detector Numbers (Bodega)
Bodega (detector parameters) is a simple example of a collection of versioned documents which share a common schema. The first step in migrating Bodega to xedocs is defining the schema, this code can be found in `xedocs/schema/detector_numbers.py`

```python

import rframe
import datetime

from .base_schemas import VersionedXeDoc
from .constants import PARTITION


class DetectorNumber(VersionedXeDoc):
    """Detector parameters
    A characteristic value of our detector 
    used in analysis, that is constant in time.
    """

    _ALIAS = "detector_numbers"

    field: str = rframe.Index(max_length=80)
    partition: PARTITION = rframe.Index(default="all_tpc")

    value: float
    uncertainty: float
    definition: str
    reference: str = ""
    date: datetime.datetime
    comments: str = ""


```

Notice that we inherit from the `VersionedXeDoc` class, so the `version` field is already defined for us. We add an additional index called `field` which will store the field name of the document. The rest of the schema is simply copied from the structure of the bodega json collection. Standard python type hints can be used to enforce the field types. All `pydantic` fields are supported by the framework but a given data storage backend may have some constraints.


### Reistering a datasource

Datasources can be registered with the `register_datasource` method. This method takes any supported datasource such as a pandas dataframe, a mongodb collection, json list or an fsspec path to a file:

```python
Schema.register_datasource('github://org:repo@/path/to/file.csv', name='github')
```

### Query interface
Querying a specific datasource can be done using the query api. `Schema.datasource.find_docs(version='v1', field=...)` will return a list of matching documents and `Schema.datasource.find_one(datasource, version='v1', field=...)` will return the first match. Each document will be an instance of the schema class. If you do not pass a datasource to the query methods, the default datasource will be queried.

In [None]:
from xedocs.schemas import DetectorNumber

In [None]:
drift_velocities = DetectorNumber.staging_db.find_docs(field='drift_velocity')
drift_velocity = drift_velocities[0]

In [None]:
drift_velocity = DetectorNumber.staging_db.find_one(field='drift_velocity', version='v1')
drift_velocity

In [None]:
v1_df = DetectorNumber.staging_db.find_df(version='v1')
v1_df

In [None]:
# Document fields can be accessed as attributes
drift_velocity.value

In [None]:
# convert to python dictionary
drift_velocity.dict()

# convert to json string
print(drift_velocity.json(indent=1))

## Datasource contexts
To easily access common datasources, you can load the appropriate context and use tab completion to discover registered datasets.

In [None]:
import xedocs

# Production DB
db = xedocs.production_db()


# Staging DB
db = xedocs.staging_db() 


doc = db.pmt_gains.find_one(version='v1', pmt=0, run_id='020000')
doc

## Operation reports
Xedocs is also used to store metadata about operations relavent to analysis, such as Anode ramps and calibrations.
You can find these datasets with xedocs.

### Anode ramps

In [None]:
db.anode_ramps.find_df()

### U-tube calibrations

In [None]:
db.utube_calibrations.find_df()

## Fax Configs
The WFSim configuration has also been migrated to the CMT2.0 framework in `xedocs/schemas/fax.py`, the schema definition is as followes:

```python
from typing import Literal, Union

import rframe

from .base_schemas import VersionedXeDoc


class FaxConfig(VersionedXeDoc):
    """fax configuration values for WFSim"""

    _ALIAS = "fax_configs"

    class Config:
        smart_union = True

    field: str = rframe.Index()
    experiment: Literal["1t", "nt", "nt_design"] = rframe.Index(default="nt")
    detector: Literal["tpc", "muon_veto", "neutron_veto"] = rframe.Index(default="tpc")
    science_run: str = rframe.Index()
    version: str = rframe.Index(default="v1")

    value: Union[int, float, bool, str, list, dict]
    resource: str


```

In this case the documents are also indexed by experiment, detector, and science run.

In [None]:

s2_secondary_sc_gain = db.fax_configs.find_one(field='s2_secondary_sc_gain', version='v0')
s2_secondary_sc_gain.value

In [None]:
# Get results as a dataframe

fax_configs = db.fax_configs.find_df(experiment='nt', version='v0')
fax_configs

# RemoteFrame: pandas/xarray interface
For convenience additional query APIs are implemented inspired for the pandas and xarray packages. Most of these methods return a padnas dataframe with the requested data selection.

In [None]:
# The straxen.frames namespace holds a collection
# of remote frames for all defined corrections

rf = db.pmt_gains.rframe

In [None]:
db.pmt_gains.find_dicts(detector='tpc', version='v1', limit=2)

### xarray api

In [None]:
# calling the .sel() method returns a pandas
# dataframe with the selection result

df = rf.sel(detector='tpc', version='ONLINE', time=dtime)
df

### pandas api

In [None]:
# pandas style multi-indexing also returns a pandas
# dataframe with the selection result

df = rf.loc['v1',dtime,'tpc', 1]
df

#### Scalar lookup

In [None]:
# pandas api

gains_rf.at[('v1',dtime,'tpc',1), 'value']

In [None]:
# simple callable

gains_rf('value', detector='tpc', version='v1', time=dtime, pmt=1)

### Inserting new documents
New documents can be saved either by creating a new instance of the Schema class and calling the `.save(datasource)` method or by calling the `insert(docs)` method on the datasource accessor.

Before inserting the new data, xedocs will run any checks defined on the schema and raise an error if any check fails.

In [None]:
data = {'pmt': 0, 'value': 1.0, 'version': 'test', 'time': dtime, 'detector': 'tpc'}
# db.pmt_gains.insert(data)

### Saving to a local database
You can create a new document and call the `.save` method with any supported writable datasource

In [None]:
import pymongo
from xedocs.schemas import PmtGain

datasource = pymongo.MongoClient('mongodb://localhost:27017')['xedocs']['pmt_gains']

doc = PmtGain(**data)
doc.save(datasource)