# Using Data Server Helper Functions

This notebook demonstrates basic usage of the data server helper functions provided in Svalbard, they are mostly lightweight wrappers around pyMongo.

For constructing more advanced queries please refer to [MongoDB's official documentation](https://www.mongodb.com/docs/manual/tutorial/query-documents/), [Offical cheat sheet](https://www.mongodb.com/developer/products/mongodb/cheat-sheet/#crud), 



## Setup

In [None]:
from svalbard.utility import data_server_helper_functions as ds_helper

"""
if using aq_measurements can use this import

from aq_measurements.setup import DATA_CONFIG_PATH

universal alternative is to use this import
"""
import os
from pathlib import Path

DATA_CONFIG_PATH = Path(os.path.expanduser(f"~/.aq_config/data_server.json"))


# method for printing documents in a cursor
import pymongo


def print_cursor(cursor, limit=10):
    for i, document in enumerate(cursor):
        print(document)
        if i + 1 >= limit:
            break

## Get number of docuements

### total number of documents

In [None]:
ds_helper.get_number_of_documents(DATA_CONFIG_PATH)

### Number of documents with name 'rabi'

Here we use filter by the `name` field in the `MetaData`.

The `"$regex": "rabi"` query command is used to filter the name by the word `"rabi"`

The `"$options": "i"` query command is used to make the regex filter case insensitive. 



In [None]:
ds_helper.get_number_of_documents(
    DATA_CONFIG_PATH, {"name": {"$regex": "rabi", "$options": "i"}}
)

In general the queries are constructed in the format

```
{
    "field.to.query" : {"$query_command": "filter_paramter", ...}, 
    "another.field" {"$another_command": "another_paramter", ...},
    ...
}
```

## Getting Documents

the `get_many_documents` function returns a `pymongo.cursor.Cursor` object that has to be iterated over to get multiple documents.

*Note*: Here we use a "projection" to limit what document fields are returned for the query, projections will be covered below when discussing retrieving single documents.

In [None]:
import datetime

date_start = datetime.datetime(
    2024, 7, 30, 0, 0, 0
).isoformat()
date_end = datetime.datetime(2024, 7, 30, 4, 10, 0).isoformat()
docs = ds_helper.get_many_documents(
    DATA_CONFIG_PATH,
    {
        "name": {"$regex": "t1"},  # use regex to match partial string
        "date": {
            "$gte": date_start,
            "$lt": date_end,
        },  # use $gte and $lt to match a range (here date range)
        "station": "Atlantis - 1",  # match exact string, can also use regex here 
    },
    {
        "_id": 1,
        "name": 1,
        "data_path": 1,
        "date": 1,
    },  # projection to only return these fields
)
type(docs)

In [None]:

date_start = datetime.datetime(2024, 7, 30, 0, 0, 0).isoformat()
date_end = datetime.datetime(2024, 7, 30, 4, 10, 0).isoformat()


atlantis_measurements = ds_helper.get_number_of_documents(
    DATA_CONFIG_PATH, {
        "name": {"$regex": "t1"},  # use regex to match partial string
        "date": {
            "$gte": date_start,
            "$lt": date_end,
        },  # use $gte and $lt to match a range (here date range)
        "station": "Atlantis - 1",  # match exact string
    }
)
bermuda_measurements = ds_helper.get_number_of_documents(
    DATA_CONFIG_PATH, {
        "name": {"$regex": "t1"},  # use regex to match partial string
        "date": {
            "$gte": date_start,
            "$lt": date_end,
        },  # use $gte and $lt to match a range (here date range)
        "station": "Bermuda - 2",  # match exact string
    }
)
print("Atlantis - 1", atlantis_measurements)
print("Bermuda - 2", bermuda_measurements)

by iterating over the cursor we can access the documents

In [None]:
for i, doc in enumerate(docs):
    print(i, doc)
    if i + 1 >= 3:
        break

The cursor is exhausted by iterating over it, i.e. running the `print_cursor` function on the same cursor yields different documents

In [None]:
print_cursor(docs, limit = 3)

Once all the documents have been read out of the cursor iterating over it returns nothing

In [None]:
for doc in docs:
    pass

print_cursor(docs)

# Getting a single document

The `get_document` function is used to get a single document "ObjectID" string.

A "projection" is used to filter what fields are returned for the document.

In [None]:
ds_helper.get_document(
    DATA_CONFIG_PATH,
    "YOUR_OBJECT_ID",
    {"_id": 1, "name": 1, "data_path": 1, "date": 1, "station": 1},
)

The `get_name_and_date` function has a built in projection to get just name and date

The `get_name_and_data_path` function has a built in projection to get just name and data_path

In [None]:
print(ds_helper.get_name_and_date(DATA_CONFIG_PATH, "YOUR_OBJECT_ID"))
print(ds_helper.get_name_and_data_path(DATA_CONFIG_PATH, "YOUR_OBJECT_ID"))

"Projections" are either entirely inclusive or entirely exclusive, trying to mix inclusions and exclusions in a projection raises an error.

In [None]:
ds_helper.get_document(
    DATA_CONFIG_PATH, "YOUR_OBJECT_ID", {"instruments": 0, "compiler_data": 0}
)

The `get_and_exclude_large_fields` function exclude large fields such as `instruments` and `compiler_data` from the returned document.

In [None]:
ds_helper.get_and_exclude_large_fields(DATA_CONFIG_PATH, "YOUR_OBJECT_ID")

If you need access many fields of a document it is probably best to convert it to metadata

In [None]:
from svalbard.data_model.data_file import MetaData

document = ds_helper.get_document(
    DATA_CONFIG_PATH, "YOUR_OBJECT_ID", {"_id": 0}
)
metadata = MetaData(**document)

metadata.name

# Get Data

The `get_data_group` function can be used get a zarr array of the data belonging to the document.

In [None]:
zarr_array = ds_helper.get_data_group(DATA_CONFIG_PATH, "YOUR_OBJECT_ID")

In [None]:
for key in zarr_array:
    print(key)

In [None]:
import matplotlib.pyplot as plt

plt.plot(zarr_array["Time"][:], zarr_array["average_population"][:])
plt.xlabel("Time (s)")
plt.ylabel("Population")

# The entire document

Finally, this is the raw output of the entire document without using projections

In [None]:
ds_helper.get_document(DATA_CONFIG_PATH, "YOUR_OBJECT_ID")