<img src="../_static/DREGS_logo_v2.png" width="300"/>

# Registering production datasets

Production datasets are treated slightly differently from the rest; they are registered into an independent schema, the data are stored in a separate production workspace, and the datasets, onced registered, cannot be overwritten. 

### What we cover in this tutorial

In this tutorial we will learn how to:

- Register a production dataset
- Update a production dataset with a new version

### Before we begin

If you haven't done so already, check out the "getting setup" page from the docs if you want to run this tutorial interactively.

A quick way to check everything is set up correctly is to run the first cell below, which should load the `dataregistry` package, and print the package version.





In [None]:
import dataregistry
print("Working with dataregistry version:", dataregistry.__version__)

## The production schema

Production datasets can only be registered into the 'production' schema. The layout of the production schema (i.e., the member tables and rows of those tables) is no different than the regular working schema.

As we can only register datasets to the schema we initially declare, we need to make sure to set our `schema_version` to `production` when initiating the `DataRegistry` class, i.e.,

In [None]:
from dataregistry import DataRegistry

# Establish connection to database (using defaults)
datareg = DataRegistry(schema_version="production", owner_type="production")

Note that the only value for `owner_type` allowed in the production schema is `"production"`, so we set it globally on initialization to save time. 

## Registering a new production dataset

Now that we have made our connection to the database we can register some datasets using the `Registrar` extension of the `DataRegistry` class.

In [None]:
# Create a empty text file
with open("dummy_production_dataset.txt", "w") as f:
    f.write("some data")

# Add new entry.
dataset_id = datareg.Registrar.register_dataset(
    "nersc_tutorial/my_desc_production_dataset",
    "1.0.0",
    description="An output from some production DESC code",
    owner="DESC Working Group",
    old_location="dummy_production_dataset.txt"
)

## Updating a previously registered production dataset with a newer version

As was previously pointed out, production datasets are always non-overwritable (setting `is_overwritable=True` will not work). Therefore to update a production dataset with a new version, it must be input as a new entry. However, like with the previous tutorial, we can link the new version with the previous version through the dataset `name`.

As a reminder,

In [None]:
# Add new entry for an updated dataset with an updated version.
dataset_id = datareg.Registrar.register_dataset(
    "nersc_tutorial/my_updated_desc_production_dataset",
    "minor", # Automatically bumps to "1.1.0"
    description="An output from some production DESC code (updated)",
    old_location="dummy_production_dataset.txt",
    name="my_desc_production_dataset" # Using this name links it to the previous dataset.
)

Remember that the combination of `name`, `version` and `version_suffix` for any dataset must be unique.

## Querying production data, and linking to them as dependencies

To do