> [!Warning] 
> **This project is still in an early phase of development.**
>
> The [python API](../api.html) is not yet stable, and some aspects of the schema for the [blueprint](../terminology.html#term-blueprint) will likely evolve. 
> Therefore whilst you are welcome to try out using the package, we cannot yet guarantee backwards compatibility. 
We expect to reach a more stable version in 2025.
>
> To see which systems C-Star has been tested on so far, see [Supported Systems](../machines.html).

# Building a `Simulation` and exporting it as a blueprint

## Contents
1. [Introduction](#1.-Introduction)
2. [Building the components of the ROMSSimulation](#2.-Building-the-Simulation)
    - [Constructing the AdditionalCode instances](#2i.-Constructing-the-AdditionalCode-instances)
    - [Constructing the InputDataset instances](#2ii.-Constructing-the-InputDataset-instances)
    - [Constructing the Discretization instance](#2iii.-Constructing-the-Discretization-instance)
    - [Creating the ROMSSimulation instance](#2iv.-Creating-the-ROMSSimulation-instance)
3. [Exporting the ROMSSimulation to a blueprint](#3.-Exporting-the-ROMSSimulation-to-a-blueprint)

## 1. Introduction

[(return to top)](#Contents)

The "Simulation" is the primary object of C-Star, and contains all the information needed to run a particular simulation. Once prepared, Simulations can be stored in ["blueprints"](../terminology.html#term-blueprint),  - `.yaml` files telling C-Star what goes into each Simulation and where to find it. These blueprints can then be shared with other parties interested in reproducing the simulation described by the Simulation.

In this guide, we will create a ROMS-MARBL C-Star Simulation ([ROMS](http://research.atmos.ucla.edu/cesr/ROMS_page.html) for ocean physics modeling, [MARBL](https://eesm.science.energy.gov/projects/marine-biogeochemistry-library-marbl) for biogeochemistry) and export it to a blueprint. On the [the next page](../tutorials/2_importing_and_running_a_simulation_from_a_blueprint.html) we will look at how to _run_ the simulation, starting from a blueprint.

### The structure of the Simulation:
<!-- [Here](../terminology.html#structure-of-c-star-simulation) you can get a general overview of a C-Star simulation.  -->
For our `roms_marbl_example` [case](https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example), the simulation structure breaks down like this:  
```
ROMSSimulation
├── codebase (ROMSExternalCodeBase)
├── marbl_codebase (MARBLExternalCodeBase)
├── runtime_code (AdditionalCode)
├── compile_time_code (AdditionalCode)
├── model_grid (ROMSInputDataset)
├── initial_conditions (ROMSInputDataset)
├── tidal_forcing (ROMSInputDataset)
├── surface_forcing (list of ROMSInputDatasets)
├── boundary_forcing (list of ROMSInputDatasets)
└── discretization (ROMSDiscretization)

```
These are all the elements needed to create a unique, reproducible ROMS-MARBL simulation. You will notice that the `ExternalCodeBase`, `InputDataset`, and `Discretization` objects here are specific to the object they describe (e.g. `ROMSBaseModel`). This is because there may be some unique attributes or operations associated with the `ROMSExternalCodeBase` object describing ROMS that may be different to that describing MARBL, which has its own subclass `MARBLExternalCodeBase`.

To build this Simulation from the bottom up, we'll need to assemble any code and input datasets.

## 2. Building the components of the `ROMSSimulation`

[(return to top)](#Contents)

### 2i. Constructing the AdditionalCode instances

[(return to top)](#Contents)

`AdditionalCode` objects hold collections of related code, either in a local directory or remote repository. To construct an `AdditionalCode` object, we provide a `location` argument pointing to one of the two.

As we are using additional code hosted in a remote repository for this example, we also need:

- a `subdir` (subdirectory relative to the repository top level in which to find the code) 
- a `checkout_target` argument (branch, tag, or commit hash)

We also need to provide a list of filenames corresponding to our `AdditionalCode`.

For our `ROMSSimulation`, we will build two `AdditionalCode` instances: one used by ROMS at runtime, one used at compile-time. First, the runtime `AdditionalCode`:

In [1]:
from cstar.base import AdditionalCode
roms_runtime_code = AdditionalCode(
    location = "https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git",
    subdir = "roms_runtime_code",
    checkout_target = "main",
    files = [
        "roms.in_TEMPLATE",
        "marbl_in",
        "marbl_tracer_output_list",
        "marbl_diagnostic_output_list"
    ]
)
print(roms_runtime_code)

AdditionalCode
--------------
Location: https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git
Subdirectory: roms_runtime_code
Checkout target: main
Working path: None
Exists locally: False (get with AdditionalCode.get())
Files:
    roms.in_TEMPLATE      (roms.in will be used by C-Star based on this template)
    marbl_in
    marbl_tracer_output_list
    marbl_diagnostic_output_list


<div class="alert alert-info">

Note

For `roms_runtime_code`, in the first entry under `files`, the namelist file we begin with is a template. C-Star recognises the `_TEMPLATE` suffix and works with a local copy (in this case `roms.in`) that it will modify and use to run ROMS with user choices such as run length)
</div>

Next, the compile-time `AdditionalCode` (such as ROMS' `.opt` files, which are used to set parameters):

In [2]:
roms_compile_time_code = AdditionalCode(
    location = "https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git",
    subdir = "roms_compile_time_code",
    checkout_target = "main",
    files = [
        "bgc.opt",
         "bulk_frc.opt",
         "cppdefs.opt",
         "diagnostics.opt",
         "ocean_vars.opt",
         "param.opt",
         "tracers.opt",
         "river_frc.opt",
         "Makefile",
         "Make.depend",
    ]
)

print(roms_compile_time_code)

AdditionalCode
--------------
Location: https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git
Subdirectory: roms_compile_time_code
Checkout target: main
Working path: None
Exists locally: False (get with AdditionalCode.get())
Files:
    bgc.opt
    bulk_frc.opt
    cppdefs.opt
    diagnostics.opt
    ocean_vars.opt
    param.opt
    tracers.opt
    river_frc.opt
    Makefile
    Make.depend


---
### 2ii. Constructing the InputDataset instances

[(return to top)](#Contents)

In addition to the additional code, we need different types of input dataset, each with a specialized subclass of the [InputDataset class](../generated/cstar.base.InputDataset.html).

- a grid file supplying information about the domain to ROMS ([ROMSModelGrid](../generated/cstar.roms.ROMSModelGrid.html))
- An initial condition file from which to start the run ([ROMSInitialConditions](../generated/cstar.roms.ROMSInitialConditions.html))
- boundary forcing files providing information at the edge of the domain ([ROMSBoundaryConditions](../generated/cstar.roms.ROMSInitialConditions.html))
- surface forcing files providing information at the upper boundary ([ROMSSurfaceForcing](../generated/cstar.roms.ROMSSurfaceForcing.html))
- tidal forcing files providing information on tidal constituents ([ROMSTidalForcing](../generated/cstar.roms.ROMSTidalForcing.html))


<div class="alert alert-info">

Note

In this tutorial we will be working with pre-prepared input data in netCDF format, for simplicity. To learn how to prepare this data yourself, see [the roms-tools python package documentation](https://roms-tools.readthedocs.io/en/latest/). For more information on `InputDataset`s in general, including supported formats, see [this page](../howto_guides/2_working_with_inputdatasets.html).

</div>

<div class="alert alert-info">

Note

In the following, the `location` attribute can either be a **local path** or a **URL**. As it is a URL pointing to a binary file, the `file_hash` (a 256 bit checksum) must also be provided to verify the download.

</div>

In [3]:
from cstar.roms import ROMSModelGrid, ROMSInitialConditions, ROMSTidalForcing, ROMSBoundaryForcing, ROMSSurfaceForcing, ROMSRiverForcing
netcdf_dataset_location = "https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/"

# Boundary
roms_phys_boundary_forcing = ROMSBoundaryForcing(
    location=netcdf_dataset_location + "roms_bry.nc",
    file_hash="3b51a46b1bd50d8a0e7c7f96c7b153c9d2c7fb26a8e9d97ce957d43210944909",
)
roms_bgc_boundary_forcing = ROMSBoundaryForcing(
    location = netcdf_dataset_location+"roms_bry_bgc.nc",
    file_hash = "366af33acf309c7644fab8f7bd5385c99123634c82d5c15f09d2033ec7103a6e",
)

# Surface
roms_phys_surface_forcing = ROMSSurfaceForcing(
    location=netcdf_dataset_location + "roms_frc.nc",
    file_hash="b9d1884c5175c8e690ad372d0585583ccaa04baa35bb1e8f3c0d2f2b37666829",
)
roms_bgc_surface_forcing = ROMSSurfaceForcing(
    location=netcdf_dataset_location + "roms_frc_bgc.nc",
    file_hash="f78fce51e2178adcd128ea8d92bf091a450e4245dcb023027faae8e2d3963e72",
)

#Grid
roms_model_grid = ROMSModelGrid(
    location=netcdf_dataset_location + "roms_grd.nc",
    file_hash="41397c80fd00536dc414f6b8039b08fbe5d9f234aec66c3fc9b5a8e13353502a",
)

# Initial conditions
roms_initial_conditions = ROMSInitialConditions(
    location=netcdf_dataset_location + "roms_ini.nc",
    file_hash="c8eda3bab223d8f247055da5afe6a69234833733c75cba7dc6b5a85b06263d52",
)

# Tides
roms_tidal_forcing = ROMSTidalForcing(
    location=netcdf_dataset_location + "roms_tides.nc",
    file_hash="9466a6cacf33f3b3cbfaa87044c70cc8ef12e963f42ce3e72e30b564541afef1",
)

# Rivers
roms_river_forcing = ROMSRiverForcing(
    location=netcdf_dataset_location + "roms_riv_frc.nc",
    file_hash="43f99a44ef85d648c1e940172400f031bf6c41cd283883938543b7ca8b39a800",
)



We can query each input dataset to get pertinent information about its state, e.g.:

In [4]:
print(roms_phys_boundary_forcing)

-------------------
ROMSBoundaryForcing
-------------------
Source location: https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/roms_bry.nc
Source file hash: 3b51a46b1bd50d8a0e7c7f96c7b153c9d2c7fb26a8e9d97ce957d43210944909
Working path: None ( does not yet exist. Call InputDataset.get() )


### 2iii. Constructing the Discretization instance

[(return to top)](#Contents)

Lastly, we need to tell C-Star how we will be discretizing our components. MARBL piggybacks off the discretization of its host model, so we only need to create a `ROMSDiscretization` object. This contains:

- the time step (`time_step` , in seconds)
- the number of processors following x and y for running in parallel (`n_procs_x`, `n_procs_y`)

In [5]:
from cstar.roms import ROMSDiscretization

roms_discretization = ROMSDiscretization(time_step = 60,
                                         n_procs_x = 3,
                                         n_procs_y = 3)
print(roms_discretization)

ROMSDiscretization
------------------
time_step: 60s
n_procs_x: 3 (Number of x-direction processors)
n_procs_y: 3 (Number of y-direction processors)


### 2iv. Creating the ROMSSimulation instance

[(return to top)](#Contents)

We now have everything we need to create the `ROMSSimulation` object describing our experiment. To put it together, we should provide the objects we constructed above, and some additional information:
- A `name` for the simulation
- A `directory` in which to curate input files and code, and ultimately run the simulation
- A `valid_start_date` and `valid_end_date`, defining the date range in which the simulation is valid.

We do not intend to run this simulation yet, but typically we would also provide a `start_date` and `end_date`, within the valid date range, associated with our desired simulation period. As this information is not exported to our blueprint, we do not include it.

<div class="alert alert-info">

**Note**
    

The "valid" date range specified by `valid_start_date` and `valid_end_date` corresponds to the range of dates in which this `ROMSSimulation` **can** be run, rather than the date range for which it **will** be run. This can be due to scientific validation of a certain period, or just availability of input data (as in this example, where we only have forcing data for January 2012).

A Case should typically also be initialized with `start_date` and `end_date`, which are unique to the `Case` instance, and specify the dates for which the `Case` **will** be run.
    
As we are building this `Case` to export, not run, we ignore the `start_date` and `end_date` parameters for now, as they are not exported. C-Star will automatically set them to the maximum valid range.
    
</div>

In [6]:
from cstar.roms import ROMSSimulation
roms_simulation = ROMSSimulation(
    # Instantiation parameters:
    name='roms_marbl_example_cstar_simulation',
    directory = "../../examples/roms_marbl_example_case",
    valid_start_date = "20120101 12:00:00",
    valid_end_date = "20120131 12:00:00",
    # Constructs from above:
    runtime_code = roms_runtime_code,
    compile_time_code = roms_compile_time_code,
    discretization = roms_discretization,
    model_grid = roms_model_grid,
    initial_conditions = roms_initial_conditions,
    tidal_forcing = roms_tidal_forcing,
    river_forcing = roms_river_forcing,
    boundary_forcing = [roms_phys_boundary_forcing,roms_bgc_boundary_forcing],
    surface_forcing = [roms_phys_surface_forcing, roms_bgc_surface_forcing]
)
print(roms_simulation)

          • Source location: https://github.com/CESR-lab/ucla-roms.git
          • Checkout target: main

          • Source location: https://github.com/marbl-ecosys/MARBL.git
          • Checkout target: marbl0.45.0



ROMSSimulation
--------------
Name: roms_marbl_example_cstar_simulation
Directory: /Users/dafyddstephenson/Code/my_c_star/examples/roms_marbl_example_case
Start date: 2012-01-01 12:00:00
End date: 2012-01-31 12:00:00
Valid start date: 2012-01-01 12:00:00
Valid end date: 2012-01-31 12:00:00

Discretization: ROMSDiscretization(time_step = 60, n_procs_x = 3, n_procs_y = 3)

Code:
Codebase: ROMSExternalCodeBase instance (query using ROMSSimulation.codebase)
Runtime code: AdditionalCode instance with 4 files (query using ROMSSimulation.runtime_code)
Compile-time code: AdditionalCode instance with 10 files (query using ROMSSimulation.compile_time_code)
MARBL Codebase: MARBLExternalCodeBase instance (query using ROMSSimulation.marbl_codebase)

Input Datasets:
Model grid: <ROMSModelGrid instance>
Initial conditions: <ROMSInitialConditions instance>
Tidal forcing: <ROMSTidalForcing instance>
River forcing: <ROMSRiverForcing instance>
Surface forcing: <list of 2 ROMSSurfaceForcing instances>
Bou

<div class="alert alert-info">

**Note**
    
In addition to the warnings about the missing `start_date` and `end_date` covered in the above note, we see that we have not provided "External code bases" for ROMS and MARBL, and that defaults will be used. Should an advanced user wish to use a specific release of ROMS or MARBL, or a different source repository such as a fork, they can also provide `codebase` and `marbl_codebase` arguments to `ROMSSimulation` at initialization, specifying these preferences.

See the [page for working with ExternalCodeBases](../howto_guides/3_working_with_externalcodebases.html) for further information
    
</div>

### Visualizing the Simulation:
We can see how the simulation directory will look once the case is set up using `Simulation.tree()`:

In [7]:
print(roms_simulation.tree())

/Users/dafyddstephenson/Code/my_c_star/examples/roms_marbl_example_case
└── ROMS
    ├── input_datasets
    │   ├── roms_grd.nc
    │   ├── roms_ini.nc
    │   ├── roms_tides.nc
    │   ├── roms_riv_frc.nc
    │   ├── roms_bry.nc
    │   ├── roms_bry_bgc.nc
    │   ├── roms_frc.nc
    │   └── roms_frc_bgc.nc
    ├── runtime_code
    │   ├── roms.in_TEMPLATE
    │   ├── marbl_in
    │   ├── marbl_tracer_output_list
    │   └── marbl_diagnostic_output_list
    └── compile_time_code
        ├── bgc.opt
        ├── bulk_frc.opt
        ├── cppdefs.opt
        ├── diagnostics.opt
        ├── ocean_vars.opt
        ├── param.opt
        ├── tracers.opt
        ├── river_frc.opt
        ├── Makefile
        └── Make.depend



## 3. Exporting the ROMSSimulation to a blueprint

[(return to top)](#Contents)

We can save all the information associated with this case to a YAML file using `ROMSSimulation.to_blueprint(filename)`.
On the [next page](../tutorials/2_importing_and_running_a_simulation_from_a_blueprint.html) we will import and run a `ROMSSimulation` using a blueprint.

In [8]:
roms_simulation.to_blueprint("roms_marbl_example_simulation.yaml")

Let's take a look at the `blueprint` file. We will see it contains all the information we provided above:

In [9]:
from pathlib import Path
print(Path("roms_marbl_example_simulation.yaml").read_text())

name: roms_marbl_example_cstar_simulation
valid_start_date: 2012-01-01 12:00:00
valid_end_date: 2012-01-31 12:00:00
codebase:
  source_repo: https://github.com/CESR-lab/ucla-roms.git
  checkout_target: main
discretization:
  time_step: 60
  n_procs_x: 3
  n_procs_y: 3
runtime_code:
  location: https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git
  subdir: roms_runtime_code
  checkout_target: main
  files:
  - roms.in_TEMPLATE
  - marbl_in
  - marbl_tracer_output_list
  - marbl_diagnostic_output_list
compile_time_code:
  location: https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git
  subdir: roms_compile_time_code
  checkout_target: main
  files:
  - bgc.opt
  - bulk_frc.opt
  - cppdefs.opt
  - diagnostics.opt
  - ocean_vars.opt
  - param.opt
  - tracers.opt
  - river_frc.opt
  - Makefile
  - Make.depend
marbl_codebase:
  source_repo: https://github.com/marbl-ecosys/MARBL.git
  checkout_target: marbl0.45.0
model_grid:
  location: https://github.com