# A first distributed simulation

Until now, we've only been working with single grid point simulations. This is fine for learning and playing around with different configurations of the model since the runs takes less time. A more typical use case of COSIPY is to run a distributed simulation over the entire surface of the glacier, which gives us a much more detailed view of the glacier.

In [None]:
# Have to change the cwd for the ipython session, otherwise COSIPY
# will look for things in the wrong places.
import os
import sys
# This is not really a good method, if cell is re run we end up in the
# wrong directory.
os.chdir('./../')
sys.path.append(os.getcwd())

In [None]:
from cosipy.utils import edu_utils
# cfg gives us the NAMELIST
import numpy as np
from matplotlib import pyplot as plt
import xarray as xr

In [None]:
# Have to tell matplotlib to plot inline
%matplotlib inline

Setting up a distributed simulation is fairly easy. All we need is 2D input data (3D with time). This data could either be in the form of gcm/reanalysis output or interpolated from station data. For the tutorials we provide a distributed input file `Zhadang_ERA5_2009_dst.nc` for Zhadang. We can take a look at it

In [None]:
# This is where the data is located.
input_path = './data/input/'
with xr.open_dataset(input_path+'Zhadang/Zhadang_ERA5_2009_dst.nc') as ds:
    ds = ds.isel(time=slice(0, -1)).load()
ds

The input data contains 91 gridpoints. However, only 17 of these are actually within the glacier boundaries and are used for the run. This input data also covers the whole of 2009, so lets also set up the run for a summer month.

<div class="alert alert-warning">
    <details>
        <Summary> <b>Where is the information about the number gridpoints within the glacier?</b> <i>Click me for a hint</i></Summary>
        The points within the glacier are selected based on the MASK variable. You can plot it or "show" the data by pressing the storage symbol in the output above. It also possible to count it with the method .count(). Try it below!
    </details>
</div>

In [None]:
# Empty cell for the reader


## Running a distributed simulation

Now that we've confirmed that the data is of the right dimension, we can initialize the datasets needed to run the simulation. First we have to add the new input file in the `opt_dict`

In [None]:
# The print_options returns a pandas dataframe so we can index it.
edu_utils.print_options().loc['input_netcdf']

In [None]:
# opt_dict
opt_dict = dict()
# We change the input_netcdf
opt_dict['input_netcdf'] = 'Zhadang/Zhadang_ERA5_2009_dst.nc'


And lets change the start and end dates

In [None]:
edu_utils.print_options().loc['time_start']

In [None]:
opt_dict['time_start'] = '2009-07-01T00:00'
opt_dict['time_end'] = '2009-07-31T00:00'

With this done we can initialize the datasets needed for running the model

In [None]:
IO, DATA, RESULTS = edu_utils.create_IO(opt_dict)

The output above confirms that we have 17 glacier gridpoints in our input data.

## Running the model
We are now ready to run the model. This is just as simple as in the one dimensional case. In the background the results of each gridpoint is calculated individually, we're basically stacking multiple point simulations next to each other. The `edu_utils.run_model` distributes the work so that gridpoints are executed in parallel. Note however that this still might take some time.

In [None]:
edu_utils.run_model(DATA, IO, RESULTS, opt_dict)

As before we can take a quick look at the results. First however, we have to reduce the dimensions of the data. A simple way is to select a single time step

In [None]:
RESULTS.isel(time=1).MB.plot();

Alternatively we can also reduce one of the dimensions by taking the mean of it, creating a so called Hovmöller diagram

In [None]:
RESULTS.MB.mean(dim='lon').plot();

## Next steps
[Back to overview](welcome.ipynb)
