#### Running the Simulation

The main simulation code can be found in the file **gom_b_sim_v2.py**.

The code is split into four parts: the FieldSet, the ParticleSet, the Kernels, and the simulation run. Each of these components follow the [OceanParcels](https://docs.oceanparcels.org/en/latest/examples/tutorial_parcels_structure.html) documentation very closely. For more information on how each of these components was generated, I would recommend reading the documentation.

To run the simulation, I have split up the computation into 5 runs with 6 months beaching periods between 07/01/2019 and 12/01/2021. This required creating the following SLURM scripts,

 - gombsim_2019_12_7.sh
 - gombsim_2020_6_1.sh
 - gombsim_2020_12_7.sh
 - gombsim_2021_6_1.sh
 - gombsim_2021_12_7.sh

for the labeled beaching period. Each run simulates particles for 180 days with a release window of particles backwards in time equal to the provided window. For example a call to the simulation inside the SLURM script might look like,

```python
python gom_b_sim_v2.py data/output_v2/raw1/ 2019 12 7 180
```
where the first argument indicates the path to write the trajectory output, the second argument indicates the year, the third argument indicates the starting month (backwards in time), the fourth argument indicate the ending month, and the last argument indicates the number of days each particle is tracked. Note that the simulation code can only run particles within a single year period for any number of months. If we want to simulate for more than a year, the simulation code must be altered.

The main idea behind setting up the simulation this way is because in OceanParcels (as of October 2023) has a fairly rudimentary parallelization module. The main issue behind its current implmentation is that it tries to optimize the amount of fieldset needed for the simulation, so it automatically chunks the simulation up based on only the particleset so it knows where in the fieldset it needs to advect particles. They have already pointed out the main issue with this method and will be fixing it. However, this will not work for us, because our particle set is not constant for each release window. For example, one day we might release 10 particles but the next day 10000 particles may be released. In the current parallelization code, this would create two chunks of 10 and 10000 particles. So one file is still running 10000 particles while the other is only running 10, which beats the purpose of doing this in the first place. To get around this, I chunked my runs ahead of time and ran everything all at once within each predetermined chunk/run. 

Once each run has completed, we rechunk the files based on the recommendation made here, [Dealing with large output files](https://docs.oceanparcels.org/en/latest/examples/documentation_LargeRunsOutput.html). This greatly improves reading preformance of the trajectory files. Now that we have the output of each of the 5 runs, we just loop over each file seperately, and do the relevant post-processing. 

#### Post-Processing

The post processing is split up into ordered notebooks, starting with 01 and ending at 05, which contain the implmention from defining the domian to generating the final graphs. In each notebook, we also extract the revelant statistics, figures, and numbers needed in the thesis. Below, we have checked that the number of particles released with the nurdle statistic table generated in **05_plot_graphs.py**.

---

##### Number of Trajectories

To gain information about the number trajectories we have extracted the exact number of particles from the simulation files in two ways. The first is by counting the number of non-zero trajectories directly from the output of the simulation, and the second is manually checking the SLURM output files and recording the number of particles released that was printed at the beginning of the simulation run output.

In [1]:
import pandas as pd
import numpy as np
from glob import glob
from os import path
import xarray as xr

inputDir_Sim = ('data/output_v2/rechunked/')
inputFiles_Sim = sorted(glob(inputDir_Sim + '*.zarr' ))

trajectory_counts = []

# Trajectories are stored in 5 seperate files 
for filename in inputFiles_Sim:    
    df = xr.open_zarr(filename).dropna(dim="trajectory", how="all")
    trajectory_counts.append(df.dims['trajectory'])
    
first_count = np.sum(trajectory_counts)
second_count = (23475+116025+103173+44915+61589)
assert(first_count == second_count)
print('Total Nurdle Count:', first_count)

Total Nurdle Count: 349177


##### Nurdle Statstics Table 

In [2]:
outputDir = 'data/posterior_computation_data/'
outputgraphsDir = outputDir + 'graphs/'

pd.read_csv(outputgraphsDir + 'nurdle_stats.csv')

Unnamed: 0,region,survey_count,survey_perc,total_nurdle_count,total_nurdle_perc,mean_nurdle_count
0,Alabama,44.0,2.093,317.0,0.091,7.205
1,Florida,66.0,3.14,2051.0,0.587,31.076
2,Louisiana,104.0,4.948,115107.0,32.965,1106.798
3,Mississippi,177.0,8.421,30725.0,8.799,173.588
4,Texas,1711.0,81.399,200977.0,57.557,117.462
5,Total,2102.0,100.0,349177.0,100.0,1436.128


They match!