# Biodiversity time series analyses from European marine ecosystems

![images/biotisan_euromarec_flowchart.png](images/biotisan_euromarec_flowchart.png)

To run the workflow, do the following:

### Prepare the data
You can run the workflow with sample data or with your own data. In case you want to try out the workflow with fully prepared sample data, you can skip ahead to the next paragraph: [Run the workflow in NaaVRE](#Run-the-workflow-in-NaaVRE).  
In case you want to use your own data, do the following in NaaVRE: 

Go to the _File Browser_ and open _Cloud storage_ -> _naa-vre-public_ -> _vl-biotisan-euromarec_:

![images/cloud_storage.png](images/cloud_storage.png)

Download _Template_MBO_Example_raw.xlsx_ file:

![images/download_template.png](images/download_template.png)

Fill the excel file with you own data.

In the _File Browser_, go to _Cloud storage_ -> _naa-vre-user-data_, Upload the _Excel_ file you just filled with your own data:

![images/upload_data.png](images/upload_data.png)

The file should now be visible in your user data directory:

![images/data_in_cloud_storage.png](images/data_in_cloud_storage.png)

<br><br><br>

### Run the workflow in NaaVRE
To run the workflow in NaaVRE, open the workflow file: [../workflows/Biodiversity_time_series_analyser.naavrewf](./../workflows/Biodiversity_time_series_analyser.naavrewf).

![images/open_workflow.png](images/open_workflow.png)

Optionally, if you have a wide screen, you can drag the workflow window next to this tutorial window:

<img src="images/two_window_view.png" alt="images/two_window_view.png" width="1200">

Press "Run":

![images/run_workflow.png](images/run_workflow.png)

A parameter window will show, which allows you to set sampling parameters:

|      |      |
| ---- | ---- |
|  ![images/sampling_parameters.png](images/sampling_parameters.png)    |  `param_07_first_month` and `param_08_last_month` are used to defined the period of the year where the samples come from, <br>1 being January and 12 being December (so 1 and 12 will keep the full dataset). <br>This can be useful if the user just wants to compare the community metrics from a certain season <br>(for example, if the input is 7 and 9, the analysis will be for the summer). <br><br> `param_09_years` is the minimum number of different years in each sampling site. <br>The user may want to analyse or to exclude shorter time series (1 will keep the full dataset). <br><br> `param_NN_upper/lower_limit_max/min depth` are used to define the range of depths included. <br> Max and min refer to specific rows in the input file, related with the sampling methodology. <br>Upper and lower limit refer to the range of values desired for both the maximum and the minimum sampling depth (0 and 10000 will keep the full dataset). <br>Is it possible that the dataset do not have depth values (as in the example dataset, as it is for coastal birds). <br>In that case, empty rows will be always included.   |

Press "Use default parameter values":

![images/default_parameter_values.png](images/default_parameter_values.png)

If any parameters remain empty, you can use the values shown in the image below.  

If you have used your own data, beware that the `param_02_input_data_sheet` and `param_03_input_metadata_sheet` are case sensitive.

You can change the parameters if you want to deviate from the default setting.

Once you have set the parameters, press "Run":

![images/press_run.png](images/press_run.png)

Check the notifications at the bottom right to confirm whether the workflow is running:

![images/running_workflow.png](images/running_workflow.png)

Wait for the workflow to complete.

In case the workflow succeeds, you will see a green checkmark and can proceed to the next paragraph ["Inspect the outcome"](#Inspect-the-outcome):

![imaged/successful_run.png](images/successful_run.png)

In case the workflow fails, you will see a red cross and can explore why by following the steps in the paragraph ["Inspect workflow errors"](#Inspect-workflow-errors):

![imaged/failed_run.png](images/failed_run.png)

<br><br><br>
 
### Inspect the outcome
In case the workflow run succeeded, go to the File Browser (Folder icon in the vertical menu on the left) and click on the Folder icon next to it to go to your home folder:

![images/home_folder.png](images/home_folder.png)

Navigate to _Cloud Storage -> naa-vre-user-data_. Within 60 seconds you should now see an output file from your workflow: *"[Timestamp]__[Data_filename]__final_results_all.csv"*

![images/view_results.png](images/view_results.png)

Additionally, if you've kept the parameter *"param_make_plot"* on *"true"*, you will see three plots in the directory in *.png* format similar to this:

![images/frequency_distribution.png](images/frequency_distribution.png)

Plots automatically adjust the limits of the x axis, so it is good to inspect them. 

If you set the parameter *"param_output_samples_ecological_parameters"* to *"true"*, you will also see an output file *"[Timestamp]__[Data_filename]__samples_ecological_parameters.csv"*.

> **_What if:_**  you don't see an output file? Please get in touch with the NaaVRE support team, see naavre.net or e-mail to _VLIC at lifewatch.eu_.

<br><br><br>

### Inspect workflow errors
In case the workflow run fails, go to the File Browser (Folder icon in the vertical menu on the left) and click on the Folder icon next to it to go to your home folder:

![images/home_folder.png](images/home_folder.png)

Navigate to _Cloud Storage -> naa-vre-user-data_. In case the workflow failed on validations, you will see a file *"[timestamp]_validation_log.txt"*:

![images/validation_log.png](images/validation_log.png)

In case this file appears, open it to check which validation errors occured.

In case no validation errors occured but the workflow still failed, press "_Show in workflow engine_" to explore the errors:

![imaged/failed_run.png](images/failed_run.png)

The first time you might encounter an error "_Failed to load version/info Error_", which you can ignore. If you see a login prompt, use the leftmost login button:

<img src="images/login_to_argo.png" alt="images/login_to_argo.png" width="900">

Then argo might ask you what you are using Argo for. You can simply close this. You should now see your workflow run:

<img src="images/workflow_in_argo.png" alt="images/workflow_in_argo.png" width="900">

Click on the failed node:

![images/click_failed_node.png](images/click_failed_node.png)

A pop up should appear on the screen. Click on "LOGS" to inspect the output of the failed workflow component:

![images/click_loge.png](images/click_logs.png)

<br><br><br>

### Adapt the workflow 
You can adapt the workflow in NaaVRE to suit your own research objectives. To do this, copy the content of _Virtual Labs -> Biodiversity Time Series Analyses -> Git public_ to  _Virtual Labs -> Biodiversity Time Series Analyses -> My data_, or fork and clone the [git repository](https://github.com/QCDIS/Biodiversity_time_series_analyses_from_European_marine_ecosystems) to _My data_. 

To adapt the workflow, change the source code available in this virtual lab: [codebase/Data_cleaning_analysis....ipynb](../codebase/Data_cleaning_analysis_Example_1_03.ipynb). After changing the source code you can recontainerize the Jupyter Notebook cell and update the adapted workflow node in  [workflows/biodiversity_time_series_analyses.naavrewf](../workflows/Biodiversity_time_series_analyses.naavrewf). For documentation how on to make these changes, go to https://naavre.net/docs/tutorials/#from-notebook-to-workflow. 