# An Open Salmon Knowledge Graph

cumulative = collaborative

Scott A. Akenhead [](https://orcid.org/0000-0003-1218-3118) (Fisheries and Oceans Canada (retired))  
March 12, 2024

Goals: A knowledge graph for all aspects of salmon science and management, with a labels schema (faceted classification) by and for humans. Rich nodes richly linked. All content from public sources.

In [None]:
library(tidyverse); library(magrittr)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'magrittr'


The following object is masked from 'package:purrr':

    set_names


The following object is masked from 'package:tidyr':

    extract

# 1. Introduction

## 1.1 Goal: data to decisions

Overall, conduits from data through analyses to improved decisions about salmon.

1.  **Predictions.** Global warming effects – climate chaos– means the historical patterns and correlations in salmon ecology no longer apply; the required assumption of stationarity, of an ergodic system, has been invalidated.Predictions now require understanding how a salmon will react – via physiology, behaviour, survival - to the conditions it will encounter in every habitat throughout its life.

    1.  This implies models that track individual salmon’s lives across a time-and-space map of habitats, wherein *observed* environmental variables are translated into *perceived* variables that a salmon will respond to.

    2.  The result is a **life path***,* a trajectory through multiple dimensions (time, location, fish length, energetics, ontology,) including probability of mortality.

    3.  This is an integro-differential equation, where every *rate* (of swimming, of growth,) is a function of the fish *state* and the habitat *state*, when the fish is there.

2.  **SDM, SKM.** Salmon Data Mobilization Diack et al. ([2023](#ref-diack2023)) is required to deliver FAIR[1] datasets Wilkinson et al. ([2016](#ref-wilkinson2016)) to for the analyses, insights, and datasets underlying models such as above. This initiative is furthered as Salmon Knowledge Mobilization (**SKM**) to discover and effect collaboration opportunities: **Who is doing what, where, how, and why?**

    1.  The IT to effect this is a Salmon Knowledge Graph (**SKG**) maintained in [neo4j Aura](https://neo4j.com/cloud/platform/aura-graph-database). The size and complexity of *all things salmon-related* is challenging but approachable via:

        1.  a **labels schema** that is meaningful to, and created/maintained by, end-users. The SKG must reflect how the salmon community thinks about topics (nodes, entities).

        2.  **personalization** to reduce presented information to that relevant and important to individuals, to avoid overwhelming, irrelevant information. This cannot preclude discovery of previously remote but potentially valuable knowledge.

        3.  myriad **user interfaces**, reflecting myriad activities: planning and management, data processing and analysis, field and lab work, research and modelling, communications and decisions, documents and data products,. Interface components are shared for building new interfaces, and continuously improved.

3.  **Decisions.** From the preceding, decision-support products are proffered to effect better decisions via predictions about **competing scenarios**, presented as interactive graphics and dashboards. Overall, conduits from data through analyses to improved decisions about salmon.

4.  **Engagement.** Recognizing that drive-by science is obsolete, the engagement required to communicate effectively with decision-makers is critical Chapman ([2019](#ref-chapman2019)), Archibald, McIver, and Rangeley ([2021](#ref-archibald2021)). How to effect such **inclusivity** is a question in hand.

## 1.2 How?

Beyond mobilizing data and mobilizing knowledge, it is necessary that the salmon community mobilize: embrace and implement a paradigm (a change in culture, perspective, attitudes,) that  
(a) reflects urgency to respond to recent and impending extirpations of salmon populations, and  
(b) embraces collaboration to deliver that response. This is largely a communications problem with two aspects:

1.  reducing barriers to collabortion

2.  clarifying and enabling benefits from sharing.

# 2. Proposed

“Knowledge graph” means a neo4j labelled properties graph DB.

example: The abundance of returning salmon, $R$, that are children of a preceding abundance of spawners, $S$, has conventionally been treated as a simplistic domed function,

<span id="eq-ricker">$$
R_t = S_{t-4} e^{\alpha-\beta S_{t-4}}
 \qquad(1)$$</span>

where $\alpha$ is initial productivity of a salmon population before the effect, $\beta$, of salmon density reduces survival from spawners to returns. The use of $t-4$ reflect presumption of a four-year life cycle, spawners in 2001 will be returns (to spawn the next generation) in 2005. Using <a href="#eq-ricker" class="quarto-xref">Equation 1</a> for non-linear fitting is preferable to the linearized version,

<span id="eq-OLR_Ricker">$$
\text{log}(\frac{R_t}{S_{t-4}}) = \alpha - \beta S_{t-4} + \mathit{N}(0,\sigma)
 \qquad(2)$$</span>

which is not ordinary linear regression for many reasons including error term without a known distribution (log ratio Poisson variables), predictor variable is part of predicted variable, and undue effect from cases with imprecise estimates at low abundances of $S$.

``` python
la_palma |> 
  ggplot(aes(Longitude, Latitude)) +
  geom_point(aes(color = Magnitude, size = 40-`Depth(km)`)) +
  scale_color_viridis_c(direction = -1) + 
  scale_size(range = c(0.5, 2), guide = "none") +
  theme_bw()
```

<figure id="fig-spatial-plot">
<img src="attachment:index_files/figure-ipynb/notebooks-explore-earthquakes-fig-spatial-plot-output-1.png" />
<figcaption>Figure 1: Locations of earthquakes on La Palma since 2017</figcaption>
</figure>

<a href="#fig-spatial-plot" class="quarto-xref">Figure 1</a> shows the location of recent Earthquakes on La Palma.

# 3. Methods

# 4. Results

# 5. Conclusion

# 6. Acknowledgements

# 7. References

Archibald, D. W., R. McIver, and R. Rangeley. 2021. “Untimely Publications: Delayed Canadian Fisheries Science Advice Limits Transparency of Decision-Making.” *Marine Policy* 132 (October): 104690. <https://doi.org/10.1016/j.marpol.2021.104690>.

Chapman, Kelly. 2019. “First International Year of the Salmon Data Laboratory (ISDL) Workshop.” In, 14:15–20. Vancouver, BC, Canada. <https://npafc.org/wp-content/uploads/technical-reports/Tech-Report-14/Technical-Report-14_Final.pdf>.

Diack, Graeme, Tom Bird, Scott Akenhead, Jennifer M. Bayer, Deirdre Brophy, Colin Bull, Elvira de Eyto, et al. 2023. “Salmon Data Mobilization,” December. <https://osf.io/hk4gu>.

Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” *Scientific Data* 3 (1): 160018. <https://doi.org/10.1038/sdata.2016.18>.

[1] FAIR: findable, accessible, interoperable, reusable.