# Add new similarity aspects

This notebook illustates how a set of new similarity aspects can be added to the Atlas recommender configuration. A similarity aspect correponds to an instance of `EmbeddingModel` (stored in the model catalog) that is able to produce vectors describing particular data points (e.g. instances of `NeuronMorphology`) in some vector space, where similarity between two vectors represents similarity of the data points according to the given similarity aspect. For example, neurite features similarity of morphologies corresponds to a model that for each neuron morphology produces neurite feature vectors that can be used to establish a similarity score between two morphologies.

Typically, a record of the `RecommenderConfiguration` resources has the following shape

```
{
    "embeddingModel": {
       "@id": <MODEL_ID>,
       "@type": "EmbeddingModel"
    },   
    "similarityView": {
       "@id": <MASTER_AGG_VIEW>,
       "@type": "AggregateElasticSearchView"
    },
    "statisticsView": {
       "@id": <STATS_VIEW>,
       "@type": "ElasticSearchView"
    },
    "boostingView": {
       "@id": <AGG_VIEW_WITH_BOOSTING_FACTORS>,
       "@type": "AggregateElasticSearchView"
    }
}
```

where

- `MODEL_ID`, ID of the model representing the new similarity aspect;
- `MASTER_AGG_VIEW`, ID of the Master view, an aggregated ES view that collects a set of individual ES views from projects where data points reside. These individual views index embedding vectors produced by the above-mentioned model;
- `STATS_VIEW`, ID of the ES view with statistics on the Master view;
- `AGG_VIEW_WITH_BOOSTING_FACTORS`, ID of the aggregated ES view that collects a set of individual ES views with boosting factors from projects where data points reside.

Unfortunately, in Nexus, we cannot create an empty aggregated ES view, so the Master View (as well as Stats View and Agg Boosting View) needs to be created upon creation of the first ES view indexing a batch of embedding vectors. Therefore in this notebook, we create a new record that points only to the embedding model, but not the rest of the components.


Related JIRA tickets: 
* https://bbpteam.epfl.ch/project/issues/browse/DKE-718
* https://bbpteam.epfl.ch/project/issues/browse/DKE-715

Prerequisites:

- Models have been built and pushed to the model catalog
- Model meta-data has a `vectorDimension` field specifying the dimension of embedding vectors.
- `RecommenderConfiguration` resource exists in the Atlas configuration project
- 

Steps:

1. Fetch the Recommender Configuration resource from the Atlas configuration project
2. Add new records to the `configuration` field pointing to the specified models.

In [1]:
import getpass
import os
import warnings

from kgforge.core import KnowledgeGraphForge

from bluegraph.downstream import EmbeddingPipeline
from bluegraph.core import GraphElementEmbedder

## User input

In [2]:
ENDPOINT = "https://bbp.epfl.ch/nexus/v1"
DOWNLOAD_DIR = "../../data"
TOKEN = getpass.getpass()

········


ID of the embedding models from the catalog.
__IF YOU ARE ADDING A NEW MODEL, ADD IT TO THE LIST BELOW:__

In [10]:
MODEL_IDS = [
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/84519407-ad30-4d31-877e-1d6560325393", # Axon projection
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/9fe6873b-ef6a-41b5-854a-382bc1be9fff", # Dendrite projection
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/d0c21fd5-cb9c-445c-b0a4-94847ba61f5a", # Neurite features
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/1c4fcd2e-000f-437b-b65b-844ee211105a", # Brain regions (CCfv3)
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/608fab85-0cc9-4ff9-a4bd-4249589b5889", # Coordinates
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/43965be4-72f9-4901-9a95-d9ca13da8fb4", # TMD
    "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/7a111efa-7467-42d2-9e0c-c1ca7a883216" # TMD (scaled)
]

Atlas configuration project

In [11]:
ATLAS_CONFIG_ORG = "bbp"
ATLAS_CONFIG_PROJECT = "atlas"

ID of the recommender configuration in the atlas config project.

In [12]:
ATLAS_RECOMMENDER_CONFIG = "https://bbp.epfl.ch/neurosciencegraph/data/d9938314-4e27-4c45-8afe-44484b02636d"

## Create sessions

Session for updating Atlas configs

In [13]:
forge_atlas = KnowledgeGraphForge(
    "../../configs/new-forge-config.yaml",
    endpoint=ENDPOINT,
    token=TOKEN, 
    bucket=f"{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}")

## Update Atlas recommender configuration 

In [7]:
config_resource = forge_atlas.retrieve(ATLAS_RECOMMENDER_CONFIG)

In [8]:
configuration_records = (
    [forge_atlas.as_json(el) for el in config_resource.configuration]
    if isinstance(config_resource.configuration, list)
    else [forge_atlas.as_json(config_resource.configuration)]
)
configured_modes = set([
    el["embeddingModel"]["id"] for el in configuration_records
])
for model_id in MODEL_IDS:
    if model_id in configured_modes:
        warnings.warn(
            f"A configuration record for mode '{model_id}' already exists, skipping",
            UserWarning)
    else:
        configuration_records.append({
            "embeddingModel": {
                "id": model_id,
                "type": "EmbeddingModel"
            },
            "similarityView": None,
            "statisticsView": None,
            "boostingView": None
        })
    
config_resource.configuration = forge_atlas.from_json(
    configuration_records)

  del sys.path[0]
  del sys.path[0]
  del sys.path[0]
  del sys.path[0]
  del sys.path[0]


In [9]:
forge_atlas.update(config_resource)

<action> _update_one
<succeeded> True
