## TODO

- Make sure boosting factors are computed with __normalized__ scores (so we need to pass the new statistics)


## Forge issues to address

- `forge.search({"type": "Embedding"}, limit=200)` throws 'Server disconnected'. Current (very slow!) workaround:

```
query = f"""
    SELECT ?id
    WHERE {{
        ?id a {DATA_TYPE_FILTER} ;
            <https://bluebrain.github.io/nexus/vocabulary/deprecated> false .
    }}
""" 
resources = forge.sparql(query, limit=HARD_RESOURCE_LIMIT)
resources = [forge.retrieve(r.id) for r in resources] 

```


- `forge.elastic` print the query it executes even without debug (with debug, prints twice)
- `forge.elastic` expects the `limit` parameter (cannot ask for all the documents), current workaround: set `HARD_RESOURCE_LIMIT=10000`, some large number so that all the resources can be fetched.
- `forge.update` after retrieve adds a full context payload

## Context issues

Add new types:
- `EmbeddingModel`
- `Embedding`
- `SimilarityBoostingFactor`
- `ElasticSearchViewStatistics`
- `RecommenderConfiguration`
- any other properties to add:
    - from `EmbeddingModel`: `similarity`, `vectiorDimension`
    - from `Embedding`: `embedding`
    - from `SimilarityBoostingFactor`: `scriptScore`, `vectorParameter`
    - from `ElasticSearchViewStatistics`: `boosted`, `scriptScore`, `vectorParameter` 
    - from `RecommenderConfiguration`: `embeddingModel`, `boostingViewmodel`, `similarityView`, `statisticsView`
    

# Add/update a set of embedding vectors


When adding/updating a set of embedding vectors, we need to perform the following sequence of steps

I. Create a new ES view for the new/updated vectors as follows

1. Get dimensions of the embedding vectors
2. Create a Nexus ES View resource with:
- `resourceTypes` being `Embedding`
- mapping that has `"embedding": "dense_vector"` with the right dimensions
- `resourceTag` field corresponds to the model UUID and its revision (e.g. `e2b953b9-6724-4278-a1e5-3472bd63e374?rev=1`)

 
II. Update an existing similarity aspect in the recommender config

__Pre-requisites:__ the `RecommenderConfiguration` resource exists and the aspect is added to it added to it (see `Add new similarity aspects.ipynb`)

1. Create a new aggregated view including the new similarity view. This view will be the new master view. Make sure all the vectors have been indexed. 
2. Compute raw statistics (min/max/mean/std) of similarity values from the master view and push them as a `ElasticSearchViewStatistics` resource (created if doesn't exist, updated if exists), taged with the new revision of the master view.
3. Compute boosting factors for all the data points (vectors) indexed by the master view and push them as separate resources into respective projects (create if don't exist, update if exist). Tag them by the new revision of the master view.
4. In the bucket with embedding data create a new ES view for boosting factors (tagged by the new master view id). Make sure that all the boosting factors have finished indexing.
5. Compute statistics (min/max/mean/std) of similarity values from thr master view with boosting, push and tag them with the new revision of the master view.
6. Create a new ES view serving statistics (both raw and boosted) tagged with the new revision of the master view. Make sure that all the stats have finished indexing.
7. Create a new aggregated view for boosting factors targeting all the new boosting ES views.
9. Update the Recommender Configuration to point to the new revision of the master view, the new ES view with the stats and the new aggregated view with the boosting factors.
10. In each of the individual projects deprecate the old boosting ES view and the old ES view serving embedding vectors (if such exists).
11. If necessary, deprecate old stats view and the old aggregated view for boosting.

Related JIRA tickets: 
* https://bbpteam.epfl.ch/project/issues/browse/DKE-718
* https://bbpteam.epfl.ch/project/issues/browse/DKE-715

# Setup

## Imports

In [1]:
import copy
import getpass
import math
import requests
import urllib
import time

from collections import OrderedDict

import numpy as np
import nexussdk as nxs

from collections import namedtuple
from urllib.parse import quote_plus
from kgforge.core import KnowledgeGraphForge

In [2]:
from kgforge.version import __version__
print(__version__)

0.6.3


## Helpers

ES view mappings

In [3]:
SIMILARITY_VIEW_MAPPING = {
    "properties": {
      "@id": {
        "type": "keyword"
      },
      "@type": {
        "type": "keyword"
      },
      "derivation": {
        "properties": {
          "entity": {
            "properties": {
              "@id": {
                "type": "keyword"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      },
      "embedding": {
        "type": "dense_vector"
      },
      "generation": {
        "properties": {
          "activity": {
            "properties": {
              "used": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      }
    }
}


BOOSTING_VIEW_MAPPING = {
    "properties": {
      "@id": {
        "type": "keyword"
      },
      "@type": {
        "type": "keyword"
      },
      "value": {
        "type": "float"
      },
      "scriptScore": {
        "type": "keyword"
      },
      "vectorParameter": {
        "type": "keyword"
      },
      "derivation": {
        "properties": {
          "entity": {
            "properties": {
              "@id": {
                "type": "keyword"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      },
      "generation": {
        "properties": {
          "activity": {
            "properties": {
              "used": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      }
    }
}

STATS_VIEW_MAPPING = {
    "properties": {
        "@id": {
            "type": "keyword"
        },
        "@type": {
            "type": "keyword"
        }, 
        "boosted": {
            "type": "boolean"
        },
        "scriptScore": {
            "type": "keyword"
        },
        "vectorParameter": {
            "type": "keyword"
        },
        "derivation": {
            "properties": {
              "entity": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
        },
        "series": {
            "properties": {
                "statistic": {
                    "type": "keyword"
                },
                "value": {
                    "type": "float"
                }
            },
            "type": "nested"
        }
    } 
}

In [4]:
BucketConfiguration = namedtuple(
    'BucketConfiguration', 'endpoint org proj')


def create_forge_session(bucket_config):
    return KnowledgeGraphForge(
        "../../configs/new-forge-config.yaml",
        token=TOKEN, 
        endpoint=bucket_config.endpoint,        
        bucket=f"{bucket_config.org}/{bucket_config.proj}")


def get_es_view_mappings(dimension):
    mapping = copy.deepcopy(SIMILARITY_VIEW_MAPPING)
    mapping["properties"]["embedding"]["dims"] = dimension
    return mapping


def get_current_config(config_resource, model_id):
    """Get the configuration record corresponding to the input model."""
    current_config = None
    if isinstance(config_resource.configuration, list):
        for el in config_resource.configuration:
            if el.embeddingModel.id == model_id:
                current_config = el
    else:
        if config_resource.configuration.embeddingModel.id == model_id:
            current_config = config_resource.configuration

    return current_config


def update_current_config(forge, config_resource, current_config):
    """Update the configuration record."""
    if isinstance(config_resource.configuration, list):
        new_configs = []
        for el in config_resource.configuration:
            if el is not current_config:
                new_configs.append(el)
        new_configs.append(current_config)
        config_resource.configuration = new_configs
    else:
        config_resource.configuration = [
            current_config
        ]
    try:
        del config_resource.context
    except:
        pass
    forge.update(config_resource)
    
def set_elastic_view(forge, view):
    views_endpoint = "/".join((
        ENDPOINT,
        "views",
        quote_plus(forge._store.bucket.split("/")[0]),
        quote_plus(forge._store.bucket.split("/")[1])))
    forge._store.service.elastic_endpoint["endpoint"] = "/".join(
        (views_endpoint, quote_plus(view), "_search"))

    
def get_all_vectors(forge, resource_limit):
    all_embeddings = forge.elastic(f"""{{
        "from" : 0,
        "size" : {resource_limit},
        "query": {{
            "term": {{"_deprecated": false}}
        }}
    }}
    """)
    vectors = {
        result._source["@id"]: result._source["embedding"]
        for result in all_embeddings
    }
    return vectors


def get_all_scores(forge, vectors, formula, param_name, resource_limit=200, boosting=None):
    score_values = set()
    for k, vector in vectors.items():
        query = f"""{{
          "size": {len(vectors)},
          "query": {{
            "script_score": {{
                "query": {{
                    "bool" : {{
                      "must_not" : {{
                        "term" : {{ "@id": "{k}" }}
                      }},
                      "must": {{ "exists": {{ "field": "embedding" }} }}
                    }}
                }},
                "script": {{
                    "source": "{formula}",
                    "params": {{
                      "{param_name}": {vector}
                    }}
                }}
            }}
          }}
        }}"""
        res = forge.elastic(query)
        for el in res:
            boost_factor = 1
            if boosting:
                boost_factor = 1 + boosting[el._source["@id"]]
            score_values.add(el._score * boost_factor)
    score_values = np.array(list(score_values))
    return score_values


def get_view_stats(forge, vectors, formula, param_name, resource_limit=200, boosting=None):
    Statistics = namedtuple('Statistics', 'min max mean std')
    score_values = get_all_scores(
        forge, vectors, formula, param_name, resource_limit, boosting)

    return score_values, Statistics(
        score_values.min(),
        score_values.max(),
        score_values.mean(),
        score_values.std())


def register_stats(forge, view_id, sample_size, stats, formula,
                   param_name, tag, boosted=False):
    
    stat_values = [
        {
          "statistic": "min",
          "unitCode": "dimensionless",
          "value": stats.min
        },
        {
          "statistic": "max",
          "unitCode": "dimensionless",
          "value": stats.max
        },
        {
          "statistic": "mean",
          "unitCode": "dimensionless",
          "value": stats.mean
        },
        {
          "statistic": "standard deviation",
          "unitCode": "dimensionless",
          "value": stats.std
        },
        {
          "statistic": "N",
          "unitCode": "dimensionless",
          "value": sample_size
        }
    ]

    stats = forge.search({
        "type": "ElasticSearchViewStatistics",
        "boosted": boosted,
        "derivation": {
            "entity": {
                "id": view_id
            }
        }
    })
    
    if len(stats) > 0:
        stats_resource = stats[0]
        stats_resource.series = forge.from_json(stat_values)
        forge.update(stats_resource)
    else:    
        json_data = {
            "type": "ElasticSearchViewStatistics",
            "boosted": boosted,
            "scriptScore": formula,
            "vectorParameter": param_name,
            "series": stat_values,
            "derivation": {
                "type": "Derivation",
                "entity": {
                    "id": view_id
                }
            }
        }
        stats_resource = forge.from_json(json_data)
        forge.register(stats_resource)
    forge.tag(stats_resource, tag)
    return stats_resource

def get_score_deviation(forge, point_id, vector, score_min, score_max, k, formula, param_name):
    query = f"""{{
      "size": {k},
      "query": {{
        "script_score": {{
          "query": {{
                "exists": {{
                    "field": "embedding"
                }}
          }},
          "script": {{
            "source": "{formula}",
            "params": {{
              "{param_name}": {vector}
            }}
          }}
        }}
      }}
    }}"""

    result = forge.elastic(query)
    scores = set()
    for el in result:
        if point_id != el._source["@id"]:
            # Min/max normalization of the score
            score = (el._score - score_min) / (score_max - score_min)
            scores.add(score)
    scores = np.array(list(scores))
    return math.sqrt(((1 - scores)**2).mean())


def register_boosting_data(forge, view_id, aggregated_view_id,
                           deviation, formula, param_name, tag):
    generation_resource = forge.from_json({
        "type": "Generation",
        "activity": {
            "type": "Activity",
            "used": {
                "id": aggregated_view_id,
                "type": "AggregateElasticSearchView"
            }
        }
    })
    
    for k, v in deviation.items():
        existing_data = forge.search({
            "type": "SimilarityBoostingFactor",
            "derivation": {
                "entity": {
                    "id": k
                }
            }
        })
        if len(existing_data) > 0:
            boosting_resource = existing_data[0]
            boosting_resource.value = 1 + v
            boosting_resource.generation = generation_resource
            forge.update(boosting_resource)
        else:       
            json_data = {
                "type": "SimilarityBoostingFactor",
                "value": 1 + v,
                "unitCode": "dimensionless",
                "scriptScore": formula,
                "vectorParameter": param_name,
                "derivation": {
                    "type": "Derivation",
                    "entity": {
                        "id": k,
                        "type": "Embedding"
                    }
                },
            }
            boosting_resource = forge.from_json(json_data)
            boosting_resource.generation = generation_resource
            forge.register(boosting_resource)
        forge.tag(boosting_resource, tag)
        

def deprecate_individual_views(agg_view):
    views = agg_view["views"]
    if not isinstance(views, list):
        views = [agg_view["views"]]
    
    for el in views:
        org = el["project"].split("/")[0]
        proj = el["project"].split("/")[1]
        view = el["viewId"]
        es_view = nxs.views.fetch(org, proj, view)
        try:
            nxs.views.deprecate_es(es_view)
        except Exception as e:
            print(f"Deprecation failed with '{e}'")

            
def check_view_readiness(bucket_config, view_id, token):
    view_id = urllib.parse.quote_plus(view_id)
    url = f"{bucket_config.endpoint}/views/{bucket_config.org}/{bucket_config.proj}/{view_id}/statistics"
    r = requests.get(
        url,
        headers={"Authorization": f"Bearer {TOKEN}"}
    )
    response = r.json()
    last_event = response["lastEventDateTime"]
    last_processed_event = None
    if "lastProcessedEventDateTime" in response:
        last_processed_event = response["lastProcessedEventDateTime"]
    return last_event == last_processed_event


def add_views_with_replacement(existing_views, new_views):
    new_views = {el["_project"]: el for el in new_views}
    existing_views = {el["_project"]: el for el in existing_views}
    existing_views.update(new_views)
    return list(existing_views.values())

## User input

In [6]:
ENDPOINT = "https://bbp.epfl.ch/nexus/v1"
DOWNLOAD_DIR = "./data"
TOKEN = getpass.getpass()

········


TODO: Here we need to fix forge and allow to not specify the limit when doing ES queries, for now we put 'very large' number

In [7]:
HARD_RESOURCE_LIMIT = 10000

Bucket where embedding models live

In [8]:
MODEL_CATALOG_ORG = "dke"
MODEL_CATALOG_PROJECT = "embedding-pipelines"

ID of the embedding model to use

In [9]:
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/d0c21fd5-cb9c-445c-b0a4-94847ba61f5a  # neurite features
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/9fe6873b-ef6a-41b5-854a-382bc1be9fff  # dendrite
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/84519407-ad30-4d31-877e-1d6560325393  # axon
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/1c4fcd2e-000f-437b-b65b-844ee211105a  # brain regions
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/608fab85-0cc9-4ff9-a4bd-4249589b5889  # coordinates
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/43965be4-72f9-4901-9a95-d9ca13da8fb4  # TMD
# https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/7a111efa-7467-42d2-9e0c-c1ca7a883216  # TMD (scaled)

__PROVIDE HERE THE ID OF YOUR MODEL (OPTIONAL, REVISION)__

In [59]:
MODEL_ID = "https://bbp.epfl.ch/nexus/v1/resources/dke/embedding-pipelines/_/7a111efa-7467-42d2-9e0c-c1ca7a883216"
MODEL_REVISION = None  # Specify a revision, if necessary. If None, the latest revision is used

Atlas configuration project

In [60]:
ATLAS_CONFIG_ORG = "bbp"
ATLAS_CONFIG_PROJECT = "atlas"

ID of the recommender configuration in the atlas config project.

In [61]:
ATLAS_RECOMMENDER_CONFIG = "https://bbp.epfl.ch/neurosciencegraph/data/d9938314-4e27-4c45-8afe-44484b02636d"

Bucket where embedding vectors live

In [62]:
EMBEDDING_BUCKETS = [
     BucketConfiguration(
        "https://bbp.epfl.ch/nexus/v1",
         "dke", "seu-embeddings")
]

Later, we will assume that data and embeddings live in the same bucket

In [63]:
NEIGHBORHOOD_SIZE = 20  # Number of nearest neighbors to consider for local boosting

---

## Forge sessions

### Session for embedding models

In [64]:
forge_models = KnowledgeGraphForge(
    "../../configs/new-forge-config.yaml",
    endpoint=ENDPOINT,
    token=TOKEN, 
    bucket=f"{MODEL_CATALOG_ORG}/{MODEL_CATALOG_PROJECT}")

### Session for embedding resources

In [65]:
FORGE_SESSIONS = {
    el: create_forge_session(el) for el in EMBEDDING_BUCKETS
}

### Session for updating Atlas configs

In [66]:
forge_atlas = KnowledgeGraphForge(
    "../../configs/new-forge-config.yaml",
    endpoint=ENDPOINT,
    token=TOKEN, 
    bucket=f"{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}")

### Nexussdk session

In [67]:
nxs.config.set_environment(ENDPOINT)
nxs.config.set_token(TOKEN)

---

# I. Create ElasticSearchView


TODO: Adapt the resource_types property to the proper `Embedding` type once it is added to the context

In [68]:
model_resource = forge_models.retrieve(
    f"{MODEL_ID}{'?rev=' + str(MODEL_REVISION) if MODEL_REVISION is not None else ''}")

# If revision is not provided by the user, fetch the latest
if MODEL_REVISION is None:
    MODEL_REVISION = model_resource._store_metadata._rev 

MODEL_TAG = f"{MODEL_ID.split('/')[-1]}?rev={MODEL_REVISION}"

In [69]:
MODEL_TAG

'7a111efa-7467-42d2-9e0c-c1ca7a883216?rev=4'

In [70]:
dimension = model_resource.vectorDimension

In [71]:
SIMILARITY_VIEWS = {}
for bucket_config in EMBEDDING_BUCKETS:
    view = nxs.views.create_es(
        bucket_config.org, bucket_config.proj,
        mapping=get_es_view_mappings(dimension),
        tag=MODEL_TAG,
        resource_types=[
            f"https://neuroshapes.org/Embedding"],
        source_as_text=False,
        include_metadata = True, 
        include_deprecated = False)
    SIMILARITY_VIEWS[bucket_config] = nxs.views.fetch(
        bucket_config.org, bucket_config.proj,
        view_id=view["@id"])

In [72]:
SIMILARITY_VIEW_IDS = {
    k: v["@id"] for k, v in SIMILARITY_VIEWS.items()
}

__IMPORTANT__: Here, before we execute the next step, we need to make sure that the indexing is over. Execute the following cell until finishes

In [73]:
start = time.time()
while True:
    all_ready = []
    for k, v in SIMILARITY_VIEW_IDS.items():
        ready = check_view_readiness(k, v, TOKEN) 
        all_ready.append(ready)
    if all(all_ready):
        print(f"Indexing has finished after: {time.time() - start}s")
        break
    time.sleep(30) 

Indexing has finished after: 271.10370111465454s


---

# III. Update recommender configurations


## Update the master view and the recommender configs.

Retreive the recommender configuration resoruce and the record corresponding to the `MODEL_ID`.

In [74]:
config_resource = forge_atlas.retrieve(ATLAS_RECOMMENDER_CONFIG)
current_config = get_current_config(config_resource, MODEL_ID)

In [75]:
if current_config is None:
    raise ValueError(
        f"Model with ID '{MODEL_ID}' does not exist in Atlas Recommender Configuration, "
        "please, make sure you added the model to the config")

Check if the current record contains a link to the master view. If such link exists, retrieve all the existing individual views linked to this master view.

In [76]:
existing_individual_views = []
old_number_of_vectors = 0
old_master_view = None

if "similarityView" in forge_atlas.as_json(current_config):
    # Fetch the existing master_view
    old_master_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.similarityView.id)
    if isinstance(old_master_view["views"], OrderedDict):
        existing_individual_views.append(old_master_view["views"])
    else:
        existing_individual_views = old_master_view["views"]
    
    # Get the number of vectors indexed by the existing aggregated view
    set_elastic_view(forge_atlas, old_master_view["@id"])
    old_number_of_vectors = len(
        get_all_vectors(forge_atlas, HARD_RESOURCE_LIMIT))

In [77]:
existing_individual_views = [
    nxs.views.fetch(
        view["project"].split("/")[0],
        view["project"].split("/")[1],
        view["viewId"])
    for view in existing_individual_views
]

Create a new aggregated view to serve as the master view including all the existing individual views and the new simialiry view added by the notebook.

In [78]:
# Create a new agg view with the 
MASTER_VIEW = nxs.views.create_es_aggregated(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    add_views_with_replacement(existing_individual_views, list(SIMILARITY_VIEWS.values())),
    view_id=None
)
MASTER_VIEW_ID = MASTER_VIEW["@id"]
# re-fetch the view
MASTER_VIEW = nxs.views.fetch(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    MASTER_VIEW_ID)

Create a new tag that corresponds to the UUID of the new master view.

In [79]:
MASTER_VIEW_ID

'https://bbp.epfl.ch/neurosciencegraph/data/14d2fddc-a408-4f76-be4b-a4893b8f93df'

In [80]:
MASTER_TAG = MASTER_VIEW_ID.split('/')[-1]
MASTER_TAG

'14d2fddc-a408-4f76-be4b-a4893b8f93df'

## Compute raw (non-boosted) statistics

Compute raw statistics (min/max/mean/std) of similarity values from the master view and push them as a ElasticSearchViewStatistics resource (created if doesn't exist, updated if exists), taged with the new revision of the master view (in bbp/atlas).

In [81]:
FORMULAS = {
    "cosine": "doc['embedding'].size() == 0 ? 0 : (cosineSimilarity(params.query_vector, doc['embedding']) + 1.0) / 2",
    "euclidean": "doc['embedding'].size() == 0 ? 0 : (1 / (1 + l2norm(params.query_vector, doc['embedding'])))",
    "poincare": "float[] v = doc['embedding'].vectorValue; if (doc['embedding'].size() == 0) { return 0; } double am = doc['embedding'].magnitude; double bm = 0; double dist = 0; for (int i = 0; i < v.length; i++) { bm += Math.pow(params.query_vector[i], 2); dist += Math.pow(v[i] - params.query_vector[i], 2); } bm = Math.sqrt(bm); dist = Math.sqrt(dist); double x = 1 + (2 * Math.pow(dist, 2)) / ( (1 - Math.pow(bm, 2)) * (1 - Math.pow(am, 2)) );  double d = Math.log(x + Math.sqrt(Math.pow(x, 2) - 1)); return 1 / (1 + d);"
}
VECTOR_PARAMETER = "query_vector"

In [82]:
set_elastic_view(forge_atlas, MASTER_VIEW_ID)
vectors = get_all_vectors(forge_atlas, HARD_RESOURCE_LIMIT)

In [83]:
MASTER_VIEW_ID

'https://bbp.epfl.ch/neurosciencegraph/data/14d2fddc-a408-4f76-be4b-a4893b8f93df'

In [84]:
formula = FORMULAS[model_resource.similarity]

values, stats = get_view_stats(
    forge_atlas, vectors,
    formula,
    VECTOR_PARAMETER,
    HARD_RESOURCE_LIMIT)
stats_resource = register_stats(
    forge_atlas, MASTER_VIEW_ID,
    values.shape[0], stats, formula, VECTOR_PARAMETER,
    tag=MASTER_TAG)

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True


## Compute boosting factors and create necessary resources

Compute boosting factors for all the data points (vectors) indexed by the new master view and push them as separate resources into respective projects. Tag them by the new UUID of the master view.

In [85]:
all_views = []
if isinstance(MASTER_VIEW["views"], OrderedDict):
    all_views = [(MASTER_VIEW["views"]["project"], MASTER_VIEW["views"]["viewId"])]
else:
    for el in MASTER_VIEW["views"]:
        all_views.append((el["project"], el["viewId"]))

In [86]:
stats_json = [forge_atlas.as_json(el) for el in stats_resource.series]
stats_json = {el["statistic"]: el["value"] for el in stats_json}

In [87]:
# We update all the inidivual buckets references by the new master view
ALL_DEVIATIONS = {}
for bucket, view_id in all_views:
    print(f"(Re-)computing boosting factors in '{bucket}'...")
    deviations = dict()
    bucket_forge = KnowledgeGraphForge(
        "../../configs/new-forge-config.yaml",
        token=TOKEN, 
        endpoint=ENDPOINT,        
        bucket=bucket)
    
    # Compute local similarity deviations for points
    set_elastic_view(bucket_forge, view_id)
    all_vectors = get_all_vectors(bucket_forge, HARD_RESOURCE_LIMIT)
    for point_id, vector in all_vectors.items():
        deviations[point_id] = get_score_deviation(
            bucket_forge, point_id, vector, stats_json["min"], stats_json["max"],
            NEIGHBORHOOD_SIZE, formula, VECTOR_PARAMETER)
    ALL_DEVIATIONS.update(deviations)
    
    print(f"Registering/updating boosting factors in '{bucket}'...")
    # Register boosting factors into the current buckets
    boosting_resources = register_boosting_data(
        bucket_forge, view_id, MASTER_VIEW_ID, deviations, formula,
        VECTOR_PARAMETER, MASTER_TAG)

(Re-)computing boosting factors in 'dke/seu-embeddings'...
Registering/updating boosting factors in 'dke/seu-embeddings'...
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _t

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<s

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<s

In the individual embedding data buckets create a new ES view for boosting factors (tagged by the new master view).

In [88]:
new_boosting_views = []
for bucket, view_id in all_views:
    print(f"Creating a new ES view on boosting factors in '{bucket}'...")
    org = bucket.split("/")[0]
    proj = bucket.split("/")[1]
    boosting_view = nxs.views.create_es(
        org, proj,
        mapping=BOOSTING_VIEW_MAPPING,
        tag=MASTER_TAG,
        resource_types=[
            f"https://neuroshapes.org/SimilarityBoostingFactor"],
        source_as_text=False,
        include_metadata=True, 
        include_deprecated=False)
    new_boosting_views.append(boosting_view)

Creating a new ES view on boosting factors in 'dke/seu-embeddings'...


In [89]:
for el in new_boosting_views:
    print("Project: ", el["_project"])
    print("View: ", el["@id"])
    print()

Project:  https://bbp.epfl.ch/nexus/v1/projects/dke/seu-embeddings
View:  https://bbp.epfl.ch/neurosciencegraph/data/b38f285f-688d-453e-8214-04dc9585b59e



Create a new aggregated view for boosting factors targeting all the new boosting ES views (in `{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}`).

In [90]:
# Create a new agg view with the 
BOOSTING_VIEW = nxs.views.create_es_aggregated(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    new_boosting_views,
    view_id=None
)
BOOSTING_VIEW_ID = BOOSTING_VIEW["@id"]

__IMPORTANT__: Here, before we execute the next step, we need to make sure that the indexing in the aggregated view is over. Execute the following cell until it stops throwing an assertion error. If no error is observed, all the resources have been indexed, and we can proceed with the rest of the notebook.

In [92]:
start = time.time()
while True:
    all_ready = []
    for el in new_boosting_views:
        ready = check_view_readiness(
            BucketConfiguration(
                "/".join(el["_project"].split("/")[:-3]),
                el["_project"].split("/")[-2],
                el["_project"].split("/")[-1]),
            el["@id"],
            TOKEN)
        all_ready.append(ready)
    if all(all_ready):
        print(f"Indexing has finished after: {time.time() - start}s")
        break
    else:
        time.sleep(30)

Indexing has finished after: 0.09629678726196289s


## Compute boosted statistics

Compute statistics (min/max/mean/std) of similarity values after boosting and push them as a ElasticSearchViewStatistics resource (created if doesn't exist, updated if exists), taged with the new revision of the master view.

In [93]:
set_elastic_view(forge_atlas, MASTER_VIEW_ID)
values, stats = get_view_stats(
    forge_atlas, vectors,
    formula,
    VECTOR_PARAMETER,
    HARD_RESOURCE_LIMIT,
    ALL_DEVIATIONS)
stats_resource = register_stats(
    forge_atlas, MASTER_VIEW_ID,
    values.shape[0], stats, formula, VECTOR_PARAMETER,
    tag=MASTER_TAG, boosted=True)

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True


## Create a new ES view serving statistics

We serve stats tagged with the new uuid of the master view.

In [94]:
STATISTICS_VIEW = nxs.views.create_es(
    ATLAS_CONFIG_ORG, ATLAS_CONFIG_PROJECT,
    mapping=STATS_VIEW_MAPPING,
    tag=MASTER_TAG,
    resource_types=[
        "https://neuroshapes.org/ElasticSearchViewStatistics"],
    source_as_text=False,
    include_metadata = True, 
    include_deprecated = False)
STATISTICS_VIEW_ID = STATISTICS_VIEW["@id"]

## Update recommender condifuration and clean-up old resources

__IMPORTANT__: Check that the stats finised indexing (the following cell, shouldn't throw an assertion error

In [95]:
start = time.time()
while True:
    ready = check_view_readiness(
        BucketConfiguration(
            ENDPOINT,
            ATLAS_CONFIG_ORG,
            ATLAS_CONFIG_PROJECT),
        STATISTICS_VIEW_ID,
        TOKEN)
    if ready:
        print(f"Indexing has finished after: {time.time() - start}s")
        break

Indexing has finished after: 273.1534061431885s


Fetch old boosting and statistics views

In [96]:
old_boosting_view = None
if "boostingView" in forge_atlas.as_json(current_config):
    old_boosting_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.boostingView.id)
    
old_statistics_view = None
if "statisticsView" in forge_atlas.as_json(current_config):
    old_statistics_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.statisticsView.id)

In [97]:
STATISTICS_VIEW = nxs.views.create_es(
    ATLAS_CONFIG_ORG, ATLAS_CONFIG_PROJECT,
    mapping=STATS_VIEW_MAPPING,
    tag=MASTER_TAG,
    resource_types=[
        "https://neuroshapes.org/ElasticSearchViewStatistics"],
    source_as_text=False,
    include_metadata = True, 
    include_deprecated = False)
STATISTICS_VIEW_ID = STATISTICS_VIEW["@id"]

Update current Atlas recommendation config

In [98]:
current_config.embeddingModel.hasSelector = forge_atlas.from_json(
    {
      "@type": "FragmentSelector",
      "conformsTo": "https://bluebrainnexus.io/docs/delta/api/resources-api.html#fetch",
      "value": f"?rev={MODEL_REVISION}"
    }
)
current_config.similarityView = forge_atlas.from_json({
    "id": MASTER_VIEW_ID,
    "type": "AggregateElasticSearchView"
})
current_config.boostingView = forge_atlas.from_json({
    "id": BOOSTING_VIEW_ID,
    "type": "AggregateElasticSearchView"
})
current_config.statisticsView = forge_atlas.from_json({
    "id": STATISTICS_VIEW_ID,
    "type": "ElasticSearchView"
})
update_current_config(forge_atlas, config_resource, current_config)

<action> _update_one
<succeeded> True


Deprecate old boosting statistics.

In [99]:
MASTER_VIEW_ID

'https://bbp.epfl.ch/neurosciencegraph/data/14d2fddc-a408-4f76-be4b-a4893b8f93df'

In [100]:
if old_master_view is not None:
    deprecate_individual_views(old_master_view)
    nxs.views.deprecate_es(old_master_view)

if old_boosting_view is not None:
    deprecate_individual_views(old_boosting_view)
    nxs.views.deprecate_es(old_boosting_view)

if old_statistics_view is not None:
    nxs.views.deprecate_es(old_statistics_view)