## TODO

- Make sure boosting factors are computed with __normalized__ scores (so we need to pass the new statistics)


## Forge issues to address

- `forge.search({"type": "Embedding"}, limit=200)` throws 'Server disconnected'. Current (very slow!) workaround:

```
query = f"""
    SELECT ?id
    WHERE {{
        ?id a {DATA_TYPE_FILTER} ;
            <https://bluebrain.github.io/nexus/vocabulary/deprecated> false .
    }}
""" 
resources = forge.sparql(query, limit=HARD_RESOURCE_LIMIT)
resources = [forge.retrieve(r.id) for r in resources] 

```


- `forge.elastic` print the query it executes even without debug (with debug, prints twice)
- `forge.elastic` expects the `limit` parameter (cannot ask for all the documents), current workaround: set `HARD_RESOURCE_LIMIT=10000`, some large number so that all the resources can be fetched.
- `forge.update` after retrieve adds a full context payload

## Context issues

Add new types:
- `EmbeddingModel`
- `Embedding`
- `SimilarityBoostingFactor`
- `ElasticSearchViewStatistics`
- `RecommenderConfiguration`
- any other properties to add:
    - from `EmbeddingModel`: `similarity`, `vectiorDimension`
    - from `Embedding`: `embedding`
    - from `SimilarityBoostingFactor`: `scriptScore`, `vectorParameter`
    - from `ElasticSearchViewStatistics`: `boosted`, `scriptScore`, `vectorParameter` 
    - from `RecommenderConfiguration`: `embeddingModel`, `boostingViewmodel`, `similarityView`, `statisticsView`
    

# Add/update a set of embedding vectors


When adding/updating a set of embedding vectors, we need to perform the following sequence of steps

I. Create a new ES view for the new/updated vectors as follows

1. Get dimensions of the embedding vectors
2. Create a Nexus ES View resource with:
- `resourceTypes` being `Embedding`
- mapping that has `"embedding": "dense_vector"` with the right dimensions
- `resourceTag` field corresponds to the model UUID and its revision (e.g. `e2b953b9-6724-4278-a1e5-3472bd63e374?rev=1`)

 
II. Update an existing similarity aspect in the recommender config

__Pre-requisites:__ the `RecommenderConfiguration` resource exists and the aspect is added to it added to it (see `Add new similarity aspects.ipynb`)

1. Create a new aggregated view including the new similarity view. This view will be the new master view. Make sure all the vectors have been indexed. 
2. Compute raw statistics (min/max/mean/std) of similarity values from the master view and push them as a `ElasticSearchViewStatistics` resource (created if doesn't exist, updated if exists), taged with the new revision of the master view.
3. Compute boosting factors for all the data points (vectors) indexed by the master view and push them as separate resources into respective projects (create if don't exist, update if exist). Tag them by the new revision of the master view.
4. In the bucket with embedding data create a new ES view for boosting factors (tagged by the new master view id). Make sure that all the boosting factors have finished indexing.
5. Compute statistics (min/max/mean/std) of similarity values from thr master view with boosting, push and tag them with the new revision of the master view.
6. Create a new ES view serving statistics (both raw and boosted) tagged with the new revision of the master view. Make sure that all the stats have finished indexing.
7. Create a new aggregated view for boosting factors targeting all the new boosting ES views.
9. Update the Recommender Configuration to point to the new revision of the master view, the new ES view with the stats and the new aggregated view with the boosting factors.
10. In each of the individual projects deprecate the old boosting ES view and the old ES view serving embedding vectors (if such exists).
11. If necessary, deprecate old stats view and the old aggregated view for boosting.

Related JIRA tickets: 
* https://bbpteam.epfl.ch/project/issues/browse/DKE-718
* https://bbpteam.epfl.ch/project/issues/browse/DKE-715

# Setup

## Imports

In [442]:
import copy
import getpass
import math
import requests

from collections import OrderedDict

import numpy as np
import nexussdk as nxs

from collections import namedtuple
from urllib.parse import quote_plus
from kgforge.core import KnowledgeGraphForge

In [161]:
from kgforge.version import __version__
print(__version__)

0.6.3.dev9+gc159ffd


## Helpers

ES view mappings

In [162]:
SIMILARITY_VIEW_MAPPING = {
    "properties": {
      "@id": {
        "type": "keyword"
      },
      "@type": {
        "type": "keyword"
      },
      "derivation": {
        "properties": {
          "entity": {
            "properties": {
              "@id": {
                "type": "keyword"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      },
      "embedding": {
        "type": "dense_vector"
      },
      "generation": {
        "properties": {
          "activity": {
            "properties": {
              "used": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      }
    }
}


BOOSTING_VIEW_MAPPING = {
    "properties": {
      "@id": {
        "type": "keyword"
      },
      "@type": {
        "type": "keyword"
      },
      "value": {
        "type": "float"
      },
      "scriptScore": {
        "type": "keyword"
      },
      "vectorParameter": {
        "type": "keyword"
      },
      "derivation": {
        "properties": {
          "entity": {
            "properties": {
              "@id": {
                "type": "keyword"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      },
      "generation": {
        "properties": {
          "activity": {
            "properties": {
              "used": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
          }
        },
        "type": "nested"
      }
    }
}

STATS_VIEW_MAPPING = {
    "properties": {
        "@id": {
            "type": "keyword"
        },
        "@type": {
            "type": "keyword"
        }, 
        "boosted": {
            "type": "boolean"
        },
        "scriptScore": {
            "type": "keyword"
        },
        "vectorParameter": {
            "type": "keyword"
        },
        "derivation": {
            "properties": {
              "entity": {
                "properties": {
                  "@id": {
                    "type": "keyword"
                  }
                },
                "type": "nested"
              }
            },
            "type": "nested"
        },
        "series": {
            "properties": {
                "statistic": {
                    "type": "keyword"
                },
                "value": {
                    "type": "float"
                }
            },
            "type": "nested"
        }
    } 
}

In [438]:
BucketConfiguration = namedtuple(
    'BucketConfiguration', 'endpoint org proj')


def create_forge_session(bucket_config):
    return KnowledgeGraphForge(
        "https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
        token=TOKEN, 
        endpoint=bucket_config.endpoint,        
        bucket=f"{bucket_config.org}/{bucket_config.proj}")


def get_es_view_mappings(dimension):
    mapping = copy.deepcopy(SIMILARITY_VIEW_MAPPING)
    mapping["properties"]["embedding"]["dims"] = dimension
    return mapping


def get_current_config(config_resource, model_id):
    """Get the configuration record corresponding to the input model."""
    current_config = None
    if isinstance(config_resource.configuration, list):
        for el in config_resource.configuration:
            if el.model == model_id:
                current_config = el
    else:
        if config_resource.configuration.model == model_id:
            current_config = config_resource.configuration

    return current_config


def update_current_config(forge, config_resource, current_config):
    """Update the configuration record."""
    if isinstance(config_resource.configuration, list):
        new_configs = []
        for el in config_resource.configuration:
            if el is not current_config:
                new_configs.append(el)
        new_configs.append(current_config)
        config_resource.configuration = new_configs
    else:
        config_resource.configuration = [
            current_config
        ]
    forge.update(config_resource)
    
def set_elastic_view(forge, view):
    views_endpoint = "/".join((
        ENDPOINT,
        "views",
        quote_plus(forge._store.bucket.split("/")[0]),
        quote_plus(forge._store.bucket.split("/")[1])))
    forge._store.service.elastic_endpoint["endpoint"] = "/".join(
        (views_endpoint, quote_plus(view), "_search"))

    
def get_all_vectors(forge, resource_limit):
    all_embeddings = forge.elastic(f"""{{
        "from" : 0,
        "size" : {resource_limit},
        "query": {{
            "term": {{"_deprecated": false}}
        }}
    }}
    """)
    vectors = {
        result._source["@id"]: result._source["embedding"]
        for result in all_embeddings
    }
    return vectors


def get_all_scores(forge, vectors, formula, param_name, resource_limit=200, boosting=None):
    score_values = set()
    for k, vector in vectors.items():
        query = f"""{{
          "size": {len(vectors)},
          "query": {{
            "script_score": {{
                "query": {{
                    "bool" : {{
                      "must_not" : {{
                        "term" : {{ "@id": "{k}" }}
                      }},
                      "must": {{ "exists": {{ "field": "embedding" }} }}
                    }}
                }},
                "script": {{
                    "source": "{formula}",
                    "params": {{
                      "{param_name}": {vector}
                    }}
                }}
            }}
          }}
        }}"""

        res = forge.elastic(query)
        for el in res:
            boost_factor = 1
            if boosting:
                boost_factor = 1 + boosting[el._source["@id"]]
            score_values.add(el._score * boost_factor)
    score_values = np.array(list(score_values))
    return score_values


def get_view_stats(forge, vectors, formula, param_name, resource_limit=200, boosting=None):
    Statistics = namedtuple('Statistics', 'min max mean std')
    score_values = get_all_scores(
        forge, vectors, formula, param_name, resource_limit, boosting)
    return score_values, Statistics(
        score_values.min(),
        score_values.max(),
        score_values.mean(),
        score_values.std())


def register_stats(forge, view_id, sample_size, stats, formula,
                   param_name, tag, boosted=False):
    
    stat_values = [
        {
          "statistic": "min",
          "unitCode": "dimensionless",
          "value": stats.min
        },
        {
          "statistic": "max",
          "unitCode": "dimensionless",
          "value": stats.max
        },
        {
          "statistic": "mean",
          "unitCode": "dimensionless",
          "value": stats.mean
        },
        {
          "statistic": "standard deviation",
          "unitCode": "dimensionless",
          "value": stats.std
        },
        {
          "statistic": "N",
          "unitCode": "dimensionless",
          "value": sample_size
        }
    ]

    stats = forge.search({
        "type": "ElasticSearchViewStatistics",
        "boosted": boosted,
        "derivation": {
            "entity": {
                "id": view_id
            }
        }
    })
    
    if len(stats) > 0:
        stats_resource = stats[0]
        stats_resource.series = forge.from_json(stat_values)
        forge.update(stats_resource)
    else:    
        json_data = {
            "type": "ElasticSearchViewStatistics",
            "boosted": boosted,
            "scriptScore": formula,
            "vectorParameter": param_name,
            "series": stat_values,
            "derivation": {
                "type": "Derivation",
                "entity": {
                    "id": view_id
                }
            }
        }
        stats_resource = forge.from_json(json_data)
        forge.register(stats_resource)
    forge.tag(stats_resource, tag)
    return stats_resource

def get_score_deviation(forge, point_id, vector, score_min, score_max, k, formula, param_name):
    query = f"""{{
      "size": {k},
      "query": {{
        "script_score": {{
          "query": {{
                "exists": {{
                    "field": "embedding"
                }}
          }},
          "script": {{
            "source": "{formula}",
            "params": {{
              "{param_name}": {vector}
            }}
          }}
        }}
      }}
    }}"""

    result = forge.elastic(query)
    scores = set()
    for el in result:
        if point_id != el._source["@id"]:
            # Min/max normalization of the score
            score = (el._score - score_min) / (score_max - score_min)
            scores.add(score)
    scores = np.array(list(scores))
    return math.sqrt(((1 - scores)**2).mean())


def register_boosting_data(forge, view_id, aggregated_view_id,
                           deviation, formula, param_name, tag):
    generation_resource = forge.from_json({
        "type": "Generation",
        "activity": {
            "type": ["SimiarityBoosting", "Activity"],
            "used": {
                "id": aggregated_view_id,
                "type": "AggregateElasticSearchView"
            }
        }
    })
    
    for k, v in deviation.items():
        existing_data = forge.search({
            "type": "SimilarityBoostingFactor",
            "derivation": {
                "entity": {
                    "id": k
                }
            }
        })
        if len(existing_data) > 0:
            boosting_resource = existing_data[0]
            boosting_resource.value = 1 + v
            boosting_resource.generation = generation_resource
            forge.update(boosting_resource)
        else:       
            json_data = {
                "type": "SimilarityBoostingFactor",
                "value": 1 + v,
                "unitCode": "dimensionless",
                "scriptScore": formula,
                "vectorParameter": param_name,
                "derivation": {
                    "type": "Derivation",
                    "entity": {
                        "id": k,
                        "type": "Embedding"
                    }
                },
            }
            boosting_resource = forge.from_json(json_data)
            boosting_resource.generation = generation_resource
            forge.register(boosting_resource)
        forge.tag(boosting_resource, tag)
        

def deprecate_individual_views(agg_view):
    if isinstance(agg_view["views"], OrderedDict):
        org = agg_view["views"]["project"].split("/")[0]
        proj = agg_view["views"]["project"].split("/")[1]
        es_view = nxs.views.fetch(
            org, proj, agg_view["views"]["viewId"])
        nxs.views.deprecate_es(es_view)
    else:
        for bucket, view in agg_view["views"]:
            org = bucket.split("/")[0]
            proj = bucket.split("/")[1]
            es_view = nxs.views.fetch(org, proj, view)
            nxs.views.deprecate_es(es_view)

## User input

In [419]:
ENDPOINT = "https://staging.nexus.ocp.bbp.epfl.ch/v1"
DOWNLOAD_DIR = "./data"
TOKEN = getpass.getpass()

········


TODO: Here we need to fix forge and allow to not specify the limit when doing ES queries, for now we put 'very large' number

In [165]:
HARD_RESOURCE_LIMIT = 10000

Bucket where embedding models live

In [166]:
MODEL_CATALOG_ORG = "dke"
MODEL_CATALOG_PROJECT = "embedder_catalog"

ID of the embedding model to use

In [379]:
MODEL_ID = "https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/dke/embedder_catalog/_/14d61701-c4fa-44ea-8139-0e0ed606b4ec"
MODEL_REVISION = None  # Specify a revision, if necessary. If None, the latest revision is used

Atlas configuration project

In [380]:
ATLAS_CONFIG_ORG = "dke"
ATLAS_CONFIG_PROJECT = "fake-atlas"

ID of the recommender configuration in the atlas config project.

In [381]:
ATLAS_RECOMMENDER_CONFIG = "https://bbp.epfl.ch/neurosciencegraph/data/415184d2-0ae5-4af0-89b7-8b00393b033f"

Bucket where embedding vectors live

In [382]:
EMBEDDING_BUCKETS = [
     BucketConfiguration(
        "https://staging.nexus.ocp.bbp.epfl.ch/v1",
         "dke","seu-embeddings"),
     BucketConfiguration(
        "https://staging.nexus.ocp.bbp.epfl.ch/v1",
         "dke", "seu-embeddings-2")
]

Later, we will assume that data and embeddings live in the same bucket

In [383]:
NEIGHBORHOOD_SIZE = 20  # Number of nearest neighbors to consider for local boosting

---

## Forge sessions

### Session for embedding models

In [420]:
forge_models = KnowledgeGraphForge(
    "https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
    endpoint=ENDPOINT,
    token=TOKEN, 
    bucket="dke/embedder_catalog")

### Session for embedding resources

In [421]:
FORGE_SESSIONS = {
    el: create_forge_session(el) for el in EMBEDDING_BUCKETS
}

### Session for updating Atlas configs

In [422]:
forge_atlas = KnowledgeGraphForge(
    "https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
    endpoint=ENDPOINT,
    token=TOKEN, 
    bucket=f"{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}")

### Nexussdk session

In [423]:
nxs.config.set_environment(ENDPOINT)
nxs.config.set_token(TOKEN)

---

# I. Create ElasticSearchView


TODO: Adapt the resource_types property to the proper `Embedding` type once it is added to the context

In [388]:
model_resource = forge_models.retrieve(
    f"{MODEL_ID}{'?rev=' + str(MODEL_REVISION) if MODEL_REVISION is not None else ''}")

# If revision is not provided by the user, fetch the latest
if MODEL_REVISION is None:
    MODEL_REVISION = model_resource._store_metadata._rev 

MODEL_TAG = f"{MODEL_ID.split('/')[-1]}?rev={MODEL_REVISION}"

In [389]:
MODEL_TAG

'14d61701-c4fa-44ea-8139-0e0ed606b4ec?rev=10'

In [390]:
dimension = model_resource.vectorDimension

In [391]:
SIMILARITY_VIEWS = {}
for bucket_config in EMBEDDING_BUCKETS:
    SIMILARITY_VIEWS[bucket_config] = nxs.views.create_es(
        bucket_config.org, bucket_config.proj,
        mapping=get_es_view_mappings(dimension),
        tag=MODEL_TAG,
        resource_types=[
            f"https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/dke/seu-embeddings/_/Embedding"],
        # HEre we need to make sure we add Embedding to some global context
        source_as_text=False,
        include_metadata = True, 
        include_deprecated = False)

In [392]:
SIMILARITY_VIEW_IDS = {
    k: v["@id"] for k, v in SIMILARITY_VIEWS.items()
}

In [393]:
SIMILARITY_VIEW_IDS

{BucketConfiguration(endpoint='https://staging.nexus.ocp.bbp.epfl.ch/v1', org='dke', proj='seu-embeddings'): 'https://bbp.epfl.ch/neurosciencegraph/data/00424422-4864-4a11-abb6-63951f191b2a',
 BucketConfiguration(endpoint='https://staging.nexus.ocp.bbp.epfl.ch/v1', org='dke', proj='seu-embeddings-2'): 'https://bbp.epfl.ch/neurosciencegraph/data/1268e8f3-1559-4341-8a65-df66685538cf'}

__IMPORTANT__: Here, before we execute the next step, we need to make sure that the indexing is over. Execute the following cell until it stops throwing an error. If no error is observed, all the resources have been indexed, and we can proceed with the rest of the notebook.

In [443]:
ENDPOINT

'https://staging.nexus.ocp.bbp.epfl.ch/v1'

In [446]:
import urllib

In [None]:
totalEvents - total number of events in the project
processedEvents - number of events that have been considered by the view
remainingEvents - number of events that remain to be considered by the view
discardedEvents - number of events that have been discarded (were not evaluated due to filters, e.g. did not match schema, tag or type defined in the view)
evaluatedEvents - number of events that have been used to update an index
lastEventDateTime - timestamp of the last event in the project
lastProcessedEventDateTime - timestamp of the last event processed by the view
delayInSeconds - number of seconds between the last processed event timestamp and the last known event timestamp

In [455]:
for k, v in SIMILARITY_VIEW_IDS.items():
    view_id = urllib.parse.quote_plus(v)
    url = f"{k.endpoint}/views/{k.org}/{k.proj}/{view_id}/statistics"
    r = requests.get(
        url,
        headers={"Authorization": f"Bearer {TOKEN}"}
    )
    for kk, vv in r.json().items():
        print(kk, vv)
        
    print()

@context https://bluebrain.github.io/nexus/contexts/statistics.json
@type ViewStatistics
delayInSeconds 0
discardedEvents 14076
evaluatedEvents -2839
failedEvents 0
lastEventDateTime 2021-09-16T17:12:29.209Z
lastProcessedEventDateTime 2021-09-16T17:12:29.209Z
processedEvents 11237
remainingEvents 3239
totalEvents 14476

@context https://bluebrain.github.io/nexus/contexts/statistics.json
@type ViewStatistics
delayInSeconds 0
discardedEvents 4728
evaluatedEvents -562
failedEvents 0
lastEventDateTime 2021-09-16T17:12:29.543Z
lastProcessedEventDateTime 2021-09-16T17:12:29.543Z
processedEvents 4166
remainingEvents 962
totalEvents 5128



In [439]:
for k, v in SIMILARITY_VIEW_IDS.items():
    print(k)
    print("---------------------------")
    forge = FORGE_SESSIONS[k]
    
    query = f"""
        SELECT ?id 
        WHERE {{
            ?id a <https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/dke/seu-embeddings/_/Embedding> ;
                <https://bluebrain.github.io/nexus/vocabulary/deprecated> false ;
                <https://neuroshapes.org/generation> ?g .
            ?g <http://www.w3.org/ns/prov#activity> ?a .
            ?a <http://www.w3.org/ns/prov#used> <{MODEL_ID}>.

      }}
    """
    resources = forge.sparql(query, limit=HARD_RESOURCE_LIMIT)
    set_elastic_view(forge, v)
    vectors = get_all_vectors(forge, HARD_RESOURCE_LIMIT)

    print("Resources through SPARQL: ", len(resources))
    print("Resources through ES view: ", len(vectors))

    assert(len(vectors) >= len(resources))
    print()

BucketConfiguration(endpoint='https://staging.nexus.ocp.bbp.epfl.ch/v1', org='dke', proj='seu-embeddings')
---------------------------
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
Resources through SPARQL:  200
Resources through ES view:  200

BucketConfiguration(endpoint='https://staging.nexus.ocp.bbp.epfl.ch/v1', org='dke', proj='seu-embeddings-2')
---------------------------
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
Resources through SPARQL:  200
Resources through ES view:  200



---

# III. Update recommender configurations


## Update the master view and the recommender configs.

Retreive the recommender configuration resoruce and the record corresponding to the `MODEL_ID`.

In [398]:
config_resource = forge_atlas.retrieve(ATLAS_RECOMMENDER_CONFIG)
current_config = get_current_config(config_resource, MODEL_ID)

if current_config is None:
    raise ValueError(
        f"Model with ID '{MODEL_ID}' does not exist in Atlas Recommender Configuration, "
        "please, make sure you added the model to the config")

Check if the current record contains a link to the master view. If such link exists, retrieve all the existing individual views linked to this master view.

In [399]:
existing_individual_views = []
old_number_of_vectors = 0
old_master_view = None

if "similarityView" in forge_atlas.as_json(current_config):
    # Fetch the existing master_view
    old_master_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.similarityView)
    if isinstance(old_master_view["views"], OrderedDict):
        existing_individual_views.append(old_master_view["views"])
    else:
        existing_individual_views = old_master_view["views"]
    
    # Get the number of vectors indexed by the existing aggregated view
    set_elastic_view(forge_atlas, old_master_view["@id"])
    old_number_of_vectors = len(
        get_all_vectors(forge_atlas, HARD_RESOURCE_LIMIT))

In [400]:
existing_individual_views = [
    nxs.views.fetch(
        view["project"].split("/")[0],
        view["project"].split("/")[1],
        view["viewId"])
    for view in existing_individual_views
]

Create a new aggregated view to serve as the master view including all the existing individual views and the new simialiry view added by the notebook.

In [401]:
# Create a new agg view with the 
MASTER_VIEW = nxs.views.create_es_aggregated(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    existing_individual_views + list(SIMILARITY_VIEWS.values()),
    view_id=None
)
MASTER_VIEW_ID = MASTER_VIEW["@id"]
# re-fetch the view
MASTER_VIEW = nxs.views.fetch(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    MASTER_VIEW_ID)

Create a new tag that corresponds to the UUID of the new master view.

In [402]:
MASTER_TAG = MASTER_VIEW_ID.split('/')[-1]
MASTER_TAG

'ef19bdf2-3617-4384-9a30-623ac40fdf39'

__IMPORTANT__: Here, before we execute the next step, we need to make sure that the indexing in the aggregated view is over. Execute the following cell until it stops throwing an error. If no error is observed, all the resources have been indexed, and we can proceed with the rest of the notebook.

In [440]:
all_vectors = 0
for k, v in SIMILARITY_VIEW_IDS.items():
    forge = FORGE_SESSIONS[k]
    set_elastic_view(forge, v)
    vectors = get_all_vectors(forge, HARD_RESOURCE_LIMIT)
    all_vectors += len(vectors)
    
set_elastic_view(forge_atlas, MASTER_VIEW_ID)
agg_vectors = get_all_vectors(forge_atlas, HARD_RESOURCE_LIMIT)
print("Sum of vectors from individual indices: ", all_vectors)
print("Number of vectors from the master view:", len(agg_vectors))
assert(len(agg_vectors) == all_vectors)

{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
Sum of vectors from individual indices:  400
Number of vectors from the master view: 400


## Compute raw (non-boosted) statistics

Compute raw statistics (min/max/mean/std) of similarity values from the master view and push them as a ElasticSearchViewStatistics resource (created if doesn't exist, updated if exists), taged with the new revision of the master view (in bbp/atlas).

In [404]:
FORMULAS = {
    "cosine": "(cosineSimilarity(params.query_vector, doc['embedding']) + 1.0) / 2",
    "euclidean": "1 / (1 + l2norm(params.query_vector, doc['embedding']))"
}
VECTOR_PARAMETER = "query_vector"

In [405]:
formula = FORMULAS[model_resource.similarity]

values, stats = get_view_stats(
    forge_atlas, vectors,
    formula,
    VECTOR_PARAMETER,
    HARD_RESOURCE_LIMIT)
stats_resource = register_stats(
    forge_atlas, MASTER_VIEW_ID,
    values.shape[0], stats, formula, VECTOR_PARAMETER,
    tag=MASTER_TAG)

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/3acaffd6-4ede-4d30-b29c-f88a9dda2859" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.1416456550359726, 0.09924907982349396, 0.11578177660703659, -0.16871002316474915, -0.07568155974149704, 0.2617238163948059, 0.1812838762998581, -0.04862945154309273, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
 

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/80a9c0da-3e3c-44dd-a5c7-efa2c182137b" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.02870853990316391, 0.11050949990749359, 0.054193757474422455, -0.018031230196356773, 0.009494900703430176, 0.02084416337311268, 0.007478217128664255, -0.012473725713789463, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/b49922e2-5adf-4d80-8606-90fb144053fb" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.11504893004894257, -0.00048731331480666995, -0.028334181755781174, -0.1405833214521408, -0.12035557627677917, -0.11000135540962219, -0.1297367811203003, -0.012473725713789463, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "q

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/b6d9a7e5-a382-4759-9c17-ebeb478e3118" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.11990752816200256, 0.1400831788778305, 0.06072208657860756, 0.01612263172864914, 0.14073938131332397, 0.12482598423957825, 0.03492121398448944, 0.15022704005241394, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
   

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/0e9f8614-dbbd-4b2b-8d51-e47f6ec72cbf" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.13247039914131165, 0.10699722915887833, 0.1580311506986618, -0.25509920716285706, 0.0359426774084568, 0.000355557887814939, 0.06693805009126663, -0.03959051892161369, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/a869e7d6-0900-44e9-9df7-78a446bfa231" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.049087826162576675, 0.05230037868022919, 0.01859588548541069, -0.14861951768398285, -0.13551540672779083, -0.07576058059930801, -0.08857227861881256, -0.13901875913143158, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/9d73bb33-8b35-40e1-8e21-78c28e2d5e8f" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.28055766224861145, 0.06220844388008118, 0.020073996856808662, 0.5465120077133179, 0.2697761654853821, 0.10753434896469116, 0.11725021153688431, 0.168304905295372, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
     

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/fa6e725e-67a2-416c-80e9-8f5711f44953" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.022318048402667046, -0.053338419646024704, -0.00419168034568429, -0.09437515586614609, -0.056438155472278595, -0.11012931913137436, 0.007478217128664255, -0.08478517830371857, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "qu

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/f2b55fda-6ed6-4f89-b9ca-124ec3dcb333" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.22774375975131989, 0.08143379539251328, 0.18044918775558472, 0.13063852488994598, 0.21051107347011566, 0.07951835542917252, 0.12182404100894928, 0.19542169570922852, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
  

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/00ae7c46-9b49-4970-9605-25be4a544550" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.2839093804359436, 0.07904792577028275, 0.06675771623849869, 0.01210452988743782, -0.04880731180310249, -0.10260079801082611, -0.0977199450135231, -0.03959051892161369, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/c3485a13-499b-4e21-ac22-687defd02584" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.11398883163928986, -0.03006223775446415, -0.07193849980831146, -0.12250186502933502, -0.12339934706687927, -0.10780269652605057, -0.20291811227798462, -0.11190196871757507, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/dd8bf126-cdd2-4fef-812e-1fb52ef27317" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.049595724791288376, 0.2000579535961151, 0.11245602369308472, 0.044249340891838074, 0.2662685215473175, 0.04833488538861275, -0.07027694582939148, 0.10503238439559937, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
 

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/cc67e4b0-8a18-4007-a25a-bf19f4997618" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.043112196028232574, 0.0687139555811882, 0.0231533981859684, -0.07026654481887817, 0.08174018561840057, 0.09572230279445648, 0.22244837880134583, 0.050798796117305756, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/04062a3c-9ce7-4448-85a2-ac8af4f6c5d8" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.12997779250144958, -0.15828174352645874, -0.1657986342906952, -0.15665572881698608, -0.04524986818432808, -0.09484400600194931, -0.09314610809087753, -0.0938241109251976, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query"

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/5146322f-3187-4405-b0fe-d1a0f08939b3" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.12327883392572403, -0.17964577674865723, -0.19043384492397308, -0.026067432016134262, -0.09709511697292328, -0.07893049716949463, -0.13888444006443024, -0.15709662437438965, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "que

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/00ba9894-73e3-4a2c-b606-97b66155fb09" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.022520381957292557, -0.15563595294952393, -0.16703039407730103, 0.10251180827617645, -0.03925660625100136, -0.03345133736729622, -0.04283394664525986, -0.12094090133905411, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/b5364d73-917b-40dd-ac84-90f6e74ee819" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.015030517242848873, -0.045835938304662704, -0.08142305165529251, 0.06233079731464386, 0.05429430678486824, 0.01014118455350399, -0.14803211390972137, 0.050798796117305756, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/4bff70ab-7a6c-4165-ad13-e3777b20e2d6" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.11916255950927734, -0.15233615040779114, -0.09731276333332062, -0.07830274850130081, -0.1387762725353241, -0.1730019450187683, -0.12516294419765472, -0.1480576992034912, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query":

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/f7f9b897-b7a2-463b-a4a4-f13f54643b93" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.13525483012199402, -0.19984935224056244, -0.1975780427455902, -0.001958824461326003, -0.14319808781147003, -0.09281966090202332, -0.11601527780294418, -0.13901875913143158, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/e5c6ae05-56c2-40f3-aedf-302c31821082" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.19367937743663788, -0.13647881150245667, -0.1241651326417923, -0.14259237051010132, -0.18415918946266174, -0.014978459104895592, -0.006243281997740269, -0.13901875913143158, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "que

## Compute boosting factors and create necessary resources

Compute boosting factors for all the data points (vectors) indexed by the new master view and push them as separate resources into respective projects. Tag them by the new UUID of the master view.

In [406]:
all_views = []
if isinstance(MASTER_VIEW["views"], OrderedDict):
    all_views = [(MASTER_VIEW["views"]["project"], MASTER_VIEW["views"]["viewId"])]
else:
    for el in MASTER_VIEW["views"]:
        all_views.append((el["project"], el["viewId"]))

In [436]:
stats_json = [forge_atlas.as_json(el) for el in stats_resource.series]
stats_json = {el["statistic"]: el["value"] for el in stats_json}

In [407]:
# We update all the inidivual buckets references by the new master view
ALL_DEVIATIONS = {}
for bucket, view_id in all_views:
    print(f"(Re-)computing boosting factors in '{bucket}'...")
    deviations = dict()
    bucket_forge = KnowledgeGraphForge(
        "https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
        token=TOKEN, 
        endpoint=ENDPOINT,        
        bucket=bucket)
    
    # Compute local similarity deviations for points
    set_elastic_view(bucket_forge, view_id)
    all_vectors = get_all_vectors(bucket_forge, HARD_RESOURCE_LIMIT)
    for point_id, vector in all_vectors.items():
        deviations[point_id] = get_score_deviation(
            bucket_forge, point_id, vector, stats_json["min"], stats_json["max"],
            NEIGHBORHOOD_SIZE, formula, VECTOR_PARAMETER)
    ALL_DEVIATIONS.update(deviations)
    
    print(f"Registering/updating boosting factors in '{bucket}'...")
    # Register boosting factors into the current buckets
    boosting_resources = register_boosting_data(
        bucket_forge, view_id, MASTER_VIEW_ID, deviations, formula,
        VECTOR_PARAMETER, MASTER_TAG)

(Re-)computing boosting factors in 'dke/seu-embeddings'...
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.05515346676111221, -0.008430980145931244, 0.05259247124195099, 0.06433984637260437, 0.21012674272060394, 0.01546294242143631, 0.14469321072101593, 0.22253848612308502, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.21863116323947906, 0.1926262527704239, 0.12612855434417725, -0.04013078659772873, 0.13093514740467072, 0.3098948001861572, 0.2727605402469635, 0.19542169570922852, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.07535597681999207, 0.09464588761329651, 0.13290324807167053, -0.1606738269329

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.2342330664396286, 0.1927383542060852, 0.1733049750328064, 0.13867472112178802, 0.2981276512145996, 0.11266954243183136, 0.03492121398448944, 0.24965529143810272, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.17288173735141754, 0.20510262250900269, 0.24548614025115967, -0.269162565469741

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.04345085099339485, 0.005176150240004063, -0.036833327263593674, -0.016022179275751114, -0.05251597985625267, -0.1477762758731842, -0.11601527780294418, 0.005604137666523457, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.13646432757377625, -0.12951336801052094, -0.15434326231479645, -0.

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.0014045657590031624, 0.03241165354847908, 0.07439462840557098, 0.09648465365171432, 0.06196695193648338, 0.36055102944374084, -0.0519816130399704, 0.005604137666523457, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.013198858126997948, 0.18049828708171844, 0.23144406080245972, -0.0742846

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.5627924203872681, 0.14857885241508484, 0.08942210674285889, 0.22104579210281372, 0.2552162706851959, 0.0993630513548851, -0.04740777984261513, 0.0236820001155138, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.028315307572484016, -0.09664648771286011, -0.1276140660047531, 0.06032174453139

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.11398883163928986, -0.03006223775446415, -0.07193849980831146, -0.12250186502933502, -0.12339934706687927, -0.10780269652605057, -0.20291811227798462, -0.11190196871757507, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.09711750596761703, -0.03788486123085022, 0.0017207690980285406, 0.14

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.10846658051013947, 0.004554530140012503, -0.049643635749816895, -0.13254711031913757, 0.2577633857727051, -0.05704067647457123, -0.04283394664525986, 0.24965529143810272, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.06647437065839767, -0.11510100215673447, -0.1326642781496048, 0.279308

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.25671741366386414, 0.18041640520095825, 0.12637491524219513, 0.35565218329429626, 0.2645739018917084, 0.22951838374137878, 0.18585771322250366, 0.12311024963855743, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.015030517242848873, -0.045835938304662704, -0.08142305165529251, 0.062330797

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.01800803653895855, -0.16203764081001282, -0.12822994589805603, 0.0924665555357933, 0.01892445981502533, -0.07670256495475769, -0.04283394664525986, -0.030551588162779808, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.010985098779201508, -0.056696273386478424, -0.027348773553967476, 0.1

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.10232459008693695, -0.11990533024072647, -0.04274577647447586, 0.004068327601999044, -0.13244746625423431, 0.02577097900211811, -0.024538613855838776, -0.1932523548603058, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.1296289563179016, -0.0585794597864151, -0.1145574077963829, -0.18880

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.10866944491863251, -0.18496832251548767, -0.21359093487262726, -0.14661046862602234, -0.10372734814882278, -0.04164477065205574, -0.06112927943468094, -0.04862945154309273, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.16966989636421204, -0.14529860019683838, -0.15692995488643646, 0.239

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.13140982389450073, 0.04605427011847496, -0.008872369304299355, 0.1326475739479065, 0.30046331882476807, 0.16000168025493622, -0.04283394664525986, 0.13214917480945587, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.2074993997812271, -0.03623993322253227, -0.0023440399672836065, 0.23912724

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.14639848470687866, 0.21049830317497253, 0.26334667205810547, 0.18689192831516266, 0.11450783908367157, 0.21402956545352936, 0.3505156934261322, 0.2948499321937561, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.028496550396084785, 0.13865283131599426, 0.05727316066622734, -0.076293699443

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<s

<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
(Re-)computing boosting factors in 'dke/seu-embeddings-2'...
{
        "from" : 0,
        "size" : 10000,
        "query": {
            "term": {"_deprecated": false}
        }
    }
    
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.1416456550359726, 0.09924907982349396, 0.11578177660703659, -0.16871002316474915, -0.07568155974149704, 0.2617238163948059, 0.1812

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.06146248057484627, 0.16324076056480408, 0.0968126654624939, -0.02807648293673992, -0.014020074158906937, -0.13686126470565796, -0.11144144088029861, 0.06887666136026382, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.2815166711807251, 0.4263838827610016, 0.3297385275363922, 0.206982448697

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.1283605694770813, 0.12350928038358688, 0.1852530539035797, 0.09447560459375381, 0.1493118852376938, 0.06677554547786713, 0.14469321072101593, 0.21349956095218658, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.007247116882354021, 0.013683941215276718, 0.062323376536369324, 0.1828738301992

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.04345085099339485, 0.005176150240004063, -0.036833327263593674, -0.016022179275751114, -0.05251597985625267, -0.1477762758731842, -0.11601527780294418, 0.005604137666523457, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.13646432757377625, -0.12951336801052094, -0.15434326231479645, -0.

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.2698267698287964, -0.002287312876433134, -0.025501133874058723, 0.4058784544467926, 0.19488425552845, 0.07949671149253845, 0.06693805009126663, -0.021512657403945923, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.20745371282100677, 0.175139382481575, 0.12366504222154617, 0.10251180827617

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.1714889407157898, 0.4523398280143738, 0.476194828748703, -0.15263761579990387, -0.0308248121291399, 0.6227242350578308, 0.5014522075653076, -0.18421342968940735, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.002341104904189706, 0.311419278383255, 0.23292218148708344, -0.1405833214521408,

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.10340754687786102, 0.014464021660387516, 0.0741482749581337, -0.04414888843894005, -0.10508491843938828, -0.11640511453151703, -0.05655544623732567, -0.10286304354667664, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.07193383574485779, 0.011030630208551884, -0.017864219844341278, -0.024

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.026400059461593628, -0.032613612711429596, -0.07452519237995148, 0.1125570610165596, -0.015347570180892944, -0.0012415106175467372, -0.010817115195095539, -0.012473725713789463, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.013589251786470413, 0.031079266220331192, 0.04729590192437172, 

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [0.07664772123098373, -0.1295008510351181, -0.1261359453201294, 0.0924665555357933, 0.0140773244202137, -0.035049598664045334, -0.0519816130399704, 0.20446063578128815, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.13339586555957794, -0.17993389070034027, -0.19794757664203644, -0.080311797

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.024288898333907127, -0.14357905089855194, -0.1579153686761856, 0.06032174453139305, -0.028796866536140442, -0.19545023143291473, -0.13431060314178467, 0.04175986349582672, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.12327883392572403, -0.17964577674865723, -0.19043384492397308, -0.02

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.07078990340232849, -0.18787992000579834, -0.19597676396369934, -0.022049330174922943, 0.006004718132317066, 0.03952643275260925, -0.05655544623732567, -0.0667073130607605, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.09410161525011063, -0.17219240963459015, -0.18033340573310852, -0.06

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.028496550396084785, 0.13865283131599426, 0.05727316066622734, -0.0762936994433403, 0.14037764072418213, 0.324924498796463, 0.2681867182254791, 0.0236820001155138, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.11916255950927734, -0.15233615040779114, -0.09731276333332062, -0.07830274850

{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.20807230472564697, -0.16072386503219604, -0.1720806062221527, -0.18679147958755493, -0.17694029211997986, -0.15786141157150269, -0.1297367811203003, -0.18421342968940735, 0.0, 0.0, 0.0, 0.0]
            }
          }
        }
      }
    }
{
      "size": 20,
      "query": {
        "script_score": {
          "query": {
                "exists": {
                    "field": "embedding"
                }
          },
          "script": {
            "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
            "params": {
              "query_vector": [-0.1164817214012146, -0.16190886497497559, -0.18427503108978271, -0.0541

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<s

<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<succeeded> True
<action> _tag_one
<succeeded> True
<action> _register_one
<s

In the individual embedding data buckets create a new ES view for boosting factors (tagged by the new master view).

In [408]:
new_boosting_views = []
for bucket, view_id in all_views:
    print(f"Creating a new ES view on boosting factors in '{bucket}'...")
    org = bucket.split("/")[0]
    proj = bucket.split("/")[1]
    boosting_view = nxs.views.create_es(
        org, proj,
        mapping=BOOSTING_VIEW_MAPPING,
        tag=MASTER_TAG,
        resource_types=[
            f"https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/dke/seu-embeddings/_/SimilarityBoostingFactor"],
        # Here we need to make sure we add SimilarityBoostingFactor to some global context
        source_as_text=False,
        include_metadata=True, 
        include_deprecated=False)
    new_boosting_views.append(boosting_view)

Creating a new ES view on boosting factors in 'dke/seu-embeddings'...
Creating a new ES view on boosting factors in 'dke/seu-embeddings-2'...


In [409]:
for el in new_boosting_views:
    print("Project: ", el["_project"])
    print("View: ", el["@id"])
    print()

Project:  https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/dke/seu-embeddings
View:  https://bbp.epfl.ch/neurosciencegraph/data/6a5b3172-4615-4878-816b-57bec6cb16f6

Project:  https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/dke/seu-embeddings-2
View:  https://bbp.epfl.ch/neurosciencegraph/data/d33d168d-66f0-4cfa-b5c1-c9422a528dfe



Create a new aggregated view for boosting factors targeting all the new boosting ES views (in `{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}`).

In [410]:
# Create a new agg view with the 
BOOSTING_VIEW = nxs.views.create_es_aggregated(
    ATLAS_CONFIG_ORG,
    ATLAS_CONFIG_PROJECT,
    new_boosting_views,
    view_id=None
)
BOOSTING_VIEW_ID = BOOSTING_VIEW["@id"]

In [411]:
BOOSTING_VIEW_ID

'https://bbp.epfl.ch/neurosciencegraph/data/4f328783-d278-4154-bf88-dea4b15ccb8e'

__IMPORTANT__: Here, before we execute the next step, we need to make sure that the indexing in the aggregated view is over. Execute the following cell until it stops throwing an assertion error. If no error is observed, all the resources have been indexed, and we can proceed with the rest of the notebook.

In [441]:
set_elastic_view(forge_atlas, MASTER_VIEW_ID)
all_vectors = forge_atlas.elastic(f"""{{
    "from" : 0,
    "size" : {HARD_RESOURCE_LIMIT},
    "query": {{
        "term": {{"_deprecated": false}}
    }}
}}
""")

set_elastic_view(forge_atlas, BOOSTING_VIEW_ID)
all_factors = forge_atlas.elastic(f"""{{
    "from" : 0,
    "size" : {HARD_RESOURCE_LIMIT},
    "query": {{
        "term": {{"_deprecated": false}}
    }}
}}
""")
print(len(all_vectors))
print(len(all_factors))
assert(len(all_vectors) == len(all_factors))

{
    "from" : 0,
    "size" : 10000,
    "query": {
        "term": {"_deprecated": false}
    }
}

{
    "from" : 0,
    "size" : 10000,
    "query": {
        "term": {"_deprecated": false}
    }
}

400
400


## Compute boosted statistics

Compute statistics (min/max/mean/std) of similarity values after boosting and push them as a ElasticSearchViewStatistics resource (created if doesn't exist, updated if exists), taged with the new revision of the master view.

In [415]:
set_elastic_view(forge_atlas, MASTER_VIEW_ID)
values, stats = get_view_stats(
    forge_atlas, vectors,
    formula,
    VECTOR_PARAMETER,
    HARD_RESOURCE_LIMIT,
    ALL_DEVIATIONS)
stats_resource = register_stats(
    forge_atlas, MASTER_VIEW_ID,
    values.shape[0], stats, formula, VECTOR_PARAMETER,
    tag=MASTER_TAG, boosted=True)

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/3acaffd6-4ede-4d30-b29c-f88a9dda2859" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.1416456550359726, 0.09924907982349396, 0.11578177660703659, -0.16871002316474915, -0.07568155974149704, 0.2617238163948059, 0.1812838762998581, -0.04862945154309273, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
 

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/80a9c0da-3e3c-44dd-a5c7-efa2c182137b" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.02870853990316391, 0.11050949990749359, 0.054193757474422455, -0.018031230196356773, 0.009494900703430176, 0.02084416337311268, 0.007478217128664255, -0.012473725713789463, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/b49922e2-5adf-4d80-8606-90fb144053fb" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.11504893004894257, -0.00048731331480666995, -0.028334181755781174, -0.1405833214521408, -0.12035557627677917, -0.11000135540962219, -0.1297367811203003, -0.012473725713789463, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "q

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/b6d9a7e5-a382-4759-9c17-ebeb478e3118" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.11990752816200256, 0.1400831788778305, 0.06072208657860756, 0.01612263172864914, 0.14073938131332397, 0.12482598423957825, 0.03492121398448944, 0.15022704005241394, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
   

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/0e9f8614-dbbd-4b2b-8d51-e47f6ec72cbf" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.13247039914131165, 0.10699722915887833, 0.1580311506986618, -0.25509920716285706, 0.0359426774084568, 0.000355557887814939, 0.06693805009126663, -0.03959051892161369, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/a869e7d6-0900-44e9-9df7-78a446bfa231" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.049087826162576675, 0.05230037868022919, 0.01859588548541069, -0.14861951768398285, -0.13551540672779083, -0.07576058059930801, -0.08857227861881256, -0.13901875913143158, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/9d73bb33-8b35-40e1-8e21-78c28e2d5e8f" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.28055766224861145, 0.06220844388008118, 0.020073996856808662, 0.5465120077133179, 0.2697761654853821, 0.10753434896469116, 0.11725021153688431, 0.168304905295372, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
     

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/6e89282e-66e0-4da9-8cf1-9d8c5861b68b" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.09292639791965485, 0.19100774824619293, 0.1591397374868393, -0.04816699028015137, 0.09499838203191757, 0.07205074280500412, -0.001669449033215642, 0.09599345177412033, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/55bd92dc-756f-4e6c-979a-cbc65abc395c" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.06009721755981445, -0.08682508766651154, -0.02463890239596367, 0.04826744273304939, -0.07449735701084137, -0.06135303154587746, 0.02119971625506878, -0.11190196871757507, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query"

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/09e307f6-eba9-43e3-be7d-248764bb5182" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.08896669745445251, -0.120065838098526, -0.15286515653133392, -0.024058381095528603, -0.1269165426492691, -0.0717124491930008, -0.1617536097764969, -0.1932523548603058, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/c3485a13-499b-4e21-ac22-687defd02584" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.11398883163928986, -0.03006223775446415, -0.07193849980831146, -0.12250186502933502, -0.12339934706687927, -0.10780269652605057, -0.20291811227798462, -0.11190196871757507, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/8ba163a7-0030-4c49-8b8b-e3cbaf759f52" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.013589251786470413, 0.031079266220331192, 0.04729590192437172, 0.11456611007452011, 0.026384765282273293, -0.0622827410697937, 0.012052049860358238, 0.03272093087434769, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": 

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/fdd1718a-cc09-4644-824a-50f5d5aef8e6" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.08211369812488556, -0.057635094970464706, -0.07267755270004272, 0.034204088151454926, -0.05799919366836548, -0.06617438793182373, -0.06112927943468094, -0.021512657403945923, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "qu

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/a335d583-9caf-4411-a0f4-d226107e6bf2" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.12530876696109772, -0.14476995170116425, -0.14424282312393188, 0.008086428977549076, -0.15411679446697235, -0.13088354468345642, -0.11144144088029861, -0.12997983396053314, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/f22848c8-ff88-4a8e-92ba-105992ad6251" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.024288898333907127, -0.14357905089855194, -0.1579153686761856, 0.06032174453139305, -0.028796866536140442, -0.19545023143291473, -0.13431060314178467, 0.04175986349582672, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/161643ba-76b1-46d7-895d-a04fee358ce9" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.10232459008693695, -0.11990533024072647, -0.04274577647447586, 0.004068327601999044, -0.13244746625423431, 0.02577097900211811, -0.024538613855838776, -0.1932523548603058, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/29b691ca-82d4-4ba0-901c-ee26147afce7" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.08662490546703339, -0.1449112892150879, -0.16050206124782562, -0.09437515586614609, -0.15244178473949432, -0.10130447149276733, -0.0977199450135231, -0.18421342968940735, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query"

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/04e3f182-f5fe-42e2-86fa-5508e87d1f4c" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.028496550396084785, 0.13865283131599426, 0.05727316066622734, -0.0762936994433403, 0.14037764072418213, 0.324924498796463, 0.2681867182254791, 0.0236820001155138, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
    

{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/5f8a123a-ddb3-4d3a-988b-a12b1d5c28f5" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [0.13254274427890778, -0.1164899468421936, -0.13192522525787354, 0.2531906068325043, 0.03350342437624931, -0.07395783066749573, -0.0977199450135231, 0.050798796117305756, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "query": {


{
          "size": 200,
          "query": {
            "script_score": {
                "query": {
                    "bool" : {
                      "must_not" : {
                        "term" : { "@id": "https://bbp.epfl.ch/neurosciencegraph/data/embeddings/19854799-8c7f-4a8a-b5bf-430cd69f256d" }
                      },
                      "must": { "exists": { "field": "embedding" } }
                    }
                },
                "script": {
                    "source": "1 / (1 + l2norm(params.query_vector, doc['embedding']))",
                    "params": {
                      "query_vector": [-0.08049522340297699, -0.025934478268027306, -0.05481702834367752, -0.07026654481887817, -0.1650765985250473, -0.14860062301158905, -0.08857227861881256, -0.12094090133905411, 0.0, 0.0, 0.0, 0.0]
                    }
                }
            }
          }
        }
{
          "size": 200,
          "query": {
            "script_score": {
                "quer

## Create a new ES view serving statistics

We serve stats tagged with the new uuid of the master view.

In [416]:
STATISTICS_VIEW = nxs.views.create_es(
    ATLAS_CONFIG_ORG, ATLAS_CONFIG_PROJECT,
    mapping=STATS_VIEW_MAPPING,
    tag=MASTER_TAG,
    resource_types=[
        f"https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/{ATLAS_CONFIG_ORG}/{ATLAS_CONFIG_PROJECT}/_/ElasticSearchViewStatistics"],
    source_as_text=False,
    include_metadata = True, 
    include_deprecated = False)
STATISTICS_VIEW_ID = STATISTICS_VIEW["@id"]

## Update recommender condifuration and clean-up old resources

__IMPORTANT__: Check that the stats finised indexing (the following cell, shouldn't throw an assertion error

In [424]:
set_elastic_view(forge_atlas, STATISTICS_VIEW_ID)
all_stats = forge_atlas.elastic(f"""{{
    "from" : 0,
    "size" : {HARD_RESOURCE_LIMIT},
    "query": {{
        "term": {{"_deprecated": false}}
    }}
}}
""")
print(len(all_stats))
assert(len(all_stats) == 2)

{
    "from" : 0,
    "size" : 10000,
    "query": {
        "term": {"_deprecated": false}
    }
}

2


Fetch old boosting and statistics views

In [425]:
old_boosting_view = None
if "boostingView" in forge_atlas.as_json(current_config):
    old_boosting_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.boostingView)
    
old_statistics_view = None
if "statisticsView" in forge_atlas.as_json(current_config):
    old_statistics_view = nxs.views.fetch(
        ATLAS_CONFIG_ORG,
        ATLAS_CONFIG_PROJECT,
        current_config.statisticsView)

Update current Atlas recommendation config

In [426]:
current_config.similarityView = MASTER_VIEW_ID
current_config.boostingView = BOOSTING_VIEW_ID
current_config.statisticsView = STATISTICS_VIEW_ID
update_current_config(forge_atlas, config_resource, current_config)

<action> _update_one
<succeeded> True


Deprecate old boosting statistics.

In [427]:
MASTER_VIEW_ID

'https://bbp.epfl.ch/neurosciencegraph/data/ef19bdf2-3617-4384-9a30-623ac40fdf39'

In [428]:
if old_master_view is not None:
    deprecate_individual_views(old_master_view)
    nxs.views.deprecate_es(old_master_view)

if old_boosting_view is not None:
    deprecate_individual_views(old_boosting_view)
    nxs.views.deprecate_es(old_boosting_view)

if old_statistics_view is not None:
    nxs.views.deprecate_es(old_statistics_view)