# Beyond accuracy evaluation (bacceval)

## Notes

### Metrics

 * Diversity at 100 (div@100) : on prend seulement les 100 premiers elements du ranking et on calcul la diversité de ceux-ci en terme de topic, de TFIDF et de style. Donc la moyenne des pairwise distances $$diversity(R) = \frac{\sum_{i=1}^{|R|}\sum_{j=i+1}^{|R|} dist(R_i, R_j)}{\frac{{|R|}^2-{|R|}}{2}}$$
 With R a set of recommendation lists. On ne prend que les 100 premier car si on prennait les 1000, alors tous les modèles auraient la même diversité.
 On utilise les representation vectorielles de TFIDF, style et topic avec la cosine distance. Une quatrieme diversity se base sur la moyenne des distance de jaccard. Attention cette distance c'est pas [l'extension à n ensembles comme décrit sur wikipedia](https://fr.wikipedia.org/wiki/Indice_et_distance_de_Jaccard) (car l'intersection de bcp de document donnera simplement un ensemble vide ou composé de stop words...) mais la moyenne des pairwises distances comme dans cet [article](https://sci-hub.tw/https://ieeexplore.ieee.org/abstract/document/4812525) et [celui-ci](http://www.l3s.de/~siersdorfer/sources/2012/fp055-deng.pdf) (refined diversity jaccard) :
 $$JD(A, B) = 1 - \frac{|A \cap B|}{|A \cup B|}$$ where A and B are sets of words from the item A and item B. TODO dire si on supprime les stopwords.
 * Novelty at 100 (nov@100) : pareil mais entre l'historique utilisateur et R.
 $$novelty(R, H) = \frac{\sum_{i=1}^{|R|}\sum_{j=1}^{|H|} dist(R_i, H_j)}{|R|.|H|}$$
 * Strict novelty at 100 (snov@100) : pareil mais on prend le min.
 $$strictnovelty(R, H) = \frac{\sum_{i=1}^{|R|} mindist(R_i, H)}{|R|}$$
 * Serendipity at 100 (ser@100) : the ratio of relevants items the evaluated model recommanded and the primitive model didn't recommand. With $R$ the recommendation set of the evaluated model, $P$ the recommendation set of the primitive model, $T$ the set of relevant items, and for cases where $T \setminus P \neq \emptyset$, we define the serendipity as:
 $$serendipity(R, P, T) = \frac{|R \cap (T \setminus P)|}{|T \setminus P|}$$
 Cases where $T \setminus P = \emptyset$ are not relevant because the primitive model already predicted all relevant items. Thus no model can be serendipe. These cases are not taken into account in the average for all user (+ TODO donner le % des cas $T \setminus P = \emptyset$).
 Les modèles primitif sont le modèle TFIDF avec historyRef=1 et lowercase et lemmatization. L'autre est le modèle qui prend le set des mots sans stop words pour l'historique, et cherche la meilleur similarité jaccard dans les candidats.

## Mongo monitoring

    db.getCollection('scores').find({'metric': 'snov@100'}).count()
    db.getCollection('scores').find({'metric': 'jacc-snov@100'}).count()

## Commands

In [None]:
# Killer unique run:
# oomstopper --no-tail bacceval ; killbill bacceval ; cd ~/twinews-logs ; jupython -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb

In [None]:
# Unique run:
# oomstopper --no-tail bacceval ; cd ~/twinews-logs ; jupython -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb

In [None]:
# Killer triple run:
# oomstopper --no-tail bacceval ; killbill bacceval ; cd ~/twinews-logs ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb ; sleep 30 ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb ; sleep 30 ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb

In [None]:
# Triple run:
# oomstopper --no-tail bacceval ; cd ~/twinews-logs ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb ; sleep 30 ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb ; sleep 30 ; jupython --no-tail -o nohup-bacceval-$HOSTNAME-$(date +%Y-%m-%d.%M-%S).out --venv st-venv ~/Workspace/Python/Datasets/Twinews/twinews/evaluation/bacceval.ipynb

## Imports

In [None]:
import os ; os.environ["CUDA_VISIBLE_DEVICES"] = ""
isNotebook = '__file__' not in locals()
TEST = isNotebook

In [None]:
from systemtools.hayj import *
from systemtools.location import *
from systemtools.basics import *
from systemtools.file import *
from systemtools.printer import *
from databasetools.mongo import *
from datastructuretools.cache import *
from newstools.goodarticle.utils import *
from nlptools.preprocessing import *
from nlptools.news import parser as newsParser
from machinelearning.iterator import *
from twinews.utils import *
from twinews.evaluation import metrics
from twinews.evaluation.utils import *
from twinews.models.genericutils import *
from twinews.models.ranking import *
import time
import pymongo

## Init

In [None]:
# Defining logger:
logger = Logger(tmpDir('logs') + "/bacceval.log") if isNotebook else Logger("bacceval-" + getHostname() + "-" + getDateSec() + ".log")
tt = TicToc(logger=logger)
tt.tic()

In [None]:
# Making the cache that is a dict-like object (url --> vector) keeping data until 2Go of free RAM:
genericCaches = dict()
newsCollection = getNewsCollection(logger=logger)
def getter(key, logger=None, verbose=True):
    global newsCollection
    global genericCaches
    global genericFields
    if newsCollection is None:
        newsCollection = getNewsCollection(logger=logger, verbose=verbose)
    cacheKey, url = key
    field = genericFields[cacheKey]
    if cacheKey in genericCaches:
        genericCache = genericCaches[cacheKey]
    else:
        genericCache = getGenericCache(cacheKey, logger=logger, verbose=verbose)
        genericCaches[cacheKey] = genericCache
    row = newsCollection.findOne({'url': url}, projection={field: True})
    theHash = objectToHash(row[field])
    return genericCache[theHash]

In [None]:
# We define the minFreeRAM:
minFreeRAM = 4
cleanInterval = 3.0

In [None]:
# We define primitive models and cache keys:
cacheKeys = {"tfidf", "dbert-ft", "nmf"}

In [None]:
# Making the cache instance (don't forget to purge it at the end):
cache = Cache(getter, logger=logger, name="cacheForGenericVectors")

In [None]:
# We get scores collection and the rankings GridFS:
twinewsScores = getTwinewsScores(logger=logger)
twinewsRankings = getTwinewsRankings(logger=logger)

In [None]:
def cacheForceFeeding(model, maxItems=None, logger=None, verbose=True):
    global cache
    global newsCollection
    i = 0
    urls = list(newsCollection.distinct("url"))
    if maxItems is not None:
        urls = urls[:maxItems]
    for url in pb(urls, printRatio=0.01, message="Force-feeding the cache...",
                  logger=logger, verbose=verbose):
        cache[(model, url)]
        if i % 1000 == 0 and freeRAM() < 2:
            logWarning("Stopping because no RAM left.", logger)
            break
        i += 1

## Diversity

In [None]:
def basicDistPrint(url1, url2, dist, prob=1.0, logger=None, verbose=True):
    if verbose:
        if getRandomFloat() < prob and (dist >= 0.99 or dist < 0.84):
                log(dist, logger)
                # t1 = getNewsField(url1, 'detokText')
                # t2 = getNewsField(url2, 'detokText')
                t1Words = set(flattenLists(getNewsField(url1, 'sentences', verbose=False)))
                t2Words = set(flattenLists(getNewsField(url2, 'sentences', verbose=False)))
                inter = t1Words.intersection(t2Words)
                log("-" * 20, logger)
                bp(t1Words, 4, logger)
                log("-" * 20, logger)
                bp(t2Words, 4, logger)
                log("-" * 20, logger)
                bp(inter, 5, logger)
                log(len(inter), logger)
                log("#" * 20, logger)
                log("#" * 20, logger)

In [None]:
def tfidfDiversityAt100(urls, logger=None, verbose=False):
    return diversity(urls, 'tfidf', at=100, logger=logger, verbose=verbose)
def styleDiversityAt100(urls, logger=None, verbose=False):
    return diversity(urls, 'dbert-ft', at=100, logger=logger, verbose=verbose)
def topicDiversityAt100(urls, logger=None, verbose=False):
    return diversity(urls, 'nmf', at=100, logger=logger, verbose=verbose)
def diversity(urls, model, at=100, distance="cosine", logger=None, verbose=False):
    global cache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    vectors = vstack([cache[(model, url)] for url in urls])
    distances = getDistances(vectors, vectors, metric=distance, verbose=False)
    pairwiseCount = 0
    distSum = 0
    for i in range(at):
        for u in range(i+1, at):
            dist = distances[i][u]
            distSum += dist
            pairwiseCount += 1
            basicDistPrint(urls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
    assert pairwiseCount == (at**2 - at) / 2 # \frac{{|R|}^2-{|R|}}{2}
    return distSum / pairwiseCount

## Jaccard diversity

In [None]:
def swJaccardRepr(url, *args, **kwargs):
    return __jaccardRepr(url, 200, *args, **kwargs)
def jaccardRepr(url, *args, **kwargs):
    return __jaccardRepr(url, 0, *args, **kwargs)
def __jaccardRepr\
(
    url,
    stopWordAmount,
    lowercase=True,
    logger=None, verbose=True,
):
    global newsCollection
    global STOP_WORDS
    assert '__int_1__' in STOP_WORDS
    if stopWordAmount is None or stopWordAmount == 0:
        sw = None
    else:
        sw = set(STOP_WORDS[:stopWordAmount])
    sentences = getNewsField(url, 'sentences', verbose=False)
    tokens = flattenLists(sentences)
    if lowercase:
        tokens = [e.lower() for e in tokens]
    tokens = set(tokens)
    if sw is not None and len(sw) > 0:
        tokens = set([e for e in tokens if e not in sw])
    return tokens

In [None]:
if False:
    def jaccardDistance(url1, url2, cache):
        assert isinstance(useSW, Cache)
        words1 = cache[url1]
        words2 = cache[url2]
        return 1 - len(words1.intersection(words2)) / len((words1.union(words2)))

In [None]:
def jaccardDistance(key, **kwargs):
    url1, url2, useSW = key
    return __jaccardDistance(url1, url2, useSW)

In [None]:
jaccardDistanceCache = Cache\
(
    jaccardDistance,
    logger=logger,
    minFreeRAM=minFreeRAM + 5,
    name="jaccardDistanceCache",
    indexStrings=True,
    cleanInterval=cleanInterval / 10,
    actionCleanInterval=60 * 10,
    fake=True,
)

In [None]:
def __jaccardDistance(url1, url2, useSW):
    global jaccardCache
    global swJaccardCache
    assert isinstance(useSW, bool)
    cache = swJaccardCache if useSW else jaccardCache
    words1 = cache[url1]
    words2 = cache[url2]
    return 1 - len(words1.intersection(words2)) / len((words1.union(words2)))

In [None]:
def swJaccardDiversityAt100(urls, logger=None, verbose=False):
    return jaccardDiversity(urls, True, at=100, logger=logger, verbose=verbose)
def jaccardDiversityAt100(urls, logger=None, verbose=False):
    return jaccardDiversity(urls, False, at=100, logger=logger, verbose=verbose)
def jaccardDiversity(urls, useSW, at=100, logger=None, verbose=False):
    global swJaccardCache
    global jaccardCache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    pairwiseCount = 0
    distSum = 0
    for i in range(at):
        for u in range(i+1, at):
            dist = jaccardDistanceCache[(urls[i], urls[u], useSW)]
            distSum += dist
            pairwiseCount += 1
            basicDistPrint(urls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
    assert pairwiseCount == (at**2 - at) / 2 # \frac{{|R|}^2-{|R|}}{2}
    return distSum / pairwiseCount

In [None]:
swJaccardCache = Cache(swJaccardRepr, logger=logger, name="swJaccardCache", minFreeRAM=minFreeRAM + 1, cleanInterval=cleanInterval)
jaccardCache = Cache(jaccardRepr, logger=logger, name="jaccardCache", minFreeRAM=minFreeRAM + 1, cleanInterval=cleanInterval)

## Novelty

In [None]:
def tfidfNoveltyAt100(*args, logger=None, verbose=False):
    return novelty(*args, 'tfidf', at=100, logger=logger, verbose=verbose)
def styleNoveltyAt100(*args, logger=None, verbose=False):
    return novelty(*args, 'dbert-ft', at=100, logger=logger, verbose=verbose)
def topicNoveltyAt100(*args, logger=None, verbose=False):
    return novelty(*args, 'nmf', at=100, logger=logger, verbose=verbose)
def novelty(historyUrls, urls, model, at=100, distance="cosine", logger=None, verbose=False):
    global cache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    historyUrls = list(historyUrls)
    assert len(historyUrls) > 0
    historyVectors = vstack([cache[(model, url)] for url in historyUrls])
    vectors = vstack([cache[(model, url)] for url in urls])
    distances = getDistances(historyVectors, vectors, metric=distance, verbose=False)
    pairwiseCount = 0
    distSum = 0
    for i in range(len(historyUrls)):
        for u in range(len(urls)):
            dist = distances[i][u]
            distSum += dist
            pairwiseCount += 1
            basicDistPrint(historyUrls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
    assert pairwiseCount == len(historyUrls) * len(urls)
    return distSum / pairwiseCount

## Jaccard novelty

In [None]:
def swJaccardNoveltyAt100(*args, logger=None, verbose=False):
    return jaccardNovelty(*args, True, at=100, logger=logger, verbose=verbose)
def jaccardNoveltyAt100(*args, logger=None, verbose=False):
    return jaccardNovelty(*args, False, at=100, logger=logger, verbose=verbose)
def jaccardNovelty(historyUrls, urls, useSW, at=100, logger=None, verbose=False):
    global swJaccardCache
    global jaccardCache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    historyUrls = list(historyUrls)
    assert len(historyUrls) > 0
    pairwiseCount = 0
    distSum = 0
    for i in range(len(historyUrls)):
        for u in range(len(urls)):
            dist = jaccardDistanceCache[(historyUrls[i], urls[u], useSW)]
            distSum += dist
            pairwiseCount += 1
            basicDistPrint(historyUrls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
    assert pairwiseCount == len(historyUrls) * len(urls)
    return distSum / pairwiseCount

## Strict novelty

In [None]:
def tfidfStrictNoveltyAt100(*args, logger=None, verbose=False):
    return strictNovelty(*args, 'tfidf', at=100, logger=logger, verbose=verbose)
def styleStrictNoveltyAt100(*args, logger=None, verbose=False):
    return strictNovelty(*args, 'dbert-ft', at=100, logger=logger, verbose=verbose)
def topicStrictNoveltyAt100(*args, logger=None, verbose=False):
    return strictNovelty(*args, 'nmf', at=100, logger=logger, verbose=verbose)
def strictNovelty(historyUrls, urls, model, at=100, distance="cosine", logger=None, verbose=False):
    global cache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    historyUrls = list(historyUrls)
    assert len(historyUrls) > 0
    historyVectors = vstack([cache[(model, url)] for url in historyUrls])
    vectors = vstack([cache[(model, url)] for url in urls])
    distances = getDistances(historyVectors, vectors, metric=distance, verbose=False)
    pairwiseCount = 0
    distSum = 0
    for u in range(len(urls)):
        minDist = None
        for i in range(len(historyUrls)):
            dist = distances[i][u]
            if dist > 0 and dist < 0.00001:
                dist = 0.0
            if minDist is None or dist < minDist:
                minDist = dist
            basicDistPrint(historyUrls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
        pairwiseCount += 1
        distSum += minDist
    assert pairwiseCount == len(urls)
    return distSum / pairwiseCount

## Jaccard strict novelty

In [None]:
def swJaccardStrictNoveltyAt100(*args, logger=None, verbose=False):
    return jaccardStrictNovelty(*args, True, at=100, logger=logger, verbose=verbose)
def jaccardStrictNoveltyAt100(*args, logger=None, verbose=False):
    return jaccardStrictNovelty(*args, False, at=100, logger=logger, verbose=verbose)
def jaccardStrictNovelty(historyUrls, urls, useSW, at=100, logger=None, verbose=False):
    global swJaccardCache
    global jaccardCache
    assert isinstance(urls, list)
    urls = urls[:at]
    assert len(urls) == at
    historyUrls = list(historyUrls)
    assert len(historyUrls) > 0
    pairwiseCount = 0
    distSum = 0
    for u in range(len(urls)):
        minDist = None
        for i in range(len(historyUrls)):
            dist = jaccardDistanceCache[(historyUrls[i], urls[u], useSW)]
            if dist > 0 and dist < 0.00001:
                dist = 0.0
            if minDist is None or dist < minDist:
                minDist = dist
            basicDistPrint(historyUrls[i], urls[u], dist, verbose=TEST and verbose, logger=logger)
        pairwiseCount += 1
        distSum += minDist
    assert pairwiseCount == len(urls)
    return distSum / pairwiseCount

## Serendipity

 * splitVersion 1, rankings having no serendipity scores for tfidf-ser@100: **24.0%** (meaning the primitive model predicted all relevant items, thus 0-division in the formula...)
 * splitVersion 1, for jacc-ser@100: **2.31%**
 * splitVersion 2, for tfidf-ser@100: **27.9%**
 * splitVersion 2, for jacc-ser@100: **2.88%**
 * For wtfidf-ser@100: **28.78%** (unknown splitVersion)
 * For style-ser@100: **22.15%** (unknown splitVersion)
 * For bm25-ser@100: **28.78%** (unknown splitVersion)

In [None]:
pmodels = {1: {"tfidf": "tfidf-4b89a", "wtfidf": "tfidf-7febb", "jaccard": "jaccard-1d3f1", "bm25": "bm25-933f7", "dbert-ft": "dbert-ft-7847a"}, 2: {"tfidf": "tfidf-71fb5", "wtfidf": "tfidf-7e79d", "jaccard": "jaccard-1499a", "bm25": "bm25-1eb2a", "dbert-ft": "dbert-ft-d1b5f"}}

In [None]:
def tfidfSerendipityAt100(rankings, userId, rankingIndex, splitVersion, logger=None, verbose=False):
    return serendipity(rankings, userId, rankingIndex, splitVersion, 'tfidf', at=100, logger=logger, verbose=verbose)
def wtfidfSerendipityAt100(rankings, userId, rankingIndex, splitVersion, logger=None, verbose=False):
    return serendipity(rankings, userId, rankingIndex, splitVersion, 'wtfidf', at=100, logger=logger, verbose=verbose)
def bm25SerendipityAt100(rankings, userId, rankingIndex, splitVersion, logger=None, verbose=False):
    return serendipity(rankings, userId, rankingIndex, splitVersion, 'bm25', at=100, logger=logger, verbose=verbose)
def styleSerendipityAt100(rankings, userId, rankingIndex, splitVersion, logger=None, verbose=False):
    return serendipity(rankings, userId, rankingIndex, splitVersion, 'dbert-ft', at=100, logger=logger, verbose=verbose)
def jaccardSerendipityAt100(rankings, userId, rankingIndex, splitVersion, logger=None, verbose=False):
    return serendipity(rankings, userId, rankingIndex, splitVersion, 'jaccard', at=100, logger=logger, verbose=verbose)
def serendipity(rankings, userId, rankingIndex, splitVersion, model, at=100, logger=None, verbose=False):
    global pmodels
    global pmodelsRankingsCache
    global evalDataCache
    # Getting T:
    evalData = evalDataCache[splitVersion]
    T = set(evalData['testUsers'][userId].keys())
    # Getting R:
    R = rankings[userId][rankingIndex]
    if isinstance(R[0], tuple):
        R = [e[0] for e in R]
    R = set(R[:at])
    assert len(R) == 100
    # Getting P:
    pmodel = pmodels[splitVersion][model]
    prankings = pmodelsRankingsCache[pmodel]
    P = prankings[userId][rankingIndex]
    if isinstance(P[0], tuple):
        P = [e[0] for e in P]
    P = set(P[:at])
    assert len(P) == 100
    # Getting T minus P:
    TminusP = set([e for e in T if e not in P])
    # We check if this is relevant:
    if len(TminusP) == 0:
        return None
    # We compute the ratio:
    score = len(R.intersection(TminusP)) / len(TminusP)
    # Printing stuff:
    if TEST:
        allItems = T.union(R).union(P)
        idsMap = dict()
        i = 0
        for url in allItems:
            idsMap[url] = i
            i += 1
        T = set([idsMap[e] for e in T])
        R = set([idsMap[e] for e in R])
        P = set([idsMap[e] for e in P])
        TminusP = set([idsMap[e] for e in TminusP])
        log("userId: " + str(userId), logger)
        log("splitVersion: " + str(splitVersion), logger)
        log("model: " + str(model), logger)
        log("-" * 20, logger)
        log("T: " + str(T), logger)
        log("-" * 20, logger)
        log("R: " + str(R), logger)
        log("-" * 20, logger)
        log("P: " + str(P), logger)
        log("-" * 20, logger)
        log("TminusP: " + str(TminusP), logger)
        log("-" * 20, logger)
        log("score: " + str(score), logger)
        log("#" * 20, logger)
        log("#" * 20, logger)
    return score

In [None]:
def getRankings(key, logger=None, verbose=True, **kwargs):
    log("Downloading rankings of the primitive model " + key + "...", logger, verbose=verbose)
    rk = twinewsRankings[key]
    log("Done.", logger, verbose=verbose)
    return rk
pmodelsRankingsCache = Cache(getRankings, logger=logger, name="pmodelsRankingsCache")

## Continuous evaluation

In [None]:
# We init an eval data cache:
def evalDataGetter(splitVersion, logger=None, verbose=True):
    log("Downloading eval data version " + str(splitVersion) + "...", logger, verbose=verbose)
    return getEvalData(splitVersion, logger=logger, verbose=verbose, maxExtraNews=0)
evalDataCache = Cache(evalDataGetter, logger=logger, name="evalDataCache", minFreeRAM=minFreeRAM + 3, cleanInterval=cleanInterval)

In [None]:
# Misc params:
iterations = 1 if isNotebook else 10000000
sleep = 0 if isNotebook else 30
exceptionSleep = 10

In [None]:
# To prevent reloading rankings at each test:
testRankings = None

In [None]:
# Metrics for local:
metricFuncts = \
{
    ##### Diversity #####
    'div@100': tfidfDiversityAt100,
    'style-div@100': styleDiversityAt100,
    'topic-div@100': topicDiversityAt100,
    ##### Jaccard diversity #####
    'jacc-div@100': jaccardDiversityAt100,
    'swjacc-div@100': swJaccardDiversityAt100,
    ##### Novelty #####
    'nov@100': tfidfNoveltyAt100,
    'style-nov@100': styleNoveltyAt100,
    'topic-nov@100': topicNoveltyAt100,
    ##### Jaccard novelty #####
    'jacc-nov@100': jaccardNoveltyAt100,
    'swjacc-nov@100': swJaccardNoveltyAt100,
    ##### Strict novelty #####
    'snov@100': tfidfStrictNoveltyAt100,
    'style-snov@100': styleStrictNoveltyAt100,
    'topic-snov@100': topicStrictNoveltyAt100,
    ##### Jaccard strict novelty #####
    'jacc-snov@100': jaccardStrictNoveltyAt100,
    'swjacc-snov@100': swJaccardStrictNoveltyAt100,
    ##### Serendipity #####
    'tfidf-ser@100': tfidfSerendipityAt100,
    'wtfidf-ser@100': wtfidfSerendipityAt100,
    'bm25-ser@100': bm25SerendipityAt100,
    'style-ser@100': styleSerendipityAt100,
    'jacc-ser@100': jaccardSerendipityAt100,
}
tipiNum = lambda: tipiNumber(toInteger=True)
# cacheForceFeeding('tfidf', maxItems=300, logger=logger)
if TEST:
    metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
# elif tipiNum() in {60, 61, 62, 63}:
#     metricFuncts = dictSelect(metricFuncts, {'div@100', 'nov@100'})
elif isHostname("titanv"):
    metricFuncts = dictSelect(metricFuncts, {'swjacc-nov@100', 'swjacc-snov@100'})
elif isHostname("kepler"):
    metricFuncts = dictSelect(metricFuncts, {'swjacc-nov@100', 'swjacc-snov@100'})
elif isHostname("tipi"):
    # metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
    # metricFuncts = dictSelect(metricFuncts, {'div@100', 'nov@100', 'snov@100'})
    # metricFuncts = dictSelect(metricFuncts, {'topic-div@100', 'topic-nov@100', 'topic-snov@100'})
    # metricFuncts = dictSelect(metricFuncts, {'style-div@100', 'style-nov@100', 'style-snov@100'})
    # metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
    metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
    if False:
        tipis = "88 81 85 82 92 93 95 58 63 80 57 59 56 04 61 62 90 86 83 94 84 89 01 87 06 02 03".split()
        tipis = [int(e) for e in tipis]
        tipis = shuffle(tipis, seed=0)
        tipis = split(tipis, 4)
        if tipiNum() in tipis[0]:
            metricFuncts = dictSelect(metricFuncts, {'swjacc-div@100'})
        elif tipiNum() in tipis[1]:
            metricFuncts = dictSelect(metricFuncts, {'swjacc-nov@100', 'swjacc-snov@100'})
        elif tipiNum() in tipis[2]:
            metricFuncts = dictSelect(metricFuncts, {'jacc-div@100'})
        elif tipiNum() in tipis[3]:
            metricFuncts = dictSelect(metricFuncts, {'jacc-nov@100', 'jacc-snov@100'})
        else:
            metricFuncts = dictSelect(metricFuncts, {'jacc-nov@100', 'jacc-snov@100'})
elif octods():
    # metricFuncts = dictSelect(metricFuncts, {'jacc-nov@100', 'jacc-snov@100'})
    metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
else:
    # metricFuncts = dictSelect(metricFuncts, {'jacc-nov@100', 'jacc-snov@100'})
    metricFuncts = dictSelect(metricFuncts, {'jacc-div@100', 'swjacc-div@100', 'jacc-nov@100', 'swjacc-nov@100', 'jacc-snov@100', 'swjacc-snov@100'})
log("Current metric functions:\n" + b(metricFuncts.keys(), 5), logger)

In [None]:
# We print size of caches each n seconds:
def cacheInfos(*args, **kwargs):
    if isFile(tmpDir() + "/cache-infos"):
        for current in [cache, swJaccardCache, jaccardCache, pmodelsRankingsCache, evalDataCache, jaccardDistanceCache]:
            current.printState()
cachePrintTimer = Timer(cacheInfos, 60 * 5)
cachePrintTimer.start()

In [None]:
# For a certain amount of iterations:
for i in range(iterations):
    # We get all
    modelsKeys = shuffle(sorted(list(twinewsRankings.keys())), seed=0 if TEST else None)
    if TEST:
        modelsKeys = [e for e in modelsKeys if "combin" not in e]
        modelsKeys = modelsKeys[:1]
    # For all model instances:
    tt.tic(display=False)
    for modelKey in modelsKeys:
        # We init the eval data to None:
        evalData = None
        rankings = None
        # For all metrics:
        for metricKey, metricFunct in metricFuncts.items():
            # If we didn't added the score previously:
            if TEST or (not twinewsScores.has({'id': modelKey, 'metric': metricKey})):
                try:
                    # We print infos:
                    log("Computing " + metricKey + " score of " + modelKey + "...", logger)
                    # We get all data:
                    meta = twinewsRankings.getMeta(modelKey)
                    splitVersion = meta['splitVersion']
                    maxUsers = meta['maxUsers']
                    modelName = meta['model']
                    # We get eval data:
                    if evalData is None:
                        evalData = evalDataCache[splitVersion]
                    candidates = evalData['candidates']
                    # We get rankings:
                    if rankings is None:
                        if TEST and testRankings is not None:
                            logWarning("Taking testRankings as rankings !!! " * 20, logger)
                            rankings = testRankings
                        else:
                            localTT = TicToc(logger=logger)
                            localTT.tic("Downloading rankings of " + modelKey + "...")
                            rankings = twinewsRankings[modelKey]
                            if rankings is None or len(rankings) == 0:
                                raise Exception("Rankings of " + modelKey + " doesn't exist anymore, you need to re-generate it.")
                            else:
                                checkRankings(rankings, candidates, maxUsers=maxUsers)
                            localTT.toc(modelKey + " downloaded.", logger)
                            if TEST:
                                testRankings = rankings
                    # Init scores:
                    scores = []
                    # We get user ids:
                    userIds = shuffle(sorted(list(rankings.keys())), seed=0)
                    if TEST:
                        userIds = userIds[:100]
                    # Diversity:
                    if 'div@' in metricKey:
                        for userId in pb(userIds, logger=logger, message="Computing " + metricKey + " of " + modelKey):
                            for currentRankings in rankings[userId]:
                                assert len(currentRankings) >= 100
                                assert isinstance(currentRankings, list)
                                assert isinstance(currentRankings[0], str) or isinstance(currentRankings[0], tuple)
                                if isinstance(currentRankings[0], tuple):
                                    currentUrls = [e[0] for e in currentRankings]
                                else:
                                    currentUrls = currentRankings
                                score = metricFunct(currentUrls, logger=logger)
                                scores.append(score)
                        if not TEST:
                            assert len(scores) >= len(rankings)
                    # Novelty:
                    elif 'nov@' in metricKey:
                        for userId in pb(userIds, logger=logger, message="Computing " + metricKey + " of " + modelKey):
                            for currentRankings in rankings[userId]:
                                assert len(currentRankings) >= 100
                                assert isinstance(currentRankings, list)
                                assert isinstance(currentRankings[0], str) or isinstance(currentRankings[0], tuple)
                                if isinstance(currentRankings[0], tuple):
                                    currentUrls = [e[0] for e in currentRankings]
                                else:
                                    currentUrls = currentRankings
                                historyUrls = set(evalData['trainUsers'][userId].keys())
                                score = metricFunct(historyUrls, currentUrls, logger=logger)
                                scores.append(score)
                        if not TEST:
                            assert len(scores) >= len(rankings)
                    # Serendipity:
                    elif 'ser@' in metricKey:
                        totalScoresToCompute = 0
                        noneScores = 0
                        for userId in pb(userIds, logger=logger, message="Computing " + metricKey + " of " + modelKey):
                            for rankingIndex in range(len(rankings[userId])):
                                currentRankings = rankings[userId][rankingIndex]
                                assert len(currentRankings) >= 100
                                assert isinstance(currentRankings, list)
                                assert isinstance(currentRankings[0], str) or isinstance(currentRankings[0], tuple)
                                score = metricFunct(rankings, userId, rankingIndex, splitVersion, logger=logger)
                                if score is None:
                                    noneScores += 1
                                else:
                                    scores.append(score)
                                totalScoresToCompute += 1
                        log("Rankings having no serendipity scores for " + metricKey + ": " + str(truncateFloat(noneScores / totalScoresToCompute * 100, 2)) + "%", logger)
                    else:
                        logError("The metric key " + metricKey + " is unknown.", logger)
                    # We mean all scrores:
                    if len(scores) == 0:
                        score = 0.0
                    else:
                        score = float(np.mean(scores))
                    # And finally we add the score in the db:
                    if not TEST:
                        addTwinewsScore(modelKey, metricKey, score, verbose=False)
                    # We print result:
                    log(metricKey + " score of " + modelKey + ": " + str(truncateFloat(score, 3)), logger)
                except AssertionError as error:
                    logException(e, logger)
                except Exception as e:
                    if isNotebook:
                        raise e
                    else:
                        logError(str(e), logger)
                        time.sleep(exceptionSleep)
        tt.tic(modelKey + " done.")
    if sleep > 0:
        log("Sleeping " + str(sleep) + " seconds for the iteration " + str(i) + " on " + str(iterations) + "...", logger)
        time.sleep(sleep)

## Testing

In [None]:
if isNotebook and TEST:
    urls = shuffle(list(newsCollection.distinct("url")))[:100]

In [None]:
if isNotebook and TEST:
    log(tfidfDiversityAt100(urls), logger)
    log(styleDiversityAt100(urls), logger)
    log(topicDiversityAt100(urls), logger)

In [None]:
if isNotebook and TEST:
    urls = shuffle(list(newsCollection.distinct("url")))[:100]
    jaccardDistance(urls[0], urls[1], swJaccardCache)
    jaccardDiversity(urls, False, verbose=True, logger=logger)

In [None]:
if isNotebook and TEST:
    def parallelJaccardDiversity(urls, useSW, at=100, logger=None, verbose=False):
        """
            Much slower...
        """
        global swJaccardCache
        global jaccardCache
        assert isinstance(urls, list)
        urls = urls[:at]
        assert len(urls) == at
        # Getting data:
        cache = swJaccardCache if useSW else jaccardCache
        data = dict()
        for url in urls:
            data[url] = cache[url]
        # Getting pairs to compute:
        pairs = []
        for i in range(at):
            for u in range(i+1, at):
                pairs.append((i, u))
        pairsChunks = split(pairs, cpuCount())
        # Defining the gen fucnt:
        def genFunct(pairs, urls, data, *args, **kwargs):
            for i, u in pairs:
                yield ((i, u), jaccardDistance(urls[i], urls[u], data))
        # Defining the MLIterator:
        mli = MLIterator(pairsChunks, genFunct, genArgs=(urls, data), verbose=False, parallelProcesses=cpuCount(), maxParallelProcesses=cpuCount())
        # Iterating all pairs yielded by the mli:
        pairwiseCount = 0
        distSum = 0
        for ((i, u), dist) in mli:
            distSum += dist
            pairwiseCount += 1
        # Checking the size:
        assert pairwiseCount == (at**2 - at) / 2 # \frac{{|R|}^2-{|R|}}{2}
        # Returning the result:
        return distSum / pairwiseCount

In [None]:
if isNotebook and TEST:
    urls = shuffle(list(newsCollection.distinct("url")))[:100]
    tt.tic(display=False)
    for i in range(100):
        print(jaccardDiversity(urls, False, verbose=False, logger=logger))
    tt.tic()
    tt.tic(display=False)
    for i in range(100):
        print(parallelJaccardDiversity(urls, False, verbose=False, logger=logger))
    tt.tic()

In [None]:
if isNotebook and TEST:
    urls = shuffle(list(newsCollection.distinct("url")))[:100]
    at = 50
    historyUrls1 = urls[:10]
    urls1 = urls[:50]
    historyUrls2 = urls[50:60]
    urls2 = urls[:50]
    historyUrls3 = urls[:60]
    urls3 = urls[:50]
    historyUrls4 = urls[:60]
    urls4 = urls[:49] + [urls[62]]
    print(jaccardStrictNovelty(historyUrls1, urls1, True, at=at, logger=logger, verbose=True))
    print(jaccardStrictNovelty(historyUrls2, urls2, True, at=at, logger=logger))
    print(jaccardStrictNovelty(historyUrls3, urls3, True, at=at, logger=logger))
    print(jaccardStrictNovelty(historyUrls4, urls4, True, at=at, logger=logger))

## End

In [None]:
cache.purge()
swJaccardCache.purge()
jaccardCache.purge()
pmodelsRankingsCache.purge()
evalDataCache.purge()
jaccardDistanceCache.purge()

In [None]:
tt.toc()