# Mangoes : Evaluate embeddings

This notebook illustrates how to evaluate embeddings created with mangoes. The examples are applied on an embedding created from a sample of wikipedia. 

In [1]:
import mangoes

## Content of this notebook

1. [Evaluate on analogy tasks](#1.-Evaluate-on-analogy-tasks)
2. [Evaluate on similarity tasks](#2.-Evaluate-on-similarity-tasks)
3. [Evaluate on outlier detection tasks](#3.-Evaluate-on-outlier-detection-tasks)


First, we're going to load a pre-created embedding (see the dedicated notebook to see how to create your own). 

In [2]:
embedding = mangoes.Embeddings.load("data/ppmi_1500words_win2")

Mangoes provides 2 types of tasks : analogy tasks, similariry tasks and outlier detection tasks.


## 1. Evaluate on analogy tasks

In [3]:
import mangoes.evaluation.analogy

Analogy tasks try to resolve questions of the form "a is to b as c is to d".  
Google and MSR datasets are available in Mangoes, but you can use your own.

In [4]:
google_dataset = mangoes.evaluation.analogy.GOOGLE
msr_dataset = mangoes.evaluation.analogy.MSR
analogy_evaluation = mangoes.evaluation.analogy.Evaluation(embedding, google_dataset, msr_dataset)
print(analogy_evaluation.get_report())


                                                            Nb questions      cosadd      cosmul
Google                                                         343/19544      59.18%      56.27%
                                                (including 3 duplicates)
------------------------------------------------------------------------------------------------
MSR                                                             578/8000      55.19%      43.77%
------------------------------------------------------------------------------------------------



The result is an object of type `mangoes.evaluation.base.BaseEvaluation`.  
As indicated, some duplicates entries were detected within the Google dataset. You can display the scores for each dataset, keeping or removing these duplicates:  

In [5]:
print(analogy_evaluation.get_score("Google", keep_duplicates=True))
print(analogy_evaluation.get_score("Google", keep_duplicates=False))

Score(cosadd=0.5918367346938775, cosmul=0.5626822157434402, nb=343)
Score(cosadd=0.5941176470588235, cosmul=0.5676470588235294, nb=340)


You can also display the detail of the scores for each subset of the datasets :

In [6]:
print(analogy_evaluation.get_report(show_subsets=True, keep_duplicates=True))


                                                            Nb questions      cosadd      cosmul
Google                                                         343/19544      59.18%      56.27%
                                                (including 3 duplicates)

    semantic                                                     65/8869      67.69%      46.15%
                                                (including 3 duplicates)

        capital-common-countries                                   6/506      33.33%       0.00%
        capital-world                                             3/4524      33.33%       0.00%
        city-in-state                                             0/2467          NA          NA
        currency                                                   0/866          NA          NA
        family                                                    56/506      73.21%      53.57%

    syntactic                                                  278/10675  

You can also display the questions and the computed answers :

In [7]:
print(analogy_evaluation.get_report(show_questions=True, keep_duplicates=True))


                                                            Nb questions      cosadd      cosmul
Google                                                         343/19544      59.18%      56.27%
                                                (including 3 duplicates)

    semantic                                                     65/8869      67.69%      46.15%
                                                (including 3 duplicates)

        capital-common-countries                                   6/506      33.33%       0.00%

        london england paris france                                           france       italy
        london england rome italy                                             france     kingdom
        paris france rome italy                                              germany     germany
        paris france london england                                          germany      africa
        rome italy london england                                         

## 2. Evaluate on similarity tasks

In [8]:
import mangoes.evaluation.similarity

Similarity tasks evaluate the similarity between word pairs (using cosine similarity) and measure the correlation of these scores with human-assign scores.  
Again, some datasets are available in Mangoes (defined in the module mangoes.dataset). But you can use your own.

In [9]:
similarity_dataset = mangoes.evaluation.similarity.ALL_DATASETS
similarity_evaluation = mangoes.evaluation.similarity.Evaluation(embedding, *similarity_dataset)
print(similarity_evaluation.get_report())


                                                                          pearson       spearman
                                                      Nb questions        (p-val)        (p-val)
WS353                                                       64/353   0.495(3e-05)   0.468(1e-04)
------------------------------------------------------------------------------------------------
WS353 relatedness                                           52/252   0.468(5e-04)   0.425(2e-03)
------------------------------------------------------------------------------------------------
WS353 similarity                                            37/203   0.602(8e-05)   0.498(2e-03)
------------------------------------------------------------------------------------------------
MEN                                                       198/3000   0.617(4e-22)   0.636(7e-24)
------------------------------------------------------------------------------------------------
M. Turk                      

The result is an object of type `mangoes.evaluation.base.BaseEvaluation`. You can display the score and reports for each dataset:

In [10]:
print("Score :", similarity_evaluation.get_score('WS353'))
print("Score :", similarity_evaluation.get_score('WS353 relatedness'))

Score : Score(pearson=Coeff(coeff=0.49498919927471885, pvalue=3.210682187295102e-05), spearman=Coeff(coeff=0.46780767225389513, pvalue=9.709660345048611e-05), nb=64)
Score : Score(pearson=Coeff(coeff=0.4680650822053023, pvalue=0.000466764410853007), spearman=Coeff(coeff=0.42504002803749796, pvalue=0.0016842739079171649), nb=52)


You can display the detail of the scores for each subset of the dataset :

In [11]:
print(similarity_evaluation.get_report(show_questions=True))


                                                                          pearson       spearman
                                                      Nb questions        (p-val)        (p-val)
WS353                                                       64/353   0.495(3e-05)   0.468(1e-04)

                                               gold          score                              
book paper                                     7.46           0.35
computer internet                              7.58           0.42
train car                                      6.31           0.35
television radio                               6.77           0.64
media radio                                    7.42           0.37
student professor                              6.81           0.25
book library                                   7.46           0.16
bank money                                     8.12           0.14
king queen                                     8.58           0.46
movie s

## 3. Evaluate on outlier detection tasks

In [12]:
import mangoes.evaluation.outlier

Given a group of word, the goal is to identify the word that does not belong in the group. 
Again, some datasets are available in Mangoes (defined in the module mangoes.dataset). But you can use your own.

In [13]:
_8_8_8_dataset = mangoes.evaluation.outlier._8_8_8
outlier_evaluation = mangoes.evaluation.outlier.Evaluation(embedding, _8_8_8_dataset)
print(outlier_evaluation.get_report())


                                                            Nb questions         OPP    accuracy
8-8-8                                                               4/64     100.00%     100.00%
------------------------------------------------------------------------------------------------



The result is an object of type `mangoes.evaluation.base.BaseEvaluation`. You can display the score and reports for each dataset:

In [14]:
print(outlier_evaluation.get_score())

Score(opp=1.0, accuracy=1.0, nb=4)


You can display the detail of the scores for each subset of the dataset :

In [15]:
print(outlier_evaluation.get_report(show_subsets=True))


                                                            Nb questions         OPP    accuracy
8-8-8                                                               4/64     100.00%     100.00%

    Apostles_of_Jesus_Christ                                         0/8          NA          NA
    Big_cats                                                         0/8          NA          NA
    European_football_teams                                          0/8          NA          NA
    German_car_manufacturers                                         0/8          NA          NA
    Information_Technology_companies                                 0/8          NA          NA
    Months                                                           4/8     100.00%     100.00%
    Solar_System_planets                                             0/8          NA          NA
    SouthAmerica                                                     0/8          NA          NA

---------------------------

In [16]:
print(outlier_evaluation.get_report(show_questions=True))


                                                            Nb questions         OPP    accuracy
8-8-8                                                               4/64     100.00%     100.00%

    Apostles_of_Jesus_Christ                                         0/8          NA          NA

    Big_cats                                                         0/8          NA          NA

    European_football_teams                                          0/8          NA          NA

    German_car_manufacturers                                         0/8          NA          NA

    Information_Technology_companies                                 0/8          NA          NA

    Months                                                           4/8     100.00%     100.00%

                                                                                outlier position
    january march may july september november february june winter                             9
    january march may 