# Find duplicate questions on StackOverflow by their embeddings

In this assignment you will learn how to calculate a similarity for pieces of text. Using this approach you will know how to find duplicate questions from [StackOverflow](https://stackoverflow.com).

### Libraries

In this task you will you will need the following libraries:
- [StarSpace](https://github.com/facebookresearch/StarSpace) — a general-purpose model for efficient learning of entity embeddings from Facebook
- [Gensim](https://radimrehurek.com/gensim/) — a tool for solving various NLP-related tasks (topic modeling, text representation, ...)
- [Numpy](http://www.numpy.org) — a package for scientific computing.
- [scikit-learn](http://scikit-learn.org/stable/index.html) — a tool for data mining and data analysis.
- [Nltk](http://www.nltk.org) — a platform to work with human language data.

### Data

The following cell will download all data required for this assignment into the folder `week3/data`.

In [1]:
import sys
sys.path.append("..")
from common.download_utils import download_week3_resources

download_week3_resources()

HBox(children=(IntProgress(value=0, max=119127793), HTML(value='')))




HBox(children=(IntProgress(value=0, max=535543630), HTML(value='')))




HBox(children=(IntProgress(value=0, max=46408910), HTML(value='')))




HBox(children=(IntProgress(value=0, max=5333), HTML(value='')))




### Grading
We will create a grader instace below and use it to collect your answers. Note that these outputs will be stored locally inside grader and will be uploaded to platform only after running submiting function in the last part of this assignment. If you want to make partial submission, you can run that cell any time you want.

In [2]:
from grader import Grader

In [3]:
grader = Grader()

## Word embedding

To solve the problem, you will use two different models of embeddings:

 - [Pre-trained word vectors](https://code.google.com/archive/p/word2vec/) from Google which were trained on a part of Google News dataset (about 100 billion words). The model contains 300-dimensional vectors for 3 million words and phrases. You need to download it by following this [link](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing).
 - Representations using StarSpace on StackOverflow data sample. You will need to train them from scratch.

It's always easier to start with pre-trained embeddings. Unpack the pre-trained Goggle's vectors and upload them using the function [KeyedVectors.load_word2vec_format](https://radimrehurek.com/gensim/models/keyedvectors.html) from gensim library with the parameter *binary=True*. If the size of the embeddings is larger than the avaliable memory, you could load only a part of the embeddings by defining the parameter *limit* (recommended: 500000).

In [7]:
import gensim
from gensim.models import KeyedVectors

In [13]:
wv_embeddings = gensim.models.KeyedVectors.load_word2vec_format("GoogleNews-vectors-negative300.bin",binary=True)

In [25]:
#wv_embeddings['big']

most_similar = wv_embeddings.most_similar(positive=['woman', 'king'], negative=['man'])
most_similar_to_given = wv_embeddings.most_similar_to_given('music', ['water', 'sound', 'backpack', 'mouse'])

most_similar
most_similar_to_given

'sound'

### How to work with Google's word2vec embeddings?

Once you have loaded the representations, make sure you can access them. First, you can check if the loaded embeddings contain a word:
    
    'word' in wv_embeddings
    
Second, to get the corresponding embedding you can use the square brackets:

    wv_embeddings['word']
 
### Checking that the embeddings are correct 
 
To prevent any errors during the first stage, we can check that the loaded embeddings are correct. You can call the function *check_embeddings*, implemented below, which runs 3 tests:
1. Find the most similar word for provided "positive" and "negative" words.
2. Find which word from the given list doesn’t go with the others.
3. Find the most similar word for the provided one.

In the right case the function will return the string *These embeddings look good*. Othervise, you need to validate the previous steps.

In [14]:
def check_embeddings(embeddings):
    error_text = "Something wrong with your embeddings ('%s test isn't correct)."
    most_similar = embeddings.most_similar(positive=['woman', 'king'], negative=['man'])
    if len(most_similar) < 1 or most_similar[0][0] != 'queen':
        return error_text % "Most similar"

    doesnt_match = embeddings.doesnt_match(['breakfast', 'cereal', 'dinner', 'lunch'])
    if doesnt_match != 'cereal':
        return error_text % "Doesn't match"
    
    most_similar_to_given = embeddings.most_similar_to_given('music', ['water', 'sound', 'backpack', 'mouse'])
    if most_similar_to_given != 'sound':
        return error_text % "Most similar to given"
    
    return "These embeddings look good."

In [15]:
print(check_embeddings(wv_embeddings))

These embeddings look good.


## From word to text embeddings

**Task 1 (Question2Vec).** Usually, we have word-based embeddings, but for the task we need to create a representation for the whole question. It could be done in different ways. In our case we will use a **mean** of all word vectors in the question. Now you need to implement the function *question_to_vec*, which calculates the question representation described above. This function should work with the input text as is without any preprocessing.

Note that there could be words without the corresponding embeddings. In this case, you can just skip these words and don't take them into account during calculating the result. If the question doesn't contain any known word with embedding, the function should return a zero vector.

In [29]:
import numpy as np

In [150]:
def question_to_vec(question, embeddings, dim=300):
    """
        question: a string
        embeddings: dict where the key is a word and a value is its' embedding
        dim: size of the representation

        result: vector representation for the question
    """
    vect=np.zeros(300)
    count=0
    if question is not '':
        #print("second")
        for word in question.split():
            #print(word,len(question.split()))
            if word in embeddings:
                vect=vect+embeddings[word]
             
            else:
                count=count + 1
                #print(count,len(question))
                if (count==len(question.split())):
                    return np.zeros(300)
        #print(vect)        
        return vect/(len(question.split())-count)       
    else:
        #print("first")
        return np.zeros(300)
   
            
        
        
    
            
    
            
        
        
    ######################################
    ######### YOUR CODE HERE #############
    ######################################

To check the basic correctness of your implementation, run the function *question_to_vec_tests*.

In [84]:
def question_to_vec_tests():
    if (np.zeros(300) != question_to_vec('', wv_embeddings)).any():
        return "You need to return zero vector for empty question."
    if (np.zeros(300) != question_to_vec('thereisnosuchword', wv_embeddings)).any():
        return "You need to return zero vector for the question, which consists only unknown words."
    if (wv_embeddings['word'] != question_to_vec('word', wv_embeddings)).any():
        return "You need to check the corectness of your function."
    if ((wv_embeddings['I'] + wv_embeddings['am']) / 2 != question_to_vec('I am', wv_embeddings)).any():
        return "Your function should calculate a mean of word vectors."
    if (wv_embeddings['word'] != question_to_vec('thereisnosuchword word', wv_embeddings)).any():
        return "You should not consider words which embeddings are unknown."
    return "Basic tests are passed."

In [85]:
print(question_to_vec_tests())

thereisnosuchword 1
word 1
[ 3.59375000e-01  4.15039062e-02  9.03320312e-02  5.46875000e-02
 -1.47460938e-01  4.76074219e-02 -8.49609375e-02 -2.04101562e-01
  3.10546875e-01 -1.05590820e-02 -6.15234375e-02 -1.55273438e-01
 -1.52343750e-01  8.54492188e-02 -2.70996094e-02  3.84765625e-01
  4.78515625e-02  2.58789062e-02  4.49218750e-02 -2.79296875e-01
  9.09423828e-03  4.08203125e-01  2.40234375e-01 -3.06640625e-01
 -1.80664062e-01  4.73632812e-02 -2.63671875e-01  9.08203125e-02
  1.37695312e-01 -7.20977783e-04  2.67333984e-02  1.92382812e-01
 -2.29492188e-02  9.70458984e-03 -7.37304688e-02  4.29687500e-01
 -7.93457031e-03  1.06445312e-01  2.80761719e-02 -2.29492188e-01
 -1.91650391e-02 -2.36816406e-02  3.51562500e-02  1.71875000e-01
 -1.12304688e-01  6.25000000e-02 -1.69921875e-01  1.29882812e-01
 -1.54296875e-01  1.58203125e-01 -7.76367188e-02  1.78710938e-01
 -1.72851562e-01  9.96093750e-02  3.94531250e-01  6.44531250e-02
 -6.83593750e-02 -3.18359375e-01  5.95703125e-02 -1.02539062e-0

 -0.15527344 -0.13879395 -0.20996094 -0.06097412  0.15991211 -0.01953125]
thereisnosuchword 2
word 2
[ 3.59375000e-01  4.15039062e-02  9.03320312e-02  5.46875000e-02
 -1.47460938e-01  4.76074219e-02 -8.49609375e-02 -2.04101562e-01
  3.10546875e-01 -1.05590820e-02 -6.15234375e-02 -1.55273438e-01
 -1.52343750e-01  8.54492188e-02 -2.70996094e-02  3.84765625e-01
  4.78515625e-02  2.58789062e-02  4.49218750e-02 -2.79296875e-01
  9.09423828e-03  4.08203125e-01  2.40234375e-01 -3.06640625e-01
 -1.80664062e-01  4.73632812e-02 -2.63671875e-01  9.08203125e-02
  1.37695312e-01 -7.20977783e-04  2.67333984e-02  1.92382812e-01
 -2.29492188e-02  9.70458984e-03 -7.37304688e-02  4.29687500e-01
 -7.93457031e-03  1.06445312e-01  2.80761719e-02 -2.29492188e-01
 -1.91650391e-02 -2.36816406e-02  3.51562500e-02  1.71875000e-01
 -1.12304688e-01  6.25000000e-02 -1.69921875e-01  1.29882812e-01
 -1.54296875e-01  1.58203125e-01 -7.76367188e-02  1.78710938e-01
 -1.72851562e-01  9.96093750e-02  3.94531250e-01  6.44

You can submit embeddings for the questions from the file *test_embeddings.tsv* to earn the points. In this task you don't need to transform the text of a question somehow.

In [79]:
from util import array_to_string

In [87]:
question2vec_result = []
for question in open('data/test_embeddings.tsv'):
    question = question.strip()
    answer = question_to_vec(question, wv_embeddings)
    question2vec_result = np.append(question2vec_result, answer)

grader.submit_tag('Question2Vec', array_to_string(question2vec_result))

question2vec_result

Play 13
Framework 13
1.2.5 13
: 13
OutOfMemoryError 13
occured 13
: 13
Java 13
heap 13
space 13
in 13
play 13
framework 13
[ 0.17364502 -0.25854492  0.414505    0.76733398  0.21875    -0.65612793
  0.12084961  0.01049805  0.12756348  0.77973175 -0.85957336 -0.11669922
  0.01147461 -0.96142578 -0.83740234 -0.37890625  0.10269165  0.66772461
 -0.02685547 -0.54458618  0.20489502  0.89501953  0.04577637  1.01756287
  0.52752686  0.29602051  0.29310608  1.19165039  0.46745682 -0.76171875
 -1.14050293 -0.4251709   0.06359863 -1.00028992  0.22576904 -0.52246094
  0.05004883  0.30838013  0.72112274  0.95733643  0.7414856  -0.45697021
  0.30206299  0.9163208   0.21269226 -0.96429443 -1.09997559 -1.58740234
 -0.07885742  1.02636719  0.5133667  -0.0090332  -0.24230957 -0.72094727
 -0.36413574  0.24590302  0.02124023 -1.37182617 -0.20483398 -0.83447266
 -0.22924805  0.29272461 -0.28259277 -0.52978516 -0.1685791   0.30883789
  0.20761871  0.7019043  -0.21166992  0.11125946  0.90600586 -1.14526367
 

 -0.07580566 -0.38085938 -0.29931641  0.24462891  0.20623779  0.41607666]
Formula 17
to 17
auto 17
change 17
all 17
calendar 17
hours 17
if 17
I 17
change 17
1 17
to 17
keep 17
the 17
same 17
total 17
hours 17
[ 7.03125000e-02 -6.45904541e-01  3.92089844e-01  1.41479492e+00
 -1.17578125e+00 -5.17028809e-01 -2.56958008e-02 -1.11602783e+00
  1.52185059e+00  9.62348938e-01 -6.28112793e-01 -6.31225586e-01
 -6.47827148e-01  6.92260742e-01 -1.80224609e+00  1.07336426e+00
  8.89831543e-01  7.64244080e-01  1.29229736e+00 -7.01637268e-01
 -7.22900391e-01  1.03836060e+00 -6.45263672e-01 -6.93443298e-01
  6.84053421e-01  2.22116089e+00 -1.30554199e+00  1.12167358e+00
  1.34031677e+00  6.15936279e-01 -1.59887695e+00 -3.59130859e-01
 -4.50859070e-01 -5.74645996e-02  7.66082764e-01 -5.11230469e-01
 -1.60034180e-01  6.22314453e-01 -7.20458984e-01  1.73785400e+00
  1.84814453e+00 -3.56628418e-01  1.33673096e+00  4.53063965e-01
 -1.12536621e+00 -8.10165405e-01 -5.87829590e-01  9.64111328e-01
  5.872802

  0.31445312  0.03405762  0.37445068 -0.36645508 -0.4096756   0.34362793]
How 13
to 13
read 13
response 13
from 13
angular 13
resource 13
$save() 13
and 13
also 13
keeping 13
original 13
data 13
[ 0.26948547 -0.08299923  0.4536438   0.26345444 -0.60827637  0.17456055
  0.29003906 -0.36132812  0.83879089  0.73510742 -0.43356323 -0.97509766
 -0.74960327  0.56864929 -0.89031982 -0.07348633 -0.10662842  0.92456055
 -0.62023926 -1.15203857 -0.11474609  0.57873535 -0.36505127 -0.3059082
 -0.29348755 -0.1149292  -0.63464355  0.96801758  0.02319336 -0.93334961
  0.53167725 -0.75415039 -0.37646484 -0.39361572  0.1060791  -0.05407715
  0.57751465  0.10675812  0.13092041  0.57617188  0.29309082  0.42642212
  0.60412598  0.44812393 -0.33361816 -0.63876343 -0.82434082 -0.00195312
  0.13891602  0.17886353 -0.31958008 -1.08032227 -0.49041748 -1.14941406
  0.93357849  1.10510254 -1.11328125 -0.29699707  0.40330505  0.5078125
 -0.20595551 -0.45184326 -1.3503418  -0.79650879 -0.54205322  0.17982483
 -0.

  0.32537842  0.234375    0.34387207 -0.68060303  0.02941895  0.33740234]
View 10
what 10
methods/functions 10
return 10
a 10
specific 10
object 10
in 10
Visual 10
Studio 10
[ 0.70211792 -0.28009033 -0.29394531  0.13037109 -1.06298828  0.53222656
  0.328125   -0.86135578  0.46020508  0.35778809 -0.20849609 -0.18847656
  0.67480469  0.11083984 -0.94360352  0.18908691  0.48913574  0.71072388
 -0.49298096 -0.69685364 -0.48913574  0.53820801 -1.18786621  0.50189209
 -0.40350342  0.06640625  0.46266174  0.44287109  1.11169434 -0.58679199
 -0.74768066 -0.09863281 -1.2322998  -0.8125     -0.25770569 -0.93896484
  0.60571289 -0.21911621 -0.32299805  0.43408966  0.5680542   0.19830322
  0.43283081  0.26593018 -0.02685547 -0.43579102 -0.30196381 -0.54187012
 -0.07592773  0.68554688 -0.70874023  0.06091309 -0.48364258 -0.37226868
 -0.34796143  0.40511322 -0.3538208  -0.19567871  0.1887207  -0.44335938
  0.33215332  0.25927734 -0.73242188 -0.64364624 -0.76647949 -0.30297852
  0.37158203  0.5994873

 -1.73583984e-01 -1.13134766e+00  1.21582031e-01  3.72528076e-01]
Installing 15
an 15
Aurelia 15
web 15
application 15
on 15
a 15
web 15
server 15
other 15
than 15
ones 15
for 15
the 15
turorial 15
[ 0.46112061 -0.52294922 -0.12553406  0.43896484 -0.67871094  0.95739746
  0.71615601 -0.18676758  1.35679626  0.69372559  0.09667969 -1.05981445
 -0.78900146 -0.17456055  0.49499512  1.08401489  0.08984375  0.83203125
 -0.44940186  0.04345703 -0.05908203  0.52893066 -0.36032104  0.66329193
  0.40234375 -0.56689453 -0.84399414  1.90435791 -0.07478333 -0.86425781
 -0.19525146 -0.49246979 -0.65264893 -1.19851685  0.55609131 -0.72479248
 -0.07843018  0.30535889  1.11291504  0.17150879  0.5871582  -0.03990936
  0.78100586  0.7093811   0.42796326 -0.94352722 -0.25323486  0.39122009
  0.02160645  0.6003418  -0.22328186  0.38299561  0.33355713 -1.50387573
 -0.27978516  0.90045166 -0.34802246  0.38256836  0.27639771 -0.09985352
 -0.66775513 -0.20962524 -1.45556641 -0.75372314 -1.08422852  0.45869446

 -1.22070312e-01 -9.35119629e-01 -4.51660156e-02  1.72851562e-01]
Reserved 5
Class 5
Names 5
in 5
Groovy 5
[ 9.58251953e-01 -3.46923828e-01  8.42895508e-02 -5.88378906e-02
 -5.54687500e-01 -1.05273438e+00 -3.62792969e-01  1.82617188e-01
  1.33483887e-01  3.37829590e-01  4.77050781e-01  6.44775391e-01
  8.42285156e-02  1.50390625e-01 -5.30761719e-01 -1.96289062e-01
  3.34472656e-01  9.18823242e-01 -3.68896484e-01 -8.85009766e-02
  4.00878906e-01  1.36474609e-01 -2.38769531e-01  3.69873047e-01
 -1.08703613e-01  5.89599609e-02 -1.21444702e-01  2.79785156e-01
 -4.42626953e-01 -2.55371094e-01 -4.45312500e-01  2.06054688e-01
 -2.54028320e-01 -3.71582031e-01  5.30090332e-01 -2.45727539e-01
  6.08642578e-01 -3.95507812e-01 -8.71582031e-01  6.73583984e-01
 -3.21289062e-01 -8.12988281e-02  2.81494141e-01 -8.90502930e-02
  1.53930664e-01 -7.03979492e-01 -2.85522461e-01 -3.77319336e-01
  3.43017578e-02  1.91406250e-01 -1.62475586e+00  4.03656006e-01
 -1.74560547e-02 -4.25186157e-02  4.45556641e-01

 -0.21679688  0.08642578 -0.32159424  0.04541016  0.10464478  0.38647461]
Lookup 9
Error 9
ORA-00932: 9
inconsistent 9
datatypes: 9
expected 9
DATE 9
got 9
NUMBER 9
[ 5.23925781e-01 -3.84765625e-01  4.99511719e-01  7.13470459e-01
 -1.28906250e-01  2.76794434e-02 -1.12329102e+00  1.14185333e-01
 -1.28906250e-01  1.39111328e+00 -7.71484375e-02  2.29492188e-01
 -5.59082031e-01 -1.14074707e-01 -6.00250244e-01  2.46826172e-01
  1.42553711e+00  6.58203125e-01 -8.27636719e-01 -5.01220703e-01
 -4.08325195e-01  6.57226562e-01  6.09863281e-01  6.35986328e-02
 -9.87792969e-01 -6.16455078e-01 -6.69494629e-01  1.23864746e+00
  4.26757812e-01  1.30029297e+00  3.58886719e-02 -3.40576172e-01
  4.08325195e-01 -1.07629395e+00 -2.46582031e-02  1.85852051e-01
 -1.61132812e-01  7.24349976e-01  3.03588867e-01 -5.58837891e-01
 -5.83740234e-01 -1.51831055e+00  3.11523438e-01  1.16943359e-01
  3.83422852e-01 -1.41748047e+00 -8.09936523e-01 -1.23672485e-01
 -4.58251953e-01 -1.02905273e-01 -1.78369141e+00  1.921

 -0.36578369  0.0559082  -0.28430176 -0.53622437 -0.1550293  -0.34417725]
node.js 4
form 4
elements 4
array 4
[ 9.32617188e-02  2.73956299e-01 -2.11181641e-02 -9.38720703e-02
 -4.86328125e-01  4.85839844e-02 -4.05273438e-02 -3.99902344e-01
 -4.49707031e-01 -5.33447266e-02 -2.31933594e-02 -3.14697266e-01
 -4.15039062e-01 -1.28662109e-01 -2.67700195e-01  5.36132812e-01
 -1.00097656e-02  6.34765625e-03  2.25891113e-01 -2.29003906e-01
  7.03125000e-02 -9.76562500e-04 -2.86865234e-01 -3.24707031e-02
  1.75231934e-01 -2.96875000e-01 -2.16308594e-01  3.17138672e-01
 -3.13304901e-01 -2.64648438e-01 -2.41210938e-01  2.27539062e-01
  1.19018555e-01  7.78808594e-01  6.83593750e-03  4.29443359e-01
  1.09863281e-01 -1.76269531e-01  1.23962402e-01 -4.98657227e-02
  4.15100098e-01  2.59277344e-01  1.30859375e-01  4.83398438e-01
 -8.54492188e-04 -6.03515625e-01 -2.91259766e-01 -6.51855469e-02
  3.33007812e-01 -6.78710938e-02 -2.00195312e-02  1.80053711e-01
 -2.50213623e-01 -2.03125000e-01  3.42041016e

  3.30200195e-01 -5.78613281e-01 -3.21289062e-01  3.21807861e-01]
fullcalendar 10
removeEventSource 10
not 10
working 10
when 10
event 10
source 10
is 10
object 10
property 10
[ 1.15390015 -0.37414551  0.42547607  0.12339783 -0.95001221  0.55810547
  1.02752686 -0.47506714  0.83544922  0.56774902 -0.45947266 -1.10302734
 -0.54321289 -0.03662109 -0.6105957   0.20019531  0.0456543   0.77679443
  0.25       -0.61694336  0.00341797 -0.04669189 -0.14501953 -0.27950287
  0.33105469 -0.26955986 -0.31787109  1.10021973 -0.35058594 -0.53271484
 -0.01062012 -0.42773438 -0.30462646 -0.10058594  0.4407959   0.01382828
  0.51733398 -0.29968262  0.31433105  0.19885254  0.56140137  0.56469727
  0.74890137  0.28637695 -0.42919159 -0.85845947 -0.30505371 -0.18200684
 -0.03662109  0.65246582  0.51545715 -0.56860352 -0.29092407 -0.57043457
  0.70458984  0.87317657 -0.0880661  -0.40899658  0.21569824 -0.47399902
  0.26757812  0.16101074 -0.96826172 -0.28964233 -1.17382812 -0.02819824
 -0.27355957  0.28918

 -1.08789062e+00 -8.17321777e-01 -1.99707031e-01  6.12792969e-01]
Using 6
facebook 6
logo 6
on 6
ios 6
app 6
[ 5.59753418e-01 -5.85083008e-01  5.62744141e-02  4.28771973e-01
 -6.09741211e-01  3.98925781e-01  1.42089844e-01 -5.98144531e-01
  5.50292969e-01 -8.64257812e-02 -1.14135742e-01 -4.28466797e-01
 -5.51940918e-01 -6.42547607e-02 -4.29687500e-02  8.93554688e-02
  4.40979004e-01  3.24890137e-01 -5.76553345e-01 -7.86132812e-01
 -1.02783203e-01  9.57031250e-01  4.49584961e-01  3.24218750e-01
 -7.03613281e-01 -3.71337891e-01 -8.16406250e-01  1.56396484e+00
 -1.18652344e-01 -1.15417480e+00 -9.55200195e-02 -1.16981506e-01
 -2.56286621e-01 -1.20605469e-01  4.91394043e-01 -3.92578125e-01
 -2.05535889e-01  2.73864746e-01  1.02148438e+00 -2.86743164e-01
  1.44226074e-01 -2.57202148e-01  1.62414551e-01  7.15240479e-01
  6.91772461e-01 -8.38806152e-01 -7.71194458e-01 -3.91723633e-01
  7.55371094e-01  8.66699219e-02 -8.59008789e-01 -2.62451172e-03
  3.60717773e-01 -8.98437500e-01  3.81103516e-

  6.15234375e-02 -7.94921875e-01 -3.05435181e-01  4.19921875e-02]
How 8
to 8
share 8
files 8
and 8
folders 8
with 8
batch 8
[ 4.16259766e-01  8.10546875e-01 -7.32421875e-03  6.32324219e-01
 -3.35449219e-01  5.74951172e-02  1.31835938e-01 -3.93066406e-02
  1.10083008e+00  4.87060547e-01 -4.41894531e-01 -3.15429688e-01
 -7.81250000e-03  2.08007812e-01 -1.00927734e+00  5.83190918e-01
  3.93554688e-01  9.83642578e-01 -2.62207031e-01 -1.14111328e+00
  6.83593750e-02 -3.38378906e-01  2.25402832e-01 -7.25341797e-01
  4.07958984e-01 -3.56201172e-01 -4.14794922e-01  4.89746094e-01
  1.84082031e-01  2.59216309e-01  6.10351562e-04 -3.82080078e-01
 -6.25152588e-02  2.99316406e-01 -5.33142090e-01  1.38427734e-01
  5.88745117e-01 -1.45843506e-01 -3.10791016e-01  3.66271973e-01
  1.43005371e-01  1.57867432e-01  7.86132812e-01  5.16113281e-01
 -1.67770386e-01 -8.92211914e-01 -1.04858398e-01  5.99609375e-01
 -2.74902344e-01  5.14266968e-01 -8.74023438e-01 -9.94140625e-01
  3.44726562e-01 -1.07666016e+0

 -5.67871094e-01 -4.93591309e-01  2.85156250e-01 -2.73117065e-01]
Rails_admin 7
passing 7
a 7
param 7
to 7
List 7
scopes 7
[ 1.84936523e-01  4.33105469e-01  3.85742188e-02 -2.01416016e-02
 -5.15136719e-01  6.10351562e-05 -3.29589844e-01 -2.04345703e-01
  4.24316406e-01  2.22656250e-01  2.88024902e-01 -1.23168945e-01
  4.02770996e-01  7.08984375e-01 -1.13525391e-01  2.95898438e-01
  3.36425781e-01  3.52539062e-01 -6.71508789e-01 -1.08789062e+00
  1.89941406e-01 -1.21826172e-01 -7.65136719e-01  8.77929688e-01
 -2.91015625e-01 -3.97460938e-01 -4.31762695e-01  1.18041992e-01
  1.03271484e-01 -2.55706787e-01 -5.35461426e-01 -5.05371094e-01
 -1.18530273e-01 -9.05761719e-02  8.69140625e-02 -3.99169922e-01
  2.15148926e-01  1.04736328e-01  3.24218750e-01 -4.90051270e-01
  1.17187500e-01  1.38671875e-01  4.70214844e-01  1.73828125e-01
  3.71582031e-01 -7.69531250e-01  3.45764160e-02 -3.74023438e-01
 -3.03115845e-01 -3.59619141e-01 -1.15722656e+00 -3.19717407e-01
  2.71148682e-01  4.24804688e-01

  4.10156250e-02  5.49926758e-02 -1.80175781e-01  1.06262207e-01]
White 6
listing 6
effectiveness 6
against 6
SQL 6
Injection 6
[-0.16333008  0.7644043   0.00177765 -0.37634277  0.4107666   0.68554688
 -0.57666016 -0.58041382  1.21182251  0.96679688 -0.56420898 -0.0135498
  0.23312378  0.14794922 -0.49066162 -0.13330078  0.23272705  0.90081787
 -1.29931641 -0.42404366 -0.23839569  1.04052734 -0.99023438  0.72705078
 -0.28222656  0.32617188 -0.5016861   0.74462891 -0.06591797  0.81884766
  0.08306885 -0.27041626 -0.86425781 -1.23510742 -0.70166016  0.73937988
  0.65753174  0.59680176  0.32226562 -0.49325562  0.45898438 -0.78039551
  0.33178711  0.28637695 -0.71044922 -0.98144531 -0.31715393 -0.6658783
 -0.16223145  0.99890137 -0.47485352  0.14904785 -0.69218445 -1.05639648
  0.2565918   0.25463867 -0.4251709  -0.13369751  0.70117188  0.66259766
  0.13092041 -0.07019043 -0.66650391 -0.33300781 -0.91516113  0.06811523
  0.16784668  0.21810913 -0.44130707 -0.38415527  0.39746094 -0.7005615

  5.48095703e-01  6.25000000e-01 -1.32507324e+00  6.56738281e-01]
Subsonic 7
Three 7
(3) 7
CreatedBy 7
fields 7
not 7
updating 7
[-0.2232132  -0.02294922  0.25605011  0.21081543 -0.69921875 -0.18115234
 -0.05151367 -0.39794922 -0.03515625  0.37207031  0.00701904 -0.52160645
 -0.27209473  0.22802734 -0.20428467 -0.34605408  0.01660156  0.25746155
 -0.00915527 -0.44189453 -0.59082031  0.00466919 -0.15356445  0.00891113
  0.41015625  0.0859375  -0.43212891  0.45715332  0.00842285 -0.63946533
 -0.75390625 -0.38592529  0.06201172 -0.15194702 -0.2958374   0.13574219
  0.26025391 -0.09680176  0.52252197 -0.09301758  0.62426758  0.4005127
  0.30090332  0.27526855  0.34375763 -0.67173004 -0.14981079 -0.10107422
 -0.18920898  0.42565918 -0.87809753  0.29174805  0.32849121 -0.60119629
  0.22875977  0.69970703  0.37207031 -0.50708008  0.36230469 -0.27978516
 -0.71508789 -0.00402832 -0.39257812 -0.53759766 -0.09887695 -0.04675293
  0.07141113  0.29858398 -0.53709412  0.11706543 -0.33511353  0.60839

  0.03417969 -0.13769531  0.38883972 -0.14428711 -0.60668945  0.13916016]
Implementing 4
a 4
2D 4
Map 4
[-0.03027344 -0.75036621  0.15020752  0.28173828 -0.12548828  0.48974609
 -0.22351074  0.28393555 -0.54785156  0.97851562  0.4883728   0.078125
 -0.13427734 -0.24584961 -0.43115234 -0.05957031  0.10498047 -0.01318359
 -0.26757812 -0.14794922  0.10839844  0.06103516 -0.13183594  0.67578125
  0.09851074  0.13183594 -0.12768555  0.09771729  0.46435547 -0.53259277
 -0.29663086 -0.16210938 -0.15722656 -0.59460449 -0.22314453 -0.20410156
  0.6472168   0.33566284  0.04101562  0.09606934  0.30371094  0.09033203
  0.1796875   0.21984863 -0.04626465 -0.80908203 -0.09542847 -0.54833984
 -0.11279297  0.03417969 -1.17773438 -0.30224609  0.0637207  -0.97851562
 -0.02087402  0.4362793  -0.21240234 -0.49511719  0.31494141  0.24511719
 -0.52478027  0.20922852  0.03955078 -0.31884766 -0.64306641  0.1081543
 -0.39920044  0.17626953  0.15039062 -0.55957031 -0.49023438 -0.04528809
  0.45019531 -0.3024902

 -6.63146973e-01  3.82232666e-01 -2.31079102e-01 -4.74441528e-01]
Web 13
Api 13
How 13
to 13
add 13
a 13
Header 13
parameter 13
for 13
all 13
API 13
in 13
Swagger 13
[ 3.73535156e-01 -1.52893066e-01 -8.26934814e-01  6.23413086e-01
  2.91992188e-01 -5.73974609e-01 -2.26165771e-01 -2.20573425e-01
  7.90039062e-01 -3.02734375e-02 -2.70446777e-01 -9.07226562e-01
 -5.40466309e-02  9.90600586e-02 -1.29870605e+00  1.70211792e-02
  7.78808594e-02  6.61254883e-01 -1.37310791e+00 -6.68006897e-01
 -7.79785156e-01  1.01898193e+00  5.51193237e-01  1.18872070e+00
  8.68225098e-02 -2.74902344e-01  3.05175781e-04  7.85186768e-01
  1.28594971e+00 -4.88883972e-01 -1.20152283e+00  8.42285156e-03
 -3.16284180e-01 -8.13796997e-01 -3.13995361e-01 -2.78808594e-01
  2.14599609e-01 -1.61148071e-01 -7.45483398e-01  7.98217773e-01
  1.18756104e+00  3.15948486e-01  1.69958496e+00  1.10772705e+00
 -7.70431519e-01 -6.85119629e-01 -5.81604004e-01 -3.38363647e-01
  5.07896423e-01 -8.29315186e-03 -2.04406738e+00  3.77

  0.07458496 -0.15087891 -0.16882324 -0.48791504  0.25091553 -0.18792725]
Determine 9
which 9
group(s) 9
the 9
current 9
user 9
account 9
belongs 9
to? 9
[ 3.70361328e-01 -2.56195068e-01  2.97119141e-01  4.14062500e-01
 -6.02630615e-01  3.33618164e-01  5.41564941e-01 -2.68981934e-01
  8.19641113e-01 -1.33666992e-01 -7.52929688e-01  5.22338867e-01
  5.78002930e-02  3.40942383e-01 -7.58575439e-01  1.67236328e-01
 -2.87503481e-01  4.40032959e-01 -5.22216797e-01  1.08276367e-01
 -4.19433594e-01  3.93676758e-03 -9.19128418e-01  5.93177795e-01
  7.37152100e-02 -1.74804688e-01 -1.54296875e+00  1.07046509e+00
  4.98825073e-01 -2.06359863e-01 -6.94290161e-01 -8.76586914e-01
 -2.48576164e-01  7.31872559e-01 -1.58935547e-01 -6.98608398e-01
 -4.97558594e-01 -3.57543945e-01  4.62371826e-01 -3.88183594e-02
 -1.42578125e-01  2.05810547e-01  9.42810059e-01 -2.23419189e-01
 -4.29275513e-01 -1.39390564e+00  5.36132812e-01  4.37072754e-01
 -9.32617188e-02 -5.75927734e-01 -5.16479492e-01 -5.23559570e-01
 

 -7.03125000e-02 -5.99975586e-01 -6.46057129e-01  4.15649414e-01]
dozer 4
Boolean 4
property 4
mapping 4
[ 1.19482422 -0.90039062  0.06982422  0.39402771 -0.50750732 -0.52050781
 -0.45068359 -0.18041992  0.04589844 -0.01049805  1.12362671  0.21191406
 -0.08398438  0.34054565  0.12451172  0.34228516 -0.31396484  0.1328125
 -0.56835938 -0.76953125 -0.51977539 -0.54272461 -0.90234375  0.97066903
 -0.34521484 -0.31335449 -0.82910156  0.86523438  0.26245117 -0.37802887
 -0.34661865  0.1081543   0.09521484  0.10229492 -0.48629761  0.04406738
  0.64489746  0.31689453  0.64233398 -0.46496582  0.65002441  0.36645508
  0.61437988  0.09008789 -0.42993164 -1.11132812 -0.34960938 -0.36230469
  0.21606445  0.24951172 -0.37695312 -0.21398926 -0.05386353 -0.56689453
 -0.42431641  1.05273438  0.04833984 -0.56225586  0.21069336  0.54724121
 -0.01318359 -0.0144043   0.09399414  0.07788086 -1.13183594  0.2121582
 -0.18457031 -0.40625    -0.51904297 -0.79833984  0.19213867 -0.77294922
  0.50708008 -0.51611

 -0.6776123  -0.17726898 -0.0982666   0.16783142 -0.5637207   0.15789795]
How 17
to 17
use 17
different 17
module 17
names 17
for 17
same 17
URL. 17
There 17
will 17
be 17
different 17
exports 17
from 17
all 17
modules 17
[-0.01226807 -0.27751827  1.2492218   1.66769409 -0.71295166 -0.21704102
  0.63760376 -0.17218018  0.93444061  1.46704102  0.0703125  -0.60791016
 -1.05883789  0.3850708  -2.2074585   1.81860352  0.4666748   1.58563232
 -0.92364502 -1.13309479 -1.09436035  0.8286438  -0.57089233  0.07455444
 -0.1600647   0.48173523 -1.23272705  0.45932007  1.1192627  -0.09790039
 -0.60671997 -1.11340332 -0.25097656  0.58210373  1.25228882 -0.23986816
 -0.2288208  -0.80908203  0.14892578  0.7427063   2.10498047 -0.07679749
  1.13842773 -0.3871727   0.14239502 -1.66470337 -0.67999268  0.52368164
 -0.61675262  0.48474884 -0.7376709   0.41384888 -0.19683838 -0.94927979
  0.30460739 -0.36797523 -0.10617065 -0.47741699  1.89242554 -0.81843567
 -0.27636719  0.44940186 -1.16711426 -1.86279297

  6.01074219e-01 -4.68627930e-01 -3.72558594e-01 -2.71850586e-01]
Flex++ 3
Bisonc++ 3
parser 3
[ 3.49609375e-01  1.03149414e-02  3.30078125e-01  9.81445312e-02
 -1.64062500e-01  1.25000000e-01 -1.61132812e-01  1.39648438e-01
  1.13769531e-01  2.32421875e-01  2.08984375e-01  1.28173828e-02
  4.24804688e-02 -1.11816406e-01 -1.03759766e-02  8.30078125e-02
  2.83203125e-01  2.71484375e-01 -1.19628906e-01 -4.88281250e-01
 -2.39257812e-01  3.10546875e-01  1.86767578e-02  2.45117188e-01
 -2.58789062e-02 -6.28906250e-01  1.15234375e-01  4.19921875e-01
 -2.19726562e-02  9.37500000e-02 -9.76562500e-03 -3.57421875e-01
 -2.15820312e-01 -1.59179688e-01  1.26953125e-01  2.55859375e-01
 -1.09252930e-02  2.91015625e-01  6.93359375e-02 -2.26562500e-01
  3.63281250e-01  8.83789062e-02  5.78125000e-01  2.81250000e-01
  1.30859375e-01 -3.37890625e-01 -3.84765625e-01 -1.54296875e-01
 -2.60009766e-02  1.01562500e-01 -1.33789062e-01 -5.50781250e-01
 -2.06054688e-01 -4.14062500e-01 -1.75781250e-01  1.99218750

 -1.02416992e-01 -4.96643066e-01 -7.00683594e-02 -1.37695312e-01]
Updating 7
using 7
subquery 7
that 7
returns 7
multiple 7
rows 7
[ 0.32678223  0.71488953 -0.17626953  0.12509155 -0.67333984 -0.21228027
 -0.15435791 -0.62646484  0.20556641  0.75183105 -0.18157196 -0.27227783
  0.07788086  1.28771973 -0.67529297 -0.19033813 -0.03991699  0.3918457
 -0.48362732 -0.35119629 -0.36514854  0.2277298  -0.7409668   0.01501465
 -0.19781494 -0.46862793 -0.44671631  0.95092773 -0.1005249   0.03097534
 -0.00610352  0.05731201 -0.03125    -0.5534668   0.08789062  0.13616943
  0.22070312 -0.0871582   0.02062988 -0.00769043  0.73120117  0.41949463
  0.2865448  -0.39746094  0.55871582 -0.69122314 -0.52392578  0.75332642
 -0.00231934 -0.22595215 -0.44030762 -0.07580566 -0.66711426 -0.34680176
 -0.33331299  0.45031738  0.27587891 -1.03320312  0.60868835  0.25683594
 -0.50378418  0.06103516 -0.20483398 -0.45507812  0.34011841  0.42785645
 -0.1986084   0.31945801 -0.1708374  -0.4206543   0.7487793   0.380

 -0.5229187  -1.00637817  0.18862915  0.02026367 -0.49121094  0.28668213]
Select 17
rows 17
where 17
column 17
1 17
value 17
is 17
the 17
same 17
but 17
column 17
2 17
value 17
is 17
different 17
in 17
PostgreSQL 17
[ 0.68334961  0.18914795  0.1998291   0.97753906 -1.02929688  0.15143585
  1.24121094 -0.88305664  1.06203842  1.70497131 -0.56086922 -0.61663818
  2.43414307  1.39033508 -1.40917969  1.01104736  0.24188232  1.70540619
 -0.30047607 -1.35702515 -0.47167969 -0.03585815 -0.88751221 -0.1546402
 -0.37866211  0.91577148 -1.22618103  2.66412354  0.60490417  0.04315186
 -0.78806996 -1.12487793 -0.4552002  -1.05245972  1.18481445  0.80926514
  0.29486084  0.58746338 -0.47955322  0.6940918   1.99636841 -0.5670929
  0.77703857  0.10479736 -0.45764542 -0.18077087  0.22515869 -0.12774658
  0.51733398  0.1373291  -0.95166016  0.49835205 -1.01080322 -0.81469727
 -0.6003418   0.51665497  0.29656982 -0.18234253  0.01806641 -1.04873657
 -0.70904541  0.43395996 -1.17370605 -1.73236084 -0.0959

  0.24487305 -0.41143799  0.01879883 -0.26141357 -0.01068115  0.20947266]
iOS 10
- 10
ViewController 10
not 10
being 10
released 10
when 10
popped 10
under 10
ARC 10
[ 4.71191406e-02 -8.52050781e-01  2.36816406e-02  2.52441406e-01
 -8.67431641e-01 -1.05224609e+00 -2.13211060e-01  2.18994141e-01
  1.00097656e+00  8.51318359e-01 -1.70867920e-01 -6.63085938e-01
 -4.67361450e-01 -2.91870117e-01 -8.50219727e-01  2.00942993e-01
  7.62512207e-01  4.98107910e-01 -2.25097656e-01 -1.11083984e-01
 -1.46942139e-01  3.89404297e-01  1.05773926e-01 -4.54940796e-02
  1.79138184e-01  1.79964066e-01 -2.07397461e-01  1.65240479e+00
  1.58203125e-01 -8.52294922e-01 -6.27441406e-01  5.77819824e-01
 -3.72985840e-01  9.20410156e-02 -5.06591797e-02 -2.03125000e-01
  6.10351562e-04 -7.09838867e-01  4.45388794e-01  2.91748047e-01
  8.92646790e-01 -5.84487915e-02  7.57507324e-01  1.20849609e-01
  3.54080200e-02 -6.30615234e-01 -1.38339996e-01  4.78515625e-01
  5.14801025e-01  1.25976562e-01 -6.25656128e-01  4.58

  5.11550903e-02 -7.72949219e-01  2.11425781e-01  1.89208984e-01]
html, 5
javascript 5
- 5
popup 5
window 5
[ 1.93847656e-01 -4.76074219e-01 -5.71777344e-01  2.92602539e-01
 -1.38671875e+00  1.06689453e-01  8.54980469e-01 -2.79785156e-01
  6.34765625e-02  1.29150391e-01  1.76269531e-01 -5.86914062e-01
  1.69921875e-01 -4.37011719e-01 -9.07226562e-01 -3.98773193e-01
  1.00878906e+00  6.10839844e-01  3.11523438e-01 -6.50878906e-01
  4.58984375e-01  4.95849609e-01 -3.80859375e-02 -1.24023438e-01
  3.52050781e-01 -2.70874023e-01  6.49414062e-01  1.30664062e+00
  1.08984375e+00 -6.08398438e-01 -1.11132812e+00 -1.00372314e-01
  0.00000000e+00 -2.37243652e-01  1.08642578e-01 -6.02539062e-01
 -8.88671875e-02  4.11621094e-01 -2.05627441e-01  3.66943359e-01
  6.47949219e-01 -1.85302734e-01  8.06396484e-01  8.69140625e-02
 -2.37304688e-01  1.92871094e-01 -3.53027344e-01 -6.20117188e-01
 -4.62280273e-01  1.87255859e-01 -6.30859375e-01 -3.65234375e-01
  1.63574219e-01 -5.42968750e-01 -1.35742188e+0

  0.16064453 -0.82666016 -0.53399658 -0.12255859 -0.00805664  0.38134766]
How 10
to 10
check 10
whether 10
a 10
column-name 10
is 10
"name" 10
or 10
not? 10
[ 2.06268311e-01  1.20239258e-02  4.55322266e-01  6.61743164e-01
 -4.73876953e-01  6.64062500e-01  6.89453125e-01  1.41601562e-02
  5.38085938e-01 -2.58422852e-01 -7.81250000e-03 -6.03027344e-01
 -4.82910156e-01 -1.85791016e-01 -3.18603516e-01  7.50732422e-01
  6.18896484e-02  5.06599426e-01 -1.58695221e-01 -3.32397461e-01
 -2.14355469e-01 -5.87234497e-02  2.45117188e-01 -1.08886719e-01
 -4.56787109e-01  2.51464844e-02 -7.85644531e-01  3.84277344e-01
  1.67846680e-01  1.68945312e-01 -2.98767090e-02 -8.17871094e-02
 -3.23974609e-01 -4.38476562e-01 -2.76000977e-01  3.43017578e-01
 -1.79443359e-02  1.78222656e-02  9.93728638e-02  1.73828125e-01
 -2.36816406e-01 -1.52435303e-01  4.56909180e-01 -6.46728516e-01
  2.50549316e-02  1.47827148e-01  4.04052734e-02  1.47460938e-01
  1.59301758e-01  4.59350586e-01 -2.68554688e-01  7.05810547e-0

 -0.41307068 -0.30787659  0.04724884 -0.79559326 -0.67428589  0.04309082]
Different 8
colors 8
below/above 8
bar 8
of 8
boxplot 8
in 8
ggplot2 8
[ 0.12597656  0.02832031  0.38745117  0.66992188 -0.34716797 -0.15148926
  0.04003906 -0.51367188  0.31213379  0.38204956 -0.02148438 -0.21313477
 -0.05371094 -0.09143066 -0.73291016  0.0670166  -0.45336914  1.12988281
 -0.06323242 -0.83447266 -0.04052734  0.39135742 -0.05371094 -0.31982422
 -0.32049561  0.51733398 -0.35191345  0.51489258  0.32189941 -0.14111328
 -0.6496582   0.27026367  0.91125488  0.48242188 -0.48883057  0.06726074
 -0.14550781  0.00292969 -0.44828796  0.59716797  0.49072266 -0.01196289
  0.05249023 -0.0043335   0.2668457  -0.14648438 -0.64343262 -0.17272949
  0.08795166  0.3671875  -0.60025024  0.47436523  0.07470703 -0.50230408
 -0.04101562  0.11894989  0.13330078 -0.37963867 -0.15087891 -0.46630859
 -0.64160156 -0.01367188 -0.48974609 -0.56323242  0.02099609  0.05102539
  0.16584778  0.5859375   0.42700195  0.14306641  0.

  2.94677734e-01 -5.76171875e-02 -3.02001953e-01 -6.10351562e-02]
WPF: 5
TreeView 5
from 5
KeyValuePair<int, 5
string> 5
[ 0.10571289 -0.18669796 -0.05859375  0.05286789 -0.15356445  0.03649902
  0.05664062 -0.12359619 -0.19238281  0.19995117 -0.09912109  0.00488281
  0.09033203  0.08703613 -0.27685547 -0.06970215  0.28320312  0.30371094
 -0.28479004 -0.22137451 -0.11889648  0.24121094 -0.24023438  0.27392578
 -0.08834839 -0.00488281 -0.02319336  0.47460938  0.18798828 -0.16162109
 -0.04638672 -0.06933594 -0.12963867  0.09228134 -0.12792969 -0.04736328
 -0.04296875 -0.16333008 -0.23828125  0.20263672  0.0769043  -0.07177734
  0.12768555  0.11962891 -0.08023071 -0.3508606  -0.23535156 -0.19189453
 -0.08300781 -0.07958984 -0.49682617 -0.09686279  0.04351807 -0.38330078
 -0.05517578  0.30664062 -0.3125      0.09301758  0.10070801 -0.02001953
 -0.07861328 -0.07788086 -0.15258789 -0.19445801 -0.00488281 -0.18103027
  0.11279297  0.1015625   0.07220459  0.20141602  0.27197266 -0.23266602
  0

 -0.31542969 -0.24770927 -1.11547852  0.02514648 -0.34820557  0.25036621]
how 8
to 8
get 8
letter 8
entered 8
instantly 8
in 8
android? 8
[ 3.80371094e-01 -1.95312500e-03 -1.34277344e-02  5.85937500e-02
 -3.02001953e-01  4.88281250e-04 -3.24462891e-01 -4.34570312e-01
  2.63549805e-01  8.59375000e-02 -1.68457031e-01 -9.64111328e-01
 -3.53393555e-02  5.23147583e-01 -8.29101562e-01 -1.57226562e-01
 -3.72676849e-02  8.12744141e-01  3.26660156e-01  3.29895020e-02
 -9.98535156e-02  3.54370117e-01 -4.07104492e-01 -1.40747070e-01
  3.16097260e-01 -3.18115234e-01  2.54409790e-01  2.14843750e-01
  2.77099609e-01 -8.25500488e-02 -1.26342773e-01 -7.94677734e-02
 -4.30541992e-01 -1.52832031e-01  5.01647949e-01  1.46972656e-01
  7.39501953e-01  3.69628906e-01  4.83398438e-02  3.75488281e-01
 -1.20849609e-02 -4.30908203e-02  7.21191406e-01  5.29174805e-02
 -1.62353516e-01  2.51464844e-02 -1.39770508e-02 -1.62536621e-01
 -4.06250000e-01  5.31250000e-01 -5.41259766e-01  5.23437500e-01
 -1.33178711e-01 

  2.31918335e-01 -2.65197754e-01 -6.61773682e-02  1.80664062e-02]
java 3
join 3
method 3
[ 1.10595703e-01 -1.32080078e-01 -3.17382812e-02  9.29687500e-01
 -4.10644531e-01  4.46899414e-01  2.69531250e-01 -3.79638672e-01
  9.76562500e-04  3.01757812e-01 -1.77001953e-03 -2.53417969e-01
  2.40859985e-01 -2.45040894e-01 -6.00585938e-02  7.14843750e-01
  9.17968750e-02 -1.18652344e-01  1.53808594e-02 -4.13574219e-01
 -5.85937500e-03  3.53820801e-01 -2.63671875e-01  3.12500000e-01
 -3.90998840e-01 -1.56860352e-01 -3.80859375e-01  5.15136719e-01
 -4.53125000e-01 -3.20312500e-01  2.68310547e-01 -1.79443359e-01
 -3.57116699e-01 -3.11279297e-02 -1.70043945e-01  4.11376953e-01
  1.02844238e-01 -2.28576660e-01  6.47949219e-01  1.84570312e-01
  2.33642578e-01 -7.08007812e-03  3.96240234e-01  2.06542969e-01
 -2.40234375e-01  2.01232910e-01 -1.04296875e+00 -2.44247437e-01
 -1.39648438e-01  7.89062500e-01 -2.96386719e-01  3.12377930e-01
 -1.90917969e-01 -5.90087891e-01  1.21826172e-01 -2.05078125e-02
 

  0.79467773 -0.07397461 -0.21801758 -0.45239258 -0.09783936 -0.01696777]
Check 11
if 11
an 11
image 11
is 11
already 11
in 11
gallery 11
and 11
retrieve 11
it 11
[ 2.09381104e-01  1.05149651e+00 -2.50976562e-01  1.86737061e-01
 -6.67236328e-01  4.83276367e-01  5.47790527e-01 -8.33496094e-01
  6.89941406e-01 -1.70341492e-01 -2.42187500e-01 -1.16024780e+00
  3.73046875e-01  7.06176758e-01 -1.08007812e+00  4.27032471e-01
  9.36035156e-01  1.02331543e+00  2.15682983e-01 -2.93823242e-01
  2.94677734e-01  2.32421875e-01  1.15478516e-01 -4.06127930e-01
 -2.06499100e-01 -5.58593750e-01 -1.25572205e+00  1.31274414e+00
  1.13641357e+00  2.91748047e-01 -4.03900146e-01  3.79516602e-01
 -9.32312012e-01  1.85073853e-01 -1.08398438e-01 -1.01782227e+00
 -4.12902832e-02 -8.74938965e-02  5.79742432e-01 -8.66699219e-03
  5.53955078e-01 -2.37060547e-01  2.91015625e-01 -6.40869141e-03
  7.00683594e-02 -6.66809082e-01 -6.16638184e-01  2.17926025e-01
  6.52656555e-01  2.73559570e-01 -2.05886841e-01  1.19598

 -1.04614258e-01 -4.09606934e-01 -2.31140137e-01  3.76464844e-01]
Javascript 4
unload 4
page 4
condition 4
[ 0.50195312  0.3348999  -0.47698975 -0.28564453 -0.91888428  0.42553711
  0.25900269  0.07789612  0.76953125  0.1015625  -0.06445312  0.02636719
 -0.16894531  0.6640625  -0.47265625 -0.10839844 -0.17193604  0.05810547
 -0.32470703 -0.03149414  0.87646484  0.31103516  0.19824219  0.70922852
  0.15161133 -0.48158264 -0.14111328  0.44799805 -0.05541992 -1.00756836
 -0.54199219  0.36230469 -0.11914062  0.2244873   0.09082031  0.37402344
  0.21777344  0.47070312 -0.54931641  0.29296875  0.43701172 -0.20849609
  0.35327148  0.42285156  0.15087891 -0.42504883 -0.18554688 -0.362854
 -0.52575684 -0.15087891 -0.22387695  0.07574463 -0.51782227 -0.81005859
 -0.58813477  0.34082031  0.18054199 -0.07910156 -0.12573242 -0.46069336
  0.20410156 -0.30224609 -0.47753906  0.24023438  0.23712158 -0.07702637
  0.36487579 -0.20544434 -0.18408203  0.79611206  0.83911133  0.24316406
  0.01489258 -0.271

 -2.90893555e-01 -1.17016602e+00 -7.09838867e-01  4.57336426e-01]
embed 5
json 5
file 5
within 5
d3.js 5
[ 8.93554688e-02 -3.17138672e-01 -1.99218750e-01  2.91870117e-01
 -6.23535156e-01 -3.07617188e-02 -2.03613281e-01 -2.16857910e-01
  6.44836426e-01 -2.07153320e-01 -4.65820312e-01 -1.86401367e-01
 -3.34472656e-01  6.54785156e-01 -2.77343750e-01 -4.89013672e-01
  5.02929688e-01  2.63427734e-01 -6.56738281e-01 -5.52734375e-01
 -2.78320312e-01  4.55322266e-02 -3.85742188e-01  2.47737885e-01
  4.13208008e-01  3.12500000e-02  1.92871094e-01  1.21429443e-01
 -7.67822266e-02 -2.12280273e-01 -3.09301376e-01  1.93511963e-01
 -1.18652344e-01  2.92968750e-02 -8.66210938e-01 -6.12304688e-01
  2.27050781e-01  4.24804688e-02  4.44335938e-02  1.48071289e-01
  6.31835938e-01  6.15722656e-01  1.85058594e-01  1.10717773e-01
 -1.55761719e-01 -1.93237305e-01 -3.82202148e-01 -5.77148438e-01
 -2.33276367e-01  3.28857422e-01 -1.10839844e-01 -1.04687500e+00
  9.30175781e-02 -6.18164062e-01 -3.42315674e-01  

  1.29150391e-01 -4.66552734e-01  9.30786133e-02  1.91040039e-01]
routes.MapRoute 2
Confusion 2
[ 3.78906250e-01 -2.02636719e-02 -3.20312500e-01  1.89208984e-02
 -2.11914062e-01 -2.65625000e-01 -2.19726562e-01 -9.39941406e-03
 -2.23632812e-01  3.49609375e-01 -1.77734375e-01  1.77734375e-01
 -1.83593750e-01  1.04980469e-02 -3.33984375e-01 -1.35742188e-01
  4.86328125e-01  1.02050781e-01 -4.82421875e-01 -1.39648438e-01
 -2.02148438e-01  1.89453125e-01  4.10156250e-01  2.81250000e-01
 -1.59179688e-01  2.87109375e-01 -2.29492188e-01 -3.33984375e-01
  4.27734375e-01 -1.44531250e-01 -1.43554688e-01 -2.73437500e-01
  8.10546875e-02 -1.49536133e-02 -1.53320312e-01  1.87500000e-01
 -1.16699219e-01 -2.08007812e-01  4.08203125e-01 -1.31835938e-01
 -1.04980469e-01 -3.45703125e-01 -1.59179688e-01 -3.86718750e-01
 -2.57812500e-01 -2.19726562e-01 -2.51953125e-01  2.28515625e-01
 -1.42578125e-01  1.28906250e-01 -5.66406250e-01  3.51562500e-01
 -3.85742188e-02 -5.95703125e-02  3.27148438e-02  3.4179687

  0.72564697 -0.81754684  0.30029297 -1.01831055 -0.26287842 -0.17492676]
Onvif 8
Simulator 8
For 8
Testing 8
Onvif 8
web 8
service 8
client 8
[-1.56250000e-01 -7.48046875e-01 -5.11169434e-02 -4.21752930e-02
  8.56056213e-02  1.06445312e+00  1.26823425e-01  6.55029297e-01
  3.12110901e-01  6.75537109e-01  1.16271973e-01 -1.45263672e-01
 -8.23974609e-02  2.15942383e-01  4.66064453e-01  4.83093262e-01
  4.80957031e-02  4.24255371e-01 -4.64691162e-01 -7.19482422e-01
  1.10290527e-01  3.13476562e-01 -1.49902344e-01  1.24218750e+00
 -2.78564453e-01 -3.05175781e-03 -1.13587952e+00  7.29736328e-01
  2.75878906e-02 -3.51236343e-01 -7.51953125e-02 -7.50488281e-01
 -6.25488281e-01 -8.66943359e-01  9.53735352e-01 -2.36450195e-01
 -4.34112549e-01 -4.24133301e-01  5.86425781e-01 -7.05444336e-01
 -8.11767578e-02 -2.84774780e-01 -3.85009766e-01 -1.27929688e-01
  2.64892578e-02 -1.08520508e+00 -9.09881592e-01  1.87072754e-01
  2.65930176e-01  1.01782227e+00 -3.18115234e-01  9.59472656e-02
  2.89306641

  1.24389648e-01 -4.57031250e-01 -1.77490234e-01  2.24853516e-01]
Is 9
produces 9
of 9
@RequestMapping 9
sensitive 9
to 9
order 9
of 9
values? 9
[ 0.29394531  0.27587891  0.00341797  0.24462891 -0.35131836  0.57763672
  0.52148438 -0.19329834  0.01391602 -0.03829956 -0.64770508 -0.49365234
  0.14672852 -0.14221191 -0.00390625  0.82177734 -0.72874451  0.70874023
 -0.53100586  0.11376953 -0.13867188 -0.04980469 -0.24511719  0.02392578
 -0.29394531 -0.35943604 -0.4576416   0.51025391 -0.27038574  0.421875
  0.02496338 -0.11035156 -0.32568359  0.43737793 -0.09265137 -0.09375
 -0.1394043   0.20220947 -0.04595947  0.1105957   0.21228027 -0.21697998
  0.28710938 -0.23254395 -0.1663208  -0.18481445 -0.40820312 -0.0050621
 -0.32019043 -0.36437988 -0.70483398 -0.02509308  0.39593506 -0.06103516
  0.24511719  0.40063477 -0.19775391 -0.72949219  0.42651367  0.17407227
  0.38623047  0.11401367 -0.65991211 -0.61865234 -0.07455063  0.11520386
 -0.39477539  0.19628906  0.13232422 -0.25976562  0.241333

 -3.60534668e-01 -5.57128906e-01 -5.99609375e-01  8.19213867e-01]
iTunes 9
Connect 9
submission 9
code 9
signing 9
entitlements 9
error 9
Xcode 9
8 9
[ 3.91357422e-01 -2.27508545e-01 -7.81433105e-01  5.10009766e-01
 -1.53393555e+00 -2.79541016e-01 -4.22180176e-01  2.04345703e-01
  1.10351562e+00  1.14733887e+00 -6.36459351e-01  2.12707520e-02
  8.68530273e-02 -6.03515625e-01 -7.30834961e-01  3.20800781e-01
  1.97265625e+00  6.40747070e-01  1.91528320e-01 -6.39177322e-01
  1.06567383e-01  1.21728516e+00  8.97338867e-01  9.76074219e-01
  9.19311523e-01 -2.76123047e-01 -3.39721680e-01  1.35351562e+00
  5.13793945e-01 -3.86657715e-01 -1.03344727e+00 -4.68750000e-01
 -6.87805176e-01 -8.14521790e-01  1.45263672e-01 -5.56457520e-01
 -3.31787109e-01 -2.67578125e-01  5.22293091e-01  3.41674805e-01
  4.37805176e-01 -5.47027588e-01  3.90747070e-01  3.84887695e-01
 -4.57763672e-02 -1.06787109e+00 -3.26049805e-01 -1.62658691e-02
 -1.07394409e+00 -4.94873047e-01 -9.69238281e-01 -6.77368164e-01
  9.9

 -3.15185547e-01 -1.06640625e+00 -4.56375122e-01  8.58398438e-01]
How 14
to 14
merge 14
column 14
data 14
of 14
the 14
same 14
value 14
and 14
sum 14
its 14
specific 14
data 14
[ 5.44677734e-01 -1.04888916e-01  5.34530640e-01  8.82934570e-01
 -6.59912109e-01  1.94091797e-02  3.24462891e-01  9.47294235e-02
  1.32440186e+00 -2.21481323e-02 -7.50244141e-01 -3.80004883e-01
 -1.14685059e-01  1.04125977e+00 -1.75781250e+00  7.09594727e-01
 -8.13049316e-01  1.21990967e+00  2.24731445e-01 -1.88366699e+00
 -4.00878906e-01 -8.24890137e-02 -5.84320068e-01 -1.02882385e-01
  3.48876953e-01  1.21606445e+00 -5.69778442e-01  1.21099854e+00
 -2.98599243e-01 -2.41333008e-01  4.66064453e-01 -1.24304199e+00
 -1.86279297e-01 -8.69964600e-01 -2.58911133e-01 -2.97790527e-01
  3.99291992e-01  8.11767578e-03  4.69726562e-01  9.94628906e-01
  4.66430664e-01  4.28810120e-01  1.04544067e+00  2.54760742e-01
 -4.49645996e-01 -9.21531677e-01 -3.09467316e-01  7.09472656e-01
  6.78955078e-01  9.90386963e-01  5.7714843

  1.05102539e-01 -5.48889160e-01 -4.41436768e-02  5.12695312e-03]
SlidingTabLayout 3
With 3
NavigationDrawer 3
[ 1.95312500e-02  6.12792969e-02  2.31933594e-03  2.95410156e-02
 -5.02929688e-02  9.13085938e-02  1.08886719e-01 -8.10546875e-02
  7.51953125e-02  1.17675781e-01 -7.53784180e-03  2.09960938e-02
 -1.91406250e-01  3.16406250e-01  1.86523438e-01  2.35595703e-02
  3.37890625e-01  1.98242188e-01 -1.60156250e-01  5.41992188e-02
 -2.45117188e-01  7.03125000e-02  3.16406250e-01 -7.47070312e-02
  6.54296875e-02 -1.18164062e-01 -1.80664062e-02 -6.29882812e-02
  5.68847656e-02  2.14843750e-01  1.77734375e-01 -1.09375000e-01
 -2.23632812e-01  6.25000000e-02 -5.76782227e-03  1.84326172e-02
 -2.37304688e-01  2.92968750e-02  3.61328125e-02  1.13769531e-01
  5.54199219e-02 -4.56542969e-02 -2.30468750e-01  1.20117188e-01
  2.73437500e-02 -4.85839844e-02 -1.84570312e-01 -1.41601562e-01
  2.85156250e-01  3.08593750e-01 -2.87109375e-01  1.03515625e-01
  1.90429688e-02 -2.11914062e-01 -6.80541992

  9.05761719e-02 -7.44384766e-01  4.74365234e-01 -1.44531250e-01]
Using 6
NSPredicate 6
with 6
nested 6
arrays 6
ios 6
[-0.31317139 -0.21057129  0.12670898 -0.11364746 -0.27981567 -0.13635254
  0.09893799 -0.29052734  0.05371094  0.83129883 -0.01489258 -0.30010986
 -0.4730835   0.24403381  0.0246582   0.14404297 -0.01147461  0.92382812
 -0.8215332  -0.50732422 -0.36889648  0.33105469 -0.37176514  0.48046875
 -0.26269531 -0.44506836 -0.58569336  1.05914307 -0.63208008 -0.56433105
  0.12744141  0.28686523 -0.05456543  0.06982422  0.05810547 -0.32978821
 -0.22192383 -0.0027771   0.23657227 -0.02655029  1.19677734  0.57336426
  0.19216919  0.03417969  0.5324707  -1.03155518 -0.19921875 -0.42285156
  0.53540039  0.35449219 -0.7479248  -0.68743896  0.11230469 -0.93164062
  0.0480957   0.14233398 -0.37646484 -0.60192871  0.38842773  0.15917969
 -0.12213135 -0.14678955 -0.57666016 -0.71435547 -0.22692871 -0.50561523
  0.09313965  0.33538818 -0.2668457  -0.44726562  1.01721191  0.02098149
  0.3

 -1.43066406e-01  1.20849609e-02  2.99743652e-01 -2.36450195e-01]
Android 8
TableLayout: 8
preserve 8
column 8
widths 8
between 8
different 8
tables 8
[-0.3046875  -0.2583313  -0.48034668  0.94140625 -0.25195312 -0.05883789
  0.26721191  0.23568726  0.09773064  0.87689972 -0.68566895 -0.18920898
  1.12280273  1.06596756 -0.04371643  0.36865234  0.09985352  1.01196289
 -0.51953125 -1.27587891  0.24560547 -0.22680664 -0.54232788 -0.19448853
 -0.65270996  0.60528564 -1.27856445  1.03588867  0.33984375 -0.66308594
 -0.3157959   0.24176025  0.53320312  0.6295166  -0.20849609  0.06219482
  0.30389404  0.52832031  0.81152344  1.12792969  0.84790039  0.41760254
  0.03991699 -0.48876953 -0.13600159 -0.40227509 -1.13427734 -0.25678253
  0.29711914 -0.89535522 -0.83227539  0.07427979 -1.3034668  -0.47277832
 -0.29318237  0.64770508  0.0592041  -0.74456787 -0.29049683 -0.75518799
 -0.20629883  0.03155518 -0.88378906 -0.93164062  0.01348877 -0.199646
 -0.20550537  0.70178223  0.00297546  0.43395996

 -5.85937500e-03 -2.10937500e-01 -1.69311523e-01  3.25439453e-01]
Is 10
it 10
possible 10
to 10
create 10
a 10
UINavigationController 10
within 10
a 10
ModalPopup? 10
[-0.01513672 -0.17918587  0.23046875  0.37445068 -0.51928711 -0.0496521
  0.28149414 -0.18746948  0.38061523 -0.18429565 -0.82421875 -0.38674927
 -0.32177734 -0.07595825  0.02416992  0.39526367  0.00813293  0.59564209
  0.11610413 -0.1893158  -0.07324219  0.3380127   0.11364746 -0.36090469
 -0.109375    0.26977539 -0.6237793   0.29272461  0.03717041  0.1986084
 -0.26385498  0.33417511 -0.41333008 -0.08789062 -0.40234375 -0.05029297
  0.03839111 -0.08261108  0.37554932  0.71826172  0.50244141 -0.10455322
  0.10028076 -0.39831543 -0.49121094 -0.08569336 -0.21350098  0.05027771
  0.28100586  0.07495117 -0.09277344  0.24569702 -0.06298828 -0.09301758
  0.18261719  0.2935791  -0.15263367 -0.63989258  0.39419556 -0.0316925
  0.30780029  0.25952148 -0.60058594 -0.40405273 -0.3236084   0.36813354
 -0.32782745  0.5737915  -0.60205

 -2.03613281e-01 -1.30926514e+00 -6.59423828e-01  5.22460938e-02]
android: 9
removing 9
item 9
from 9
listview 9
is 9
not 9
working 9
properly 9
[-2.82623291e-01  8.63122940e-02  1.35803223e-01  4.34875488e-01
 -1.02624512e+00  3.25561523e-01  4.84375000e-01 -3.17138672e-01
  1.97998047e-01  9.55810547e-02 -2.08801270e-01 -9.77050781e-01
 -6.03027344e-02  1.69311523e-01 -1.02490234e+00  6.14135742e-01
  1.00097656e-01  5.45104980e-01  2.23388672e-02  2.49633789e-02
  2.17773438e-01 -1.01074219e-01 -4.27490234e-01 -2.32055664e-01
  1.54327393e-01 -5.05371094e-02 -4.77294922e-01  7.18872070e-01
 -1.16271973e-01  6.34765625e-03 -2.54638672e-01  6.65283203e-02
 -1.02172852e-01  1.72080994e-02  5.49560547e-01 -2.91503906e-01
  1.81152344e-01 -2.05688477e-01 -1.79138184e-01 -1.74072266e-01
  4.82788086e-01  7.73193359e-01  7.21313477e-01 -4.64538574e-01
 -6.59660339e-01 -4.85321045e-01 -2.83203125e-02  2.32910156e-01
 -3.32763672e-01  6.89086914e-02  3.69224548e-02 -8.53881836e-02
 -2.246704

 -8.97216797e-01  1.66992188e-01  1.58996582e-01 -2.42919922e-02]
Information 10
to 10
store 10
in 10
exception 10
object: 10
providing 10
strings 10
in 10
exception 10
[ 0.07821655 -0.18530273  0.14111328  0.23950195  0.02793884  0.21630859
  0.33166504 -0.87518311  0.45483398  0.73486328 -1.30078125 -0.41555405
 -0.30847168 -0.14465332 -1.23594666 -0.24487305  0.41241455  0.37261963
 -0.48632812 -0.78662109  0.38922119  0.6953125  -0.61709595 -0.05957031
  0.73620605  0.27685547  0.03091431  0.66015625 -0.08056641  0.33544922
  0.0769043  -0.70861816 -0.3157959   0.41821289 -0.29632568  0.29614258
 -0.22412109 -0.27069092  0.21765137  0.47033691  0.76226807  0.56195068
 -0.51898193 -0.00585938  0.38427734 -1.13818359 -0.40527344  0.31787109
 -0.34716797  0.27978516 -0.60449219  0.04806519 -0.59436035 -0.0769043
  0.13476562  0.31260681 -0.62695312 -0.20043945  0.1105957  -0.54437256
 -0.13195801 -0.15612793 -0.46228027 -1.70507812 -0.68261719  0.50439453
 -0.90625     0.94165039 -0.1

 -4.37988281e-01  3.51623535e-01  1.15966797e-01  3.13476562e-01]
Trouble 11
working 11
on 11
Java 11
app 11
between 11
Windows 11
and 11
Snow 11
Leopard 11
machines 11
[ 1.16394043e+00 -1.06591797e+00 -1.30938721e+00  8.04626465e-01
 -9.53735352e-01  1.11669922e+00  3.44665527e-01  6.50512695e-01
  7.70385742e-01  9.88281250e-01 -7.00271606e-01  5.18310547e-01
  3.52416992e-01 -1.20886230e+00 -5.75820923e-01 -2.27294922e-01
  6.33117676e-01  8.43383789e-01 -3.12568665e-01 -1.21475220e+00
 -7.32421875e-02  1.85302734e-01  1.79199219e-01  5.50872803e-01
  1.06665039e+00  1.00170898e+00 -1.44335938e+00  1.78875732e+00
  5.59326172e-01 -8.71154785e-01 -8.16955566e-01 -6.92665100e-01
  1.33483887e-01 -6.95556641e-01 -5.21240234e-02 -6.04980469e-01
 -7.69775391e-01  1.26159668e-01  7.11509705e-01  7.37915039e-01
  1.09234619e+00 -4.90417480e-01  4.07257080e-02  3.53027344e-01
 -6.25076294e-01 -4.57817078e-01 -1.16650391e+00 -6.55639648e-01
 -4.97314453e-01  7.27905273e-01  9.20104980e-02 -2

 -0.31494141 -0.29602051  0.03649902  0.44946289  0.25920105  0.08325195]
MySql 9
5.7 9
installer 9
fails 9
to 9
detect 9
VS 9
2013 9
redistributable 9
[ 3.97949219e-01 -7.55004883e-02  2.41210938e-01 -2.66357422e-01
 -7.55371094e-01  7.11669922e-01  4.30908203e-01  4.34082031e-01
  3.56140137e-01  4.88281250e-04 -9.88769531e-01 -4.37255859e-01
 -4.97924805e-01 -2.61993408e-01 -4.34326172e-01  8.76953125e-01
  5.22460938e-01  1.08709717e+00 -7.29980469e-01 -6.33728027e-01
 -2.02636719e-01  8.15307617e-01  3.99322510e-01  2.34863281e-01
 -2.49755859e-01 -8.59375000e-01 -6.81884766e-01  1.28515625e+00
 -5.96191406e-01 -7.68554688e-01 -6.32019043e-01 -8.08105469e-02
 -5.67504883e-01 -9.93041992e-01 -5.33691406e-01 -4.96093750e-01
 -4.82055664e-01  2.61474609e-01  6.95800781e-02 -1.88217163e-02
  5.89965820e-01 -1.73492432e-01  1.30102539e+00 -1.07666016e-01
 -5.04028320e-01 -8.61328125e-01 -5.73638916e-01 -6.52832031e-01
 -1.55517578e-01 -9.75341797e-02 -6.57348633e-01 -2.49023438e-01
  4

array([ 0.01929389, -0.02872721,  0.04605611, ..., -0.11884562,
       -0.01780192,  0.01442464])

Now we have a method to create a representation of any sentence and we are ready for the first evaluation. So, let's check how well our solution (Google's vectors + *question_to_vec*) will work.

## Evaluation of text similarity

We can imagine that if we use good embeddings, the cosine similarity between the duplicate sentences should be less than for the random ones. Overall, for each pair of duplicate sentences we can generate *R* random negative examples and find out the position of the correct duplicate.  

For example, we have the question *"Exceptions What really happens"* and we are sure that another question *"How does the catch keyword determine the type of exception that was thrown"* is a duplicate. But our model doesn't know it and tries to find out the best option also among questions like *"How Can I Make These Links Rotate in PHP"*, *"NSLog array description not memory address"* and *"PECL_HTTP not recognised php ubuntu"*. The goal of the model is to rank all these 4 questions (1 *positive* and *R* = 3 *negative*) in the way that the correct one is in the first place.

However, it is unnatural to count on that the best candidate will be always in the first place. So let us consider the place of the best candidate in the sorted list of candidates and formulate a metric based on it. We can fix some *K* — a reasonalble number of top-ranked elements and *N* — a number of queries (size of the sample).

### Hits@K

The first simple metric will be a number of correct hits for some *K*:
$$ \text{Hits@K} = \frac{1}{N}\sum_{i=1}^N \, [dup_i \in topK(q_i)]$$

where $q_i$ is the i-th query, $dup_i$ is its duplicate, $topK(q_i)$ is the top K elements of the ranked sentences provided by our model and the operation $[dup_i \in topK(q_i)]$ equals 1 if the condition is true and 0 otherwise (more details about this operation could be found [here](https://en.wikipedia.org/wiki/Iverson_bracket)).


### DCG@K
The second one is a simplified [DCG metric](https://en.wikipedia.org/wiki/Discounted_cumulative_gain):

$$ \text{DCG@K} = \frac{1}{N} \sum_{i=1}^N\frac{1}{\log_2(1+rank_{dup_i})}\cdot[rank_{dup_i} \le K] $$

where $rank_{dup_i}$ is a position of the duplicate in the sorted list of the nearest sentences for the query $q_i$. According to this metric, the model gets a higher reward for a higher position of the correct answer. If the answer does not appear in topK at all, the reward is zero. 

### Evaluation examples

Let's calculate the described metrics for the toy example introduced above. In this case $N$ = 1 and the correct candidate for $q_1$ is *"How does the catch keyword determine the type of exception that was thrown"*. Consider the following ranking of the candidates:
1. *"How Can I Make These Links Rotate in PHP"*
2. *"How does the catch keyword determine the type of exception that was thrown"*
3. *"NSLog array description not memory address"*
4. *"PECL_HTTP not recognised php ubuntu"*

Using the ranking above, calculate *Hits@K* metric for *K = 1, 2, 4*: 
 
- [K = 1] $\text{Hits@1} = \frac{1}{1}\sum_{i=1}^1 \, [dup_i \in top1(q_i)] = [dup_1 \in top1(q_1)] = 0$ because the correct answer doesn't appear in the *top1* list.
- [K = 2] $\text{Hits@2} = \frac{1}{1}\sum_{i=1}^1 \, [dup_i \in top2(q_i)] = [dup_1 \in top2(q_1)] = 1$ because $rank_{dup_1} = 2$.
- [K = 4] $\text{Hits@4} = \frac{1}{1}\sum_{i=1}^1 \, [dup_i \in top4(q_i)] = [dup_1 \in top4(q_1)] = 1$

Using the ranking above, calculate *DCG@K* metric for *K = 1, 2, 4*:

- [K = 1] $\text{DCG@1} = \frac{1}{1} \sum_{i=1}^1\frac{1}{\log_2(1+rank_{dup_i})}\cdot[rank_{dup_i} \le 1] = \frac{1}{\log_2(1+rank_{dup_i})}\cdot[rank_{dup_i} \le 1] = 0$ because the correct answer doesn't appear in the top1 list.
- [K = 2] $\text{DCG@2} = \frac{1}{1} \sum_{i=1}^1\frac{1}{\log_2(1+rank_{dup_i})}\cdot[rank_{dup_i} \le 2] = \frac{1}{\log_2{3}}$, because $rank_{dup_1} = 2$.
- [K = 4] $\text{DCG@4} = \frac{1}{1} \sum_{i=1}^1\frac{1}{\log_2(1+rank_{dup_i})}\cdot[rank_{dup_i} \le 4] = \frac{1}{\log_2{3}}$.


**Tasks 2 and 3 (HitsCount and DCGScore).** Implement the functions *hits_count* and *dcg_score* as described above. Each function has two arguments: *dup_ranks* and *k*. *dup_ranks* is a list which contains *values of ranks* of duplicates. For example, *dup_ranks* is *[2]* for the example provided above.

In [None]:
def hits_count(dup_ranks, k):
    """
        dup_ranks: list of duplicates' ranks; one rank per question; 
                   length is a number of questions which we are looking for duplicates; 
                   rank is a number from 1 to len(candidates of the question); 
                   e.g. [2, 3] means that the first duplicate has the rank 2, the second one — 3.
        k: number of top-ranked elements (k in Hits@k metric)

        result: return Hits@k value for current ranking
    """
    ######################################
    ######### YOUR CODE HERE #############
    ######################################

In [102]:
answers = ["How does the catch keyword determine the type of exception that was thrown"]
    
    # candidates_ranking — the ranked sentences provided by our model
candidates_ranking = [["How Can I Make These Links Rotate in PHP", "NSLog array description not memory address",
                           "How does the catch keyword determine the type of exception that was thrown",
                           
                           "PECL_HTTP not recognised php ubuntu"]]
    # dup_ranks — position of the dup_i in the list of ranks +1
#dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
#dup_ranks
print(candidates_ranking[0])
candidates_ranking[0].index(answers[0])

['How Can I Make These Links Rotate in PHP', 'NSLog array description not memory address', 'How does the catch keyword determine the type of exception that was thrown', 'PECL_HTTP not recognised php ubuntu']


2

Test your code on the tiny examples:

In [None]:
def test_hits():
    # *Evaluation example*
    # answers — dup_i
    answers = ["How does the catch keyword determine the type of exception that was thrown"]
    
    # candidates_ranking — the ranked sentences provided by our model
    candidates_ranking = [["How Can I Make These Links Rotate in PHP", 
                           "How does the catch keyword determine the type of exception that was thrown",
                           "NSLog array description not memory address",
                           "PECL_HTTP not recognised php ubuntu"]]
    # dup_ranks — position of the dup_i in the list of ranks +1
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    
    # correct_answers — the expected values of the result for each k from 1 to 4
    correct_answers = [0, 1, 1, 1]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(hits_count(dup_ranks, k), correct):
            return "Check the function."
    
    # Other tests
    answers = ["How does the catch keyword determine the type of exception that was thrown", 
               "Convert Google results object (pure js) to Python object"]
    
    # The first test: both duplicates on the first position in ranked list
    candidates_ranking = [["How does the catch keyword determine the type of exception that was thrown",
                           "How Can I Make These Links Rotate in PHP"], 
                          ["Convert Google results object (pure js) to Python object",
                           "WPF- How to update the changes in list item of a list"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [1, 1]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(hits_count(dup_ranks, k), correct):
            return "Check the function (test: both duplicates on the first position in ranked list)."
        
    # The second test: one candidate on the first position, another — on the second
    candidates_ranking = [["How Can I Make These Links Rotate in PHP", 
                           "How does the catch keyword determine the type of exception that was thrown"], 
                          ["Convert Google results object (pure js) to Python object",
                           "WPF- How to update the changes in list item of a list"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [0.5, 1]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(hits_count(dup_ranks, k), correct):
            return "Check the function (test: one candidate on the first position, another — on the second)."

    # The third test: both candidates on the second position
    candidates_ranking = [["How Can I Make These Links Rotate in PHP", 
                           "How does the catch keyword determine the type of exception that was thrown"], 
                          ["WPF- How to update the changes in list item of a list",
                           "Convert Google results object (pure js) to Python object"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [0, 1]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(hits_count(dup_ranks, k), correct):
            return "Check the function (test: both candidates on the second position)."

    return "Basic test are passed."

In [None]:
print(test_hits())

In [None]:
def dcg_score(dup_ranks, k):
    """
        dup_ranks: list of duplicates' ranks; one rank per question; 
                   length is a number of questions which we are looking for duplicates; 
                   rank is a number from 1 to len(candidates of the question); 
                   e.g. [2, 3] means that the first duplicate has the rank 2, the second one — 3.
        k: number of top-ranked elements (k in DCG@k metric)

        result: return DCG@k value for current ranking
    """
    ######################################
    ######### YOUR CODE HERE #############
    ######################################

In [None]:
def test_dcg():
    # *Evaluation example*
    # answers — dup_i
    answers = ["How does the catch keyword determine the type of exception that was thrown"]
    
    # candidates_ranking — the ranked sentences provided by our model
    candidates_ranking = [["How Can I Make These Links Rotate in PHP", 
                           "How does the catch keyword determine the type of exception that was thrown",
                           "NSLog array description not memory address",
                           "PECL_HTTP not recognised php ubuntu"]]
    # dup_ranks — position of the dup_i in the list of ranks +1
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    
    # correct_answers — the expected values of the result for each k from 1 to 4
    correct_answers = [0, 1 / (np.log2(3)), 1 / (np.log2(3)), 1 / (np.log2(3))]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(dcg_score(dup_ranks, k), correct):
            return "Check the function."
    
    # Other tests
    answers = ["How does the catch keyword determine the type of exception that was thrown", 
               "Convert Google results object (pure js) to Python object"]

    # The first test: both duplicates on the first position in ranked list
    candidates_ranking = [["How does the catch keyword determine the type of exception that was thrown",
                           "How Can I Make These Links Rotate in PHP"], 
                          ["Convert Google results object (pure js) to Python object",
                           "WPF- How to update the changes in list item of a list"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [1, 1]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(dcg_score(dup_ranks, k), correct):
            return "Check the function (test: both duplicates on the first position in ranked list)."
        
    # The second test: one candidate on the first position, another — on the second
    candidates_ranking = [["How Can I Make These Links Rotate in PHP", 
                           "How does the catch keyword determine the type of exception that was thrown"], 
                          ["Convert Google results object (pure js) to Python object",
                           "WPF- How to update the changes in list item of a list"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [0.5, (1 + (1 / (np.log2(3)))) / 2]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(dcg_score(dup_ranks, k), correct):
            return "Check the function (test: one candidate on the first position, another — on the second)."
        
    # The third test: both candidates on the second position
    candidates_ranking = [["How Can I Make These Links Rotate in PHP",
                           "How does the catch keyword determine the type of exception that was thrown"], 
                          ["WPF- How to update the changes in list item of a list",
                           "Convert Google results object (pure js) to Python object"]]
    dup_ranks = [candidates_ranking[i].index(answers[i]) + 1 for i in range(len(answers))]
    correct_answers = [0, 1 / (np.log2(3))]
    for k, correct in enumerate(correct_answers, 1):
        if not np.isclose(dcg_score(dup_ranks, k), correct):
            return "Check the function (test: both candidates on the second position)."

    return "Basic test are passed."

In [None]:
print(test_dcg())

Submit results of the functions *hits_count* and *dcg_score* for the following examples to earn the points.

In [None]:
test_examples = [
    [1],
    [1, 2],
    [2, 1],
    [1, 2, 3],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [9, 5, 4, 2, 8, 10, 7, 6, 1, 3],
    [4, 3, 5, 1, 9, 10, 7, 8, 2, 6],
    [5, 1, 7, 6, 2, 3, 8, 9, 10, 4],
    [6, 3, 1, 4, 7, 2, 9, 8, 10, 5],
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
]

In [None]:
hits_results = []
for example in test_examples:
    for k in range(len(example)):
        hits_results.append(hits_count(example, k + 1))
grader.submit_tag('HitsCount', array_to_string(hits_results))

In [None]:
dcg_results = []
for example in test_examples:
    for k in range(len(example)):
        dcg_results.append(dcg_score(example, k + 1))
grader.submit_tag('DCGScore', array_to_string(dcg_results))

##  First solution: pre-trained embeddings

We will work with predefined train, validation and test corpora. All the files are tab-separated, but have a different format:
 - *train* corpus contains similar sentences at the same row.
 - *validation* corpus contains the following columns: *question*, *similar question*, *negative example 1*, *negative example 2*, ... 
 - *test* corpus contains the following columns: *question*, *example 1*, *example 2*, ...

Validation corpus will be used for the intermediate validation of models. The test data will be necessary for submitting the quality of your model in the system.

Now you should upload *validation* corpus to evaluate current solution.

In [105]:
def read_corpus(filename):
    data = []
    for line in open(filename, encoding='utf-8'):
        data.append(line.strip().split('\t'))
    return data

In [126]:
train = read_corpus('data/train.tsv')

In [110]:
validation = read_corpus('data/validation.tsv')

In [139]:
len(validation[1])
#train

1001

In [104]:
from sklearn.metrics.pairwise import cosine_similarity

We will use cosine distance to rank candidate questions which you need to implement in the function *rank_candidates*. The function should return a sorted list of pairs *(initial position in candidates list, candidate)*. Index of some pair corresponds to its rank (the first is the best). For example, if the list of candidates was *[a, b, c]* and the most similar is *c*, then *a* and *b*, the function should return a list *[(2, c), (0, a), (1, b)]*.

Pay attention, if you use the function *cosine_similarity* from *sklearn.metrics.pairwise* to calculate similarity because it works in a different way: most similar objects has greatest similarity. It's preferable to use a vectorized version of *cosine_similarity* function. Try to compute similarity at once and not use list comprehension. It should speed up your computations significantly.

In [427]:
def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings
        
        result: a list of pairs (initial position in the list, question)
    """
    score=[]
    index=[]
    return_list=[]
    question_vec=question_to_vec(question,embeddings).reshape(1,-1)
    #print(question_vec)
    #print(question_vec.shape)
    for i in range(len(candidates)):
        #print(candidate)
        candidate_vec=question_to_vec(candidates[i],embeddings).reshape(1,-1)
        #print(candidate_vec.shape)
        res=cosine_similarity(question_vec,candidate_vec)
        
        
        
        #print(res[0][0])
        score.append([(res[0][0]),i,candidates[i]])
        #print(score)
        
    sorted_score=sorted(score,reverse=True)
    #sorted_score.remove([0][0])

    #print(sorted_score[0][0])
    #print(len(sorted_score))
    for i in  range (len(sorted_score)):
        index.append(sorted_score[i][1])
        #print(index)   
        
   # print(sorted_score)   
        
    for j in index:
            return_list.append([j,candidates[j]])
        
   
    return return_list  
    
    ######################################
    ######### YOUR CODE HERE #############
    ######################################

Test your code on the tiny examples:

In [428]:
def test_rank_candidates():
    questions = ['converting string to list', 'Sending array via Ajax fails']
    candidates = [['Convert Google results object (pure js) to Python object', 
                   'C# create cookie from string and send it',
                   'How to use jQuery AJAX for an outside domain?'], 
                  ['Getting all list items of an unordered list in PHP', 
                   'WPF- How to update the changes in list item of a list', 
                   'select2 not displaying search results']]
    results = [[(1, 'C# create cookie from string and send it'), 
                (0, 'Convert Google results object (pure js) to Python object'), 
                (2, 'How to use jQuery AJAX for an outside domain?')],
               [(0, 'Getting all list items of an unordered list in PHP'), 
                (2, 'select2 not displaying search results'), 
                (1, 'WPF- How to update the changes in list item of a list')]]
    for question, q_candidates, result in zip(questions, candidates, results):
        ranks = rank_candidates(question, q_candidates, wv_embeddings, 300)
        print(ranks)
        if not np.all(ranks == result):
            return "Check the function."
    return "Basic tests are passed."

In [431]:
print(test_rank_candidates())

[[1, 'C# create cookie from string and send it'], [0, 'Convert Google results object (pure js) to Python object'], [2, 'How to use jQuery AJAX for an outside domain?']]
Check the function.


Now we can test the quality of the current approach. Run the next two cells to get the results. Pay attention that calculation of similarity between vectors takes time and this calculation is computed approximately in 10 minutes.

In [432]:
wv_ranking = []
for line in validation:
    q, *ex = line
    ranks = rank_candidates(q, ex, wv_embeddings)
    wv_ranking.append([r[0] for r in ranks].index(0) + 1)

In [435]:
validation[0]

['How to print a binary heap tree without recursion?',
 'How do you best convert a recursive function to an iterative one?',
 'How can i use ng-model with directive in angular js',
 'flash: drawing and erasing',
 'toggle react component using hide show classname',
 'Use a usercontrol from another project to current webpage',
 '~ Paths resolved differently after upgrading to ASP.NET 4',
 'Materialize datepicker - Rendering when an icon is clicked',
 'Creating PyPi package - Could not find a version that satisfies the requirement iso8601',
 'How can I analyze a confusion matrix?',
 'How do I declare a C array in Swift?',
 'Using rand() when flipping a coin and rolling a die',
 'Handling a JSON field with a special character in its name in Java',
 'React Native select row on ListView when push it',
 "Get 'creation_time' of video using ffmpeg and regex",
 'Does row exist and multiple where',
 "How to specify a classifier in a gradle dependency's dependency?",
 'Using $unwind on multiple do

In [434]:

ranks

[[0,
  'Replace conditional with polymorphism - nice in theory but not practical'],
 [522,
  'Different output for the same input using scipy optimize with L-BFGS-B algorithm'],
 [90,
  'Factor Clojure code setting many different fields in a Java object using a parameter map bound to a var or local'],
 [607,
  'What are the downsides of specifying synonyms directly in config instead of using synonyms_path'],
 [802,
  'Need to create li with list of different links using php explode method'],
 [349,
  'Passing a simple IEnumerable to view and using foreach to loop through returns a blank screen?'],
 [425,
  'Assert that certain parameterized vectors will throw an exception in JUnit?'],
 [900,
  'Which documentation should I use to develop & test makefile based applications with Yocto?'],
 [241,
  'How do you override an existing html form to use jquery to post the form?'],
 [189,
  'Difference between using and not using pipe in Export-Csv in Powershell'],
 [866,
  'I cannot use alphanu

In [None]:
for k in [1, 5, 10, 100, 500, 1000]:
    print("DCG@%4d: %.3f | Hits@%4d: %.3f" % (k, dcg_score(wv_ranking, k), k, hits_count(wv_ranking, k)))

If you did all the steps correctly, you should be frustrated by the received results. Let's try to understand why the quality is so low. First of all, when you work with some data it is necessary to have an idea how the data looks like. Print several questions from the data:

In [134]:
for line in validation[:3]:
    q, *examples = line
    print(q, *examples[:3])
    print('------------')
    print(q)
    print('------------')

How to print a binary heap tree without recursion? How do you best convert a recursive function to an iterative one? How can i use ng-model with directive in angular js flash: drawing and erasing
------------
How to print a binary heap tree without recursion?
------------
How to start PhoneStateListener programmatically? PhoneStateListener and service Java cast object[] to model WCF and What does this mean?
------------
How to start PhoneStateListener programmatically?
------------
jQuery: Show a div2 when mousenter over div1 is over when hover on div1 depenting on if it is on div2 or not it should act differently How to run selenium in google app engine/cloud? Python Comparing two lists of strings for similarities
------------
jQuery: Show a div2 when mousenter over div1 is over
------------


As you can see, we deal with the raw data. It means that we have many punctuation marks, special characters and unlowercased letters. In our case, it could lead to the situation where we can't find some embeddings, e.g. for the word "grid?". 

To solve this problem you should use the functions *text_prepare* from the previous assignments to prepare the data.

In [None]:
from util import text_prepare

Now transform all the questions from the validation set:

In [None]:
prepared_validation = []
for line in validation:
    ######### YOUR CODE HERE #############

Let's evaluate the approach again after the preparation:

In [None]:
wv_prepared_ranking = []
for line in prepared_validation:
    q, *ex = line
    ranks = rank_candidates(q, ex, wv_embeddings)
    wv_prepared_ranking.append([r[0] for r in ranks].index(0) + 1)

In [None]:
for k in [1, 5, 10, 100, 500, 1000]:
    print("DCG@%4d: %.3f | Hits@%4d: %.3f" % (k, dcg_score(wv_prepared_ranking, k), 
                                              k, hits_count(wv_prepared_ranking, k)))

Now, prepare also train and test data, because you will need it in the future:

In [None]:
def prepare_file(in_, out_):
    out = open(out_, 'w')
    for line in open(in_, encoding='utf8'):
        line = line.strip().split('\t')
        new_line = [text_prepare(q) for q in line]
        print(*new_line, sep='\t', file=out)
    out.close()

In [None]:
######################################
######### YOUR CODE HERE #############
######################################

**Task 4 (W2VTokenizedRanks).** For each question from prepared *test.tsv* submit the ranks of the candidates to earn the points. The calculations should take about 3-5 minutes. Pay attention that the function *rank_candidates* returns a ranking, while in this case you should find a position in this ranking. Ranks should start with 1.

In [None]:
from util import matrix_to_string

In [None]:
w2v_ranks_results = []
prepared_test_data = ######### YOUR CODE HERE #############
for line in open(prepared_test_data):
    q, *ex = line.strip().split('\t')
    ranks = rank_candidates(q, ex, wv_embeddings, 300)
    ranked_candidates = [r[0] for r in ranks]
    w2v_ranks_results.append([ranked_candidates.index(i) + 1 for i in range(len(ranked_candidates))])
    
grader.submit_tag('W2VTokenizedRanks', matrix_to_string(w2v_ranks_results))

## Advanced solution: StarSpace embeddings

Now you are ready to train your own word embeddings! In particular, you need to train embeddings specially for our task of duplicates detection. Unfortunately, StarSpace cannot be run on Windows and we recommend to use provided
[docker container](https://github.com/hse-aml/natural-language-processing/blob/master/Docker-tutorial.md) or other alternatives. Don't delete results of this task because you will need it in the final project.

### How it works and what's the main difference with word2vec?
The main point in this section is that StarSpace can be trained specifically for some tasks. In contrast to word2vec model, which tries to train similar embeddings for words in similar contexts, StarSpace uses embeddings for the whole sentence (just as a sum of embeddings of words and phrases). Despite the fact that in both cases we get word embeddings as a result of the training, StarSpace embeddings are trained using some supervised data, e.g. a set of similar sentence pairs, and thus they can better suit the task.

In our case, StarSpace should use two types of sentence pairs for training: "positive" and "negative". "Positive" examples are extracted from the train sample (duplicates, high similarity) and the "negative" examples are generated randomly (low similarity assumed). 

### How to choose the best params for the model?
Normally, you would start with some default choice and then run extensive experiments to compare different strategies. However, we have some recommendations ready for you to save your time:
- Be careful with choosing the suitable training mode. In this task we want to explore texts similarity which corresponds to *trainMode = 3*.
- Use adagrad optimization (parameter *adagrad = true*).
- Set the length of phrase equal to 1 (parameter *ngrams*), because we need embeddings only for words.
- Don't use a large number of *epochs* (we think that 5 should be enough).
- Try dimension *dim* equal to 100.
- To compare embeddings usually *cosine* *similarity* is used.
- Set *minCount* greater than 1 (for example, 2) if you don't want to get embeddings for extremely rare words.
- Parameter *verbose = true* could show you the progress of the training process.
- Set parameter *fileFormat* equals *labelDoc*.
- Parameter *negSearchLimit* is responsible for a number of negative examples which is used during the training. We think that 10 will be enought for this task.
- To increase a speed of training we recommend to set *learning rate* to 0.05.

Train StarSpace embeddings for unigrams on the train dataset. You don't need to change the format of the input data. Just don't forget to use prepared version of the training data. 

If you follow the instruction, the training process will take about 1 hour.

In [None]:
######### TRAINING HAPPENING HERE #############

And now we can compare the new embeddings with the previous ones. You can find trained word vectors in the file *[model_file_name].tsv*. Upload the embeddings from StarSpace into a dict. 

In [None]:
starspace_embeddings = ######### YOUR CODE HERE #############

In [None]:
ss_prepared_ranking = []
for line in prepared_validation:
    q, *ex = line
    ranks = rank_candidates(q, ex, starspace_embeddings, 100)
    ss_prepared_ranking.append([r[0] for r in ranks].index(0) + 1)

In [None]:
for k in [1, 5, 10, 100, 500, 1000]:
    print("DCG@%4d: %.3f | Hits@%4d: %.3f" % (k, dcg_score(ss_prepared_ranking, k), 
                                               k, hits_count(ss_prepared_ranking, k)))

Due to training for the particular task with the supervised data, you should expect to obtain a higher quality than for the previous approach. In additiion, despite the fact that StarSpace's trained vectors have a smaller dimension than word2vec's, it provides better results in this task.

**Task 5 (StarSpaceRanks).** For each question from prepared *test.tsv* submit the ranks of the candidates for trained representation.

In [None]:
starspace_ranks_results = []
prepared_test_data = ######### YOUR CODE HERE #############
for line in open(prepared_test_data):
    q, *ex = line.strip().split('\t')
    ranks = rank_candidates(q, ex, starspace_embeddings, 100)
    ranked_candidates = [r[0] for r in ranks]
    starspace_ranks_results.append([ranked_candidates.index(i) + 1 for i in range(len(ranked_candidates))])
    
grader.submit_tag('StarSpaceRanks', matrix_to_string(starspace_ranks_results))

### Authorization & Submission
To submit assignment parts to Cousera platform, please, enter your e-mail and token into variables below. You can generate token on this programming assignment page. <b>Note:</b> Token expires 30 minutes after generation.

In [None]:
STUDENT_EMAIL = # EMAIL 
STUDENT_TOKEN = # TOKEN 
grader.status()

If you want to submit these answers, run cell below

In [None]:
grader.submit(STUDENT_EMAIL, STUDENT_TOKEN)