# CAI Lab Session 4: Implementing search in the vector space model

In this session you will:

- Continue to work with the `arxiv` repository from last session
- Learn how to do atomic, conjunctive and disjunctive search with ElasticSearch
- Build an inverted index for the `arxiv` repository from last session (should fit in main memory)
- Implement search in the vector space model and compare it with ElasticSearch built-in search mechanism
- Compare different implementations of search

## 1. Built-in search in ElasticSearch

ElasticSearch provides a search mechanism to make queries against a database. 
In the next code snippet you can find examples on how to do this with an atomic query (single term)
and with conjunctive and disjunctive queries.

In [2]:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl.query import Q


client = Elasticsearch("http://localhost:9200", request_timeout=1000)
s = Search(using=client, index='arxiv')

## atomic query
q = Q('query_string',query='computer')  # Feel free to change the word

s = s.query(q)
response = s[:5].execute()
for r in response:  # only returns a specific number of results
    print('ID= %s SCORE=%s' % (r.meta.id,  r.meta.score))
    print('PATH= %s' % r.path)
    print('TEXT: %s' % r.text[:90])
    print()

ID= mlNkL4sBF4_-Pm_zvsNN SCORE=3.2082987
PATH= ../../arxiv\cs.updates.on.arXiv.org/002772
TEXT: Limit computable functions can be characterized by Turing jumps on the input side or limit

ID= clRkL4sBF4_-Pm_z6BJ_ SCORE=3.2082987
PATH= ../../arxiv\math.updates.on.arXiv.org/001904
TEXT: Limit computable functions can be characterized by Turing jumps on the input side or limit

ID= GlNkL4sBF4_-Pm_zxtUl SCORE=3.1941829
PATH= ../../arxiv\cs.updates.on.arXiv.org/007252
TEXT: We study scheduling of computation tasks across $n$ workers in a large scale distributed l

ID= DlRkL4sBF4_-Pm_z6xri SCORE=3.1941829
PATH= ../../arxiv\math.updates.on.arXiv.org/003852
TEXT: We study scheduling of computation tasks across $n$ workers in a large scale distributed l

ID= KFNkL4sBF4_-Pm_zx9YE SCORE=3.1520977
PATH= ../../arxiv\cs.updates.on.arXiv.org/007522
TEXT: Grid Computing is an idea of a new kind of network technology in which research work in pr



In [2]:
## conjunctive query

client = Elasticsearch("http://localhost:9200", request_timeout=1000)
s = Search(using=client, index='arxiv')

q = Q('query_string',query='computer') & Q('query_string',query='magic')

s = s.query(q)
response = s[0:5].execute()
for r in response:  # only returns a specific number of results
    print(f'ID= {r.meta.id} SCORE={r.meta.score}')
    print(f'PATH= {r.path}')
    print(f'TEXT: {r.text[:90]}')
    print()

ID= 1edDGosBAcLkfcUZTnGi SCORE=15.135108
PATH= /tmp/arxiv/quant-ph.updates.on.arXiv.org/000677
TEXT: We give a new algorithm for computing the robustness of magic - a measure of the utility o

ID= c-dDGosBAcLkfcUZT3RR SCORE=15.135108
PATH= /tmp/arxiv/quant-ph.updates.on.arXiv.org/000650
TEXT: We give a new algorithm for computing the robustness of magic - a measure of the utility o

ID= wOdDGosBAcLkfcUZTW_k SCORE=12.778042
PATH= /tmp/arxiv/quant-ph.updates.on.arXiv.org/001652
TEXT: A defining feature in the field of quantum computing is the potential of a quantum device 

ID= QudDGosBAcLkfcUZbeou SCORE=11.47281
PATH= /tmp/arxiv/cond-mat.updates.on.arXiv.org/000521
TEXT: Smale's 7-th problem concerns N-point configurations on the 2-dim sphere which minimize th

ID= TehDGosBAcLkfcUZgUnR SCORE=11.47281
PATH= /tmp/arxiv/math.updates.on.arXiv.org/000731
TEXT: Smale's 7-th problem concerns N-point configurations on the 2-dim sphere which minimize th



In [3]:
## disjunctive query

client = Elasticsearch("http://localhost:9200", request_timeout=1000)
s = Search(using=client, index='arxiv')

q = Q('query_string',query='computer') | Q('query_string',query='magic')

s = s.query(q)
response = s[0:5].execute()
for r in response:  # only returns a specific number of results
    print(f'ID= {r.meta.id} SCORE={r.meta.score}')
    print(f'PATH= {r.path}')
    print(f'TEXT: {r.text[:90]}')
    print()

ID= iFRkL4sBF4_-Pm_z_0ai SCORE=14.512409
PATH= ../../arxiv\quant-ph.updates.on.arXiv.org/000650
TEXT: We give a new algorithm for computing the robustness of magic - a measure of the utility o

ID= o1RkL4sBF4_-Pm_z_0ai SCORE=14.512409
PATH= ../../arxiv\quant-ph.updates.on.arXiv.org/000677
TEXT: We give a new algorithm for computing the robustness of magic - a measure of the utility o

ID= KVNkL4sBF4_-Pm_ztrEM SCORE=12.0883665
PATH= ../../arxiv\cond-mat.updates.on.arXiv.org/003482
TEXT: When two monolayers of graphene are stacked with a small relative twist angle, the resulti

ID= ClRkL4sBF4_-Pm_z4QbH SCORE=11.981175
PATH= ../../arxiv\hep-th.updates.on.arXiv.org/000265
TEXT: We introduce the extended Freudenthal-Rosenfeld-Tits magic square based on six algebras: t

ID= vVRkL4sBF4_-Pm_z5g5U SCORE=11.981175
PATH= ../../arxiv\math.updates.on.arXiv.org/000955
TEXT: We introduce the extended Freudenthal-Rosenfeld-Tits magic square based on six algebras: t



## 2. Excruciatingly slow search

In class we have presented a _slow_ version of search that, given a search query $q$, loops over every document in the database
computing the cosine similarity between document and query. Once this is done, it sorts documents by their similarity w.r.t. $q$ and returns the top $r$
scoring ones. 

```
1. for each d in D:
    sim(d,q) = 0
    get vector representing d
    for each w in q:
        sim(d,q) += tf(d,w) * idf(w)
    normalize sim(d,q) by |d|*|q|
2. sort results by similarity
3. return top r docs
```

A possible implementation can be found below. 

__Remark:__ _It should be important to note that there are certain elements in the implementation below that refer to my own
implementation, and that you should adapt to your own; in particular, the line_

```    weights = dict(normalize(tf_idf(s['_id'])))   # gets weights as a python dict of term -> weight ```

_obtains tf-idf weights through calling a function `tf_idf` that I have implemented that, given a docid, returns a list of pairs (term, weight); and `normalize` takes such a list a normalizes weights so that the corresponding vector has length 1. 
Obviously, you should adapt the code to your own implementations from previous sessions._


In [None]:
!pip

In [5]:
from elasticsearch.helpers import scan
from pprint import pprint
from elasticsearch import Elasticsearch
import tqdm
import numpy as np

# get tf-idf vector from doc (internal) id
def tf_idf(doc_id):
    # does nothing, adapt to your needs
    return []

# normalizes weights so that resulting vec has length 1
def normalize(l1):
    # does nothing, adapt to your needs
    return l1

client = Elasticsearch("http://localhost:9200", request_timeout=1000)

r = 10  # only return r top docs
query = 'computer magic'
sims = dict()

l2query  = np.sqrt(len(query.split()))  # l2 of query assuming 0-1 vector representation

# get nr. of docs; just for the progress bar
ndocs = int(client.cat.count(index='arxiv', format = "json")[0]['count'])  # D

# scan through docs, compute cosine sim between query and each doc
for s in tqdm.tqdm(scan(client, index='arxiv', query={"query" : {"match_all": {}}}), total=ndocs):
    docid = s['_source']['path']   # use path as id
    
    sims[docid] = 0.0
    weights = dict(normalize(tf_idf(s['_id'])))

    print(docid, s['_id'])

    for w in query.split():  # gets terms as a list
        if w in weights:    # probably need to do something fancier to make sure that word is in vocabulary etc.
            sims[docid] += weights[w]   # accumulates if w in current doc

  2%|▏         | 1001/58102 [00:00<00:18, 3076.17it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/000000 DlNkL4sBF4_-Pm_zhGg8
../../arxiv\astro-ph.updates.on.arXiv.org/000001 D1NkL4sBF4_-Pm_zhGhD
../../arxiv\astro-ph.updates.on.arXiv.org/000002 EFNkL4sBF4_-Pm_zhGhD
../../arxiv\astro-ph.updates.on.arXiv.org/000003 EVNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000004 ElNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000005 E1NkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000006 FFNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000007 FVNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000008 FlNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000009 F1NkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000010 GFNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000011 GVNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000012 GlNkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph.updates.on.arXiv.org/000013 G1NkL4sBF4_-Pm_zhGhE
../../arxiv\astro-ph

  3%|▎         | 2001/58102 [00:00<00:16, 3460.24it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/001001 91NkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001002 -FNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001003 -VNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001004 -lNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001005 -1NkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001006 _FNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001007 _VNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001008 _lNkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001009 _1NkL4sBF4_-Pm_zjGsE
../../arxiv\astro-ph.updates.on.arXiv.org/001010 AFNkL4sBF4_-Pm_zjGwE
../../arxiv\astro-ph.updates.on.arXiv.org/001011 AVNkL4sBF4_-Pm_zjGwE
../../arxiv\astro-ph.updates.on.arXiv.org/001012 AlNkL4sBF4_-Pm_zjGwE
../../arxiv\astro-ph.updates.on.arXiv.org/001013 A1NkL4sBF4_-Pm_zjGwE
../../arxiv\astro-ph.updates.on.arXiv.org/001014 BFNkL4sBF4_-Pm_zjGwE
../../arxiv\astro-ph

  7%|▋         | 4001/58102 [00:01<00:11, 4693.95it/s]

../../arxiv\cs.updates.on.arXiv.org/001492 mlNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001493 m1NkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001494 nFNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001495 nVNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001496 nlNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001497 n1NkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001498 oFNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001499 oVNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001500 olNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001501 o1NkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001502 pFNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001503 pVNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001504 plNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001505 p1NkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/001506 qFNkL4sBF4_-Pm_zvL6K
../../arxiv\cs.updates.on.arXiv.org/0015

  9%|▊         | 5001/58102 [00:01<00:09, 5649.33it/s]

../../arxiv\cs.updates.on.arXiv.org/003492 alNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003493 a1NkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003494 bFNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003495 bVNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003496 blNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003497 b1NkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003498 cFNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003499 cVNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003500 clNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003501 c1NkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003502 dFNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003503 dVNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003504 dlNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003505 d1NkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/003506 eFNkL4sBF4_-Pm_zwMYP
../../arxiv\cs.updates.on.arXiv.org/0035

 12%|█▏        | 7001/58102 [00:01<00:09, 5676.81it/s]

../../arxiv\cond-mat.updates.on.arXiv.org/001424 H1NkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001425 IFNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001426 IVNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001427 IlNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001428 I1NkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001429 JFNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001430 JVNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001431 JlNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001432 J1NkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001433 KFNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001434 KVNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001435 KlNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001436 K1NkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat.updates.on.arXiv.org/001437 LFNkL4sBF4_-Pm_zsql8
../../arxiv\cond-mat

 14%|█▍        | 8001/58102 [00:01<00:08, 5995.22it/s]

../../arxiv\cond-mat.updates.on.arXiv.org/003423 7lNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003424 71NkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003425 8FNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003426 8VNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003427 8lNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003428 81NkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003429 9FNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003430 9VNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003431 9lNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003432 91NkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003433 -FNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003434 -VNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003435 -lNkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat.updates.on.arXiv.org/003436 -1NkL4sBF4_-Pm_ztrAM
../../arxiv\cond-mat

 16%|█▌        | 9166/58102 [00:01<00:09, 4965.82it/s]

../../arxiv\cs.updates.on.arXiv.org/004170 EFNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004171 EVNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004172 ElNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004173 E1NkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004174 FFNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004175 FVNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004176 FlNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004177 F1NkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004178 GFNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004179 GVNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004180 GlNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004181 G1NkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004182 HFNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004183 HVNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/004184 HlNkL4sBF4_-Pm_zwMnk
../../arxiv\cs.updates.on.arXiv.org/0041

 19%|█▉        | 11072/58102 [00:02<00:07, 6597.83it/s]

../../arxiv\cs.updates.on.arXiv.org/005541 a1NkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005542 bFNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005543 bVNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005544 blNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005545 b1NkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005546 cFNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005547 cVNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005548 clNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005549 c1NkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005550 dFNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005551 dVNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005552 dlNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005553 d1NkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005554 eFNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/005555 eVNkL4sBF4_-Pm_zw85r
../../arxiv\cs.updates.on.arXiv.org/0055

 26%|██▌       | 15001/58102 [00:02<00:04, 9612.51it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/014288 3lNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014289 31NkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014290 4FNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014291 4VNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014292 4lNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014293 41NkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014294 5FNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014295 5VNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014296 5lNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014297 51NkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014298 6FNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014299 6VNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014300 6lNkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph.updates.on.arXiv.org/014301 61NkL4sBF4_-Pm_zrZ_5
../../arxiv\astro-ph

 28%|██▊       | 16003/58102 [00:02<00:05, 7864.80it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/011684 slNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011685 s1NkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011686 tFNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011687 tVNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011688 tlNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011689 t1NkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011690 uFNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011691 uVNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011692 ulNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011693 u1NkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011694 vFNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011695 vVNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011696 vlNkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph.updates.on.arXiv.org/011697 v1NkL4sBF4_-Pm_zqZU_
../../arxiv\astro-ph

 29%|██▉       | 17001/58102 [00:02<00:05, 7602.85it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/012686 nFNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012687 nVNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012688 nlNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012689 n1NkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012690 oFNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012691 oVNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012692 olNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012693 o1NkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012694 pFNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012695 pVNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012696 plNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012697 p1NkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012698 qFNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph.updates.on.arXiv.org/012699 qVNkL4sBF4_-Pm_zq5k2
../../arxiv\astro-ph

 33%|███▎      | 19001/58102 [00:03<00:04, 8001.66it/s]

../../arxiv\cond-mat.updates.on.arXiv.org/005132 m1NkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005133 nFNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005134 nVNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005135 nlNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005136 n1NkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005137 oFNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005138 oVNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005139 olNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005140 o1NkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005141 pFNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005142 pVNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005143 plNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005144 p1NkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat.updates.on.arXiv.org/005145 qFNkL4sBF4_-Pm_zuLf-
../../arxiv\cond-mat

 36%|███▌      | 21001/58102 [00:03<00:04, 8347.24it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/006806 pFNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006807 pVNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006808 plNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006809 p1NkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006810 qFNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006811 qVNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006812 qlNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006813 q1NkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006814 rFNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006815 rVNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006816 rlNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006817 r1NkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006818 sFNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph.updates.on.arXiv.org/006819 sVNkL4sBF4_-Pm_znoI0
../../arxiv\astro-ph

 40%|███▉      | 23031/58102 [00:03<00:03, 8775.13it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/008908 2lNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008909 21NkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008910 3FNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008911 3VNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008912 3lNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008913 31NkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008914 4FNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008915 4VNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008916 4lNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008917 41NkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008918 5FNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008919 5VNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008920 5lNkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph.updates.on.arXiv.org/008921 51NkL4sBF4_-Pm_zooqL
../../arxiv\astro-ph

 43%|████▎     | 25001/58102 [00:03<00:03, 8451.84it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/004171 WVNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004172 WlNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004173 W1NkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004174 XFNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004175 XVNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004176 XlNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004177 X1NkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004178 YFNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004179 YVNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004180 YlNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004181 Y1NkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004182 ZFNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004183 ZVNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph.updates.on.arXiv.org/004184 ZlNkL4sBF4_-Pm_zl3j9
../../arxiv\astro-ph

 48%|████▊     | 28099/58102 [00:04<00:03, 9560.70it/s]

../../arxiv\astro-ph.updates.on.arXiv.org/002575 HVNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002576 HlNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002577 H1NkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002578 IFNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002579 IVNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002580 IlNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002581 I1NkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002582 JFNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002583 JVNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002584 JlNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002585 J1NkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002586 KFNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002587 KVNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph.updates.on.arXiv.org/002588 KlNkL4sBF4_-Pm_zknJ4
../../arxiv\astro-ph

 52%|█████▏    | 30393/58102 [00:04<00:02, 9328.26it/s]

../../arxiv\cs.updates.on.arXiv.org/008336 VlNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008337 V1NkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008338 WFNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008339 WVNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008340 WlNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008341 W1NkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008342 XFNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008343 XVNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008344 XlNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008345 X1NkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008346 YFNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008347 YVNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008348 YlNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008349 Y1NkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/008350 ZFNkL4sBF4_-Pm_zydlI
../../arxiv\cs.updates.on.arXiv.org/0083

 58%|█████▊    | 33752/58102 [00:04<00:02, 10268.83it/s]

../../arxiv\cs.updates.on.arXiv.org/010963 mVNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010964 mlNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010965 m1NkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010966 nFNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010967 nVNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010968 nlNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010969 n1NkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010970 oFNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010971 oVNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010972 olNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010973 o1NkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010974 pFNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010975 pVNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010976 plNkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/010977 p1NkL4sBF4_-Pm_z0ONz
../../arxiv\cs.updates.on.arXiv.org/0109

 60%|██████    | 35001/58102 [00:04<00:02, 9239.34it/s] 

../../arxiv\cs.updates.on.arXiv.org/013088 5lNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013089 51NkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013090 6FNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013091 6VNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013092 6lNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013093 61NkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013094 7FNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013095 7VNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013096 7lNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013097 71NkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013098 8FNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013099 8VNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013100 8lNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013101 81NkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/013102 9FNkL4sBF4_-Pm_z1Ove
../../arxiv\cs.updates.on.arXiv.org/0131

 64%|██████▍   | 37401/58102 [00:05<00:01, 10543.72it/s]

../../arxiv\cs.updates.on.arXiv.org/015336 rlNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015337 r1NkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015338 sFNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015339 sVNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015340 slNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015341 s1NkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015342 tFNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015343 tVNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015344 tlNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015345 t1NkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015346 uFNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015347 uVNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015348 ulNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015349 u1NkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/015350 vFNkL4sBF4_-Pm_z2vRV
../../arxiv\cs.updates.on.arXiv.org/0153

 71%|███████   | 41001/58102 [00:05<00:01, 11328.08it/s]

../../arxiv\hep-ph.updates.on.arXiv.org/000573 ZlRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000574 Z1RkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000575 aFRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000576 aVRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000577 alRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000578 a1RkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000579 bFRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000580 bVRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000581 blRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000582 b1RkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000583 cFRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000584 cVRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000585 clRkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000586 c1RkL4sBF4_-Pm_z3wDY
../../arxiv\hep-ph.updates.on.arXiv.org/000587 d

 73%|███████▎  | 42145/58102 [00:05<00:01, 10652.13it/s]

../../arxiv\math.updates.on.arXiv.org/000284 HlRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000285 H1RkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000286 IFRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000287 IVRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000288 IlRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000289 I1RkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000290 JFRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000291 JVRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000292 JlRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000293 J1RkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000294 KFRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000295 KVRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000296 KlRkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000297 K1RkL4sBF4_-Pm_z5QxZ
../../arxiv\math.updates.on.arXiv.org/000298 LFRkL4sBF4_-Pm_z5QxZ
../../arxi

 76%|███████▌  | 44123/58102 [00:05<00:01, 7992.04it/s] 

../../arxiv\math.updates.on.arXiv.org/001284 BlRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001285 B1RkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001286 CFRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001287 CVRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001288 ClRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001289 C1RkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001290 DFRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001291 DVRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001292 DlRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001293 D1RkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001294 EFRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001295 EVRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001296 ElRkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001297 E1RkL4sBF4_-Pm_z5xB9
../../arxiv\math.updates.on.arXiv.org/001298 FFRkL4sBF4_-Pm_z5xB9
../../arxi

 83%|████████▎ | 48001/58102 [00:06<00:00, 10611.09it/s]

../../arxiv\math.updates.on.arXiv.org/004284 vlRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004285 v1RkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004286 wFRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004287 wVRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004288 wlRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004289 w1RkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004290 xFRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004291 xVRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004292 xlRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004293 x1RkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004294 yFRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004295 yVRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004296 ylRkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004297 y1RkL4sBF4_-Pm_z7BvN
../../arxiv\math.updates.on.arXiv.org/004298 zFRkL4sBF4_-Pm_z7BvN
../../arxi

 86%|████████▌ | 50001/58102 [00:06<00:00, 11619.84it/s]

../../arxiv\math.updates.on.arXiv.org/007284 dlRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007285 d1RkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007286 eFRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007287 eVRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007288 elRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007289 e1RkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007290 fFRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007291 fVRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007292 flRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007293 f1RkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007294 gFRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007295 gVRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007296 glRkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007297 g1RkL4sBF4_-Pm_z8Sfa
../../arxiv\math.updates.on.arXiv.org/007298 hFRkL4sBF4_-Pm_z8Sfa
../../arxi

 93%|█████████▎| 54001/58102 [00:06<00:00, 12387.66it/s]

../../arxiv\physics.updates.on.arXiv.org/002822 LlRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002823 L1RkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002824 MFRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002825 MVRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002826 MlRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002827 M1RkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002828 NFRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002829 NVRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002830 NlRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002831 N1RkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002832 OFRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002833 OVRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002834 OlRkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arXiv.org/002835 O1RkL4sBF4_-Pm_z9zPb
../../arxiv\physics.updates.on.arX

 97%|█████████▋| 56560/58102 [00:06<00:00, 12349.06it/s]

../../arxiv\physics.updates.on.arXiv.org/005822 5lRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005823 51RkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005824 6FRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005825 6VRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005826 6lRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005827 61RkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005828 7FRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005829 7VRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005830 7lRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005831 71RkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005832 8FRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005833 8VRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005834 8lRkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arXiv.org/005835 81RkL4sBF4_-Pm_z_D7e
../../arxiv\physics.updates.on.arX

100%|██████████| 58102/58102 [00:06<00:00, 8360.09it/s] 

../../arxiv\quant-ph.updates.on.arXiv.org/001501 21RlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001502 3FRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001503 3VRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001504 3lRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001505 31RlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001506 4FRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001507 4VRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001508 4lRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001509 41RlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001510 5FRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001511 5VRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001512 5lRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001513 51RlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph.updates.on.arXiv.org/001514 6FRlL4sBF4_-Pm_zAUkv
../../arxiv\quant-ph




In [3]:
# from elasticsearch.helpers import scan
# from pprint import pprint
# from elasticsearch import Elasticsearch
# import tqdm
# import numpy as np

# # get tf-idf vector from doc (internal) id
# def tf_idf(doc_id):
#     # does nothing, adapt to your needs
#     return []

# # normalizes weights so that resulting vec has length 1
# def normalize(l1):
#     # does nothing, adapt to your needs
#     return l1

# client = Elasticsearch("http://localhost:9200", request_timeout=1000)

# r = 10  # only return r top docs
# query = 'computer magic'
# sims = dict()

# l2query  = np.sqrt(len(query.split()))  # l2 of query assuming 0-1 vector representation

# # get nr. of docs; just for the progress bar
# ndocs = int(client.cat.count(index='arxiv', format = "json")[0]['count'])

# # scan through docs, compute cosine sim between query and each doc
# for s in tqdm.tqdm(scan(client, index='arxiv', query={"query" : {"match_all": {}}}), total=ndocs):
#     docid = s['_source']['path']   # use path as id
    
#     sims[docid] = 0.0
#     weights = dict(normalize(tf_idf(s['_id'])))   # get tf-idf weights representing doc as dict
#     for w in query.split():  # gets terms as a list
#         if w in weights:    # probably need to do something fancier to make sure that word is in vocabulary etc.
#             sims[docid] += weights[w]   # accumulates if w in current doc
#     # normalize sim
#     sims[docid] /= l2query

# # now sort by cosine similarity
# sorted_answer = sorted(sims.items(), key=lambda kv: kv[1], reverse=True)
# pprint(sorted_answer[:r])


100%|██████████| 58102/58102 [05:49<00:00, 166.07it/s]

[('/tmp/arxiv/quant-ph.updates.on.arXiv.org/000650', 0.46298539019176793),
 ('/tmp/arxiv/quant-ph.updates.on.arXiv.org/000677', 0.46285572520081464),
 ('/tmp/arxiv/cond-mat.updates.on.arXiv.org/003482', 0.41693456487012037),
 ('/tmp/arxiv/quant-ph.updates.on.arXiv.org/001475', 0.3078298379878905),
 ('/tmp/arxiv/astro-ph.updates.on.arXiv.org/002083', 0.26997407750109564),
 ('/tmp/arxiv/math.updates.on.arXiv.org/002825', 0.2637693594252774),
 ('/tmp/arxiv/astro-ph.updates.on.arXiv.org/001294', 0.2583918756706909),
 ('/tmp/arxiv/hep-th.updates.on.arXiv.org/000265', 0.2554940223200789),
 ('/tmp/arxiv/math.updates.on.arXiv.org/000955', 0.2554940223200789),
 ('/tmp/arxiv/hep-th.updates.on.arXiv.org/000255', 0.2505494496917682)]





In [24]:
nz = len([x for x, s in sorted_answer if s>0])
total = len(sorted_answer)
print(f'There are {nz} docs with non-zero similarity out of {total}, i.e. {100.0*nz/total:.1f}%')

There are 1948 docs with non-zero similarity out of 58102, i.e. 3.4%


## 3. Your tasks

---

**Exercise 1:**  

Make sure you understand the algorithm for implementing search described in the lecture notes. Both slow and efficient versions. Describe
the number of sums you need to do in both slow and quick versions for the following toy example with a vocabulary of size 4 and four documents:

- $q = 0,1,1,0$

- document-term matrix:
<center>


|        | t1  | t2  | t3  | t4  |
|--------|-----|-----|-----|-----|
| **d1** | 1.2 | 0.0 | 0.0 | 0.0 |
| **d2** | 0.7 | 0.3 | 1.5 | 0.1 |
| **d3** | 0.0 | 0.0 | 0.0 | 0.7 |
| **d4** | 2.0 | 0.0 | 0.0 | 0.0 |

</center>

---

**Exercise 2:**

Implement the quick version; run both slow and quick versions and report times (as a reference, in my old laptop it takes around 5m30s to run the slow version in the code above). Make sure both versions return the same answer. Note that you will need to build an inverted index in order to implement the efficient version as explained in class; it may take time but this is done once for all queries, and can be done "off-line".

---

**Exercise 3:**

Compare the results for a few sample queries that you get from your quick version and ElasticSearch search. Do you get similar results? Which is faster?

---

## 4. Rules of delivery

- To be solved in _pairs_.

- No plagiarism; don't discuss your work with other teams. You can ask for help to others for simple things, such as recalling a python instruction or module, but nothing too specific to the session.

- If you feel you are spending much more time than the rest of the classmates, ask us for help. Questions can be asked either in person or by email, and you'll never be penalized by asking questions, no matter how stupid they look in retrospect.

- Write a short report listing the solutions to the exercises proposed. Include things like the important parts of your implementation (data structures used for representing objects, algorithms used, etc). You are welcome to add conclusions and findings that depart from what we asked you to do. We encourage you to discuss the difficulties you find; this lets us give you help and also improve the lab session for future editions.

- Turn the report to PDF. Make sure it has your names, date, and title. Include your code in your submission.

- Submit your work through the [raco](http://www.fib.upc.edu/en/serveis/raco.html) _before November 6th, 2023_.