# Query examples

## Example 1

Parse the authors name of the 14th paper in the paper list queried with "particles", and the following 10.

In [15]:
import pandas as pd
from pprint import pprint
from scrapxiv.shelf import Shelf

Intialise a shelf with a query for deep learning papers, starting from the 14th paper in the output query, and getting the first 10 results:

In [17]:
shelf = Shelf()
shelf.query(keywords="deep learning", start_index=14, max_results=10)

Getting the basic information of the papers in the shelf:

In [19]:
shelf.get_papers_ids()

['1101.5071v2',
 '1612.05468v1',
 '1205.2046v1',
 '1401.6410v1',
 '1711.00225v2',
 '0407322v3',
 '1610.04315v1',
 '1610.08027v1']

In [21]:
from pprint import pprint
pprint(shelf.get_papers_info())

{'0407322v3': {'authors': ['Boris L. Granovsky', 'Dudley Stark'],
               'date': '2004-07-19T12:11:23Z',
               'title': 'Asymptotic enumeration and logical limit laws for '
                        'expansive multisetsand selections'},
 '1101.5071v2': {'authors': ['Jean-Baptiste Gramain', 'Jorn B. Olsson'],
                 'date': '2011-01-26T14:47:11Z',
                 'title': 'On bar lengths in partitions'},
 '1205.2046v1': {'authors': ['Mark Sh. Levin'],
                 'date': '2012-05-09T17:42:36Z',
                 'title': 'Multiset Estimates and Combinatorial Synthesis'},
 '1401.6410v1': {'authors': ['Christian Steinruecken'],
                 'date': '2014-01-24T17:36:32Z',
                 'title': 'Compressing Sets and Multisets of Sequences'},
 '1610.04315v1': {'authors': ['Renzo Angles', 'Claudio Gutierrez'],
                  'date': '2016-10-14T03:19:54Z',
                  'title': 'The multiset semantics of SPARQL patterns'},
 '1610.08027v1': {'auth

Download all the papers in the shelf in a local folder:

In [23]:
shelf.clean_download_folder()
shelf.download_papers()

Http can not retrieve the paper http://arxiv.org/pdf/0407322v3.pdf
Download paper http://arxiv.org/pdf/1711.00225v2.pdf finished.
Download paper http://arxiv.org/pdf/1610.04315v1.pdf finished.
Download paper http://arxiv.org/pdf/1101.5071v2.pdf finished.
Download paper http://arxiv.org/pdf/1205.2046v1.pdf finished.
Download paper http://arxiv.org/pdf/1401.6410v1.pdf finished.
Download paper http://arxiv.org/pdf/1610.08027v1.pdf finished.
Download paper http://arxiv.org/pdf/1612.05468v1.pdf finished.


In [24]:
pd.set_option('display.max_rows', 100)
shelf.get_authors_dataframe(get_emails=True)

Http can not retrieve the paper http://arxiv.org/pdf/0407322v3.pdf
Pdf not found in /Users/sferraris/repos2/ScrapXiv/tmp/0407322v3.pdf.


Unnamed: 0,name,affiliation,email,paper_id,paper_title,paper_published_date,paper_found
0,Jean-Baptiste Gramain,,,1101.5071v2,On bar lengths in partitions,2011-01-26T14:47:11Z,1
1,Jorn B. Olsson,,,1101.5071v2,On bar lengths in partitions,2011-01-26T14:47:11Z,1
2,Håkon Robbestad Gylterud,,,1612.05468v1,From Multisets to Sets in Hotmotopy Type Theory,2016-12-16T13:52:57Z,1
3,Mark Sh. Levin,,,1205.2046v1,Multiset Estimates and Combinatorial Synthesis,2012-05-09T17:42:36Z,1
4,Christian Steinruecken,,,1401.6410v1,Compressing Sets and Multisets of Sequences,2014-01-24T17:36:32Z,1
5,Rinovia Simanjuntak,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
6,Presli Siagian,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
7,Tomas Vetrik,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
8,Boris L. Granovsky,,,0407322v3,Asymptotic enumeration and logical limit laws ...,2004-07-19T12:11:23Z,1
9,Dudley Stark,,,0407322v3,Asymptotic enumeration and logical limit laws ...,2004-07-19T12:11:23Z,1


## Example 2
Query authors data from the first 8 papers, filtering with the keywords "sub multisets".

In [25]:
shelf = Shelf()
shelf.query(keywords="sub multisets", start_index=1, max_results=8)


In [26]:
pd.set_option('display.max_rows', 78)
df_multiset = shelf.get_authors_dataframe()
df_multiset

Unnamed: 0,name,affiliation,email,paper_id,paper_title,paper_published_date,paper_found
0,Jean-Baptiste Gramain,,,1101.5071v2,On bar lengths in partitions,2011-01-26T14:47:11Z,1
1,Jorn B. Olsson,,,1101.5071v2,On bar lengths in partitions,2011-01-26T14:47:11Z,1
2,Håkon Robbestad Gylterud,,,1612.05468v1,From Multisets to Sets in Hotmotopy Type Theory,2016-12-16T13:52:57Z,1
3,Mark Sh. Levin,,,1205.2046v1,Multiset Estimates and Combinatorial Synthesis,2012-05-09T17:42:36Z,1
4,Christian Steinruecken,,,1401.6410v1,Compressing Sets and Multisets of Sequences,2014-01-24T17:36:32Z,1
5,Rinovia Simanjuntak,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
6,Presli Siagian,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
7,Tomas Vetrik,,,1711.00225v2,The multiset dimension of graphs,2017-11-01T07:01:49Z,1
8,Boris L. Granovsky,,,0407322v3,Asymptotic enumeration and logical limit laws ...,2004-07-19T12:11:23Z,1
9,Dudley Stark,,,0407322v3,Asymptotic enumeration and logical limit laws ...,2004-07-19T12:11:23Z,1


In [27]:
dict_texts = shelf.fetch_texts()
pprint(dict_texts)

Http can not retrieve the paper http://arxiv.org/pdf/0407322v3.pdf
{'1101.5071v2': <pdftotext.PDF object at 0x119a2d1e0>,
 '1205.2046v1': <pdftotext.PDF object at 0x119a2d180>,
 '1401.6410v1': <pdftotext.PDF object at 0x119573c30>,
 '1610.04315v1': <pdftotext.PDF object at 0x119a2db10>,
 '1610.08027v1': <pdftotext.PDF object at 0x119573b40>,
 '1612.05468v1': <pdftotext.PDF object at 0x119a2d270>,
 '1711.00225v2': <pdftotext.PDF object at 0x119a2d210>}
