# Weekly ArXiv Harvesting

This notebook is used to keep track of new arXiv publications in machine learning & physics and to update the list on the physicsml website.

### The workflow:
- 1) Get all recent publications from the arXiv API; list titles and tags.
- 2) Review and select publications to be added to physicsml; collect the indices into one array for each category:
    - Applying Machine Learning to Physics
    - Physics-Inspired Ideas Applied to Machine Learning
    - Quantum Computation and Quantum Algorithms for Machine Learning
- 3) Create formatted entries for the selected publications and add them to file papers.md
- 4) Add basic data (id, title, sublist) to the file papers_ids_titles_sublists.csv.

All functions are saved in arxiv_papers.py in the same folder as this notebook. The functions are:
- `harvest_arxiv()`: calls arxiv API and gets recent publications specified in the query, returns parsed arxiv feed (afp)
- `list_all(afp)`: lists all entries with index, title, tags for review
- `show_info(afp, index)`: displays more info on the specified entry in afp
- `formatted_block(afp, indices)`: formats entries and concats them into one string as required for physicsml
- `add_papers_to_file(new_papers, filename = 'papers.md)`: adds new entries to file papers.md *
- `add_to_csv_list(afp, indices_selected, filename = 'papers_ids_titles_sublists.csv')`: adds new to the csv file *

 \* Note: path to file is specified within the function! make sure to adjust to your folder structure!

The following cells guide through the workflow.

In [None]:
from arxiv_papers import *

# get recently published papers from arXiv by calling the arXiv API
# info for each publication is stored in afp.entries.KEY for KEY=(title, authors, published, tags, ...)
# or
# get the info for the specified ids in the list

# - uncomment the following line and add id_list as argument into harvest_arxiv()
#id_list = '1710.09842'

afp = harvest_arxiv()

# list all entries with title, tags and publishing date
list_all(afp)

In [None]:
# to review a single entry in detail, plug in its afp index
show_info(afp, index=0, pubdate=True, tags=True)

In [None]:
# indices of the publications selected to be added to the list on physicsml

# Applying Machine Learning to Physics
selected_L1 = [0]

# Physics-Inspired Ideas Applied to Machine Learning
selected_L2 = []

# Quantum Computation and Quantum Algorithms for Machine Learning
selected_L3 = []

# other interesting (personal note)
selected_O = []

In [None]:
indices_selected = [selected_L1, selected_L2, selected_L3]
new_papers = []

for indices in indices_selected:
    new_papers.append(formatted_block(afp, indices))

In [None]:
# add selected publications to the list of papers in papers.md
add_papers_to_file(new_papers)

# add selected publications to the list in papers_ids_titles_sublists.csv
add_to_csv_list(afp, indices_selected)

In [None]:
# add selected "other interesting" papers to the separate csv file Other_interesting.csv
add_to_csv_list(afp, selected_O, filename='Other_interesting.csv')

In [None]:
print new_papers[0]