## Quick ALMA-centric Arxiv filter

The goal is to find any paper on the astro-ph arxiv that uses ALMA data.

For now, we search for keywords:
* ALMA
* mm
* millimeter
* millimetre

Results are returned in a table format for a quick look, and also exported to an excel spreadsheet.

### Suggestions always welcome! 

In [1]:
%matplotlib inline

In [2]:
import urllib
try:
    # Python 2
    from urllib import quote_plus
    from urllib import urlencode
    from urllib import urlretrieve
except ImportError:
    # Python 3
    from urllib.parse import quote_plus
    from urllib.parse import urlencode
    from urllib.request import urlretrieve


In [4]:
import feedparser # (you may need to install this: conda install feedparser)
import pandas as pd
import numpy as np
import datetime
import time

In [None]:
#OLDER VERSION:
#url = 'http://export.arxiv.org/api/query?search_query=cat:%s+AND+%%28+all:%s+OR+all:%s+OR+all:%s+OR+all:%s+%%29&start=0&sortBy=submittedDate&sortOrder=descending'%(cat,keywords[0],keywords[1],keywords[2],keywords[3])
#data = urllib.urlopen(url).read()

In [21]:
## Choose the date range to search
enddate = '201901061159' #enter as yyyymmddhhmm, 
startdate = '201812011159' #if nothing, print since one month before enddate 
## should implement an option for "today", but later have problems with single digit dates
if enddate == 'today':
    now = datetime.datetime.now()
    enddate = str(now.year)+str(now.month)+str(now.day)+str(now.hour)+str(now.minute)
enddate_as_date = datetime.datetime(int(enddate[0:4]),int(enddate[4:6]),int(enddate[6:8]))
if startdate == '':
    startdate_as_date=enddate_as_date.replace(day=1)
    startdate = str(startdate_as_date.year)+str(startdate_as_date.month)+str(startdate_as_date.day)+str(startdate_as_date.hour)+str(startdate_as_date.minute)
#print('Search dates: %s:%s')%(startdate,enddate) #python2
print('Search dates: {}:{}'.format(startdate,enddate)) #python3


Search dates: 201812011159:201901061159


In [14]:
## based on code from https://github.com/lukasschwab/arxiv.py/blob/master/arxiv/arxiv.py
root_url = 'http://export.arxiv.org/api/'

def query(search_query="",
         date_from=None, 
         date_until=None,
         id_list=[], 
         prune=True, 
         start=0, 
         max_results=10, 
         sort_by="relevance", 
         sort_order="descending"):
    url_args = urlencode({"id_list": ','.join(id_list),
                          "start": start,
                          "max_results": max_results,
                          "sortBy": sort_by,
                          "sortOrder": sort_order},)
    results = feedparser.parse(root_url + 'query?search_query='+search_query + url_args)
    if results.get('status') != 200:
        # TODO: better error reporting
        raise Exception("HTTP Error " + str(results.get('status', 'no status')) + " in query")
    else:
        results = results['entries']

    return results

In [15]:
## Make a query
cat = 'astro-ph*'
## Here are keywords from title/abstract (full text?)
keywords = ['ALMA','millimeter','millimetre','mm']
## You could also query authors using au:authorname
sq = 'cat:%s+AND+%%28+all:%s+OR+all:%s+OR+all:%s+OR+all:%s+%%29+AND+submittedDate:[%s+TO+%s]&'\
    %(cat,keywords[0],keywords[1],keywords[2],keywords[3],startdate,enddate)
num_results = 200
results=query(search_query=sq,sort_by='submittedDate',sort_order='descending',max_results=num_results)

**IMPORTANT NOTE**

These are not curated.  They may have *nothing* to do with ALMA, but simply have a keyword 'mm'.  

Later I will curate them myself as we go, and save them to a google spreadsheet (or elsewhere).

In [16]:
## NOW convert to dataframe for better table processing
posts = []
columns=['published','title','authors','summary','link','arxiv_primary_category','arxiv_comment']

for pp in np.arange(np.size(results)):
        try: comment=results[pp][columns[6]]
        except: comment=''  
        authorlist=[]
        for elem in results[pp][columns[2]]:
            authorlist.append(elem['name'])
        posts.append((results[pp][columns[0]], results[pp][columns[1]]\
                     , results[pp][columns[2]][0]['name'], authorlist\
                     , results[pp][columns[3]], results[pp][columns[4]]\
                     , results[pp][columns[5]]['term'], comment))
        
df = pd.DataFrame(posts, columns=[columns[0],columns[1],columns[2],'author list',columns[3],columns[4],columns[5],columns[6]]) # pass data to init



In [17]:
## LOOK at the table
df

Unnamed: 0,published,title,authors,author list,summary,link,arxiv_primary_category,arxiv_comment
0,2019-01-03T14:55:53Z,Physics of Planet Trapping with Applications t...,Alex J. Cridland,"[Alex J. Cridland, Ralph E. Pudritz, Matthew A...",We explore planet formation in the HL Tau disk...,http://arxiv.org/abs/1901.00778v1,astro-ph.EP,Accepted for publication in MNRAS
1,2019-01-03T09:31:28Z,Can Injection Model Replenish the Filaments in...,Peng Zou,"[Peng Zou, Chaowei Jiang, Fengsi Wei, Wenda Cao]",We observed an H$\alpha$ surge occurred in the...,http://arxiv.org/abs/1901.00659v1,astro-ph.SR,"12 pages, 7 figures, Accepted by RAA"
2,2019-01-02T15:41:21Z,Laboratory spectroscopic study of the $^{15}$N...,A. Coutens,"[A. Coutens, O. Zakharenko, F. Lewen, J. K. Jø...",Cyanamide is one of the few interstellar molec...,http://arxiv.org/abs/1901.00421v1,astro-ph.SR,"Astron. Astrophys., accepted; 8 pages"
3,2019-01-02T02:33:17Z,JCMT POL-2 and ALMA polarimetric observations ...,Hsi-Wei Yen,"[Hsi-Wei Yen, Bo Zhao, I-Ta Hsieh, Patrick Koc...",We present our analysis of the magnetic field ...,http://arxiv.org/abs/1901.00242v1,astro-ph.SR,Accepted by ApJ
4,2019-01-01T01:12:29Z,An ALMA view of CS and SiS around oxygen-rich ...,T. Danilovich,"[T. Danilovich, A. M. S. Richards, A. I. Karak...",We aim to determine the distributions of molec...,http://arxiv.org/abs/1901.00070v1,astro-ph.SR,16 pages
5,2018-12-31T08:32:21Z,Geologic Constraints on Early Mars Climate,Edwin S. Kite,[Edwin S. Kite],Early Mars climate research has well-defined g...,http://arxiv.org/abs/1812.11722v1,astro-ph.EP,Accepted by Space Science Reviews
6,2018-12-30T05:18:23Z,Time Evolution of 3D Disk Formation with Misal...,Miikka S. Väisälä,"[Miikka S. Väisälä, Hsien Shang, Ruben Krasnop...",Distinguishing diagnostic observational signat...,http://arxiv.org/abs/1812.11471v1,astro-ph.GA,"28 pages, 19 Figures and 3 Tables. Accepted fo..."
7,2018-12-27T14:41:57Z,Indication of Another Intermediate-mass Black ...,Shunya Takekawa,"[Shunya Takekawa, Tomoharu Oka, Yuhei Iwata, S...",We report the discovery of molecular gas strea...,http://arxiv.org/abs/1812.10733v1,astro-ph.GA,Accepted for publication on the Astrophysical ...
8,2018-12-25T00:00:51Z,Molecular gas in radio galaxies in dense Mpc-s...,G. Castignani,"[G. Castignani, F. Combes, P. Salomé, C. Benoi...",We investigate the role of dense Mpc-scale env...,http://arxiv.org/abs/1812.09997v1,astro-ph.GA,"24 pages, 9 Figures, 8 Tables, accepted for pu..."
9,2018-12-23T21:30:08Z,ALMA Observations of the massive molecular out...,Carlos Hervías-Caimapo,"[Carlos Hervías-Caimapo, Manuel Merello, Leona...",We present observations and analysis of the ma...,http://arxiv.org/abs/1812.09779v1,astro-ph.GA,"19 pages, 17 figures, 2 appendices. Accepted f..."


In [None]:
## WRITE to excel
writer = pd.ExcelWriter('testoutput.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

** TO DO **

Better select of range of dates https://github.com/Mahdisadjadi/arxivscraper/blob/master/arxivscraper/arxivscraper.py
--> Works, but gets confused between single and double digit dates.

Then sync directly with Google sheets (or github?)
https://github.com/burnash/gspread
https://github.com/robin900/gspread-dataframe