## Checking versions
Please do not run this code on your computer if you don't understand what it is.

In [24]:
%load_ext version_information
import time
now = time.strftime("%Y-%m-%d %H:%M:%S (%Z = GMT%z)")
print(f"This notebook was generated at {now} ")

vv = %version_information requests, tqdm, pandas, astroquery, version_information
for i, pkg in enumerate(vv.packages):
    print(f"{i} {pkg[0]:10s} {pkg[1]:s}")

The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information
This notebook was generated at 2019-07-02 11:12:32 (KST = GMT+0900) 
0 Python     3.7.3 64bit [Clang 4.0.1 (tags/RELEASE_401/final)]
1 IPython    6.5.0
2 OS         Darwin 18.6.0 x86_64 i386 64bit
3 requests   2.22.0
4 tqdm       4.32.1
5 pandas     0.24.2
6 astroquery 0.3.10.dev5533
7 version_information 1.0.3


## Importing and Setting Up

In [1]:
from astroquery import nasa_ads as na
import requests
import time
from pathlib import Path
import math
from tqdm import tqdm

# helped from https://stackoverflow.com/questions/37573483/progress-bar-while-download-file-over-http-with-requests
def download_pdf(response, fpath):
    total_size = int(response.headers.get('content-length', 0)); 
    block_size = 1024
    wrote = 0 
    with open(fpath, 'wb') as f:
        for data in tqdm(response.iter_content(block_size), total=math.ceil(total_size//block_size), unit='kB', unit_scale=True):
            wrote = wrote + len(data)
            f.write(data)
#     if total_size != 0 and wrote != total_size:
#         print("ERROR, something went wrong")  

## Query to ADS

In [27]:
# if you don't store your token as an environment variable
# or in a file, give it here
na.ADS.TOKEN = 'RXPglegHZqHD6dav0ur6sac6ZXFYPdMMdJbaes1F'

# by default, the top 10 records are returned, sorted in
# reverse chronological order. This can be changed

# change the number of rows returned
na.ADS.NROWS = 9999

# change the fields that are returned (enter as strings in a list)
na.ADS.ADS_FIELDS = ["title", "bibcode", "author", "pubdate", "property", "esources",
                     "pub", "issn", "volume", "issue", "page", "doi", "arxiv", "bibstem"]

author = "Ishiguro, Masateru"
year = "2000-2019"
query_str = f'author:"={author}" year:{year}'
print(f"Query with: \n\t {query_str}")
results = na.ADS.query_simple(query_str)

results.sort(['pubdate', "title"])

# flatten the shape to convert to pandas... 
# I currently don't know what bad thing will happen.
# It was OK when I tested for my personal purposes.
for c in results.colnames:
    if len(results[c].shape) > 1:
        results[c] = results[c][:, 0]

results = results.to_pandas()

results["N_author"] = results["author"].str.len()
results["YYYYMM"] = results["pubdate"].str[:-3].str.replace("-", "").astype(int)
results["refereed"] = [True if "REFEREED" in row["property"] else False for i, row in results.iterrows()]

results_ref = results[results["refereed"]==True]

print(f"ADS contains {len(results)} match with <{author}> (refreed: {len(results_ref)}) in {year}.")
if len(results_ref) > 100:
    print(f"Hey {author}, you are awesome.")

Query with: 
	 author:"=Ishiguro, Masateru" year:2000-2019
ADS contains 305 match with <Ishiguro, Masateru> (refreed: 129) in 2000-2019.
Hey Ishiguro, Masateru, you are awesome.


See http://adsabs.github.io/help/search/comprehensive-solr-term-list for the complete list of columns

## Select Rows for This BK Survey
I will select those with ``201803 <= YYYYMM <= 201908``. Also, based on the columns of ``2019보고서요청자료(연구실) - 논문`` Excel file, I will only select the 

1. title
2. journal (full name)
3. issn
4. volume
5. issue
6. page
7. YYYYMM
8. number of authors 

in this order. It will be saved as ``BK2019_ishiguro.csv`` and you can open it with Excel, copy-and-paste to the original Excel file.
* **WARNING**: The formatting is crazy in the original Excel, so you should do it by yourself.

In [2]:
results_ref_2019 = results_ref[(results_ref["YYYYMM"] >= 201803) & (results_ref["YYYYMM"] <= 201908)]
results_ref_BK2019 = results_ref_2019[["title", "pub", "issn", "volume", "issue", "page", "YYYYMM", "N_author"]]
results_ref_BK2019.to_csv("BK2019_ishiguro.csv", index=False)

In [3]:
results_ref_BK2019

Unnamed: 0,title,pub,issn,volume,issue,page,YYYYMM,N_author
285,Significantly high polarization degree of the ...,Astronomy and Astrophysics,,611,[None],A31,201803,10
286,Extremely strong polarization of an active ast...,Nature Communications,,9,[None],2486,201806,12
287,The Reactivation and Nucleus Characterization ...,The Astronomical Journal,,156,1,39,201807,7
288,Opposition effect on S-type asteroid (25143) I...,Astronomy and Astrophysics,,616,[None],A178,201809,2
292,Optical observations of NEA 3200 Phaethon (198...,Astronomy and Astrophysics,,619,[None],A123,201811,27
293,The 2016 Reactivations of the Main-belt Comets...,The Astronomical Journal,,156,5,223,201811,10
294,High polarization degree of the continuum of c...,Astronomy and Astrophysics,,620,[None],A161,201812,19
295,Physical properties of near-Earth asteroids wi...,Publications of the Astronomical Society of Japan,,70,6,114,201812,43
300,Hayabusa2 arrives at the carbonaceous asteroid...,Science,,364,6437,268,201904,88
301,Shape and Rotational Motion Models for Tumblin...,The Astronomical Journal,,157,4,155,201904,16


## Download the PDF Files of the Papers
I will use the ADS web link and try
1. to access to the publisher's PDF if available
  - For Science, the publisher's PDF link is not directed to the full pdf, so I added some conditional clause.
2. if unavailable, I tried something
  - Nature, for example, adding ``.pdf`` seem to direct you to the pdf.
  
As time goes, I will add more exceptions so that it works as perfect as possible.

In [25]:
BASE = "https://ui.adsabs.harvard.edu/link_gateway/"
# helped from https://stackoverflow.com/questions/43165341/python3-requests-connectionerror-connection-aborted-oserror104-econnr/43167631
headers = requests.utils.default_headers()
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

for i, row in results_ref_2019.iterrows():
    bib = row["bibcode"]
    fpath = Path('{}.pdf'.format(bib))
    print(fpath, end=' ')
    
    if fpath.exists():
        print('already exists!'.format(bib))
        continue
        
    if "PUB_PDF" in row["esources"]:
        url = BASE + row["bibcode"] + "/PUB_PDF"
        print('is freely available from ADS: Downloading...'.format(bib), end=' ')

        response = requests.get(url, headers=headers, stream=True)
        
        if "Science" in row["pub"]:
            if response.url.endswith("/tab-pdf"):
                url = response.url.replace("/tab-pdf", ".full.pdf")
            else:
                url = response.url + ".full.pdf"
            response = requests.get(url, headers=headers, stream=True)

        print("\n\t" + response.url)
        time.sleep(1)
        
        download_pdf(response, fpath)

    else:
        try:
            print("trying to find pdf...", end=' ')
            url = BASE + row["bibcode"] + "/PUB_HTML"
            response = requests.get(url, headers=headers, stream=True)
            url = response.url + ".pdf"
            response = requests.get(url, headers=headers, stream=True)
            if response.status_code == 404:
                raise ConnectionError()
            print('I found it! Downloading...'.format(bib), end=' ')
            print("\n\t" + response.url)
            time.sleep(1)
        
            download_pdf(response, fpath)
            
        except ConnectionError:            
            print("!!! I couldn't find a valid link. Download from below:".format(bib))
            print("\t" + BASE + bib + "/PUB_HTML")


2018A&A...611A..31K.pdf is freely available from ADS: Downloading... 
	https://www.aanda.org/articles/aa/pdf/2018/03/aa32086-17.pdf


1.70kkB [00:05, 315kB/s]                           


2018NatCo...9.2486I.pdf trying to find pdf... I found it! Downloading... 
	https://www.nature.com/articles/s41467-018-04727-2.pdf


749kB [00:00, 4.45kkB/s]                         


2018AJ....156...39H.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.3847/1538-3881/aac81c/pdf


1.84kkB [00:06, 273kB/s]


2018A&A...616A.178L.pdf is freely available from ADS: Downloading... 
	https://www.aanda.org/articles/aa/pdf/2018/08/aa32721-18.pdf


6.52kkB [00:08, 774kB/s]                           


2018A&A...619A.123K.pdf is freely available from ADS: Downloading... 
	https://www.aanda.org/articles/aa/pdf/2018/11/aa33593-18.pdf


1.77kkB [00:02, 732kB/s]                           


2018AJ....156..223H.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.3847/1538-3881/aae528/pdf


2.29kkB [00:06, 373kB/s]


2018A&A...620A.161K.pdf is freely available from ADS: Downloading... 
	https://www.aanda.org/articles/aa/pdf/2018/12/aa33968-18.pdf


1.87kkB [00:02, 653kB/s]                           


2018PASJ...70..114H.pdf is freely available from ADS: Downloading... 
	https://watermark.silverchair.com/psy119.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAlwwggJYBgkqhkiG9w0BBwagggJJMIICRQIBADCCAj4GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMRcQopE7tgUDumUR3AgEQgIICD83SiNFFndqDC-xSosEzxPKdDt3Mm5ytHI81USEkS5dE_sciSOqUIZp98kFqbPqY6DfeYixEcpjdGX0VWhsjriUadWB5-j87wybDotKeZD4d9w_8f7akDL2PHvZyLt-FWuvAOCR54eF6c1poYe7LUhUm9ybRgSgrKayk4rVJm5Mql7-HLBp24bWlssa5kBZjMn3OFDagKDty8UjdRzev7AbysyIPGOi6Hbshkbh_9OPI_sZsplsNyJvFECZwcaJLBz9ENo3dLQmi8KxDacS0OWiUSjzJEH_0aRsIMJsCgHQOP3uIyD8VeDHNt_B53X_3QJuXFL-zo-GkSk1D1QLpWqBrpM8TQZ8IbY-F3oW1gSj78BhLaO--ffbenF37U5FhcqqKiUtK2CeXSYEhF8YkBVXI-EroNvpOcm2OwdF47O_FFJeiiaG87e6XBdDtYxq-dR3kHpCTAywP_CMTYG-lrvPp5QZNLQIZDAhF4_MajNDcM3FpoDtR5yZqwVjdva9W1kUlr2ldwlCe10-4mEormfZcXey9hmtMuJh2bIle51l_wGYGBtJoAGyWSD_SCZ3NFKuadYdzoR6nHZlAcTz35th2waVmydEFGd3xwvKy2vYhLC4CV5xCSYmwVp15JrpZg0syutlCTn3ydvi31yNS1A9wiCLZoU2VrVnfuPZC1MJj79gvo3yF3Pe6KB2ogbSH


3.42kkB [00:21, 161kB/s]                            


2019Sci...364..268W.pdf is freely available from ADS: Downloading... 
	https://science.sciencemag.org/content/sci/364/6437/268.full.pdf


839kB [00:00, 1.46kkB/s]                         


2019AJ....157..155U.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.3847/1538-3881/ab09f0/pdf


1.94kkB [00:02, 926kB/s]


2019Sci...364..252S.pdf is freely available from ADS: Downloading... 
	https://science.sciencemag.org/content/sci/364/6437/eaaw0422.full.pdf


12.5kkB [00:24, 519kB/s]                           


* **WARNING**: You may have some papers that are accepted but not on ADS yet. You **MUST** find those by yourself!!!