## Checking versions
Please do not run this code on your computer if you don't understand what it is.

In [24]:
%load_ext version_information
import time
now = time.strftime("%Y-%m-%d %H:%M:%S (%Z = GMT%z)")
print(f"This notebook was generated at {now} ")

vv = %version_information requests, tqdm, pandas, astroquery, version_information
for i, pkg in enumerate(vv.packages):
    print(f"{i} {pkg[0]:10s} {pkg[1]:s}")

The version_information extension is already loaded. To reload it, use:
  %reload_ext version_information
This notebook was generated at 2019-07-02 11:12:32 (KST = GMT+0900) 
0 Python     3.7.3 64bit [Clang 4.0.1 (tags/RELEASE_401/final)]
1 IPython    6.5.0
2 OS         Darwin 18.6.0 x86_64 i386 64bit
3 requests   2.22.0
4 tqdm       4.32.1
5 pandas     0.24.2
6 astroquery 0.3.10.dev5533
7 version_information 1.0.3


## Importing and Setting Up

In [2]:
from astroquery import nasa_ads as na
import requests
import time
from pathlib import Path
import math
from tqdm import tqdm

# helped from https://stackoverflow.com/questions/37573483/progress-bar-while-download-file-over-http-with-requests
def download_pdf(response, fpath):
    total_size = int(response.headers.get('content-length', 0)); 
    block_size = 1024
    wrote = 0 
    with open(fpath, 'wb') as f:
        for data in tqdm(response.iter_content(block_size), total=math.ceil(total_size//block_size), unit='kB', unit_scale=True):
            wrote = wrote + len(data)
            f.write(data)
#     if total_size != 0 and wrote != total_size:
#         print("ERROR, something went wrong")  

## Query to ADS

1. Go to [ADS](https://ui.adsabs.harvard.edu/), log in. 
2. Then go to [Account - Settings - API Token](https://ui.adsabs.harvard.edu/user/settings/token). 
3. Generate your token.
4. Copy and paste it to ``na.ADS.TOKEN`` below:

In [21]:
na.ADS.TOKEN = ''

# by default, the top 10 records are returned, sorted in
# reverse chronological order. This can be changed

# change the number of rows returned
na.ADS.NROWS = 9999

# change the fields that are returned (enter as strings in a list)
na.ADS.ADS_FIELDS = ["title", "bibcode", "author", "pubdate", "property", "esources",
                     "pub", "issn", "volume", "issue", "page", "doi", "arxiv", "bibstem", "database"]

author = "Ishiguro, Masateru"
year = "2000-2019"
query_str = f'author:"={author}" year:{year}'
print(f"Query with: \n\t {query_str}")
results = na.ADS.query_simple(query_str)

results.sort(['pubdate', "title"])

# flatten the shape to convert to pandas... 
# I currently don't know what bad thing will happen.
# It was OK when I tested for my personal purposes.
for c in results.colnames:
    if len(results[c].shape) > 1:
        results[c] = results[c][:, 0]

results = results.to_pandas()

results["N_author"] = results["author"].str.len()
results["YYYYMM"] = results["pubdate"].str[:-3].str.replace("-", "").astype(int)
results["refereed"] = [True if "REFEREED" in row["property"] else False for i, row in results.iterrows()]
results["astronomy"] = [True if "astronomy" in row["database"] else False for i, row in results.iterrows()]
results = results[results["astronomy"]==True]

results_ref = results[results["refereed"]==True]

print(f"ADS contains {len(results)} match with <{author}> (refreed: {len(results_ref)}) in {year}.")
if len(results_ref) > 100:
    print(f"Hey {author}, you are awesome.")

Query with: 
	 author:"=Ishiguro, Masateru" year:2000-2019
ADS contains 285 match with <Ishiguro, Masateru> (refreed: 111) in 2000-2019.
Hey Ishiguro, Masateru, you are awesome.


See http://adsabs.github.io/help/search/comprehensive-solr-term-list for the complete list of columns

## Select Rows for This BK Survey
I will select those with ``201803 <= YYYYMM <= 201908``. Also, based on the columns of ``2019보고서요청자료(연구실) - 논문`` Excel file, I will only select the 

1. title
2. journal (full name)
3. issn
4. volume
5. issue
6. page
7. YYYYMM
8. number of authors 

in this order. It will be saved as ``BK2019_ishiguro.csv`` and you can open it with Excel, copy-and-paste to the original Excel file.
* **WARNING**: The formatting is crazy in the original Excel, so you should do it by yourself.

In [23]:
results_ref_2019 = results_ref[(results_ref["YYYYMM"] >= 201003) & (results_ref["YYYYMM"] <= 201908)]
results_ref_BK2019 = results_ref_2019[["title", "pub", "issn", "volume", "issue", "page", "YYYYMM", "N_author"]]
results_ref_BK2019.to_csv("BK2019_ishiguro.csv", index=False)

In [24]:
results_ref_BK2019

Unnamed: 0,title,pub,issn,volume,issue,page,YYYYMM,N_author
164,Hayabusa AMICA V1.0,NASA Planetary Data System,,[None],[None],HAY-A-AMICA-3-HAYAMICA-V1.0,201003,34
165,Surface morphological features of boulders on ...,Icarus,,206,1,319,201003,14
166,2007 Outburst of 17P/Holmes: The Albedo and th...,The Astrophysical Journal,,714,2,1324,201005,14
167,The Hayabusa Spacecraft Asteroid Multi-band Im...,Icarus,,207,2,714,201006,15
168,Detection of Parent H<SUB>2</SUB>O and CO<SUB>...,The Astrophysical Journal,,717,1,L66,201007,16
172,Brightness map of the zodiacal emission from t...,Astronomy and Astrophysics,,523,[None],A53,201011,9
173,Comet 67P/Churyumov-Gerasimenko: the GIADA dus...,Astronomy and Astrophysics,,522,[None],A63,201011,16
174,Outburst of Comet 217P/LINEAR,The Astrophysical Journal,,724,1,L118,201011,5
176,Search for the Comet Activity of 107P/(4015) W...,The Astrophysical Journal,,726,2,101,201101,17
184,Physical Properties of Main-belt Comet 176P/LI...,The Astronomical Journal,,142,1,29,201107,4


## Download the PDF Files of the Papers
I will use the ADS web link and try
1. to access to the publisher's PDF if available
  - For Science, the publisher's PDF link is not directed to the full pdf, so I added some conditional clause.
2. if unavailable, I tried something
  - Nature, for example, adding ``.pdf`` seem to direct you to the pdf.
  
As time goes, I will add more exceptions so that it works as perfect as possible.

In [32]:
BASE = "https://ui.adsabs.harvard.edu/link_gateway/"
# helped from https://stackoverflow.com/questions/43165341/python3-requests-connectionerror-connection-aborted-oserror104-econnr/43167631
headers = requests.utils.default_headers()
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

for i, row in results_ref_2019.iterrows():
    bib = row["bibcode"]
    fpath = Path('{}.pdf'.format(bib))
    print(fpath, end=' ')
    
    if fpath.exists():
        print('already exists!'.format(bib))
        continue
        
    if "PUB_PDF" in row["esources"]:
        url = BASE + row["bibcode"] + "/PUB_PDF"
        print('is freely available from ADS: Downloading...'.format(bib), end=' ')

        response = requests.get(url, headers=headers, stream=True)
        
        if "Science" in row["pub"]:
            if response.url.endswith("/tab-pdf"):
                url = response.url.replace("/tab-pdf", ".full.pdf")
            else:
                url = response.url + ".full.pdf"
            response = requests.get(url, headers=headers, stream=True)

        print("\n\t" + response.url)
        time.sleep(1)
        
        download_pdf(response, fpath)

    else:
        try:
            print("trying to find pdf...", end=' ')
            url = BASE + row["bibcode"] + "/PUB_HTML"
            response = requests.get(url, headers=headers, stream=True)
            if "elsevier.com" in response.url:
                raise ConnectionError()
            elif "nature.com" in response.url:
                url = response.url + ".pdf"
            response = requests.get(url, headers=headers, stream=True)
            if response.status_code == 404:
                raise ConnectionError()
            print('I found it! Downloading...'.format(bib), end=' ')
            print("\n\t" + response.url)
            time.sleep(1)
        
            download_pdf(response, fpath)
            
        except ConnectionError:            
            print("!!! I couldn't find a valid link. Download from below:".format(bib))
            print("\t" + BASE + bib + "/PUB_HTML")


2010PDSS...90.....S.pdf already exists!
2010Icar..206..319N.pdf already exists!
2010ApJ...714.1324I.pdf already exists!
2010Icar..207..714I.pdf already exists!
2010ApJ...717L..66O.pdf already exists!
2010AcCrA..66..484I.pdf already exists!
2010A&A...523A..53P.pdf already exists!
2010A&A...522A..63F.pdf already exists!
2010ApJ...724L.118S.pdf already exists!
2011ApJ...726..101I.pdf already exists!
2011PlST...13..307H.pdf already exists!
2011AJ....142...29H.pdf already exists!
2011JNuM..415S.620Q.pdf trying to find pdf... !!! I couldn't find a valid link. Download from below:
	https://ui.adsabs.harvard.edu/link_gateway/2011JNuM..415S.620Q/PUB_HTML
2011PhPl...18i2306O.pdf trying to find pdf... I found it! Downloading... 
	https://aip.scitation.org/doi/10.1063/1.3640494


59.0kB [00:00, 6.93kkB/s]


2011Icar..215...17U.pdf trying to find pdf... !!! I couldn't find a valid link. Download from below:
	https://ui.adsabs.harvard.edu/link_gateway/2011Icar..215...17U/PUB_HTML
2011PASJ...63.1117U.pdf is freely available from ADS: Downloading... 
	https://watermark.silverchair.com/pasj63-1117.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAmUwggJhBgkqhkiG9w0BBwagggJSMIICTgIBADCCAkcGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQM16FfVBrzKMZs_JsJAgEQgIICGNM4cKmcVYhPy_YnX6JY9WVH-YFpyj5TCBKJavmGa-3zc2PkLDJua9lKMrXLDZQrMuIA01-4t4Z864g3w-2cd-OuM3MNAcausjHaORBVo9OQeB06kI5_YRBdh8-5dL0nIm0lKdGOmD-yCcChQi5S9Xf_pZXcc0waX2V6WrMzyOg8DA_9JWteXWL8jPaLKueX1UBpEz8Kd3hgNNm0aZAUVDhRLcEl49lYAaWHGuUhyVWvC1x_nyZjWqZ_sr1Oolsm1jz2OogogmH7d5tTl31sxl6Enrq9-wGvSrS9QyDj4untyfpbl34NVaOmAdMXiLcBREaZa-_SHAFY4QWWQCrd_HEfvoUKxd5GoCYv0OJAdF5WOh0TrHiGV8v8Ct5Ad4lQ6FBhy8Rjgr7QEWbnvP3iW0BpDedLMYyCkSICaFM9SuLhJtfKCdnhvEm8GRFAoOLJ_z-jVPbVBHwOrOaOsHA4Qz65bRY0q5fRcAwc6HbhwbeJpFh0fr2LNqG6j6jOMuUaxrNZMMHc5Hi0aXiSiS8lI7E1DfYJDYD

6.38kkB [00:50, 126kB/s]                           


2011ApJ...740L..11I.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.1088/2041-8205/740/1/L11/pdf


1.03kkB [00:04, 220kB/s]


2011ApJ...741L..24I.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.1088/2041-8205/741/1/L24/pdf


503kB [00:05, 89.1kB/s]


2011ITPS...39.3006B.pdf trying to find pdf... I found it! Downloading... 
	https://ieeexplore.ieee.org/document/6008663/


15.0kB [00:00, 2.76kkB/s]                  


2012ApJ...746L..11K.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.1088/2041-8205/746/1/L11/pdf


1.08kkB [00:04, 232kB/s]


2012ApJ...752...15O.pdf is freely available from ADS: Downloading... 
	https://iopscience.iop.org/article/10.1088/0004-637X/752/1/15/pdf


1.11kkB [00:05, 218kB/s]


2012AJ....143..141K.pdf is freely available from ADS: Downloading... 

KeyboardInterrupt: 

* **WARNING**: You may have some papers that are accepted but not on ADS yet. You **MUST** find those by yourself!!!