This notebook identifies and gathers the metadata for Heliophysics reviews papers whose references serve as ground truth for various topics

This approach of using review papers' references for ground truth follows [Belter, C. W. (2016)](https://doi.org/10.1002/asi.23605)
- Belter, C. W. (2016). Citation analysis as a literature search method for systematic reviews. Journal of the Association for Information Science and Technology, 67(11), 2766–2777. https://doi.org/10.1002/asi.23605


In [4]:
# Import standard Python libraries
import urllib
import urllib3
from urllib.parse import urlencode
import requests
import json
import sys
import math
import csv
from datetime import datetime

import pandas as pd

#### Utility Functions

In [10]:
# Execute a search query
    # Ryan method
def do_query(URL, params):
    qparams = urlencode(params)    
    data = requests.get("{}?{}".format(URL,qparams),\
                headers={'Content-type': 'application/json',
                         'Accept': 'text/plain',
                         'Authorization': 'Bearer ' + APItoken})
    
    data = data.json()
    return data

#     # Edwin method
# def do_query(URL, params):
#     qparams = urllib.parse.urlencode(params)
#     req = urllib.request.Request("%s?%s"%(URL, qparams))
#     # and add the correct header information
#     req.add_header('Content-type', 'application/json')
#     req.add_header('Accept', 'text/plain')
#     req.add_header('Authorization', 'Bearer %s' % APItoken)
#     # do the actual request
#     resp = urllib.request.urlopen(req)
#     # and retrieve the data to work with
#     data = json.load(resp)
#     return data

# Get records from Solr
def get_records(token, query_string, return_fields):
    start = 0
    results = []
    params = {
        'q':query_string,
        'fl': return_fields,
        'rows': rows,
        'start': start
    }
    data = do_query(QUERY_URL, params)
    try:
        results = data['response']['docs']
    except:
        raise Exception('Solr returned unexpected data!')
    num_documents = int(data['response']['numFound'])
    num_paginates = int(math.ceil((num_documents) / (1.0*rows))) - 1
    start += rows
    for i in range(num_paginates):
        params['start'] = start
        data = do_query(QUERY_URL, params)
        try:
            results += data['response']['docs']
        except:
            raise Exception('Solr returned unexpected data!')
        start += rows
    return results

In [25]:
# Set query parameters for all queries

APItoken = 'Qe9f4qU8ITbD0rP7rIRNvlgxZ5vYfahY8PW05en0'

rows = 1

# Address of API
API_URL = 'https://api.adsabs.harvard.edu/v1'
QUERY_URL = "{}/search/query".format(API_URL)

# What data do we need back from Solr
fields = "bibcode,doi,first_author_norm,title,property,citation_count,citation,keyword,reference"



#### Topic: Coronal Mass Ejections

Review paper: 
- Kilpua, E., Koskinen, H.E.J. & Pulkkinen, T.I., "Coronal mass ejections and their sheath regions in interplanetary space", Living Rev Sol Phys (2017) 14: 5. https://doi.org/10.1007/s41116-017-0009-6
    
"This review focuses on the current understanding of observational signatures and properties of ICMEs and the associated sheath regions based on five decades of studies"

ADS Bibcode (ADS-assigned unique identifier): 2017LRSP...14....5K


**Alternatives/additions**
- https://ui.adsabs.harvard.edu/abs/2012LRSP....9....3W/abstract


In [29]:
# Gathering the references for this article
query = 'bibcode: 2017LRSP...14....5K'

try:
    pubdata = get_records(APItoken, query, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

    
print('Number of ground-truth papers from this review article = {}'.format(len(pubdata[0]['reference'])))

Number of ground-truth papers from this review article = 310


#### Topic: Solar Wind

Review paper: 
- Vidotto, A.A. The evolution of the solar wind. Living Rev Sol Phys 18, 3 (2021). https://doi.org/10.1007/s41116-021-00029-w
    
"discuss the long-term evolution of the solar wind, including the evolution of observed properties that are intimately linked to the solar wind: rotation, magnetism and activity"

ADS Bibcode (ADS-assigned unique identifier): 2021LRSP...18....3V

**Alternatives/additions**
- ...


In [38]:
# Gathering the references for this article
query = 'bibcode: 2021LRSP...18....3V'

try:
    pubdata = get_records(APItoken, query, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

    
print('Number of ground-truth papers from this review article = {}'.format(len(pubdata[0]['reference'])))

Number of ground-truth papers from this review article = 279


#### Topic: Space Weather

Review papers (x3 to cover perspectives on SpWx): 
- Schwenn R (2006) Space weather: the solar perspective. Living Rev Sol Phys 3:2. https://doi.org/10.12942/lrsp-2006-2 
- (and update to Schwenn 2006, which could be combined with it): Temmer, M. Space weather: the solar perspective. Living Rev Sol Phys 18, 4 (2021). https://doi.org/10.1007/s41116-021-00030-3
- Pulkkinen, T. Space Weather: Terrestrial Perspective. Living Rev. Sol. Phys. 4, 1 (2007). https://doi.org/10.12942/lrsp-2007-1


ADS Bibcodes (ADS-assigned unique identifier): 
- 2006LRSP....3....2S
- 2021LRSP...18....4T
- 2007LRSP....4....1P


**Alternatives/additions**

- https://www.sciencedirect.com/science/article/pii/S0273117715002252?via%3Dihub ("This roadmap prioritizes the scientific focus areas and research infrastructure that are needed to significantly advance our understanding of space weather of all intensities and of its implications for society")
NOTE: only 18 references (space weather is a new field!)
    - 2015AdSpR..55.2745S
- https://www.frontiersin.org/articles/10.3389/fspas.2021.786308/full#B110
    - more recent, 127 references, including many of the review articles we have selected in this notebook
    - 2022FrASS...8..253B
    
    
    


In [36]:
# Gathering the references for this article
query1 = 'bibcode: 2006LRSP....3....2S'
query2 = 'bibcode: 2021LRSP...18....4T'
query3 = 'bibcode: 2007LRSP....4....1P'

references_running = []

try:
    pubdata = get_records(APItoken, query1, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

references_running = references_running + pubdata[0]['reference']

print('Number of total ground-truth papers from these review articles = {}'.format(len(references_running)))

try:
    pubdata = get_records(APItoken, query2, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

references_running = references_running + pubdata[0]['reference']

print('Number of total ground-truth papers from these review articles = {}'.format(len(references_running)))

try:
    pubdata = get_records(APItoken, query3, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

references_running = references_running + pubdata[0]['reference']

print('Number of total ground-truth papers from these review articles = {}'.format(len(references_running)))



Number of total ground-truth papers from these review articles = 234
Number of total ground-truth papers from these review articles = 668
Number of total ground-truth papers from these review articles = 799


In [37]:
references_running

['1932TeMAE..37....1B',
 '1944PASP...56..156R',
 '1950AuSRA...3..541W',
 '1958ApJ...128..664P',
 '1959IAUS....9..194R',
 '1960AJ.....65U.494M',
 '1963JGR....68.6361S',
 '1965AnAp...28..125D',
 '1966JGR....71..965C',
 '1969SoPh....9..452S',
 '1970esp..book.....M',
 '1971JGR....76.3534B',
 '1971SoPh...18..313H',
 '1972SoPh...26..468A',
 '1972cesw.book.....H',
 '1973A&A....28..131D',
 '1973Ap&SS..24..371S',
 '1973JGR....78.2001G',
 '1973SoPh...29..505K',
 '1974GeoRL...1...11R',
 '1974IAUS...57..333B',
 '1974JGR....79.3103M',
 '1974JGR....79.4581G',
 '1974SSRv...16..257S',
 '1975JGR....80.4204B',
 '1976JGR....81.2111G',
 '1976JGR....81.5054F',
 '1976SoPh...48..361B',
 '1976sofl.book.....S',
 '1977ASSL...71....3H',
 '1977JGR....82.1487B',
 '1977RvGSP..15..271A',
 '1977chhs.conf.....Z',
 '1978JGR....83...75P',
 '1978JGR....83..616G',
 '1978JGR....83.1011S',
 '1979SoPh...62..179B',
 '1980ApJ...236..696K',
 '1980ApJ...236L..97P',
 '1980GeoRL...7..201S',
 '1980IAUS...86..369G',
 '1980JGR....85.

#### Topic: solar wind + magnetosphere + coupling

Combine two review papers' references: 
- Solar Wind: 2021LRSP...18....3V
- Magnetosphere and Coupling: 2007LRSP....4....1P


**Alternatives/additions**
-  [The Need for a System Science Approach to Global Magnetospheric Models](https://ui.adsabs.harvard.edu/abs/2022FrASS...908629D/abstract)
    - 2022FrASS...908629D

In [40]:
# Gathering the references for this article
query1 = 'bibcode: 2021LRSP...18....3V'
query2 = 'bibcode: 2007LRSP....4....1P'

references_running = []

try:
    pubdata = get_records(APItoken, query1, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

references_running = references_running + pubdata[0]['reference']

print('Number of total ground-truth papers from these review articles = {}'.format(len(references_running)))

try:
    pubdata = get_records(APItoken, query2, fields)
except Exception:
    sys.exit('Failed to get results for query provided')

references_running = references_running + pubdata[0]['reference']

print('Number of total ground-truth papers from these review articles = {}'.format(len(references_running)))


Number of total ground-truth papers from these review articles = 279
Number of total ground-truth papers from these review articles = 410
