# Download raw bibliographic metadata and save it locally

We want to keep the part of the code that downloads the raw data separate from the rest. The reason is that we won't run it that many times.
There are two good arguments for this:
1. This is the code that takes most time, mainly because we need to wait for the API to serve the data.
2. We don't want to exceed our [weekly quota](https://dev.elsevier.com/api_key_settings.html) by performing repeated queries.

## Compose the search query

We need to put together a query string.
Here we specify a search phrase and (optionally) date filters, sorting and the amount of results to return.
Here you will find [Elsevier's instructions on how to write a query](https://dev.elsevier.com/tecdoc_federated_search.html).
A query is the *question* you ask the database.
The following example retrieves the occurrences of `Q Fever` from 2006 onwards.

In [1]:
query = ['PUBYEAR > 2005 AND TITLE-ABS-KEY(Q Fever)']
query.append("view=COMPLETE")
query.append("count=25")  # Our subscription allows up to 25 results at a time

Note: The maximum number of results that can be retrieved through the Scopus Search Query API is 5,000.
This limitation is not specific to any API key or client; it's an artifact of the back end search technology that is used for these APIs, as explained [*here*](https://dev.elsevier.com/tecdoc_developer_faq.html).

This limitation can be circumvented by reducing the total number of results to fewer than 5000, which can be accomplished by adding parameters to narrow the search in conjunction with submitting more requests (i.e. not a problem, just annoying).

## Execute the search and save the results to a local file

The only variable that needs to be modified here is `filename`.
Note that if the file already exists the search results will be overwritten.

In [2]:
filename = "my_results"

The following code initializes the client and the doc_search object, and then executes the search.
In order to be able to run it, you should request a private API key and create your own [configuration file as described here](https://github.com/ElsevierDev/elsapy/blob/master/CONFIG.md).
**Note:** You should keep the file `config.json` private (i.e. out of a shared Git repository).

The same results are saved to 3 different formats:
* `.json`: the original format, as received from the API
* `.xlsx`: an Excel file, convenient for quick examination
* `.csv`: "comma separated values", might be easier to import from **R** (or not?)

You can comment the code corresponding to `.xlsx` and `.csv` formats if you don't want these files.

Note: You can limit the search results to 25 by changing line 12 below to
`doc_srch.execute(client, get_all=False)`.
You can turn the boolean parameter back to `True` when you confirm that your query text is correct.

In [3]:
import json
from elsapy.elsclient import ElsClient
from elsapy.elssearch import ElsSearch
from pandas.io.json import json_normalize

import helperfuncs

with open("config.json") as con_file:
    config = json.load(con_file)    
client = ElsClient(config['apikey'])
doc_srch = ElsSearch('&'.join(query), 'scopus')
doc_srch.execute(client, get_all=True)

# .json
json_file = filename + ".json"
with open(json_file, 'w') as f:
    json.dump(doc_srch.results, f)
print('Saved {0} records to file "{1}".'.format(len(doc_srch.results), json_file))

# Load into DataFrame and flatten
with open(json_file, 'r') as local_file:
    results = json.load(local_file)
flat_results = [None] * len(results)
for idx, record in enumerate(results):
    flat_results[idx] = helperfuncs.flatten_json(record)
df_flat_results = json_normalize(flat_results)

# .xlsx and .csv
xlsx_file = filename + ".xlsx"
df_flat_results.to_excel(xlsx_file)
print('Saved {0} records to file "{1}".'.format(len(results), xlsx_file))
csv_file = filename + ".csv"
df_flat_results.to_csv(csv_file)
print('Saved {0} records to file "{1}".'.format(len(results), csv_file))

Saved 25 records to file "my_results.json".
Saved 25 records to file "my_results.xlsx".
Saved 25 records to file "my_results.csv".


## Preview

This is a quick and ugly preview of downloaded data, just to check if it makes any sense at all.

In [4]:
df_flat_results.head(5)

Unnamed: 0,@_fa,affiliation_0_@_fa,affiliation_0_affiliation-city,affiliation_0_affiliation-country,affiliation_0_affiliation-url,affiliation_0_affilname,affiliation_0_afid,affiliation_1_@_fa,affiliation_1_affiliation-city,affiliation_1_affiliation-country,...,prism:issn,prism:issueIdentifier,prism:pageRange,prism:publicationName,prism:url,prism:volume,pubmed-id,source-id,subtype,subtypeDescription
0,True,True,Dresden,Germany,https://api.elsevier.com/content/affiliation/a...,Saxon State Laboratory of Health and Veterinar...,112284194,True,Jena,Germany,...,345288,,91-96,Research in Veterinary Science,https://api.elsevier.com/content/abstract/scop...,118.0,,18851,ar,Article
1,True,True,Liege,Belgium,https://api.elsevier.com/content/affiliation/a...,Centre Hospitalier Universitaire de Liege,60016829,True,,Belgium,...,12019712,,50-54,International Journal of Infectious Diseases,https://api.elsevier.com/content/abstract/scop...,69.0,,22380,ar,Article
2,True,True,Shanghai,China,https://api.elsevier.com/content/affiliation/a...,Tongji University,60073652,,,,...,17439191,,40-43,International Journal of Surgery,https://api.elsevier.com/content/abstract/scop...,52.0,,130156,ar,Article
3,True,True,London,United Kingdom,https://api.elsevier.com/content/affiliation/a...,National Aids Trust,60109025,True,Nottingham,United Kingdom,...,9502688,,1-8,Epidemiology and Infection,https://api.elsevier.com/content/abstract/scop...,,,19686,ip,Article in Press
4,True,True,Daegu,South Korea,https://api.elsevier.com/content/affiliation/a...,Kyungpook National University,60012704,True,Anyang,South Korea,...,494747,,1-6,Tropical Animal Health and Production,https://api.elsevier.com/content/abstract/scop...,,,18937,ip,Article in Press
