# Snapshots
Snapshots is one if the two main article delivery channels. It can be used as a one-time collection, or as an initial execution with further updates (as short as weekly).

## Notebook Initalization

In [70]:
import os
from IPython.display import Markdown as md
from dotenv import load_dotenv
import requests
import json
load_dotenv()
user_key = os.getenv("DJDNA_USERKEY")
api_server = os.getenv("DJDNA_APISERVER")
api_url = "https://{0}".format(api_server)
explain_api_path = "/alpha/extractions/documents/_explain"
analytics_api_path = "/alpha/analytics"
extract_api_path = "/alpha/extractions/documents"
api_headers = { "Content-Type": "application/json" , "user-key": user_key }
print('Initialization done!')

Initialization done!


## Notebook Variables
Ensure that the value for the following variables is set according to the provided credentials and local environment.

In [71]:
snapshot_download_dir = "data-" + "cse-NewsCO"  # Don't change the data previx and ensure this value is updated for each new snapshot collection

snapshot_query = { 'query': {
                        'where': 'publication_datetime >= "2019-01-01 00:00:00" AND publication_datetime < "2019-08-29 00:00:00" AND REGEXP_CONTAINS(UPPER(region_of_origin), r"(COL)")'
                     } }

## Explain
The explain API provides count of articles to have an idea about the potential volume to be returned by the Extract (Snapshots) API. The query used in this section is exactly the same that will be used for **Analytics** and **Extractions**.

### Create the Explain Job
Explains as well as other operations available under the Snapshots category receive a POST request with a query and an action, and the response notifies if the job was created successfully or not, along with a Job ID. Then, the Job ID has to be used as paramneter to fire a new request with the aim to obtain the operation output.

In [73]:
expl_req_url = api_url + explain_api_path
expl_resp = requests.post(expl_req_url, headers = api_headers, json = snapshot_query)
if expl_resp.status_code < 400:
    expl_id = expl_resp.json()['data']['id']
    print("Successful Explain Request\n{0}".format(json.dumps(expl_resp.json()['data'])))
else:
    print("Failed Explain Request with code {0} and message {1}".format(expl_resp.status_code, json.dumps(expl_resp.json()['errors'])))

Successful Explain Request
{"attributes": {"current_state": "JOB_CREATED"}, "id": "7d43507c-3e68-4018-91a2-9d8c342a5392", "type": "explain"}


### Check the Explain Job Results
This operations has to be executed until the current_state field is shown as "JOB_STATE_DONE".

In [74]:
explstatus_req_url = api_url + extract_api_path + "/{0}".format(expl_id) + "/_explain"
explstatus_resp = requests.get(explstatus_req_url, headers = api_headers)

In [77]:
if explstatus_resp.json()['data']['attributes']['current_state'] == "JOB_STATE_DONE":
    print("Estimated article volume is {0}".format(explstatus_resp.json()['data']['attributes']['counts']))
else:
    print("Job is still running, please try again in few seconds.")

Job is still running, please try again in few seconds.
