# Snapshot Explain

This notebook shows how to run a Snapshot Explain operation with the minimal steps and a simple query.

In this notebook...
* [Dependencies and Initialisation](#Dependencies-and-Initialisation)
* [The Where Statement](#The-Where-Statement)
* [Load a Saved Query](#Load-a-Saved-Query)
* [Save the query for other operations](#Save-the-query-for-other-operations)
* [Running the Explain Operation](#Running-the-Explain-Operation)

## Dependencies and Initialisation
Import statements and environment initialisation using the package `dotenv`. More details in the [Configuration notebook](0.2_configuration.ipynb).

In [1]:
from factiva.news import Snapshot
from dotenv import load_dotenv
load_dotenv()
print('Done!')

Done!


## The Where Statement

This notebook uses a simple query for illustration purposes. For more tips about queries, or guidance on how to build complex or large queries, checkout the [Complex and Large Queries](2.1_complex_large_queries.ipynb) notebook.

In [2]:
# Industry i3432 is for Batteries
where_statement = (
    r" publication_datetime >= '2016-01-01 00:00:00' "
    r" AND LOWER(language_code) IN ('en', 'de', 'fr') "
    r" AND REGEXP_CONTAINS(industry_codes, r'(?i)(^|,)(i3432)($|,)') "
)

s = Snapshot(query=where_statement)

## Load a Saved Query

Loads a query that was saved to a JSON file. **To be implemented!**

In [3]:
# Load saved query code

## Save the query for other operations

**To be implemented!**

In [4]:
# Save the query to a JSON file

## Running the Explain Operation

This operation returns the number of documents matching the provided query in the Factiva Analytics archive.

The goal of this operation is to have a rough idea of the document volume. When used iteratively, helps deciding on the used criteria to add/delete/modify the criteria to verify the impact on the matching items.


The `<Snapshot>.process_explain()` function directly returns this value. If a more manual process is required (send job, monitor job, get results), please see the [detailed package documentation](https://factiva-news-python.readthedocs.io/).

In [5]:
%%time
s.process_explain()
print(f'Estimated document volume: {s.last_explain_job.document_volume}')

Estimated document volume: 165633
CPU times: user 116 ms, sys: 10.8 ms, total: 127 ms
Wall time: 38.3 s


## Getting Explain Samples
Recently (December 2021) the product team released a new endpoint that allows to request a set of articles metadata samples matching the Explain criteria. The main requirement in this case is just using the previously obtained Explain Job ID.

This operation is not yet implemented in the factiva-news package, but as quick reference, the following code allows to retrieve those samples.

In [6]:
from factiva.core import req
import pandas as pd

samples_url = f'https://api.dowjones.com/alpha/extractions/samples/{s.last_explain_job.job_id}'
s_param = { 'num_samples': 100 }  # Numer of samples to retrieve. The endpoint accepts max 100 as value.
s_head = {'user-key': s.user_key.key}

resp = req.api_send_request(endpoint_url=samples_url, headers=s_head, qs_params=s_param)
if resp.status_code == 200:
    resp_json = resp.json()['data']['attributes']['sample']
    samples = pd.DataFrame(resp_json)
    print(f'DataFrame size: {samples.shape}')
    print(f'Columns: {samples.columns}')
else:
    print(f'Unexpected Response: {resp.text}')

DataFrame size: (100, 16)
Columns: Index(['an', 'company_codes', 'company_codes_about', 'company_codes_occur',
       'industry_codes', 'ingestion_datetime', 'modification_datetime',
       'publication_datetime', 'publisher_name', 'region_codes',
       'region_of_origin', 'source_code', 'source_name', 'subject_codes',
       'title', 'word_count'],
      dtype='object')


In [7]:
samples[['an', 'source_name', 'title', 'word_count']]

Unnamed: 0,an,source_name,title,word_count
0,NYTF000020180927ee9r0003p,The New York Times,New Battery May Hold Promise To Create a Carbo...,1549
1,SMHHOL0020180904ee940008e,The Sydney Morning Herald - Online,"Syrah in trading halt, unveils $94 million cap...",645
2,ETFEXP0020180914ee9d00004,ETF Express,ETF Securities Australia launches battery tech...,314
3,ELEKTW0020180924ee8m00005,Elektronikpraxis Online,Lithium-Ionen-Batterien unter der Lupe,1327
4,ENGLIND020180924ee9m00003,Englewood Independent,Northmont area police reports,1435
...,...,...,...,...
95,FNDW000020180816ee8f001xi,CQ FD Disclosure,Q3 2018 Electrovaya Inc Earnings Call - Final,2961
96,UNNIND0020180906ee96000xk,UNI (United News of India),IIT-H develops 'new electrodes' for producing ...,493
97,MMAC000020180915ee910000n,Modern Machine Shop,Amada Miyachi America Expands Detroit Technica...,200
98,SCHGAZ0020180930ee9u0002v,The Daily Gazette,A deep Reservoir?,1153
