# Snapshot Update

This notebook shows how to run a Snapshot Extraction Update operation with the minimal steps.

In this notebook...
* [Dependencies and Initialisation](#Dependencies-and-Initialisation)
* [Snapshot instance](#Snapshot-Instance)
* [Running the Update Operation](#Running-the-Update-Operation)
* [Load the downloaded AVRO files to a Pandas DataFrame](#Load-the-downloaded-AVRO-files-to-a-Pandas-DataFrame)

## Dependencies and Initialisation
Import statements and environment initialisation using the package `dotenv`. More details in the [Configuration notebook](0.2_configuration.ipynb).

In [1]:
from factiva.news import Snapshot
from dotenv import load_dotenv
load_dotenv()
print('Done!')

Done!


## Snapshot Instance

Creates a Snapshot object based on a short Snapshot ID:

* **`snapshot_id`**: _Optional_. Short ID of the Snapshot to update. e.g. `'ztj2gkbldt'` Necessary for further update jobs.

In [2]:
s = Snapshot(snapshot_id='ztj2gkbldt')

## Running the Update Operation

This operation builds a collection of files containing the delta from the previously executed operation (either Snapshot or Update).

The `<Snapshot>.process_update()` function directly submits, monitors the job and download the content. If a more manual process is required (send job, monitor job, get results), please see the [detailed package documentation](https://factiva-news-python.readthedocs.io/).

* **`update_type`**: _Optional_. Default: `additions`. Possible values are: `additions`, `replacements` or `deletes` 
* **`download_path`**: _Optional_. Location to download the content. If not provided, a new folder named as the `update_id` will be created in the same folder as this notebook.

> **Note:** This job is likely to fail if the same combination of `snapshot_id` and `update_type` is sent **within 24 hours**. This update mechanism is designed for low-frequency scenarios. If a higher frequency is needed (hourly or near-real-time), please consider using the Streams service.

To review the **Snapshot History**, please see the notebook [User Statistics](1.1_user_statistics.ipynb).

In [4]:
%%time
s.process_update(update_type='deletes')
print('Done!')

Done!
CPU times: user 340 ms, sys: 42.4 ms, total: 382 ms
Wall time: 1min 59s


## Load the downloaded AVRO files to a Pandas DataFrame
Restuls are stored in the folder named as the Job ID property (`<Snapshot>.last_update_job.id`). A custom tool allows to load its contents to a DataFrame.

In [6]:
from factiva.news import SnapshotFiles
sf = SnapshotFiles()
articles = sf.read_folder(s.last_update_job.job_id)

In [7]:
articles.columns

Index(['copyright', 'subject_codes', 'modification_datetime', 'body',
       'company_codes_occur_ticker_exchange', 'company_codes_occur',
       'company_codes_about', 'company_codes_lineage',
       'company_codes_ticker_exchange', 'snippet',
       'company_codes_relevance_ticker_exchange', 'market_index_codes',
       'section', 'company_codes_association_ticker_exchange',
       'currency_codes', 'company_codes_about_ticker_exchange',
       'region_of_origin', 'company_codes_lineage_ticker_exchange',
       'ingestion_datetime', 'modification_date', 'source_name',
       'language_code', 'region_codes', 'company_codes_association',
       'person_codes', 'byline', 'dateline', 'company_codes_relevance',
       'source_code', 'an', 'word_count', 'company_codes', 'industry_codes',
       'title', 'publication_datetime', 'publisher_name', 'action'],
      dtype='object')

In [8]:
articles[['an', 'publication_datetime', 'title', 'industry_codes', 'language_code']].head()

Unnamed: 0,an,publication_datetime,title,industry_codes,language_code
0,OFBOAR0020210118eh1f00060,2021-01-15,GS Yuasa - The organizational chart displays i...,,en
1,OFBOAR0020210125eh1k0010g,2021-01-20,Exide Industries - The organizational chart di...,,en
2,OFBOAR0020210208eh21002mk,2021-02-01,Sion Power - The organizational chart displays...,,en
3,OFBOAR0020201221egcg001ze,2020-12-16,NEC Energy Solutions - The organizational char...,,en
4,JKPOST0020210729eh7t00004,2021-07-29,Indonesian government inks MoU with Korean fir...,",i3432,i35104,i351,iaut,iindele,iindstrls,itec...",en
