# Snapshot Download

This notebook shows how to download the files for an existing Snapshot.

In this notebook...
* [Dependencies and Initialisation](#Dependencies-and-Initialisation)
* [Snapshot Download](#Snapshot-Download)
* [Load the downloaded AVRO files to a Pandas DataFrame](#Load-the-downloaded-AVRO-files-to-a-Pandas-DataFrame)


## Dependencies and Initialisation
Import statements and environment initialisation using the package `dotenv`. More details in the [Configuration notebook](0.2_configuration.ipynb).

In [1]:
from factiva.analytics import SnapshotExtraction
from dotenv import load_dotenv
load_dotenv()
print('Done!')

Done!


## Snapshot Download

This operation requires to know the ID of an existing Snapshot. Please see the [User Statistics notebook](1.1_user_statistics.ipynb) in case you'd like to list available Snapshots.

* **`download_path`**: _Optional_. Location to download the content. If not provided, a new folder named as the `snapshot_id` (short id) will be created in the same folder as this notebook.

In [2]:
sid = 'xfy5fitmqc'
s = SnapshotExtraction(sid)

In [3]:
print(s)

<'factiva.analytics.SnapshotExtraction'>
  ├─user_key: <'factiva.analytics.UserKey'>
  │  ├─key: ****************************CBZH
  │  └─cloud_token: **********************vuMm0G1dLQZl
  ├─query: <NotRetrieved>
  └─job_response: <'factiva.analytics.SnapshotExtractionJobReponse'>
     ├─job_id: ******************************************************xfy5fitmqc
     ├─job_link: https://api.dowjones...xfy5fitmqc
     ├─job_state: JOB_STATE_DONE
     ├─short_id: xfy5fitmqc
     ├─files: <list> - [171] elements
     └─errors: <NoErrors>


In [7]:
s.download_files()
print('Done!')

Done!


## Load the downloaded AVRO files to a Pandas DataFrame
Restuls are stored in the folder named as the Job ID property (`<Snapshot>.last_extraction_job.id`). A custom tool allows to load its contents to a DataFrame.

In [4]:
print(s.job_response.short_id)

xfy5fitmqc


In [5]:
from factiva.analytics import SnapshotFiles
sf = SnapshotFiles()
articles = sf.read_avro_folder(f"{sid}")

In [6]:
articles.columns

Index(['copyright', 'subject_codes', 'modification_datetime', 'body',
       'company_codes_occur_ticker_exchange', 'company_codes_occur',
       'company_codes_about', 'company_codes_lineage',
       'company_codes_ticker_exchange', 'snippet',
       'company_codes_relevance_ticker_exchange', 'market_index_codes',
       'section', 'company_codes_association_ticker_exchange',
       'currency_codes', 'company_codes_about_ticker_exchange',
       'region_of_origin', 'company_codes_lineage_ticker_exchange',
       'ingestion_datetime', 'availability_datetime', 'modification_date',
       'source_name', 'language_code', 'region_codes',
       'company_codes_association', 'person_codes', 'byline', 'dateline',
       'company_codes_relevance', 'source_code', 'an', 'word_count',
       'company_codes', 'industry_codes', 'title', 'publication_datetime',
       'publisher_name', 'action'],
      dtype='object')

In [30]:
articles.loc[articles['company_codes'] != '', ['an', 'publication_datetime', 'title', 'company_codes', 'language_code']]

Unnamed: 0,an,publication_datetime,title,company_codes,language_code
8,GLOBO00020111127e7br000eq,2011-11-27 00:00:00,Mais pontos nos próximos cinco meses,",petbrs,",pt
13,ELESPT0020141013ea9c000um,2014-09-12 00:00:00,Líderes británicos hacen llamado vehemente al ...,",uknhs,",es
14,VALOFUR020140605ea65008y9,2014-06-05 23:18:00,Terpel reprograma para 19 de junio asamblea de...,",orgats,orgats,",es
15,VALOFUR020140326ea3q003bh,2014-03-26 16:52:00,"Asamblea ETB aprueba dividendo 2013 de COP 17,...",",etbco,etbco,",es
17,VALOFUR020141107eab7003bh,2014-11-07 16:54:00,BTG Pactual recorta PO de Petrobras desde USD ...,",banpac,banpac,petbrs,petbrs,btgbrz,",es
...,...,...,...,...,...
17152,ELFINA0020190117ef1g0008l,2019-01-16 00:00:00,Esta firma mexicana 'en apuros' analiza partic...,",pemeks,pemeks,",es
17153,ELFINA0020190627ef6r000ux,2019-06-27 00:00:00,"Él quiere limpiar Pemex; los inversionistas, m...",",dunbst,fisi,flurcp,kmbcl,pemeks,twnit,pemeks,",es
17154,ELFINA0020201207egc7000s2,2020-12-07 00:00:00,Biden era lo que los vehículos eléctricos nece...,",atievi,bmw,frdmo,gnmoc,jfoskv,vlkwag,gnmoc,te...",es
17155,ELFINA0020201013egad000p3,2020-10-13 00:00:00,"Oro, bitcoin y whisky: las apuestas favoritas ...",",fed,fed,",es
