# Streams - Consuming data via REST API

Given some environment restrictions or preference by IT policies, this notebook shows an alternative mechanism to consume content from Streams. Specifically pulling messages from Google Cloud Pub/Sub Subscriptions using a convenitional REST API, instead of using the default streaming protocols. 

The current notebook shows how to interact with these taxonomies to convert codes to human-readable values or viceversa.

In this notebook...
* [Dependencies and Initialisation](#dependencies-and-initialisation)
* [Authentication](#authentication)
* [Pulling and Processing Messages](#pulling-and-processing-messages)
* [Next Steps](#next-steps)

## Dependencies and Initialisation

This notebook expects that the `FACTIVA_USERKEY` and `FACTIVA_SUBSCRIPTIONID` are set in the `.env` file when the `dotenv.load_dotenv()` is executed. More details in the [Configuration notebook](0.2_configuration.ipynb).

Particularly for this notebook, the following packages are also needed:

```
pip install PyJWT
pip install pyjwt[crypto]
```

In [1]:
import json
import os
import requests
import datetime
import jwt          # pip install PyJWT
import base64

from dotenv import load_dotenv
load_dotenv()

STREAM_CRED_URL = 'https://api.dowjones.com/sns-accounts/streaming-credentials'

USERKEY = os.environ['FACTIVA_USERKEY']
REQ_DEFAULT_HEADERS = {
    'user-key': USERKEY,
    'content-type': "application/json",
    'cache-control': "no-cache",
    'X-API-VERSION': "3.0"
}
SUBSCRIPTIONID = os.environ['FACTIVA_SUBSCRIPTIONID']
AUTHZ_URL = 'https://oauth2.googleapis.com/token'

## Authentication

Loads the streaming credentials from the API and extracts details needed for Google Cloud Pub/Sub requests.

In [2]:
resp = requests.get(STREAM_CRED_URL, headers=REQ_DEFAULT_HEADERS)
streaming_credentials = json.loads(resp.json()['data']['attributes']['streaming_credentials'])

private_key_id = streaming_credentials['private_key_id']
private_key = streaming_credentials['private_key']
client_email = streaming_credentials['client_email']
project_id = streaming_credentials['project_id']

With the streaming credentials data, builds an Authentication Token request object.

In [3]:
iat_dt = datetime.datetime.now()
iat = int(iat_dt.timestamp())
exp_dt = iat_dt + datetime.timedelta(seconds=3600)
exp = int(exp_dt.timestamp())

payload = {
    'iss': client_email,
    'scope': "https://www.googleapis.com/auth/cloud-platform https://www.googleapis.com/auth/pubsub",
    'aud': "https://oauth2.googleapis.com/token",
    'iat': iat,
    'exp': exp
}

additional_headers = {
    'kid': streaming_credentials['private_key_id']
}

authn_token = jwt.encode(payload, streaming_credentials['private_key'], headers=additional_headers, algorithm="RS256")

With the Authentication token, it uses the Google Auth service to obtain a Bearer Token (`jwt_token`).

In [4]:
authz_payload = {
        'grant_type': 'urn:ietf:params:oauth:grant-type:jwt-bearer',
        'assertion': authn_token
    }

resp = requests.post(AUTHZ_URL, data=authz_payload)
jwt_token = resp.json()

## Pulling and Processing Messages

Constants for the message consumption section.

In [5]:
PUBSUB_HEADERS = { 'Authorization': f"Bearer {jwt_token['access_token']}" }
PULL_BODY = { 'maxMessages': 10 }
PULL_URL = f"https://pubsub.googleapis.com/v1/projects/{project_id}/subscriptions/{SUBSCRIPTIONID}:pull"
ACK_URL = f"https://pubsub.googleapis.com/v1/projects/{project_id}/subscriptions/{SUBSCRIPTIONID}:acknowledge"
INFO_URL = f"https://pubsub.googleapis.com/v1/projects/{project_id}/subscriptions/{SUBSCRIPTIONID}"

Message handling custom function and list of messages acknowledgement function.

`process_message`: Function to process a Factiva message which can have the strcture of a news message or a bulk or other type of event [See the official documentation](https://developer.dowjones.com/documents/site-docs-factiva_apis-factiva_analytics_apis-factiva_streams_api#events).
- Article-Specific Events (possible values for the `action` property):
    - **ADD**: The message is delivering the first version of an article. At the application’s database level, this action is equivalent to an INSERT or UPSERT (an INSERT after checking the article is not in the database in case of retransmitted messages).
    - **REP**: Short for replace. The message is delivering an updated version of an article. At the database level, this action is equivalent to an UPSERT or an UPDATE, depending on whether your application’s database has a previous version of the article; if not, process the message as an add. Identify article existence using its ID in the attribute an. Some use cases might require storing the article update as an additional version with timestamps to determine the latest version to display based on a point-in-time logic.
    - **DEL**: Short for delete. The message is requesting to delete an article based on its ID (an). Your application can ignore the message if the article doesn’t exist in the database. Depending on the use case, this action is equivalent to a DELETE operation at the database level when hard-delete is required. If soft-delete is required, your application must modify a custom availability flag with an UPDATE database operation. The latter option requires the application logic to filter out unavailable articles to end-users and keep the record only for audit and traceability purposes.
- Other Events (possible values from the `event_type` property):
    - **SOURCE_DELETE**: When receiving a source_delete event, you must remove all articles associated with the specified source from your database. Your application can ignore the message if no content exists for the specified source.

In [8]:
def print_message(message):
    print(f"[{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] {message}")


def process_message(factiva_message):
    try:
        if 'action' in factiva_message.keys(): 
            # Message is an article event
            # Implement your logic according to the documentation:
            # https://developer.dowjones.com/documents/site-docs-factiva_apis-factiva_analytics_apis-factiva_streams_api#article-specific-events
            if factiva_message['action'] == 'add':
                # Insert the article to the repository as new. Handle repeated messages.
                print_message(f"[ARTICLE] [ADD] AN: {factiva_message['an']} - {factiva_message['title']}")
            elif factiva_message['action'] == 'rep':
                # Upsert/Update/AddNewVersion the article in the repository. Handle repeated messages.
                print_message(f"[ARTICLE] [REP] AN: {factiva_message['an']} - {factiva_message['title']}")
            elif factiva_message['action'] == 'del':
                # Delete or mark as deleted the article in the repository. Handle inexistent article AN.
                print_message(f"[ARTICLE] [DEL] AN: {factiva_message['an']} - *** DELETE ***")
            else:
                print_message(f"[ERROR] Factiva Action Not Handled: {factiva_message['action']}")

        elif 'event_type' in factiva_message.keys():
            # Message is a bulk action or service event
            if factiva_message['event_type'] == 'source_delete':
                # Delete all articles from the repository matching the source_code. Handle inexistent source_code and repeated messages.
                print_message(f"[EVENT] [SOURCE_DELETE] Source: {factiva_message['source_code'].upper()} - {factiva_message['description']}")
            else:
                print_message(f"[ERROR] Factiva Event Type Not Handled: {factiva_message['event_type']}")

        else:
            print_message(f"[ERROR] Unexpected Message Format:[{factiva_message}]")
   
    except Exception as e:
        print_message(f"[ERROR] Error processing Factiva message: {e}")



def acknowledge_messages(ack_ids: list):
    ack_payload = {
        'ackIds': [ack_id for ack_id in ack_ids]
    }
    resp = requests.post(ACK_URL, headers=PUBSUB_HEADERS, json=ack_payload)
    if resp.status_code == 200:
        if resp.json() == {}:
            print("--- ACK Success ---")

Code snippet that consumes all messages until the subscription is emptied.

In [10]:
while True:
    pull_response = requests.post(PULL_URL, headers=PUBSUB_HEADERS, json=PULL_BODY)
    if 'receivedMessages' in pull_response.json():
        encoded_messages = pull_response.json()['receivedMessages']
        if len(encoded_messages) > 0:
            ack_ids = []
            for encoded_message in encoded_messages:
                encoded_data = encoded_message['message']['data']
                pubsub_message = base64.b64decode(encoded_data)
                pubsub_dict = json.loads(pubsub_message)
                news_message = pubsub_dict['data'][0]['attributes']
                process_message(news_message)
                ack_ids.append(encoded_message['ackId'])
            acknowledge_messages(ack_ids)
        else:
            break
    else:
        break
print("*** No more messages to process ***")

[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002l0 - *S&PGR Rates Citadel Securities L.P.'s Sr Secured Notes 'BBB-'
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002if - Xunlei Down Over 18%, On Track for Largest Percent Decrease Since January 2018 -- Data Talk
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002l1 - *Kazia Therapeutics Ltd. ADR (KZIA) Resumed Trading
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002ml - GitLab Price Target Maintained With a $85.00/Share by Piper Sandler
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002l2 - Carnival Is Maintained at Buy by Stifel
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002l3 - Dow Jones 2:00 PM Averages: DJIA 42,985.10 UP 118.23
[2025-06-17 13:04:39] [ARTICLE] [ADD] AN: DJDN000020250611el6b002ig - Press Release: DoubleVerify Launches DV Authentic AdVantage, an Industry-First, AI-Powered Solution to Drive Superior Performance Across Propr

KeyboardInterrupt: 

## Next Steps

* Create a [Snapshot Extraction](1.6_snapshot_extraction.ipynb)
* Check out [Account Statistics](1.1_account_statistics.ipynb)