# How to get MRN News Analytics via Elektron WebSocket API with Python

## Refinitiv News Analytics Overvew

Refinitiv News Analytics (TRNA) provides real-time numerical insight into the events on multiple news sources, in a format that can be directly consumed by algorithmic trading systems. TRNA enables algorithms to exploit the power of news to seize opportunities, capitalize on market inefficiencies and manage event risk.

## Machine Readable News Overview

Refinitiv provides TRNA via Elektron as Refinitiv Machine Readable News (MRN) data model for consumers. MRN is an advanced service for automating the consumption and systematic analysis of news. It delivers deep historical news archives, ultra-low latency structured news and news analytics directly to your applications. 

### MRN Data behavior

The MRN data is published over Elektron using an Open Message Model (OMM) envelope in News Text Analytics domain messages. The News Analytics content set is made available over MRN_TRNA RIC. The content data is contained in a FRAGMENT field that has been compressed, and potentially fragmented across multiple messages, in order to reduce bandwidth and message size.

A FRAGMENT field has a different data type based on a connection type:
* RSSL connection (ESDK [C++](https://developers.refinitiv.com/elektron/elektron-sdk-cc)/[Java](https://developers.refinitiv.com/elektron/elektron-sdk-java)): BUFFER type
* WebSocket connection: Base64 ascii string

The data goes through the following series of transformations:

1. The core content data is a UTF-8 JSON string
2. This JSON string is compressed using gzip
3. The compressed JSON is split into a number of fragments (BUFFER or Base64 ascii string) which each fit into a single update message
4. The data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope


<img src="images/trna_process.png"/>

Therefore, in order to parse the core content data, the application will need to reverse this process. The WebSocket application also need to convert a received Base64 string in a FRAGMENT field to bytes data before further process this field.

### MRN Data model

Five fields, as well as the RIC itself, are necessary to determine whether the entire item has been received in its various fragments and how to concatenate the fragments to reconstruct the item:
* MRN_SRC: identifier of the scoring/processing system that published the FRAGMENT
* GUID: a globally unique identifier for the data item. All messages for this data item will have the same GUID values.
* FRAGMENT: compressed data item fragment, itself
* TOT_SIZE: total size in bytes of the fragmented data
* FRAG_NUM: sequence number of fragments within a data item. This is set to 1 for the first fragment of each item published and is incremented for each subsequent fragment for the same item.

A single MRN data item publication is uniquely identified by the combination of RIC, MRN_SRC, and GUID.

#### Fragmentation
For a given RIC-MRN_SRC-GUID combination, when a data item requires only a single message, then TOT_SIZE will equal the number of bytes in the FRAGMENT and FRAG_NUM will be 1.

When multiple messages are required, then the data item can be deemed as fully received once the sum of the number of bytes of each FRAGMENT equals TOT_SUM. The consumer will also observe that all FRAG_NUM range from 1 to the number of the fragment, with no intermediate integers skipped. In other words, a data item transmitted over three messages will contain FRAG_NUM values of 1, 2 and 3.

#### Compression
The FRAGMENT field is compressed with gzip compression, thus requiring the consumer to decompress to reveal the JSON plain-text data in that FID.

When an MRN data item is sent in multiple messages, all the messages must be received and their FRAGMENTs concatenated before being decompressed. In other words, the FRAGMENTs should not be decompressed independently of each other.

The decompressed output is encoded in UTF-8 and formatted as JSON.

Please see a full documentation of this example application in [this article](https://developers.refinitiv.com/article/introduction-machine-readable-news-elektron-websocket-api-refinitiv).

If you are not familiar with MRN concept, please visit the following resources which will give you a full explanation of the MRN data model and implementation logic:
* [Webinar Recording: Introduction to Machine Readable News](https://developers.refinitiv.com/news#news-accordion-nid-12045)
* [Introduction to Machine Readable News (MRN) with Elektron Message API (EMA)](https://developers.refinitiv.com/article/introduction-machine-readable-news-mrn-elektron-message-api-ema).
* [MRN Data Models and Elektron Implementation Guide](https://developers.refinitiv.com/elektron/elektron-sdk-java/docs?content=8736&type=documentation_item).

In [1]:
# #uncomment if you do not have requests and websocket-client (version 0.49 and above) installed\n
# #Install requests and websocket-client packages in a current Jupyter kernal\n

import sys

# !{sys.executable} -m pip install requests
# !{sys.executable} -m pip install websocket-client

In [2]:

import time
import getopt
import socket
import json
import websocket
import threading
from threading import Thread, Event
import base64
import zlib

In [3]:
# TREP connection variables

hostname = '127.0.0.1'
port = '15000'
user = 'user'
app_id = '256'
position = socket.gethostbyname(socket.gethostname())
login_id = 1

In [4]:
# WebSocket connections Variables

web_socket_app = None
web_socket_open = False
_news_envelopes = []

# keeps decompress news JSON messaage
_trna_messages = []

### MRN Process Code

The MRN data can be subscribed with the *NewsTextAnalytics* domain and MRN-specific RIC name as following:
- *MRN_TRNA*: News Analytics: Company and C&E assets
- *MRN_TRNA_DOC*: News Analytics: Macroeconomic News & events
- *MRN_STORY*: Real-time News
- *MRN_TRSI*: News Sentiment Indices

In [5]:
# MRN variables

mrn_domain = 'NewsTextAnalytics'
mrn_item = 'MRN_TRNA'

def send_mrn_request(ws):
    """ Create and send MRN request """
    mrn_req_json = {
        'ID': 2,
        "Domain": mrn_domain,
        'Key': {
            'Name': mrn_item
        }
    }

    ws.send(json.dumps(mrn_req_json))
    print("SENT:")
    print(json.dumps(mrn_req_json, sort_keys=True, indent=2, separators=(',', ':')))

### Initial Refresh Message
The Initial Refresh response does not contain any NTA data, all the fields related to news item and fragment are empty or 0. It contains only the relevant feed related or other static Fields. 

The application can just print out each incoming field data in a console for informational purpose or just ignore it.

In [6]:
# Process FieldList, Refresh and Status messages.

def decodeFieldList(fieldList_dict):
    for key, value in fieldList_dict.items():
        print("Name = %s: Value = %s" % (key, value))

def processRefresh(ws, message_json):

    print("RECEIVED: Refresh Message")
    decodeFieldList(message_json["Fields"])

def processStatus(ws, message_json):  # process incoming status message
    print("RECEIVED: Status Message")
    print(json.dumps(message_json, sort_keys=True, indent=2, separators=(',', ':')))

### MRN News Update messages Process Code

The updates contain only fields related to the item and the fragment. They do not contain any of the static or per-feed fields. The updates are not cached or conflated.

#### First Update
The first update contains all the fields related to the item and the first fragment, subsequent updates only contain the fields relating to the fragment they contain. The FRAG_NUM FID is set to 1 for the first Update of each item and is incremented in each subsequent Update for that item. This allows you to you to detect a missing fragment (and ensure correct order of the fragments for re-assembly). 


#### Subsequent Update and Multi Fragment Items
The subsequent update contains the fields necessary to identify the MRN data item, the order of this fragment among all the fragments for this item, and the fragment itself. The other point to note is that (for a Multi fragment item), Update messages with FRAG_NUM >1 will have fewer FIDs as the metadata is included in the first Update message (FRAG_NUM=1) for that item

#### News Fragments simple handle logic


<img src="images/mrn_flow_reconstruct.png"/>


In [7]:
def processMRNUpdate(ws, message_json):  # process incoming News Update messages

    fields_data = message_json["Fields"]
    # Dump the FieldList first (for informational purposes)
    # decodeFieldList(message_json["Fields"])

    # declare variables
    tot_size = 0
    guid = None

    try:
        # Get data for all requried fields
        fragment = base64.b64decode(fields_data["FRAGMENT"])
        frag_num = int(fields_data["FRAG_NUM"])
        guid = fields_data["GUID"]
        mrn_src = fields_data["MRN_SRC"]

        #print("GUID  = %s" % guid)
        #print("FRAG_NUM = %d" % frag_num)
        #print("MRN_SRC = %s" % mrn_src)

        if frag_num > 1:  # We are now processing more than one part of an envelope - retrieve the current details
            guid_index = next((index for (index, d) in enumerate(
                _news_envelopes) if d["guid"] == guid), None)
            envelop = _news_envelopes[guid_index]
            if envelop and envelop["data"]["mrn_src"] == mrn_src and frag_num == envelop["data"]["frag_num"] + 1:
                print("process multiple fragments for guid %s" %
                      envelop["guid"])

                #print("fragment before merge = %d" % len(envelop["data"]["fragment"]))

                # Merge incoming data to existing news envelop and getting FRAGMENT and TOT_SIZE data to local variables
                fragment = envelop["data"]["fragment"] = envelop["data"]["fragment"] + fragment
                envelop["data"]["frag_num"] = frag_num
                tot_size = envelop["data"]["tot_size"]
                print("TOT_SIZE = %d" % tot_size)
                print("Current FRAGMENT length = %d" % len(fragment))

                # The multiple fragments news are not completed, waiting.
                if tot_size != len(fragment):
                    return None
                # The multiple fragments news are completed, delete assoiclate GUID envelop
                elif tot_size == len(fragment):
                    del _news_envelopes[guid_index]
            else:
                print("Error: Cannot find fragment for GUID %s with matching FRAG_NUM or MRN_SRC %s" % (
                    guid, mrn_src))
                return None
        else:  # FRAG_NUM = 1 The first fragment
            tot_size = int(fields_data["TOT_SIZE"])
            print("FRAGMENT length = %d" % len(fragment))
            # The fragment news is not completed, waiting and add this news data to envelop object.
            if tot_size != len(fragment):
                print("Add new fragments to news envelop for guid %s" % guid)
                _news_envelopes.append({  # the envelop object is a Python dictionary with GUID as a key and other fields are data
                    "guid": guid,
                    "data": {
                        "fragment": fragment,
                        "mrn_src": mrn_src,
                        "frag_num": frag_num,
                        "tot_size": tot_size
                    }
                })
                return None

        # News Fragment(s) completed, decompress and print data as JSON to console
        if tot_size == len(fragment):
            print("decompress News FRAGMENT(s) for GUID  %s" % guid)
            decompressed_data = zlib.decompress(fragment, zlib.MAX_WBITS | 32)
            
            json_news = json.loads(decompressed_data)
            _trna_messages.append(json_news)
            print("News = %s" % json_news)

    except KeyError as keyerror:
        print('KeyError exception: ', keyerror)
    except IndexError as indexerror:
        print('IndexError exception: ', indexerror)
    except binascii.Error as b64error:
        print('base64 decoding exception:', b64error)
    except zlib.error as error:
        print('zlib decompressing exception: ', error)
    # Some console environments like Windows may encounter this unicode display as a limitation of OS
    except UnicodeEncodeError as encodeerror:
        print("UnicodeEncodeError exception. Cannot decode unicode character for %s in this enviroment: " %
              guid, encodeerror)
    except Exception as e:
        print('exception: ', sys.exc_info()[0])

### JSON-OMM Process functions

In [8]:
def process_message(ws, message_json):
    """ Parse at high level and output JSON of message """
    message_type = message_json['Type']

    if message_type == "Refresh":
        if "Domain" in message_json:
            message_domain = message_json["Domain"]
            if message_domain == "Login":
                process_login_response(ws, message_json)
            elif message_domain:
                processRefresh(ws, message_json)
    elif message_type == "Update":
        if "Domain" in message_json and message_json["Domain"] == mrn_domain:
            processMRNUpdate(ws, message_json)
    elif message_type == "Status":
        processStatus(ws, message_json)
    elif message_type == "Ping":
        pong_json = {'Type': 'Pong'}
        ws.send(json.dumps(pong_json))
        print("SENT:")
        print(json.dumps(pong_json, sort_keys=True,
                         indent=2, separators=(',', ':')))


def process_login_response(ws, message_json):
    """ Send item request """
    send_mrn_request(ws)


def send_login_request(ws):
    """ Generate a login request from command line data (or defaults) and send """
    login_json = {
        'ID': 1,
        "Domain": 'Login',
        'Key': {
            'Name': '',
            'Elements': {
                'ApplicationId': '',
                'Position': ''
            }
        }
    }

    login_json['Key']['Name'] = user
    login_json['Key']['Elements']['ApplicationId'] = app_id
    login_json['Key']['Elements']['Position'] = position

    ws.send(json.dumps(login_json))
    print("SENT:")
    print(json.dumps(login_json, sort_keys=True, indent=2, separators=(',', ':')))

### WebSocket Process functions

In [9]:
def on_message(ws, message):
    """ Called when message received, parse message into JSON for processing """
    print("RECEIVED: ")
    message_json = json.loads(message)
    print(json.dumps(message_json, sort_keys=True, indent=2, separators=(',', ':')))

    for singleMsg in message_json:
        process_message(ws, singleMsg)
        
def on_error(ws, error):
    """ Called when websocket error has occurred """
    print(error)
    
def on_close(ws):
    """ Called when websocket is closed """
    global web_socket_open
    print("WebSocket Closed")
    web_socket_open = False
    
def on_open(ws):
    """ Called when handshake is complete and websocket is open, send login """

    print("WebSocket successfully connected!")
    global web_socket_open
    web_socket_open = True
    send_login_request(ws)

if __name__ == "__main__":
    # Start websocket handshake
    ws_address = "ws://{}:{}/WebSocket".format(hostname, port)
    print("Connecting to WebSocket " + ws_address + " ...")
    web_socket_app = websocket.WebSocketApp(ws_address, header=['User-Agent: Python'],
                                            on_message=on_message,
                                            on_error=on_error,
                                            on_close=on_close,
                                            subprotocols=['tr_json2'])
    web_socket_app.on_open = on_open

    # Event loop
    #wst = threading.Thread(target=web_socket_app.run_forever)
    wst = threading.Thread(target=web_socket_app.run_forever, kwargs={'sslopt': {'check_hostname': False}})
    wst.start()

    time.sleep(90)
    web_socket_app.close()

Connecting to WebSocket ws://172.20.33.30:15000/WebSocket ...
WebSocket successfully connected!
SENT:
{
  "Domain":"Login",
  "ID":1,
  "Key":{
    "Elements":{
      "ApplicationId":"256",
      "Position":"10.42.85.159"
    },
    "Name":"root"
  }
}
RECEIVED: 
[
  {
    "Domain":"Login",
    "Elements":{
      "MaxMsgSize":61430,
      "PingTimeout":30
    },
    "ID":1,
    "Key":{
      "Elements":{
        "AllowSuspectData":1,
        "ApplicationId":"256",
        "ApplicationName":"ADS",
        "Position":"10.42.85.159",
        "ProvidePermissionExpressions":1,
        "ProvidePermissionProfile":0,
        "SingleOpen":1,
        "SupportBatchRequests":7,
        "SupportEnhancedSymbolList":1,
        "SupportOMMPost":1,
        "SupportOptimizedPauseResume":1,
        "SupportPauseResume":1,
        "SupportStandby":1,
        "SupportViewRequests":1
      },
      "Name":"root"
    },
    "State":{
      "Data":"Ok",
      "Stream":"Open",
      "Text":"Login accepted by h

RECEIVED: 
[
  {
    "Type":"Ping"
  }
]
SENT:
{
  "Type":"Pong"
}
RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51W2ZLqOAz9FSpPM0UvtrPzBiFAWEJuFrapqSmTuCFNts5CQ9/qfx8noS9woXqqhgek6CjHsixZ+cngCAfH3HczpvXz/GC5cUqo6S9qyzKSKwEVTItRJoa+ZB5OxtirfBijxQNOQKIsAZGCZsuaGoOnMfP3yVHz6KtXLpVZxyGhgBUn2zhr9NO4SBrGWKHwOo13JG27uR9H1MPRu2pP09UuhV78NMsnJCohiwoSuZQEPjCBH+2Ip3l11L5nxJlfvw/OICXL05ZpYZbnRBX/gwBASEQ7b75ow6IZeUPNQjgX92PFVPDCf+m723jSN/ovlsF8PlzzwlveFeIh/OIVEDtdovXK0R08spy3QvM3esYvxpvt3FzsByB+Vu3MueFFd3ihLHJnXp8dFQsFgnS+0g8b45BveASH++5BLWYD83UxTWzI3/Cy9+JlWenMm2vLnrdJw5m6nmyVjqlNh17n6Lzz4lwdhrqo9aGb3vBy93g5xJ95dyunTxbrw2vk2r4x1FazpbpzoQuMt2cug4o1i9zkhpeXb4ltAQls90QMUFdp6k4P28mor+52/f57N+FXCR6MV2/H4Sgi8Nke/LghFu5UBCXm5F/E/EuoKxNvGMHBR9DleMdIe/4hJKvdsCfMfB5q01dndUt8pyQoMeLOxLmQQxQcfmTbYt2BeHIMm0QS5i+yrz7Pkqw9/CDT5i3xnZq4JkaenrCjAGZKdz1IjyPZF2bFsCfniSd0RrhTyNJgJt0S3ym

RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51Vy27bRhT9FYNr2SCHlGhyR1OyXhZFi7JjuyiMMTmWafPhkkM5SiAgD/QHigBpgSyCFgG66a5A+z1Nrf5G7zwUSUGiAuVizp37mJl7Zu7lcwVnOJnROCwV+/lqEoR5QUD1DejKklA3AVBsxR343rlSk8o84j6KbxvIqu+bpm5oYBzZJlK1vbHyrXTsRhC64cLVHk4JGLy4LHG2M8hpXuy4+c4RjcDjqsjvSOGENM4zcDrxmq3Drtdqguk6Lko6IBkzBQAkC2EdraYkcXZHom7EDgV7Z/mUJHTm5lVGRS4xJSmfKrZaUx7iLMofYHENdZR57atmZGw1681t1vpWqwlWOOl9EYdkjIsJcJVFcYiBis+SLkhCplhkuqeyr6aUjIMUBnk9u9qaziMTTOMp+Kt7hrVvWOqGsaIFTpgN1c26Zq3Z/LyMl4HI0K2NwCd5Eck8GqCf5kmVki8zbGxn2NjOsLWVYWsrw5r1OcWM5Iw8lF1wYg/9Ko9mQfwMcrSAszBP73E2W0bXFPI0vMHZhHzx+d0QHMFLg/uagAUUKS7uWDWkjCBczBT7GiclEayxxykX1uFU6/TNwWNWwolOSVGKfcaBbahqQ9U0BEkpMSscWtj928I5Gd5eXl4iVUXIRBf+07P4up1dPUyOE5Rk0cWxQw/dXvPZTUKrs2EQnZ+eKps5R5ji8eyelxyoWSEREh3iNE5mrJT5Vmv5wfyvH79fvP3wz5tfPr54uXj/evHu18c/Pjy+//3jqx/+/vPF4qdXjy/fLX77GYISoKvCExZ0ixknhGK2Ie8qiegAmUyD1X8VxYwZ3j

RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51WbXOqOhD+Kw5f23ogKCjfENH6hghYq3fu3IkQlYqgAbV6pv/9bgieatvbzlw/JI/ZzZPdzW6W3wKOcXTKQj8VtN/vf1w/oQSW/oK1NCWZEcEkaIIxsK2pcF8sJkGuI9haBdWrtXpFVVUQOlpD99xy6x32OfQ6Zf2CuuWu8HdB1AmA+oYiX7bwhoCgQcMsTFclfUNo6OO45CVz7PtJye4boDqnyZpQ3c/CJAbtsdU0Wx3LbIJoEdI0G5CYiVyYSOwDoXQvRGG8JkEn4B6GgZ2kId8vvguBLKNaX7aQ7qCB/A8SRYRUpHTXT/KULhq/xsdKp9mn5uvZn7SD3eRu0T50kuN51qgLb/e3vNJ/8HYnF96tLHXPWXNuP67lZE2tE6bR8/jumYbyZuuTPt2PF8dPvOgjr+5aoijKHaXglRKczpMna9ojXvuu/1odh3LWGs37duCKptVqGVkvHn/ilX/kDVfN7nSiro34PFJrL16wbrami0Pd9ya90/DUQZH84n7irfzIm/Yq6vi8a+9mj71hcv7VfBKXPq11YhW5g+2k67dX25dPvNUfealCI7fp7OfhbNo4N5fBvrasPI0cGrWrcnddW2zRXSi8QU7GyYFE2clI9nFWZEhGNvlfQVPuhWMYB8kRjpHQIzfkazGqfCuWm99Jq99K1WZu6RYKgniYLqGG4gCKI0vohxqgJCIHzBO/LLLfvZCyktjAUJT1g3S1ZpElzsID6Ivlar0u1qs3wn1GccRkklxBUuVKxi+Db0SKotSvWScJDQo/qlVIrkMS7Tfk6xAj9H2Mb+RfBPlG/jnKSP42zE

RECEIVED: 
[
  {
    "Type":"Ping"
  }
]
SENT:
{
  "Type":"Pong"
}
RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51VUZOaSBD+KxZPd1XuHjOgKG+IaFgjZYS4m1ylUiOM7pwwmAHcIyn/e3pgiLtbG6/qfLA/unua7m+6mx8a4SStSxYXmv3j8hDGuaCg+ht0RUFLNwWh2Zq7XAWftL5S5knjo61sE48Ho7GhmwiMa3sxcYLF7WRxedjJpy/qoJ9AqBdHGnVAMgqGBSnYIRd8S/iht3Lfg3kr8gMVTlyynIPHx2DqzfzAm4Jpx0RRLimXphAE5TEEQX0tZfxAEz+RGcKLeX6iaVm7ecXLtjBW0qx51Gy9rz0xnuRPEBzhd9q5/1szNq+ajek16+Cq1QIrZHoULKYREXsgiicsJmUuXhUtaEpPpK30Vpe/vlZIDjL4U3d1g57pAronJTuBv347HA5MZL0wVqUgqbRh3Rob5jPbKi9YdxDh4dAcPzPe5yJRdQzg0ClPq4y+zbBxnWHzOsOjqwyPrjKM0GuKJcmcPhU+OMmu3+ZJHbLvUCPGI9zX4jw7El535/sa/Td+JHxP32zAR0oS6DW4sT1YQJERcZDDkUmKiKg1e0fSgra8yfZUgbG8+wuD2LLO4FMXkNWGiqJ9UxTapq4PdYQwFKYxOTmlsEMvqteTbwvyFes6xhY++H89DO7QgD3MR0+zAQond3Nm3G3zZVlTbs12jnWYaC/rTkhJovooZy4AtRwmSpMZyVhay9luXvWswjdm03aqBLhOep8oEWnd2zEObclI2jtSsctFJpu098fMHPwJgVIgsSJ7GYh

RECEIVED: 
[
  {
    "Type":"Ping"
  }
]
SENT:
{
  "Type":"Pong"
}
RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/81VUXOjNhD+KwyvdTwI22db94QxtpkYhQNyqa/T6SigOGpA+ATYJTf5710JfLU7vXSmT/WD9vN+K2l3tbt8M6mgeVvztDLxt7/+xGkpGah+AV1VsdrNQZjYdIOQ7MxBrywzbWOGeGzPJ7PZCI0QkBHeeGQ3vN2av/aWfgZ7r2y0mtCCAbFhYt82VBgRe+KCi73hlsWBitZYPGdg+yjLFyadtOalAPN7svRWPvGWQD1xWdUBE4qKQTCRwoloYOZcvLDMz5R/4IUojyyvW7dsRN2FxWtW6L8mtgbmiYusPMHhyN6Yb4Mf0vb4XXq0fI+dvMtOgQVPD5KnLKFyD1kTGU9pXcq/BS1Zzo60i3Roqd/ArFQOClj6l0IXKsL2tOZHMLeG1ghN0Ni+Ypta0lyRCM2tyeSCC8uKn3fOxvPx6PKmh1JmfRxoNjCPZd4U7H+fYZVjwU6VDzaq5B/LrI35K9OmaVd356gGJvsjfaZiz/6x+J4ZzaDO4LX2wICioPJFtUWh0kNla+Inmlesy5kqzf5gSP/pMnlvYNFW4NBnJqvuniTGY8v6YCFkQ0gmVw1US7x6ILYTWbfRb7Zl2fYU+dOHW0dWRbVZLuz18rUMTmm9OhzoVxY6r4tD+Dr/KTWvQ85oTZP2oFqPgFq1EWPZihY8b1VP66su4lMt6pH17t4hRqSi98nacO+C0CE7Y7FZ3hifkmi7M4iXGGF0t/ITw7DR8IMRbGHDOvnY85H32SP3nmGMhmh

RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51VbXPaOBD+K4w/3c0lVJYNJv4GBhIuseLYLlxyc9MRtgAHv4BfoKTDf+9KMgXSlJspH9j1PtJq91nt6ptCUxrvyigoFPPb8cMLspyB6V+wFQUrrRiEYiqW7ZBn5ao2ZqFYozimjm9aHUNrtToAuuZk5LjN3uNRJ16t+02i/FfvH4Xg8WynMBOaMAAm0SrPGg9lCPZpni1Z3g3KKEsB+kz6g+GIDPoAzaK8KG2WcsgDwdIAdqtXShylSxaOQplFFDpZEcn96AiCszI3e96gPZvZuv0FI4RxG4+fb/PWeJONstfnTjfJyNxd3DvOUxr9TVZz45EObWuq7K/O/aof+N2V25X6RfptYaTNrLXVmsVv26Vu/YXazlvytTe1yRBt0HQ8fjPcHlL2QFCabVhc7qysSss6g5Il4lMksI3SMNvCMSq+k4F8DGP9JxifwFr/Etq6iBp9EekqjwLm03wOBU3DKKBllr+rUc5itqGyME3Ef1dKwUuWwF99tU5NhM1pGW1gOWqqhq6p52BV5jTmWFtFumGcYLIWciNWW1jFJ+Aky8M6DQ1KtcniKmG/RbB6mWD9IsH6RYJv3hMsLgPbFiNYw1t0moU7L3rjVHZgcZAlK5rujoGxr8GCpnP2YbMsGA3hikK55oCAIaH5kjdywgmi+U4xZzQumGSNt9KBMAjrlL49rNgVENKY5YU8x/dMHaE2UlUMSUFjHHtLe10Wa9kDBl57Lwv9GXfvZ+RTAI32vLkrfRQ8LbT1Mtrek9FNQjLlPOmQltTfrfhYIGDmbc9YOKRJFO/4FBJHneTHxweMncfGg99vNq

RECEIVED: 
[
  {
    "DoNotCache":true,
    "DoNotConflate":true,
    "Domain":"NewsTextAnalytics",
    "Fields":{
      "ACTIV_DATE":"2020-02-27",
      "FRAGMENT":"H4sIAAAAAAAA/51VbXOiSBD+KxZfz7VmRnzjG0FiLFfKCGuyu3V1NULHzApDbgbMclv+9+sBjGYr54fzg0/TL0P3M93NL4tLnlaFiLXl/Do/hHGuAFXfUac1FF6KYDmWt1wFX61uq8yT2sdaOTabDMaTPrEpGtfO4sYNFr2bxfnhyTz92QbOEzzqXUitDngGaFhwLfa5klsu952V9xnNW5XvQblxIXKJHl+CqX87D/wpmp6E0sUSpDGFCCBjPIR2rVTIPSTzxGSIL5b5AdKi8vJSFk1hooCsfrQc0rVehUzyVzycsjvr2P1PM7OvmvvTa9bBVesIrZjpixIxRFztkCiZiJgXufqtaAUpHHhTaY+YX9fShoMM/9q7ulQFsOOFOKA76dHRyLZH74xloXhqbEMyGdj0wrbKtTgFMtonlF0YH3KVtGWMxl3rkKdlBh8TbF8neHCd4MlVgidXCTYZv2fYcCzhVc/RyTT9Nk+qUPxjuGQDJCbOsxcuq1N814Kf8TOXO/iw/56BJ9hqeGE7tKAi42pvZiMzFHFVWc4TTzU0vJnuvEj7TCAdTY7oUmlMagNKNy+KQscmZEgoZViXJczcFMoJ/UiPGbvlfzFCGBux6f1+siioWP2Y/TGLXcnVHB7vAo/2b59+RhLKcrsyc3RZdsILHlUvZuICVJtRAkhueSbSykx2/aqLAj+YTGeJC2MHptDOVOi41CbvDpdJxzWbRAvd+QpcpVXHx06Wu06fdqYQQ7YF1WGETvD8FKkt8RQ8H6ThDwpucqv3UdrsCvlWsdkVZSIMj/XqCVaOF6xR2whuI6G7WTamoAhbVRc8ez

RECEIVED: 
[
  {
    "Type":"Ping"
  }
]
SENT:
{
  "Type":"Pong"
}
WebSocket Closed


In [10]:
_trna_messages

[{'analytics': {'analyticsScores': [{'assetClass': 'CMPNY',
     'assetCodes': ['P:4295914598', 'R:BKNG.O', 'R:BKNG.OQ'],
     'assetId': '4295914598',
     'assetName': 'Booking Holdings Inc',
     'brokerAction': 'UNDEFINED',
     'firstMentionSentence': 1,
     'linkedIds': [{'idPosition': 0,
       'linkedId': 'tr:FWN2AR0IA_2002271+06pdY+WnA0y+HSEUe26D6E8l3UlVm9YKTLsI'},
      {'idPosition': 1,
       'linkedId': 'tr:FWN2AR0FO_2002271+hvPE2Y5CTG2PEN9SBuoAlj4UyOZZueJ4HN2Z'},
      {'idPosition': 2,
       'linkedId': 'tr:FWN2AR062_2002271qQzGXSJYO0rR+++yGj2cQSqZVbmtkXtDraIHD'},
      {'idPosition': 3,
       'linkedId': 'tr:FWN2AR03D_2002271VBmukMVfMMB1MVjlVMdZDcMVA6G+F7VqstEZq'},
      {'idPosition': 4,
       'linkedId': 'tr:FWN2AR08F_2002271uzFI+Ho0qbFRHm03LQJ8CE8vUb7VjIlltzBPB'},
      {'idPosition': 5,
       'linkedId': 'tr:FWN2AQ0YP_2002271LwTyuAbb1wfBx2Og0V4DYjxuQi2nMJNzU7Z2J'},
      {'idPosition': 6,
       'linkedId': 'tr:FWN2AQ11T_2002261hRRrj5Wxd0wadzn/Zqe9SXHYDboCO5fmD

In [11]:
print("first 10 analytics\n")
for analytic in _trna_messages[:10]:
    if analytic["analytics"]:
        print(analytic["analytics"])

first 10 analytics

{'analyticsScores': [{'assetClass': 'CMPNY', 'assetCodes': ['P:4295914598', 'R:BKNG.O', 'R:BKNG.OQ'], 'assetId': '4295914598', 'assetName': 'Booking Holdings Inc', 'brokerAction': 'UNDEFINED', 'firstMentionSentence': 1, 'linkedIds': [{'idPosition': 0, 'linkedId': 'tr:FWN2AR0IA_2002271+06pdY+WnA0y+HSEUe26D6E8l3UlVm9YKTLsI'}, {'idPosition': 1, 'linkedId': 'tr:FWN2AR0FO_2002271+hvPE2Y5CTG2PEN9SBuoAlj4UyOZZueJ4HN2Z'}, {'idPosition': 2, 'linkedId': 'tr:FWN2AR062_2002271qQzGXSJYO0rR+++yGj2cQSqZVbmtkXtDraIHD'}, {'idPosition': 3, 'linkedId': 'tr:FWN2AR03D_2002271VBmukMVfMMB1MVjlVMdZDcMVA6G+F7VqstEZq'}, {'idPosition': 4, 'linkedId': 'tr:FWN2AR08F_2002271uzFI+Ho0qbFRHm03LQJ8CE8vUb7VjIlltzBPB'}, {'idPosition': 5, 'linkedId': 'tr:FWN2AQ0YP_2002271LwTyuAbb1wfBx2Og0V4DYjxuQi2nMJNzU7Z2J'}, {'idPosition': 6, 'linkedId': 'tr:FWN2AQ11T_2002261hRRrj5Wxd0wadzn/Zqe9SXHYDboCO5fmD+Jaq'}, {'idPosition': 7, 'linkedId': 'tr:FWN2AP0R0_20022519QTF5Y20RDJhAta49u7vMY6kwn6FmGrxTq35g'}], 'noveltyC

## News Analytics Data Model Overview

The structure of the data within each data feed is defined in the following sections. After assembly and decompression, the data appears as JSON in UTF-8.

The News Analytics feed has three top-level items:
- *id*: The value of this field is in ```[feedFamilyCode]:[sourceId]``` format.
- *analytics*: Analytics Groups sub-group containing the analytics scores
- *newsItem*: This group contains metadata sourced directly from the STORY item, in contrast to the newsItem group also inside the analytics group that contains data derived from the TRNA scoring.

In [12]:
analytic_scores_group = _trna_messages[0]["analytics"]["analyticsScores"]

### Analytics Score Group

Each analytics score group contains all the analytics information derived from the news item for a specific asset as a simple group of named values.

Example Fields:
- ```assetClass```: The broad class that the asset belongs to. Also describes the type of TRTS sentiment engine used in the scoring.
    * Either "CMPNY" for a company or "COM" for a commodity.
    * Set to “CMPNY” for document-level scores because of use of the same scoring engine as used for company-level scores.
- ```assetCodes```: List of prefixed codes, in conjunction with assetId field below, which identify the asset within various symbologies.

    * By assetClass value:
    “CMPNY”: “P:” prefix for PermID and “R:” for RIC. Can contain multiple RICs for a single company, including the primary one and those tagged to the news item.
    
    * “COM”: “N2” for topic code
- ```assetId```: Primary identifier for the asset. PermID for company and topic code for commodity.
- ```assetName```: A human readable name for the asset, used as an identifier for unknown entity scoring.
- ```brokerAction```: Denotes whether the news item is reporting the action of a broker recommendation for a security issued by the company.

    * One of "UPGRADE", "DOWNGRADE", "MAINTAIN", "BROKER", "INITIATE", "UNDEFINED"
- ```firstMentionSentence```: The first sentence, starting with the headline, in which the scored asset is mentioned. Thus, a value of 1 denotes the headline, 2 the first sentence of the story body, 3 the second sentence, etc.
- ```priceTargetIndicator```: When the news item is a price target indicator for the asset.

    * One of "INCREASE", "DECREASE", "MAINTAIN", "BROKER", "INITIATE", "UNDEFINED"
    * Set to “UNDEFINED” for all Japanese-language and document-level scores.
- ```relevance```: A decimal number indicating the relevance of the news item to the asset. It ranges from 0 to 1.
- ```sentimentClass```: This field indicates the predominant sentiment class for this news item with respect to this asset. The indicated class is the one with the highest probability.
    * 1: Positive
    * 0: Neutral 
    * -1: Negative
- ```sentimentNegative```: The probability that the sentiment of the news item was negative for the asset.
- ```sentimentNeutral```: The probability that the sentiment of the news item was neutral for the asset.
- ```sentimentPositive```: The probability that the sentiment of the news item was positive for the asset.
- ```sentimentWordCount```: The number of lexical tokens (words and punctuation) in the sections of the item text that are deemed relevant to the asset.

#### TRNA Analytics Group Processing functions

In [13]:
def get_permid(asset_codes):
    for code in asset_codes:
        if code[:2] == "P:":
            return code[2:]

def get_company(asset_codes):
    company = [code[2:] for code in asset_codes if code[:2] == "R:"]
    return " ".join(company)

def get_topic_code(asset_codes):
    topic = [code[3:] for code in asset_codes if code[:3] == "N2:"]
    return " ".join(topic)

In [14]:
# Analytics Group Fields

asset_class = None
asset_codes = None

for analytic_score in analytic_scores_group:
    if analytic_score["assetClass"]:
        asset_class = analytic_score["assetClass"]
        asset_codes = analytic_score["assetCodes"]
        print("assetClass: ", asset_class)
        print("assetCodes: ", asset_codes)
        if asset_class == "CMPNY":
            print("PermID: ", get_permid(asset_codes))
            print("Co: ", get_company(asset_codes))
        elif asset_class == "COM":
            print("Topic Codes: ", get_topic_code(asset_codes))
        print("assetId: ", analytic_score["assetId"])
        print("assetName: ", analytic_score["assetName"])
        print("brokerAction: ", analytic_score["brokerAction"])
        print("relevance: ",analytic_score["relevance"])
        print("Sent: ", analytic_score["sentimentClass"])
        print("sentimentClass: ", analytic_score["sentimentPositive"])
        print("sentimentNeutral: ", analytic_score["sentimentNeutral"])
        print("sentimentNegative: ", analytic_score["sentimentNegative"])
        print("priceTargetIndicator: ", analytic_score["priceTargetIndicator"])
        print("firstMentionSentence: ", analytic_score["firstMentionSentence"])
        print("--------------------------------------------------------")

assetClass:  CMPNY
assetCodes:  ['P:4295914598', 'R:BKNG.O', 'R:BKNG.OQ']
PermID:  4295914598
Co:  BKNG.O BKNG.OQ
assetId:  4295914598
assetName:  Booking Holdings Inc
brokerAction:  UNDEFINED
relevance:  1.0
Sent:  -1
sentimentClass:  0.0556656
sentimentNeutral:  0.125282
sentimentNegative:  0.819053
priceTargetIndicator:  DECREASE
firstMentionSentence:  1
--------------------------------------------------------


### Windowed Count Group

The windowed count group is used to associate a count with the window of time it relates to. It is used for the noveltyCounts and volumeCounts.

#### noveltyCounts 

The novelty of the content within a news item on a particular asset is calculated by comparing it with the asset-specific text over a cache of previous news items that contain the asset.

The comparison between items is done using a linguistic fingerprint. If the news items are similar, they are termed as being “linked”. As a result, a content item can “link” only to an item of the same language.

There are five historical periods that are used in the comparison. The default periods are 12 hours, 24 hours, 3 days, 5 days and 7 days prior to the news item’s timestamp.

#### volumeCounts

The volume of news for each asset is calculated. A cache of previous news items is maintained and the number of news items that mention the asset within each of five historical periods is calculated. The cache is language-specific, e.g., a volumeCount on an English-language item measures the number of other English-language items in that historical period.

By default, the historical periods are 12 hours, 24 hours, 3 days, 5 days and 7 days prior to the news item’s timestamp and are the same used in the novelty calculations. Thus, direct comparisons between similar and total items within the historical periods can be achieved.

Example Fields:
- ```itemCount```: Number of items
- ```window```: Length of time the count covers nH (for hours) or nD (for days). Default values are “12H”, “24H”, “3D”, “5D”, and “7D”.

#### TRNA Windowed Count Processing functions

In [15]:
def windowsed_count_group(group):
    for item in group:
        print("itemCount: ", item["itemCount"])
        print("window: ", item["window"])

In [16]:
# Windowed Count - Analytics Group Fields

for analytic_score in analytic_scores_group:
    print("Novelty Counts:\n")
    windowsed_count_group(analytic_score["noveltyCounts"])
    print("--------------------------------------------------------")
    print("Volumn Counts:\n")
    windowsed_count_group(analytic_score["volumeCounts"])
    print("--------------------------------------------------------")

Novelty Counts:

itemCount:  6
window:  12H
itemCount:  7
window:  24H
itemCount:  8
window:  3D
itemCount:  8
window:  5D
itemCount:  8
window:  7D
--------------------------------------------------------
Volumn Counts:

itemCount:  7
window:  12H
itemCount:  20
window:  24H
itemCount:  26
window:  3D
itemCount:  26
window:  5D
itemCount:  26
window:  7D
--------------------------------------------------------


### Linked Id Group

The linked id group is used to associate an id with its position in a longer list of ids. It is used for the linkedIds.
This group is not populated for document-level scores, since novelty is not calculated.

Example Fields:
- ```idPosition```: Position of the linkedId in the complete list of linked Ids. 0 is the first/oldest, and the largest/most recent is the 7-day itemCount minus 1.
- ```linkedId```: id of the item at this position

In [17]:
linked_id_group = None

for analytic_score in analytic_scores_group:
    print("Linked Id Group: ")
    linked_id_group = analytic_score["linkedIds"]
    if linked_id_group:
        for linked_id in linked_id_group:
            print("idPosition: ", linked_id["idPosition"])
            print("linkedId: ", linked_id["linkedId"])

Linked Id Group: 
idPosition:  0
linkedId:  tr:FWN2AR0IA_2002271+06pdY+WnA0y+HSEUe26D6E8l3UlVm9YKTLsI
idPosition:  1
linkedId:  tr:FWN2AR0FO_2002271+hvPE2Y5CTG2PEN9SBuoAlj4UyOZZueJ4HN2Z
idPosition:  2
linkedId:  tr:FWN2AR062_2002271qQzGXSJYO0rR+++yGj2cQSqZVbmtkXtDraIHD
idPosition:  3
linkedId:  tr:FWN2AR03D_2002271VBmukMVfMMB1MVjlVMdZDcMVA6G+F7VqstEZq
idPosition:  4
linkedId:  tr:FWN2AR08F_2002271uzFI+Ho0qbFRHm03LQJ8CE8vUb7VjIlltzBPB
idPosition:  5
linkedId:  tr:FWN2AQ0YP_2002271LwTyuAbb1wfBx2Og0V4DYjxuQi2nMJNzU7Z2J
idPosition:  6
linkedId:  tr:FWN2AQ11T_2002261hRRrj5Wxd0wadzn/Zqe9SXHYDboCO5fmD+Jaq
idPosition:  7
linkedId:  tr:FWN2AP0R0_20022519QTF5Y20RDJhAta49u7vMY6kwn6FmGrxTq35g


### News Item Group (Analytics Sub-group)

The TRNA feed contains two news item groups. This group, within the analytics group, contains values derived from the news item by the analytics system.

Example Fields:
- ```companyCount```: The number of companies explicitly listed in the news item in the subjects field
- ```exchangeAction```: One of "IMBALANCE", "HALT", "RESUME", "BLOCK TRADE", "INDICATION", "UNDEFINED".
    * Set to “UNDEFINED” for all Japanese-language scores.
- ```marketCommentary```: Indicator that the item is discussing general market conditions, such as “After the Bell” summaries.
- ```sentenceCount```: The total number of sentences in the news item.
- ```wordCount```: The total number of lexical tokens (words and punctuation) in the news item.

In [18]:
news_item_groups = _trna_messages[0]["analytics"]["newsItem"]

In [19]:
print("companyCount: ", news_item_groups["companyCount"])
print("exchangeAction: ", news_item_groups["exchangeAction"])
print("marketCommentary: ", news_item_groups["marketCommentary"])
print("sentenceCount: ", news_item_groups["sentenceCount"])
print("wordCount: ", news_item_groups["wordCount"])

companyCount:  1
exchangeAction:  UNDEFINED
marketCommentary:  False
sentenceCount:  1
wordCount:  16


## News Item Group (Top-Level Group)

The News Analytics feed contains two news item groups. This top-level group contains values which are contained within the news item being processed; the other group (above section) within the analytics group contains values derived from the news item by the analytics system.

Because the fields below are sourced from the incoming news item data and mapped to the below fields, those mappings can vary by the feedFamilyCode value. Those mappings are distinguished in the Notes section in the below table.

Example Fields:
- ```dataType```: The broad type of data the news item belongs to. One of "News", "Social"
- ```feedFamilyCode```: A code that identifies the family of feeds the news item came from. Thomson Reuters feeds = "tr"
- ```headline```: The headline text of the news item.
- ```sourceTimestamp```: UTC timestamp of this news item. Millisecond precision. The source of this data varies by the feedFamilyCode value.
- ```provider```: Identifier for the organization which provided the news item. The source of this data varies by the feedFamilyCode value.
    * "tr": from provider field
    * "mrvr": from sourceName or publisher field
- ```urgency```: Differentiates story types. 1: alert, 3: article
    

In [20]:
news_item  = _trna_messages[0]["newsItem"]

In [22]:
print("dataType: ", news_item["dataType"])
print("Headline: ", news_item["headline"])
print("Regional Timestamp: ", news_item["sourceTimestamp"])
print("feedFamilyCode: ", news_item["feedFamilyCode"])
print("provider: ", news_item["provider"][3:]) # news_item["provider"] == NS:RTRS
print("urgency: ", news_item["urgency"], " : ", 
      (lambda item_type: "alert" if 1 else "article")(news_item["urgency"]))

dataType:  News
Headline:  BOOKING HOLDINGS INC <BKNG.O>: EVERCORE ISI CUTS PRICE TARGET TO $1750 FROM $1970
Regional Timestamp:  2020-02-27T10:42:54.662Z
feedFamilyCode:  tr
provider:  RTRS
urgency:  1  :  alert


## References
* [Refinitiv Elektron SDK Family page](https://developers.refinitiv.com/elektron) on the [Refinitiv Developer Community](https://developers.thomsonreuters.com/) web site.
* [Refinitiv Elektron WebSocket API page](https://developers.refinitiv.com/websocket-api).
* [Developer Webinar Recording: Introduction to Electron WebSocket API](https://www.youtube.com/watch?v=CDKWMsIQfaw).
* [Introduction to Machine Readable News with Elektron WebSocket API](https://developers.refinitiv.com/article/introduction-machine-readable-news-elektron-websocket-api-refinitiv).
* [Introduction to Machine Readable News (MRN) with Elektron Message API (EMA)](https://developers.refinitiv.com/article/introduction-machine-readable-news-mrn-elektron-message-api-ema).
* [MRN Data Models and Elektron Implementation Guide](https://developers.refinitiv.com/elektron/elektron-sdk-java/docs?content=8736&type=documentation_item).
* [Refinitiv-API-Samples/Example.WebSocketAPI.Javascript.NewsMonitor](https://github.com/Refinitiv-API-Samples/Example.WebSocketAPI.Javascript.NewsMonitor).

For any question related to this example or Elektron WebSocket API, please use the Developer Community [Q&A Forum](https://community.developers.refinitiv.com/spaces/152/websocket-api.html).