# ESG Bulk Introduction With Python

ESG stands for Environmental, Social and Governance.  

Refinitiv Data Platform (RDP) provides simple web based API access to a broad range of content

Next, we are going to talk about ESG Bulk service - a service that provides the ESG Measures data with history for each of the companies in the ESG universe. This service is part of RDP family of services and is a RESTful web service, accessed via HTTP REST requests.

Learn more about ESG Bulk service via Refinitiv Data Platform APIs -> Documentation at:

https://developers.refinitiv.com/refinitiv-data-platform/refinitiv-data-platform-apis/docs

Let us now move on to the interaction with ESG Bulk RDP service

### Valid Credentials - Read From File

For this example, we have pre-stored a set of valid RDP crentials in a file and are going to retrieve them now

In [1]:
import requests, json, time, getopt, sys

credFile = open("..\creds\credFileHuman.txt","r")    # one per line
                                                #--- RDP MACHINE ID---
                                                #--- LONG PASSWORD---
                                                #--- GENERATED CLIENT ID---

USERNAME = credFile.readline().rstrip('\n')
PASSWORD = credFile.readline().rstrip('\n')
CLIENT_ID = credFile.readline().rstrip('\n')

credFile.close()

# Make sure that creds are read in correctly
#print("USERNAME="+str(USERNAME))
#print("PASSWORD="+str(PASSWORD))
#print("CLIENT_ID="+str(CLIENT_ID))

# Set Application Constants
RDP_AUTH_VERSION = "/v1"
RDP_ESG_BULK_VERSION = "/v1"
RDP_BASE_URL = "https://api.refinitiv.com"
RDP_ESG_BUCKET = "ESG"
CATEGORY_URL = "/auth/oauth2"
ENDPOINT_URL = "/token"
CLIENT_SECRET = ""
TOKEN_FILE = "token.txt"
SCOPE = "trapi"

### Define Token Handling and Obtain a Valid Token

Having a valid token is a pre-requisite to requesting of any RDP content, and will be passed into the next steps.

In [51]:
TOKEN_ENDPOINT = RDP_BASE_URL + CATEGORY_URL + RDP_AUTH_VERSION + ENDPOINT_URL

def _requestNewToken(refreshToken):
    if refreshToken is None:
        tData = {
            "username": USERNAME,
            "password": PASSWORD,
            "grant_type": "password",
            "scope": SCOPE,
            "takeExclusiveSignOnControl": "true"
        };
    else:
        tData = {
            "refresh_token": refreshToken,
            "grant_type": "refresh_token",
        };

    # Make a REST call to get latest access token
    response = requests.post(
        TOKEN_ENDPOINT,
        headers = {
            "Accept": "application/json"
        },
        data = tData,
        auth = (
            CLIENT_ID,
            CLIENT_SECRET
        )
    )
    
    if response.status_code != 200:
        raise Exception("Failed to get access token {0} - {1}".format(response.status_code, response.text));

    # Return the new token
    return json.loads(response.text);

def saveToken(tknObject):
    tf = open(TOKEN_FILE, "w+");
    print("Saving the new token");
    # Append the expiry time to token
    tknObject["expiry_tm"] = time.time() + int(tknObject["expires_in"]) - 10;
    # Store it in the file
    json.dump(tknObject, tf, indent=4)
    
def getToken():
    try:
        print("Reading the token from: " + TOKEN_FILE);
        # Read the token from a file
        tf = open(TOKEN_FILE, "r+")
        tknObject = json.load(tf);

        # Is access token valid
        if tknObject["expiry_tm"] > time.time():
            # return access token
            return tknObject["access_token"];

        print("Token expired, refreshing a new one...");
        tf.close();
        # Get a new token from refresh token
        tknObject = _requestNewToken(tknObject["refresh_token"]);

    except Exception as exp:
        print("Caught exception: " + str(exp))
        print("Getting a new token using Password Grant...");
        tknObject = _requestNewToken(None);

    # Persist this token for future queries
    saveToken(tknObject)
    print("Token is: " + tknObject["access_token"])
    # Return access token
    return tknObject["access_token"];

accessToken = getToken();
print("Have token now");

Reading the token from: token.txt
Token expired, refreshing a new one...
Saving the new token
Token is: eyJ0eXAiOiJhdCtqd3QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImRMdFd2Q0tCSC1NclVyWm9YMXFod2pZQ2t1eDV0V2ZSS2o4ME9vcjdUY28ifQ.eyJkYXRhIjoie1wiY2lwaGVydGV4dFwiOlwiSkVqUVRTeHhnV2FIY2l4UWxub1lpVFJ3ZzRfNUY4VTdINWJHZ3pwcHJGMFFINVl3RWFmNDA3UjRoaWc3QmlyM2t0aVNYVFMxXzlObGpCcE9iVWsydXhqOUROOE1COXJmOEl2RzdTWjZwb095OFhVaS1UWmx5ekR4UkUxUG9OZGJNc3BuMzNsa2ktU00zSFlRVzlIMWJfSjM5R0FJY1BCMGJRcFoxOXVzZ2p4S3J0Xzc1eC1IZ3FTMDZuVGJFcWt1UWxNNE54S0xDYjlCMy1PQ0tkQnA3aV81cVVPaUJpYXI1djVzeXVBNWlpQVJKQlp6OXo3RGFFanFhdG1IZS1UZmJuaXNseU8tRGVDUEFQa1M4MTVNd1lEUURBaTc0QXU0b18yNm42eXAyN3NNZmpWOU1nTmZvempyTGR2bWl6RDFYbHFGSnRNaHV5QjBpVHQzdnM4a2xYWS1nYldGZWtQOGtWVEhVV1lockU5MG5RM3dIMjJ6MDd4WE1tR1llLU55Z1NvbmJpVzZkWnRvSTFITTFad2ZoY0l0WGtVc1lCZ2hyR1MzNENtb2U4d04zcDRFaUdUOXlLWTM4SEpRbHdzdWcxUFNEalZPWlJNdzl2U0ZBM2VfVTNsNExiOGJRbmtvVktmYXFNc1VhLW9KNHlfRk1Ec0huZVZGaHZpSUl1bF9kZ29UNWVBU2Mzc0pkSGlUQ05EbGUwUjBfMm5LckhvLXNtUnI0dko1RHB2YnE5WVhYZ

## Request Available File Sets

The purpose of ESG bulk service is obtaining ESG content in bulk.  The content is available as:

•	A full JSON data file containing history for all measures and all organizations.
•	A delta JSON data file that contains only incremental changes to the universe since last week. 

A customer can examine the available File Sets and is expected to:

•	Build the initial ESG representation with the full files
•	Apply delta, changes, as they become available
•	Fill a gap if the retrival was not completed and the content missed remains available

This step serves to verify the type of the file, for example:
•	ESGRawFullScheme
•	ESGScoresFull
•	ESGScoresWealthFull

In [3]:
FILESET_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/file-sets?bucket='+ RDP_ESG_BUCKET
FILESET_ID = ''
WHICH_FILESET_ID_TO_STORE = 2   # 10 are returned per page

def requestFileSets(token, withNext, skipToken):   
    global FILESET_ENDPOINT
    global FILESET_ID
    global WHICH_FILESET_ID_TO_STORE   
    
    print("Obtaining FileSets in ESG Bucket...")
  
    querystring = {}
    payload = ""
    jsonfull = ""
    jsonpartial = ""
    
    headers = {
            'Content-Type': "application/json",
            'Authorization': "Bearer " + token,
            'cache-control': "no-cache"
    }

    if withNext:
        FILESET_ENDPOINT = FILESET_ENDPOINT + '&skipToken=' +skipToken
        
    response = requests.request("GET", FILESET_ENDPOINT, data=payload, headers=headers, params=querystring)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET", FILESET_ENDPOINT, data=payload, headers=headers, params=querystring)
         
    print('Raw response=');
    print(response);
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)
        print('Parsed json response=');
        print(json.dumps(jsonFullResp, indent=2));
        # We are going to store FileSet ID of the second File Set retrieved for future reference
        FILESET_ID = jsonFullResp['value'][WHICH_FILESET_ID_TO_STORE]['id']
        print('FILESET_ID stored is: '+str(FILESET_ID))
        return jsonFullResp; 
    else:
        return '';

jsonFullResp = requestFileSets(accessToken, False, '');

Obtaining FileSets in ESG Bucket...
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "404c-5175-8d907fa0-8597-d822c04dad3c",
      "name": "RFT-ESG-Sources-Full-Delta-2020-10-11",
      "bucketName": "ESG",
      "packageId": "4308-bc80-2054dc20-83db-6224911311d0",
      "attributes": [
        {
          "name": "ContentType",
          "value": "ESG Sources"
        },
        {
          "name": "ResultCount",
          "value": "35207"
        }
      ],
      "files": [
        "42ee-ba35-123eeb53-9182-df324471a316"
      ],
      "numFiles": 1,
      "availableFrom": "2020-10-11T14:17:42Z",
      "availableTo": "2020-10-25T14:17:41Z",
      "status": "READY",
      "created": "2020-10-11T14:17:42Z",
      "modified": "2020-10-11T14:17:46Z"
    },
    {
      "id": "415e-343d-38921088-8c83-ea2187158f2b",
      "name": "RFT-ESG-Raw-Full-SchemeA-Init-2020-10-04",
      "bucketName": "ESG",
      "packageId": "4e6a-ca79-af368ff5-931d-d2781b1cdb85

### Paginate Through the Available FileSets
(interrupt at any point)

In [4]:
i = 1
while "@nextLink" in jsonFullResp: 
    print('<<< Iteraction: '+str(i)+' >>>  More exists: '+ jsonFullResp['@nextLink'] + ', skipToken is: ' + jsonFullResp['@nextLink'][-62:]+'\n')
    jsonFullResp = requestFileSets(accessToken, True, jsonFullResp['@nextLink'][-62:]);
    i+=1;
print('Last response without next=');
print(jsonFullResp);

<<< Iteraction: 1 >>>  More exists: /file-store/v1/file-sets?bucket=ESG&skipToken=ZmlsZXNldElkPTQ3YzMtZmI2My01ZjNhNTBjZi04ZWRlLWZhZWEzNTRkODg5Nw, skipToken is: ZmlsZXNldElkPTQ3YzMtZmI2My01ZjNhNTBjZi04ZWRlLWZhZWEzNTRkODg5Nw

Obtaining FileSets in ESG Bucket...
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "47c4-8070-238f1417-9729-17fd5f6a261d",
      "name": "RFT-ESG-Symbology_Organization-Delta-2020-10-06T09:52:37.976Z",
      "bucketName": "ESG",
      "packageId": "4961-f8c5-20beccf6-850e-fd549b242fb5",
      "attributes": [
        {
          "name": "ContentType",
          "value": "Symbology"
        }
      ],
      "files": [
        "4aaa-0669-d76010ff-990a-b5d82c16d86e"
      ],
      "numFiles": 1,
      "availableFrom": "2020-10-06T10:01:33Z",
      "availableTo": "2020-11-06T10:01:33Z",
      "status": "READY",
      "created": "2020-10-06T10:01:33Z",
      "modified": "2020-10-06T10:01:35Z"
    },
    {
      "id": "4824-d06e-f1734

### Retrieve Complete File Details of FileSet ID

In a previous step we have stored a FileSet ID that we are about to use for the demonstartion of this feature.

In [5]:
FILES_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files?filesetId='+ FILESET_ID
WHICH_FILE_ID_TO_STORE = 0
FILE_ID = ''
 
def requestFileDetails(token):   
    global FILES_ENDPOINT
    global FILE_ID
    print("Obtaining File details..." )
  
    querystring = {}
    payload = ""
    jsonfull = ""
    jsonpartial = ""
    
    headers = {
            'Content-Type': "application/json",
            'Authorization': "Bearer " + token,
            'cache-control': "no-cache"
    }
        
    response = requests.request("GET", FILES_ENDPOINT, data=payload, headers=headers, params=querystring)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET", FILES_ENDPOINT, data=payload, headers=headers, params=querystring)
         
    print('Raw response=');
    print(response);
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)
        print('Parsed json response=');
        print(json.dumps(jsonFullResp, indent=2));
        FILE_ID = jsonFullResp['value'][WHICH_FILE_ID_TO_STORE]['id']
        print('FILE_ID stored is: '+str(FILE_ID))
        return jsonFullResp; 
    else:
        return '';

jsonFullResp = requestFileDetails(accessToken);

Obtaining File details...
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "40b6-66a9-b2e24264-951a-b9c2df0a8ef5",
      "filename": "RFT-ESG-Sources-Full-Init-2020-10-11-part03.jsonl.gz",
      "filesetId": "484d-58cc-5ded2cac-b4d0-78f478570724",
      "storageLocation": {
        "url": "https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2020/10/11/RFT-ESG-Sources-Full-Init-2020-10-11-part03.jsonl.gz",
        "@type": "s3"
      },
      "created": "2020-10-11T18:00:23Z",
      "modified": "2020-10-11T18:00:23Z",
      "href": "https://api.refinitiv.com/file-store/v1/files/40b6-66a9-b2e24264-951a-b9c2df0a8ef5/stream",
      "fileSizeInBytes": 101548304
    },
    {
      "id": "4177-abba-f3666994-9695-0680773c26ef",
      "filename": "RFT-ESG-Sources-Full-Init-2020-10-11-part04.jsonl.gz",
      "filesetId": "484d-58cc-5ded2cac-b4d0-78f478570724",
      "storageLocation": {
        "url": "https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2

###  Stream File via File Id using Redirect

In [6]:
import shutil

FILES_STREAM_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files/'+ FILE_ID+ '/stream'
 
def requestFileDownload(token):   
    global FILES_STREAM_ENDPOINT
    print("Obtaining File ... " + FILES_STREAM_ENDPOINT)
  
    filename = FILE_ID + '.gz'
    chunk_size = 1000
    
    headers = {
            'Authorization': 'Bearer ' + token,
            'cache-control': "no-cache",
            'Accept': '*/*'
    }
        
    response = requests.request("GET", FILES_STREAM_ENDPOINT, headers=headers, stream=True, allow_redirects=True)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET",FILES_STREAM_ENDPOINT, headers=headers, stream=True, allow_redirects=True)

         
    print('Response code=' + str(response.status_code));
    
    if response.status_code == 200:
        print('Processing...')
        with open(filename, 'wb') as fd:
            shutil.copyfileobj(response.raw, fd) 
#            for chunk in response.raw:
#                fd.write(chunk)
        print('Look for gzipped file named: '+ filename + ' in current directory')
        response.connection.close()
        
    return; 


requestFileDownload(accessToken);

Obtaining File ... https://api.refinitiv.com/file-store/v1/files/40b6-66a9-b2e24264-951a-b9c2df0a8ef5/stream
Response code=200
Processing...
Look for gzipped file named: 40b6-66a9-b2e24264-951a-b9c2df0a8ef5.gz in current directory


### Get File Location (Step 1 of 2)

In [42]:
import shutil

FILES_STREAM_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files/'+ FILE_ID+ '/stream?doNotRedirect=true'
DIRECT_URL = ''
 
def requestFileLocation(token):   
    
    global FILES_STREAM_ENDPOINT
    global DIRECT_URL
    
    print("Obtaining File ... " + FILES_STREAM_ENDPOINT)
  
    filename = FILE_ID + '.gz'
    chunk_size = 1000
    
    headers = {
            'Authorization': 'Bearer ' + token,
            'cache-control': "no-cache",
            'Accept': '*/*'
    }
        
    response = requests.request("GET", FILES_STREAM_ENDPOINT, headers=headers, stream=False, allow_redirects=False)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET",FILES_STREAM_ENDPOINT, headers=headers, stream=False, allow_redirects=False)

         
    print('Response code=' + str(response.status_code));
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)
        print('Parsed json response=');
        print(json.dumps(jsonFullResp, indent=2));
        DIRECT_URL = jsonFullResp['url'];
        print('File Direct URL is: '  +str(DIRECT_URL)+ '|||');
        
    return; 


requestFileLocation(accessToken);

Obtaining File ... https://api.refinitiv.com/file-store/v1/files/40b6-66a9-b2e24264-951a-b9c2df0a8ef5/stream?doNotRedirect=true
Reading the token from: token.txt
Token expired, refreshing a new one...
Saving the new token
Token is: eyJ0eXAiOiJhdCtqd3QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImRMdFd2Q0tCSC1NclVyWm9YMXFod2pZQ2t1eDV0V2ZSS2o4ME9vcjdUY28ifQ.eyJkYXRhIjoie1wiY2lwaGVydGV4dFwiOlwiTUZHTlNqS1VUUUJ4WkE5czJoYjF2ZXFvcGswaHR1dlMwQkpNdlhPVlo2dVlwUE51ODhUTEZGMUViYV9SUUVVTFc3QmFrNHRTNWRHTzBWU0FLMk9HaTJSZ0pKYlAxZUZhMXhWSlFnRkxjdFpWTTF4X1gtLV9OS042NHpuemhNR3JNd09xU042dGpxWXB5RXRxSUo4NVdHdnVlSG8zYURLXzBSNTZqWHlSSnk0ME1QZThGRW43YkZXSVBoWnhPX01qcFJpaklHQ0lqam1lVjRuOVdlaUxadHNNZ1hHbUF5ZkpfbnVQLVNpTklLNkt6ZkNLMlR5UFVCNnhaS1FWV2xDaTN4SWRzZ2xOZ2hjMGlzTHRBck9VWFpua3hpVmxxU2dIemNJZ0NBbG1VOWE2MHZvTHRHUzlzcmpLNWhDYTkyM3BRV3VHdVd2RC1yVDNsVEU0bEhQR1FoVVYyM3FIc0dDRUt1aGd6N2FKajZpN1F1dW13cnFaYzEtVWdyRWdhOUpzLWlZQ2dUSG5sOFN3ejhobUdvOHJJcTlrcmNkckxRMGVJLU5VMkVxRDBlOG1HektlZFptTHM3QkplbXpQYXBQbms0WjFaZkMtV0NMNDBIOWY2T

### Download File From File Location (Step 2 of 2)

In [50]:
from urllib.parse import urlparse, parse_qs
def requestDirectFileDownload(token):   
    
    global DIRECT_URL
    print("Obtaining File from URL... " + DIRECT_URL)
    
    #Parse out URL parameters for submission into requests
    url_obj = urlparse(DIRECT_URL)
    parsed_params = parse_qs(url_obj.query)
    # extract the URL without query parameters
    parsed_url = url_obj._replace(query=None).geturl()

    response = requests.get(parsed_url, params=parsed_params,stream=True)
        
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.get(parsed_url, params=query)

         
    print('Response code=' + str(response.status_code));        
  
    filename = FILE_ID + 'DIRECT.gz'    
    
    if response.status_code == 200:
        print('Processing...')
        with open(filename, 'wb') as fd:
            shutil.copyfileobj(response.raw, fd) 

        print('Look for gzipped file named: '+ filename + ' in current directory')
        response.connection.close()
        
    return; 


requestDirectFileDownload(accessToken);

Obtaining File from URL... https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2020/10/11/RFT-ESG-Sources-Full-Init-2020-10-11-part03.jsonl.gz?x-eds-uuid=GENTC-25929&x-cfs-claimCheckId=40b6-66a9-b2e24264-951a-b9c2df0a8ef5&x-eds-requestId=897ed315-8ccb-46ef-ba63-8e4c29508bb6&x-eds-appId=GE-A-01103867-3-603&x-event-external-name=cfs-claimCheck-download&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQDdxWsmFupAiOp1RJG6zkxy4m8QtskFtCEKvlD2rKEgwwIgIil2O7hP3YTJaAVQ7ipZ1%2FrePcMz10iop5stbK5H2soqwQMIw%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARABGgw2NDIxNTcxODEzMjYiDIlbuRd4AWzD7cqtySqVA1zUcf7weM52tOAyZn3frYJJsto6DdRPZpF8ror67ChNEjH1x4o78yI864VBIPpr67HFF1XVfDyDqJNx3ry6ZEs2hgVD1P9bCKrYe1GoC8Uy25LIor1h4WeqC2nnw5h1Lmc2Cig8IIAZVn6c0WrBijjIpyXKIGegq6ub74aXB%2Fl9RpmmMJF1n9BILVaZ9z0d20a8FqSiElfUmVF3qhoHaqbbgqEgI%2FHi43HsY0UaVHmEflSmA3GTzMCzHyqJiifWdJfWnbqnCz0tD%2FcfHYT1RMGHfNo%2BCS2WCmeacDfiUOl5tsn2%2FSjU1H8QstjHFThiR9pxs3DhUflWseOICOC654xqsgEy%2B484F9is17JT7VLF