# Introduction to ESG Bulk - Python

ESG stands for Environmental, Social and (Corporate) Governance data.

Refinitiv Data Platform (RDP) provides simple web based API access to a broad range of content, including ESG content and ESG content in bulk.

With growing popularity of socially conscious investing, Refinitiv offers one of the most comprehensive Environmental, Social and Governance (ESG) databases in the industry, covering over 80% of global market cap, across more than 450 different ESG metrics, with history going back to 2002. Customers looking to download our ESG content can do so through our bulk API service in Refinitiv Data Platform (RDP). RDP is a cloud based API that provides a single access point to all Refinitiv content.

ESG data is the first content made available in our bulk API service known as Client File Store (CFS). This capability allows our customers to download our entire history of ESG coverage. To more about how the ESG Bulk Service works in Refinitiv Data Platform, please visit:

https://developers.refinitiv.com/refinitiv-data-platform/refinitiv-data-platform-apis/docs

Within RDP family of service, ESG Bulk is part of Client File Store (CFS) - based section of service, find out more at:

https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis

Let us now focus on the programmatic interaction with ESG Bulk RDP service.

In [19]:
import requests, json, time, getopt, sys
import pandas as pd

### Set Valid Credentials 

Valid RDP credentials are required to proceed:
* USERNAME
* PASSWORD
* CLIENTID

To read one's valid credentials from a file (that can be shared by many code examples), leave below code as is.

To provide credentials in place:
* replace the commented credentials with one's valid assigned credentials
* comment the read from file step readCredsFromFile

In [20]:
USERNAME = "VALIDUSER"
PASSWORD = "VALIDPASSWORD"
CLIENT_ID = "SELFGENERATEDCLIENTID"

def readCredsFromFile(filePathName):
### Read valid credentials from file
    global USERNAME, PASSWORD, CLIENT_ID
    credFile = open(filePathName,"r")    # one per line
                                                #--- RDP MACHINE ID---
                                                #--- LONG PASSWORD---
                                                #--- GENERATED CLIENT ID---

    USERNAME = credFile.readline().rstrip('\n')
    PASSWORD = credFile.readline().rstrip('\n')
    CLIENT_ID = credFile.readline().rstrip('\n')

    credFile.close()

readCredsFromFile("..\creds\credFileHuman.txt")

# Uncomment - to make sure that creds are either set in code or read in correctly
#print("USERNAME="+str(USERNAME))
#print("PASSWORD="+str(PASSWORD))
#print("CLIENT_ID="+str(CLIENT_ID))

### Set Application Constants

In [41]:
# Set Application Constants
RDP_AUTH_VERSION = "/v1"
RDP_ESG_BULK_VERSION = "/v1"
RDP_BASE_URL = "https://api.refinitiv.com"
RDP_ESG_BUCKET = "ESG"
CATEGORY_URL = "/auth/oauth2"
ENDPOINT_URL = "/token"
CLIENT_SECRET = ""
TOKEN_FILE = "token.txt"
SCOPE = "trapi"
FILESET_ID = ''
PACKAGE_ID = ''

### Define Token Handling and Obtain a Valid Token

Having a valid token is a pre-requisite to requesting of any RDP content, and will be passed into the next steps.

In [42]:
TOKEN_ENDPOINT = RDP_BASE_URL + CATEGORY_URL + RDP_AUTH_VERSION + ENDPOINT_URL

def _requestNewToken(refreshToken):
    if refreshToken is None:
        tData = {
            "username": USERNAME,
            "password": PASSWORD,
            "grant_type": "password",
            "scope": SCOPE,
            "takeExclusiveSignOnControl": "true"
        };
    else:
        tData = {
            "refresh_token": refreshToken,
            "grant_type": "refresh_token",
        };

    # Make a REST call to get latest access token
    response = requests.post(
        TOKEN_ENDPOINT,
        headers = {
            "Accept": "application/json"
        },
        data = tData,
        auth = (
            CLIENT_ID,
            CLIENT_SECRET
        )
    )
    
    if response.status_code != 200:
        raise Exception("Failed to get access token {0} - {1}".format(response.status_code, response.text));

    # Return the new token
    return json.loads(response.text);

def saveToken(tknObject):
    tf = open(TOKEN_FILE, "w+");
    print("Saving the new token");
    # Append the expiry time to token
    tknObject["expiry_tm"] = time.time() + int(tknObject["expires_in"]) - 10;
    # Store it in the file
    json.dump(tknObject, tf, indent=4)
    
def getToken():
    try:
        print("Reading the token from: " + TOKEN_FILE);
        # Read the token from a file
        tf = open(TOKEN_FILE, "r+")
        tknObject = json.load(tf);

        # Is access token valid
        if tknObject["expiry_tm"] > time.time():
            # return access token
            return tknObject["access_token"];

        print("Token expired, refreshing a new one...");
        tf.close();
        # Get a new token from refresh token
        tknObject = _requestNewToken(tknObject["refresh_token"]);

    except Exception as exp:
        print("Caught exception: " + str(exp))
        print("Getting a new token using Password Grant...");
        tknObject = _requestNewToken(None);

    # Persist this token for future queries
    saveToken(tknObject)
#    print("Token is: " + tknObject["access_token"])
    # Return access token
    return tknObject["access_token"];

accessToken = getToken();
print("Have token now");
print("Token is: " + accessToken)

Reading the token from: token.txt
Have token now
Token is: eyJ0eXAiOiJhdCtqd3QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImRMdFd2Q0tCSC1NclVyWm9YMXFod2pZQ2t1eDV0V2ZSS2o4ME9vcjdUY28ifQ.eyJkYXRhIjoie1wiY2lwaGVydGV4dFwiOlwiYmdpY3BYLWp2T2tsSnR5ZlloSTVnVHZKY0pNbDF6TnVCUTlQcmFzLWgtQTd3U185akUwelVKYWxza0pSR3NPRGx6SXdUSnBzUlFyemFEd2RvSE54anJpYklYT2U2R3BGbXg1czlHWlJZWVZLaUJrN2M5SjQ3OExlQ1haM1pWTEVZdGhSVWRJUlBoSkY5dzRKYS04MW41YXdYQWViUktHbktyQ0VkWThUTHk5Q3JmeVpWTVdCN1RHakctZ0N3Z0dFelRuZVFmcm51Z0VTZ1pxWjFoZU5hek1FVlpFcnFTNmNnR3oxaUI5Z05lUV9LWjY4aVo2X0FMd0J4c1hocDJNdTFDeExYRGp2bUZidDRDMFE4aHBveGVFSzRQRXI5al9RSnpxZktfczFWaW1EMHhtdDZ0UWhsMkpRdlMzMmNkV05kM2twaW85ZldMS1pGWjB2NklZQjQydmhhOUhYUzZoaHVIOGRyQ3Y4QkJ4MDdpemk2LTcxR2lpdGFVOWFDLXoxREdjQmFZSGZDTE4tZkdDMGV1LUJnRW5VRWxjRlI1ZFR1UnBUUG9KNG1aR3hCLUFNdEt0MnVLMUVMVVdRR0l2X3diNFNHTDRkWHpBWDVDd3hUeERDQjN3SDlvUjdOdWQ4T0doczVzaE5Ma0FqUUplcU81Nzh4dkgwSkZubm5yYzZnTEFhZWxibUFTb29hZ0djQ2xsY2pwdFVGMkp2SkV3R1hvRDZoMlQyZHNrdGk3TXhfeDMxN0QwUHJISlA0VWVpcm5tN0hmUDdfeEc3bEY4RE5vUE

## Request Available ESG Bulk File Sets

The purpose of ESG bulk service is obtaining ESG content in bulk.  The content is available as:

* A full JSON data file containing history for all measures and all organizations.
* A delta JSON data file that contains only incremental changes to the universe since last week. 

A customer can examine the available File Sets and is expected to:

* Build the initial ESG representation with the full files
* Apply delta, changes, as they become available
* Fill a gap if the retrival was not completed and the content missed remains available

This step can additionally be used to verify the availability and type of the file, for example:
* ESGRawFullScheme
* ESGScoresFull
* ESGScoresWealthFull

In [43]:
FILESET_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/file-sets?bucket='+ RDP_ESG_BUCKET

def requestFileSets(token, withNext, skipToken, attributes):   
    global FILESET_ENDPOINT
     
    
    print("Obtaining FileSets in ESG Bucket...")
  
    FILESET_ENDPOINT = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/file-sets?bucket='+ RDP_ESG_BUCKET
    
    querystring = {}
    payload = ""
    jsonfull = ""
    jsonpartial = ""
    
    headers = {
            'Content-Type': "application/json",
            'Authorization': "Bearer " + token,
            'cache-control': "no-cache"
    }

    if attributes:
        FILESET_ENDPOINT = FILESET_ENDPOINT + attributes
    if withNext:
        FILESET_ENDPOINT = FILESET_ENDPOINT + '&skipToken=' +skipToken
    
    print('GET '+FILESET_ENDPOINT )    
    response = requests.request("GET", FILESET_ENDPOINT, data=payload, headers=headers, params=querystring)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET", FILESET_ENDPOINT, data=payload, headers=headers, params=querystring)
         
    print('Raw response=');
    print(response);
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)        
        return jsonFullResp; 
    else:
        return '';

jsonFullResp = requestFileSets(accessToken, False, '','');

print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
print('Same response, tabular view');
df = pd.json_normalize(jsonFullResp['value'])
df

Obtaining FileSets in ESG Bucket...
GET https://api.refinitiv.com/file-store/v1/file-sets?bucket=ESG
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "4022-f0c0-e3968404-9146-1b496d4e11b4",
      "name": "RFT-ESG-Sources-Current-Delta-2021-01-10",
      "bucketName": "ESG",
      "packageId": "4867-9a46-216e838a-9241-8fc3561b51ef",
      "attributes": [
        {
          "name": "ContentType",
          "value": "ESG Sources"
        },
        {
          "name": "ResultCount",
          "value": "6888"
        }
      ],
      "files": [
        "4f94-ff55-90b727d0-b896-3ecd59dd8855"
      ],
      "numFiles": 1,
      "availableFrom": "2021-01-10T14:38:04Z",
      "availableTo": "2021-02-10T14:38:03Z",
      "status": "READY",
      "created": "2021-01-10T14:38:04Z",
      "modified": "2021-01-10T14:38:06Z"
    },
    {
      "id": "4053-80f9-53e95bc3-8d9f-5df48df990ad",
      "name": "RFT-ESG-Sources-Current-Init-2021-01-24",
      "bucketName

Unnamed: 0,id,name,bucketName,packageId,attributes,files,numFiles,availableFrom,availableTo,status,created,modified,contentFrom,contentTo
0,4022-f0c0-e3968404-9146-1b496d4e11b4,RFT-ESG-Sources-Current-Delta-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...",[4f94-ff55-90b727d0-b896-3ecd59dd8855],1,2021-01-10T14:38:04Z,2021-02-10T14:38:03Z,READY,2021-01-10T14:38:04Z,2021-01-10T14:38:06Z,,
1,4053-80f9-53e95bc3-8d9f-5df48df990ad,RFT-ESG-Sources-Current-Init-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '3661884'}, ...","[4151-d940-34a4bd67-963a-83f5be5e7fea, 4180-65...",18,2021-01-24T16:07:34Z,2021-02-24T16:07:34Z,READY,2021-01-24T16:07:34Z,2021-01-24T16:09:28Z,1970-01-01T00:00:00Z,2021-01-24T14:30:00Z
2,40ad-6a3a-da6a7e51-b267-f0ee50973176,RFT-ESG-Raw-Current-SchemeA-init-2021-01-10,ESG,4c62-b05c-2a529a9d-81b3-224eacd50379,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[499b-48d1-6b36b902-96a1-7fe9f241328e],1,2021-01-10T16:24:54Z,2021-02-10T16:24:53Z,READY,2021-01-10T16:24:54Z,2021-01-10T16:25:10Z,,
3,4125-f4ac-2a33f80c-a165-ca4355ddfcd9,RFT-ESG-Raw-Current-SchemeB-Env-Init-2021-01-10,ESG,4cbb-e27e-318835e3-bad7-dee7a0ebc3b0,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[4909-fc14-65b8374a-b735-4ed171dd5609],1,2021-01-10T16:32:59Z,2021-02-10T16:32:59Z,READY,2021-01-10T16:32:59Z,2021-01-10T16:33:07Z,,
4,4126-fa19-763b8cb5-8ee9-97d59cec68b2,RFT-ESG-Scores-Full-Init-2021-01-24,ESG,42de-14b7-37470ec8-9087-ccd1a1bae75d,"[{'name': 'ContentType', 'value': 'ESG Scores'...",[42a0-9ec5-c0994aa3-b2cf-5b1666f56cbd],1,2021-01-24T13:37:14Z,2021-02-07T13:37:13Z,READY,2021-01-24T13:37:14Z,2021-01-24T13:37:19Z,1970-01-01T00:00:00Z,2021-01-24T13:25:00Z
5,4136-38a5-e4558369-9268-1f316425c539,RFT-ESG-Sources-Current-Init-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...","[4163-6eda-63cab95a-af2d-a2e3186e545c, 4224-cd...",17,2021-01-10T15:55:45Z,2021-02-10T15:55:44Z,READY,2021-01-10T15:55:45Z,2021-01-10T15:57:38Z,,
6,413f-8ba8-5f24ed54-90b5-7a32b3df34c6,RFT-ESG-Raw-Full-SchemeB-Social-Init-2021-01-24,ESG,4b6c-def9-dd1c991c-8535-6f9a61df9fc8,"[{'name': 'ContentType', 'value': 'ESG Raw Ful...",[4f0b-9f56-d3d8090f-bc07-8891c383ebd8],1,2021-01-24T13:05:48Z,2021-02-07T13:05:48Z,READY,2021-01-24T13:05:48Z,2021-01-24T13:06:24Z,1970-01-01T00:00:00Z,2021-01-24T12:45:00Z
7,419d-1b20-fc0da0f9-b878-1af4e72031dc,RFT-ESG-Sources-Current-Delta-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '11552'}, {'...",[4fc6-7291-3872960d-8c2d-bc3913220d21],1,2021-01-24T14:37:48Z,2021-02-24T14:37:47Z,READY,2021-01-24T14:37:48Z,2021-01-24T14:37:50Z,2021-01-17T14:30:00Z,2021-01-24T14:30:00Z
8,41dd-35b3-6eeaa7c1-83d1-552caeadabd6,RFT-ESG-Sources-Current-Delta-2021-01-03,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '5337'}, {'n...",[4040-cda0-05ac7d8a-88ef-2507911da09b],1,2021-01-03T14:37:22Z,2021-02-03T14:37:22Z,READY,2021-01-03T14:37:22Z,2021-01-03T14:37:24Z,,
9,4207-a09c-d96045c2-b271-5f9e57fc3938,RFT-ESG-Raw-Current-SchemeA-init-2021-01-31,ESG,4c62-b05c-2a529a9d-81b3-224eacd50379,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[4e95-71ae-a537afc9-b125-b8dc157ef5ad],1,2021-01-31T16:24:18Z,2021-02-28T16:24:17Z,READY,2021-01-31T16:24:18Z,2021-01-31T16:24:34Z,1970-01-01T00:00:00Z,2021-01-31T16:15:00Z


### Select ESG FileSet Id

In [25]:
# Optionally, copy from result of "Present FileSet Results in Tabular View"
FILESET_ID = input()
print('FILESET_ID selected is: ' + FILESET_ID)

 4022-f0c0-e3968404-9146-1b496d4e11b4


FILESET_ID selected is: 4022-f0c0-e3968404-9146-1b496d4e11b4


### Select ESG PackageId

In [44]:
# Optionally, copy from result of "Present FileSet Results in Tabular View"
PACKAGE_ID = input()
print('PACKAGE_ID selected is: ' + PACKAGE_ID)

 4867-9a46-216e838a-9241-8fc3561b51ef


PACKAGE_ID selected is: 4867-9a46-216e838a-9241-8fc3561b51ef


### Paginate Through the Available FileSets
(interrupt at any point)

In [None]:
i = 1
while "@nextLink" in jsonFullResp: 
    print('<<< Iteraction: '+str(i)+' >>>  More exists: '+ jsonFullResp['@nextLink'] + ', skipToken is: ' + jsonFullResp['@nextLink'][-62:]+'\n')
    jsonFullResp = requestFileSets(accessToken, True, jsonFullResp['@nextLink'][-62:],'');
    print(json.dumps(jsonFullResp, indent=2));
    i+=1;
print('Last response without next=');
print(json.dumps(jsonFullResp, indent=2));

### Retrieve FileSets of Specific File Type (Filter By Attribute and By PackageId)
The file types may change over time, at the time of this writing, the available FileSets are of types:

* ESG Raw Full A
* ESG Raw Full B
* ESG Raw Current A
* ESG Raw Current B
* ESG Sources
* ESG Raw Wealth Standard

* Symbology Cusip
* Symbology SEDOL
* Symbology Organization
* Symbology Instrument Quote

Further, the selected package, if also filtering by packageId has to contain the files per filtering arrtibutes, in order to request their listing succefully,
otherwise the result will be empty.


In [27]:
jsonFullResp = requestFileSets(accessToken, False, '','&attributes=ContentType:Symbology Cusip');
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
print('Same response, tabular view');
df = pd.json_normalize(jsonFullResp['value'])
df

Obtaining FileSets in ESG Bucket...
FILESET_ENDPOINT=https://api.refinitiv.com/file-store/v1/file-sets?bucket=ESG&attributes=ContentType:Symbology Cusip
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "4022-f0c0-e3968404-9146-1b496d4e11b4",
      "name": "RFT-ESG-Sources-Current-Delta-2021-01-10",
      "bucketName": "ESG",
      "packageId": "4867-9a46-216e838a-9241-8fc3561b51ef",
      "attributes": [
        {
          "name": "ContentType",
          "value": "ESG Sources"
        },
        {
          "name": "ResultCount",
          "value": "6888"
        }
      ],
      "files": [
        "4f94-ff55-90b727d0-b896-3ecd59dd8855"
      ],
      "numFiles": 1,
      "availableFrom": "2021-01-10T14:38:04Z",
      "availableTo": "2021-02-10T14:38:03Z",
      "status": "READY",
      "created": "2021-01-10T14:38:04Z",
      "modified": "2021-01-10T14:38:06Z"
    },
    {
      "id": "4053-80f9-53e95bc3-8d9f-5df48df990ad",
      "name": "RFT-ESG

Unnamed: 0,id,name,bucketName,packageId,attributes,files,numFiles,availableFrom,availableTo,status,created,modified,contentFrom,contentTo
0,4022-f0c0-e3968404-9146-1b496d4e11b4,RFT-ESG-Sources-Current-Delta-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...",[4f94-ff55-90b727d0-b896-3ecd59dd8855],1,2021-01-10T14:38:04Z,2021-02-10T14:38:03Z,READY,2021-01-10T14:38:04Z,2021-01-10T14:38:06Z,,
1,4053-80f9-53e95bc3-8d9f-5df48df990ad,RFT-ESG-Sources-Current-Init-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '3661884'}, ...","[4151-d940-34a4bd67-963a-83f5be5e7fea, 4180-65...",18,2021-01-24T16:07:34Z,2021-02-24T16:07:34Z,READY,2021-01-24T16:07:34Z,2021-01-24T16:09:28Z,1970-01-01T00:00:00Z,2021-01-24T14:30:00Z
2,40ad-6a3a-da6a7e51-b267-f0ee50973176,RFT-ESG-Raw-Current-SchemeA-init-2021-01-10,ESG,4c62-b05c-2a529a9d-81b3-224eacd50379,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[499b-48d1-6b36b902-96a1-7fe9f241328e],1,2021-01-10T16:24:54Z,2021-02-10T16:24:53Z,READY,2021-01-10T16:24:54Z,2021-01-10T16:25:10Z,,
3,4125-f4ac-2a33f80c-a165-ca4355ddfcd9,RFT-ESG-Raw-Current-SchemeB-Env-Init-2021-01-10,ESG,4cbb-e27e-318835e3-bad7-dee7a0ebc3b0,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[4909-fc14-65b8374a-b735-4ed171dd5609],1,2021-01-10T16:32:59Z,2021-02-10T16:32:59Z,READY,2021-01-10T16:32:59Z,2021-01-10T16:33:07Z,,
4,4126-fa19-763b8cb5-8ee9-97d59cec68b2,RFT-ESG-Scores-Full-Init-2021-01-24,ESG,42de-14b7-37470ec8-9087-ccd1a1bae75d,"[{'name': 'ContentType', 'value': 'ESG Scores'...",[42a0-9ec5-c0994aa3-b2cf-5b1666f56cbd],1,2021-01-24T13:37:14Z,2021-02-07T13:37:13Z,READY,2021-01-24T13:37:14Z,2021-01-24T13:37:19Z,1970-01-01T00:00:00Z,2021-01-24T13:25:00Z
5,4136-38a5-e4558369-9268-1f316425c539,RFT-ESG-Sources-Current-Init-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...","[4163-6eda-63cab95a-af2d-a2e3186e545c, 4224-cd...",17,2021-01-10T15:55:45Z,2021-02-10T15:55:44Z,READY,2021-01-10T15:55:45Z,2021-01-10T15:57:38Z,,
6,413f-8ba8-5f24ed54-90b5-7a32b3df34c6,RFT-ESG-Raw-Full-SchemeB-Social-Init-2021-01-24,ESG,4b6c-def9-dd1c991c-8535-6f9a61df9fc8,"[{'name': 'ContentType', 'value': 'ESG Raw Ful...",[4f0b-9f56-d3d8090f-bc07-8891c383ebd8],1,2021-01-24T13:05:48Z,2021-02-07T13:05:48Z,READY,2021-01-24T13:05:48Z,2021-01-24T13:06:24Z,1970-01-01T00:00:00Z,2021-01-24T12:45:00Z
7,419d-1b20-fc0da0f9-b878-1af4e72031dc,RFT-ESG-Sources-Current-Delta-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '11552'}, {'...",[4fc6-7291-3872960d-8c2d-bc3913220d21],1,2021-01-24T14:37:48Z,2021-02-24T14:37:47Z,READY,2021-01-24T14:37:48Z,2021-01-24T14:37:50Z,2021-01-17T14:30:00Z,2021-01-24T14:30:00Z
8,41dd-35b3-6eeaa7c1-83d1-552caeadabd6,RFT-ESG-Sources-Current-Delta-2021-01-03,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '5337'}, {'n...",[4040-cda0-05ac7d8a-88ef-2507911da09b],1,2021-01-03T14:37:22Z,2021-02-03T14:37:22Z,READY,2021-01-03T14:37:22Z,2021-01-03T14:37:24Z,,
9,4207-a09c-d96045c2-b271-5f9e57fc3938,RFT-ESG-Raw-Current-SchemeA-init-2021-01-31,ESG,4c62-b05c-2a529a9d-81b3-224eacd50379,"[{'name': 'ContentType', 'value': 'ESG Raw Cur...",[4e95-71ae-a537afc9-b125-b8dc157ef5ad],1,2021-01-31T16:24:18Z,2021-02-28T16:24:17Z,READY,2021-01-31T16:24:18Z,2021-01-31T16:24:34Z,1970-01-01T00:00:00Z,2021-01-31T16:15:00Z


In [46]:
print('Requesting for packageId= '+PACKAGE_ID)
jsonFullResp = requestFileSets(accessToken, False, '','&packageId='+PACKAGE_ID); #+'&attributes=ContentType:ESG Sources');
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
print('Same response, tabular view');
df = pd.json_normalize(jsonFullResp['value'])
df

Requesting for packageId= 4867-9a46-216e838a-9241-8fc3561b51ef
Obtaining FileSets in ESG Bucket...
GET https://api.refinitiv.com/file-store/v1/file-sets?bucket=ESG&packageId=4867-9a46-216e838a-9241-8fc3561b51ef
Reading the token from: token.txt
Token expired, refreshing a new one...
Saving the new token
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "4022-f0c0-e3968404-9146-1b496d4e11b4",
      "name": "RFT-ESG-Sources-Current-Delta-2021-01-10",
      "bucketName": "ESG",
      "packageId": "4867-9a46-216e838a-9241-8fc3561b51ef",
      "attributes": [
        {
          "name": "ContentType",
          "value": "ESG Sources"
        },
        {
          "name": "ResultCount",
          "value": "6888"
        }
      ],
      "files": [
        "4f94-ff55-90b727d0-b896-3ecd59dd8855"
      ],
      "numFiles": 1,
      "availableFrom": "2021-01-10T14:38:04Z",
      "availableTo": "2021-02-10T14:38:03Z",
      "status": "READY",
      "created": 

Unnamed: 0,id,name,bucketName,packageId,attributes,files,numFiles,availableFrom,availableTo,status,created,modified,contentFrom,contentTo
0,4022-f0c0-e3968404-9146-1b496d4e11b4,RFT-ESG-Sources-Current-Delta-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...",[4f94-ff55-90b727d0-b896-3ecd59dd8855],1,2021-01-10T14:38:04Z,2021-02-10T14:38:03Z,READY,2021-01-10T14:38:04Z,2021-01-10T14:38:06Z,,
1,4053-80f9-53e95bc3-8d9f-5df48df990ad,RFT-ESG-Sources-Current-Init-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '3661884'}, ...","[4151-d940-34a4bd67-963a-83f5be5e7fea, 4180-65...",18,2021-01-24T16:07:34Z,2021-02-24T16:07:34Z,READY,2021-01-24T16:07:34Z,2021-01-24T16:09:28Z,1970-01-01T00:00:00Z,2021-01-24T14:30:00Z
2,4136-38a5-e4558369-9268-1f316425c539,RFT-ESG-Sources-Current-Init-2021-01-10,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...","[4163-6eda-63cab95a-af2d-a2e3186e545c, 4224-cd...",17,2021-01-10T15:55:45Z,2021-02-10T15:55:44Z,READY,2021-01-10T15:55:45Z,2021-01-10T15:57:38Z,,
3,419d-1b20-fc0da0f9-b878-1af4e72031dc,RFT-ESG-Sources-Current-Delta-2021-01-24,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '11552'}, {'...",[4fc6-7291-3872960d-8c2d-bc3913220d21],1,2021-01-24T14:37:48Z,2021-02-24T14:37:47Z,READY,2021-01-24T14:37:48Z,2021-01-24T14:37:50Z,2021-01-17T14:30:00Z,2021-01-24T14:30:00Z
4,41dd-35b3-6eeaa7c1-83d1-552caeadabd6,RFT-ESG-Sources-Current-Delta-2021-01-03,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '5337'}, {'n...",[4040-cda0-05ac7d8a-88ef-2507911da09b],1,2021-01-03T14:37:22Z,2021-02-03T14:37:22Z,READY,2021-01-03T14:37:22Z,2021-01-03T14:37:24Z,,
5,4c3e-1c06-d7b142e6-bec6-ede8be8ba70f,RFT-ESG-Sources-Current-Delta-2021-01-31,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...",[458e-bd1e-79214be1-a7bd-ddb4ae8af362],1,2021-01-31T14:37:26Z,2021-02-28T14:37:26Z,READY,2021-01-31T14:37:26Z,2021-01-31T14:37:28Z,2021-01-24T14:30:00Z,2021-01-31T14:30:00Z
6,4d8c-562e-68011f72-ad21-39ddbd8f0bf9,RFT-ESG-Sources-Current-Delta-2021-01-17,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...",[46b0-af7d-344e487c-bbca-b2ad1d0469d8],1,2021-01-18T08:13:46Z,2021-02-18T08:13:45Z,READY,2021-01-18T08:13:46Z,2021-01-18T08:13:48Z,,
7,4dcd-546e-12f8a9f2-bab4-5b7ab085d4a7,RFT-ESG-Sources-Current-Init-2021-01-17,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...","[40bd-7368-461b3ddb-bf85-2e24f5513747, 4299-6a...",17,2021-01-18T09:42:15Z,2021-02-18T09:42:15Z,READY,2021-01-18T09:42:15Z,2021-01-18T09:44:13Z,,
8,4ea9-14e6-45609bf8-8046-2ffbeb318621,RFT-ESG-Sources-Current-Init-2021-01-03,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ResultCount', 'value': '3646464'}, ...","[411b-50f1-fbd663eb-a1f1-d81548fdfe07, 41b5-61...",18,2021-01-03T16:04:15Z,2021-02-03T16:04:15Z,READY,2021-01-03T16:04:15Z,2021-01-03T16:06:03Z,,
9,4fb3-e818-2f90ab44-b4c7-810ef3052bb5,RFT-ESG-Sources-Current-Init-2021-01-31,ESG,4867-9a46-216e838a-9241-8fc3561b51ef,"[{'name': 'ContentType', 'value': 'ESG Sources...","[4062-9f43-c6a1f1be-8d83-42dfb227d870, 4147-63...",17,2021-01-31T15:58:01Z,2021-02-28T15:58:01Z,READY,2021-01-31T15:58:01Z,2021-01-31T15:59:49Z,1970-01-01T00:00:00Z,2021-01-31T14:30:00Z


### Retrieve Complete File Details of FileSet ID

In [10]:
FILES_ENDPOINT_START = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files?filesetId='
 
def requestFileDetails(token, fileSetId):   

    print("Obtaining File details for FileSet= "+ fileSetId + " ...")
    print("(If result is Response=400, make sure that fileSetId is set with a valid value...)")
    FILES_ENDPOINT = FILES_ENDPOINT_START + fileSetId
  
    querystring = {}
    payload = ""
    jsonfull = ""
    jsonpartial = ""
    
    headers = {
            'Content-Type': "application/json",
            'Authorization': "Bearer " + token,
            'cache-control': "no-cache"
    }
        
    response = requests.request("GET", FILES_ENDPOINT, data=payload, headers=headers, params=querystring)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET", FILES_ENDPOINT, data=payload, headers=headers, params=querystring)
         
    print('Raw response=');
    print(response);
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)        
        return jsonFullResp; 
    else:
        return '';

jsonFullResp = requestFileDetails(accessToken, FILESET_ID);

print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
df = pd.json_normalize(jsonFullResp['value'])
df

Obtaining File details for FileSet= 4022-f0c0-e3968404-9146-1b496d4e11b4 ...
(If result is Response=400, make sure that fileSetId is set with a valid value...)
Raw response=
<Response [200]>
Parsed json response=
{
  "value": [
    {
      "id": "4f94-ff55-90b727d0-b896-3ecd59dd8855",
      "filename": "RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz",
      "filesetId": "4022-f0c0-e3968404-9146-1b496d4e11b4",
      "storageLocation": {
        "url": "https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2021/01/10/RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz",
        "@type": "s3"
      },
      "created": "2021-01-10T14:38:05Z",
      "modified": "2021-01-10T14:38:05Z",
      "href": "https://api.refinitiv.com/file-store/v1/files/4f94-ff55-90b727d0-b896-3ecd59dd8855/stream",
      "fileSizeInBytes": 2362679
    }
  ]
}


Unnamed: 0,id,filename,filesetId,created,modified,href,fileSizeInBytes,storageLocation.url,storageLocation.@type
0,4f94-ff55-90b727d0-b896-3ecd59dd8855,RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz,4022-f0c0-e3968404-9146-1b496d4e11b4,2021-01-10T14:38:05Z,2021-01-10T14:38:05Z,https://api.refinitiv.com/file-store/v1/files/...,2362679,https://a206464-prod-esg.s3.amazonaws.com/ESG_...,s3


### Select ESG File Id and File Name

In [11]:
# Optionally, copy from result of "Retrieve Complete File Details of FileSet ID"
FILE_ID = input()
print('FILE_ID selected is: ' + FILE_ID)
FILE_NAME = input()
print('FILE_NAME selected is: ' + FILE_NAME)

 4f94-ff55-90b727d0-b896-3ecd59dd8855


FILE_ID selected is: 4f94-ff55-90b727d0-b896-3ecd59dd8855


 RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz


FILE_NAME selected is: RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz


###  Stream File via File Id using Redirect

In [None]:
import shutil

FILES_STREAM_ENDPOINT_START = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files/'

# use valid values, obtained from the previous step
exampleFileId = '4edd-99af-da829f42-8ddd-07fabfcddca9'  
exampleFileName = 'RFT-ESG-Sources-Full-Init-2021-01-17-part07.jsonl.gz'

def requestFileDownload(token, fileId, fileName):   
    FILES_STREAM_ENDPOINT = FILES_STREAM_ENDPOINT_START + fileId+ '/stream'
    print("Obtaining File ... " + FILES_STREAM_ENDPOINT)
  
    chunk_size = 1000
    
    headers = {
            'Authorization': 'Bearer ' + token,
            'cache-control': "no-cache",
            'Accept': '*/*'
    }
        
    response = requests.request("GET", FILES_STREAM_ENDPOINT, headers=headers, stream=True, allow_redirects=True)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET",FILES_STREAM_ENDPOINT, headers=headers, stream=True, allow_redirects=True)

         
    print('Response code=' + str(response.status_code));
    
    if response.status_code == 200:
        print('Processing...')
        with open(fileName, 'wb') as fd:
            shutil.copyfileobj(response.raw, fd) 
        print('Look for gzipped file named: '+ fileName + ' in current directory')
        response.connection.close()
        
    return; 

# consider below an example only
requestFileDownload(accessToken, exampleFileId, exampleFileName);
#requestFileDownload(accessToken, FILE_ID, FILE_NAME);

### Select the Latest ESG FileSets (Init and Delta) as of Last Sunday per PackageId

In [None]:
import datetime

# determine what date last Sunday was
d = datetime.datetime.today()
#print(d)
sun_offset = (d.weekday() - 6) % 7
sunday = d - datetime.timedelta(days=sun_offset)

# format Sunday date to ESG bulk current requirements
sunday = sunday.replace(hour=0, minute=0, second=0, microsecond=0)
sunday = str(sunday).replace(' 00:00:00', 'T00:00:00Z')
print("Last Sunday was on", sunday)

PACKAGE_ID = '4867-9a46-216e838a-9241-8fc3561b51ef'
ESG_FILESET_RESP = requestFileSets(accessToken, False, '','&packageId='+PACKAGE_ID+'&availableFrom='+ sunday);
print('Parsed json response=');
print(json.dumps(ESG_FILESET_RESP, indent=2));
# now ESG_FILESET_RESP contains the requisite FileSetIds

### Iterate over Latest ESG FileSets and Request the Latest ESG Files (Init and Delta)

In [None]:
print("List of FileSet Ids to be streamed by this step:")
for item in ESG_FILESET_RESP['value']:
    print ('\t'+item['id'])
    # Request File Details for the FileSets of interest
    jsonFullRespFile = requestFileDetails(accessToken, item['id']);
    print('\t\tList of Files:')
    for item2 in jsonFullRespFile['value']:
        print ('File name: ' +item2['filename'])
    # Request download per file Id and into fileName
    print('Starting download ... ')
    for item2 in jsonFullRespFile['value']:
        print ('Streaming File: ' +item2['filename'])
        requestFileDownload(accessToken, item2['id'],item2['filename']);

### Get File Location (Step 1 of 2)

In [47]:
import shutil

FILES_STREAM_ENDPOINT_START = RDP_BASE_URL+'/file-store'+RDP_ESG_BULK_VERSION + '/files/'
DIRECT_URL = ''
 
def requestFileLocation(token, fileId):   
    
    FILES_STREAM_ENDPOINT = FILES_STREAM_ENDPOINT_START + fileId+ '/stream?doNotRedirect=true'    
    print("Obtaining File ... " + FILES_STREAM_ENDPOINT)
  
    filename = FILE_NAME
    chunk_size = 1000
    
    headers = {
            'Authorization': 'Bearer ' + token,
            'cache-control': "no-cache",
            'Accept': '*/*'
    }
        
    response = requests.request("GET", FILES_STREAM_ENDPOINT, headers=headers, stream=False, allow_redirects=False)
    
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.request("GET",FILES_STREAM_ENDPOINT, headers=headers, stream=False, allow_redirects=False)

         
    print('Response code=' + str(response.status_code));
    
    if response.status_code == 200:
        jsonFullResp = json.loads(response.text)
        print('Parsed json response=');
        print(json.dumps(jsonFullResp, indent=2));
        DIRECT_URL = jsonFullResp['url'];
        print('File Direct URL is: '  +str(DIRECT_URL)+ '|||');
        
        return jsonFullResp['url'];
    else:
        return 'Error response: '+ response.text


DIRECT_URL = requestFileLocation(accessToken, FILE_ID);

Obtaining File ... https://api.refinitiv.com/file-store/v1/files/4f94-ff55-90b727d0-b896-3ecd59dd8855/stream?doNotRedirect=true
Reading the token from: token.txt
Token expired, refreshing a new one...
Caught exception: Failed to get access token 400 - {"error":"invalid_grant"   } 
Getting a new token using Password Grant...
Saving the new token
Response code=200
Parsed json response=
{
  "url": "https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2021/01/10/RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz?x-request-Id=eb92c291-b7ed-45f5-b9b1-7426c1eeb5de&x-package-id=4867-9a46-216e838a-9241-8fc3561b51ef&x-client-app-id=GE-A-01103867-3-603&x-file-name=RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz&x-fileset-id=4022-f0c0-e3968404-9146-1b496d4e11b4&x-bucket-name=ESG&x-uuid=GENTC-25929&x-file-Id=4f94-ff55-90b727d0-b896-3ecd59dd8855&x-fileset-name=RFT-ESG-Sources-Current-Delta-2021-01-10&x-event-external-name=cfs-claimCheck-download&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB8aCXVzLWVhc3Q

### Download File From File Location (Step 2 of 2)

In [48]:
from urllib.parse import urlparse, parse_qs
def requestDirectFileDownload(token, directUrl, fileName):   
    
    global DIRECT_URL
    print("Obtaining File from URL... " + directUrl)
    
    #Parse out URL parameters for submission into requests
    url_obj = urlparse(DIRECT_URL)
    parsed_params = parse_qs(url_obj.query)
    # extract the URL without query parameters
    parsed_url = url_obj._replace(query=None).geturl()

    response = requests.get(parsed_url, params=parsed_params,stream=True)
        
    if response.status_code != 200:
        if response.status_code == 401:   # error when token expired
                accessToken = getToken();     # token refresh on token expired
                headers['Authorization'] = "Bearer " + accessToken
                response = requests.get(parsed_url, params=query)

         
    print('Response code=' + str(response.status_code));        
  
    filename = 'another_'+fileName    
    
    if response.status_code == 200:
        print('Processing...')
        with open(filename, 'wb') as fd:
            shutil.copyfileobj(response.raw, fd) 

        print('Look for gzipped file named: '+ filename + ' in current directory')
        response.connection.close()
        
    return; 


requestDirectFileDownload(accessToken, DIRECT_URL, FILE_NAME);

Obtaining File from URL... https://a206464-prod-esg.s3.amazonaws.com/ESG_Sources/2021/01/10/RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz?x-request-Id=eb92c291-b7ed-45f5-b9b1-7426c1eeb5de&x-package-id=4867-9a46-216e838a-9241-8fc3561b51ef&x-client-app-id=GE-A-01103867-3-603&x-file-name=RFT-ESG-Sources-Current-Delta-2021-01-10.jsonl.gz&x-fileset-id=4022-f0c0-e3968404-9146-1b496d4e11b4&x-bucket-name=ESG&x-uuid=GENTC-25929&x-file-Id=4f94-ff55-90b727d0-b896-3ecd59dd8855&x-fileset-name=RFT-ESG-Sources-Current-Delta-2021-01-10&x-event-external-name=cfs-claimCheck-download&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB8aCXVzLWVhc3QtMSJHMEUCIB%2FU%2BDwNMBdiyBPP3VRMHSwOIYPZvjx9ER0ngyT2aRIwAiEAutDJejT0pDIo5mcs28WLDxhtlIa1zCEbalRshMfPkd0qyQMIFxACGgw2NDIxNTcxODEzMjYiDKSUphGcBSnz%2Ff1GkiqmA6wj5QKW2GDND4oMFe3cNYae8Ka1LliYadEM1Dcy9kuxemBBkQ6AZmki8OUFlSoz0hTKbLtnYQYqRVSuAVGqZLOCIpzw4qQo2QLnd6f7F24nsZxjKpH2B1Vlk8dQr5nWvdWDHN4xNvaH0ZQKj%2BdW58rNC4Hr88Q8jn01FLqIyLbo510S32u45kMtglYpTwia9%2FSnMqxHhapl4FI4Pkdq5