# RDP ESG CFS Point in Time File Workflow

Example Code Disclaimer: ALL EXAMPLE CODE IS PROVIDED ON AN “AS IS” AND “AS AVAILABLE” BASIS FOR ILLUSTRATIVE PURPOSES ONLY. LSEG MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, AS TO THE OPERATION OF THE EXAMPLE CODE, OR THE INFORMATION, CONTENT, OR MATERIALS USED IN CONNECTION WITH THE EXAMPLE CODE. YOU EXPRESSLY AGREE THAT YOUR USE OF THE EXAMPLE CODE IS AT YOUR SOLE RISK.

### Importing libararies

In [1]:
import os
import sys
import requests 
import json
from dotenv import dotenv_values
config = dotenv_values(".env")

### Set RDP credentials and Initial Parameters

In [2]:
username = config['RDP_USERNAME'] #Or replace with your RDP Machin-ID
password = config['RDP_PASSWORD'] #Or replace with your RDP Password
clientId = config['RDP_APP_KEY'] #Or replace with your RDP APP Key

RDP_HOST= 'https://api.refinitiv.com'
acccess_token = None
refresh_token = None
expires_in = 0

## <a id="rdp_workflow"></a>RDP APIs Application Workflow

### Step 1: Authentication with RDP APIs

Refinitiv Data Platform entitlement check is based on OAuth 2.0 specification. The first step of an application workflow is to get a token from RDP Auth Service, which will allow access to the protected resource, i.e. data REST API. 

The API requires the following access credential information:
- Username: The username. 
- Password: Password associated with the username. 
- Client ID: This is also known as ```AppKey```, and it is generated using an App key Generator. This unique identifier is defined for the user or application and is deemed confidential (not shared between users). The client_id parameter can be passed in the request body or as an “Authorization” request header that is encoded as base64.

The HTTP request for the RDP APIs Authentication service is as follows:

``` HTTP
POST /auth/oauth2/v1/token HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded
Host: api.refinitiv.com:443
Content-Length: XXX

username=RDP_USERNAME
&password=RDP_PASSWORD
&client_id=RDP_APP_KEY
&grant_type=password
&takeExclusiveSignOnControl=true
&scope=trapi
```

Once the authentication success, the function gets the RDP Auth service response message and keeps the following RDP token information in the variables.
- **access_token**: The token used to invoke REST data API calls as described above. The application must keep this credential for further RDP APIs requests.
- **refresh_token**: Refresh token to be used for obtaining an updated access token before expiration. The application must keep this credential for access token renewal.
- **expires_in**: Access token validity time in seconds.

Next, after the application received the Access Token (and authorization token) from RDP Auth Service, all subsequent REST API calls will use this token to get the data. Please find more detail regarding RDP APIs workflow in the following resources:
- [RDP APIs: Introduction to the Request-Response API](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#introduction-to-the-request-response-api) page.
- [RDP APIs: Authorization - All about tokens](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#authorization-all-about-tokens) page.

In [3]:
#step 1 - get RDP Access Token from RDP

# Send HTTP Request
auth_url = f'{RDP_HOST}/auth/oauth2/v1/token'
payload = f'grant_type=password&username={username}&client_id={clientId}&password={password}&takeExclusiveSignOnControl=True&scope=trapi'
try:
    response = requests.post(auth_url, 
                             headers = {'Content-Type':'application/x-www-form-urlencoded'}, 
                             data = payload, 
                             auth = (clientId, '')
                )
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Authentication success')
    access_token = response.json()['access_token']
    refresh_token = response.json()['refresh_token']
    expires_in = int(response.json()['expires_in'])

if response.status_code != 200:
    print(f'RDP authentication failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Authentication success


## <a id="rdp_get_data"></a>Requesting Data from RDP APIs

That brings us to requesting the RDP APIs data. All subsequent REST API calls use the Access Token via the *Authorization* HTTP request message header as shown below to get the data. 
- Header: 
    * Authorization = ```Bearer <RDP Access Token>```

Please notice *the space* between the ```Bearer``` and ```RDP Access Token``` values.

The application then creates a request message in a JSON message format or URL query parameter based on the interested service and sends it as an HTTP request message to the Service Endpoint. Developers can get RDP APIs the Service Endpoint, HTTP operations, and parameters from Refinitiv Data Platform's [API Playground page](https://api.refinitiv.com/) - which is an interactive documentation site developers can access once they have a valid Refinitiv Data Platform account.

## <a id="rdp_cfs_data"></a>Requesting CFS Data

### Step 2: Listing the Package Ids using the Bucket Name

**Note**: **If you already know your package Ids, you can skip to #step3**

To request the CFS data, the first step is to send an HTTP ```GET``` request to the RDP ```/file-store/v1/packages?bucketName={bucket-name}``` endpoint to list all Package Ids under the input ```bucket-name```.

The HTTP Request structure is as follows:

``` HTTP
GET /file-store/v1/packages?bucketName={bucket-name} HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Access Token>
```
**Note**: The bucket name is *case-insensitive*.

The ESG bucket name is **bulk-ESG**.

In [4]:
#step 2 - list Package IDs from bucket name

CFS_url = f'{RDP_HOST}/file-store/v1/packages?bucketName=bulk-ESG'

try:
    response = requests.get(CFS_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')

if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive list Package IDs from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive list Package IDs from RDP APIs


Example of the first entry of package IDs, the pacakgeId is the ```packageId``` field.

In [5]:
print(json.dumps(response.json()['value'][0], sort_keys=True, indent=2, separators=(',', ':')))

{
  "bucketNames":[
    "bulk-ESG"
  ],
  "contactEmail":"robin.fielder@refinitiv.com",
  "created":"2021-11-11T07:54:04Z",
  "modified":"2023-02-10T09:10:16Z",
  "packageId":"4037-e79c-96b73648-a42a-6b65ef8ccbd1",
  "packageName":"Bulk-ESG-Global-Symbology-Organization-v1",
  "packageType":"bulk"
}


The next step is choosing the package Id.

The package name of the **ESG - Point in Time** is in the **RFT-ESG-PIT-SDI-yyyy-mm-dd** format and the packageId is currently *4173-aec7-8a0b0ac9-96f9-48e83ddbd2ad* (as of *Oct 2023*).

In [6]:
#packageId = response.json()['value'][0]['packageId']
packageId = '4173-aec7-8a0b0ac9-96f9-48e83ddbd2ad'
packageId

'4173-aec7-8a0b0ac9-96f9-48e83ddbd2ad'

### Step 3: Listing the Filesets of the Bulk ESG Data with the packageId

The next step is calling the CFS API with the buket name and package Id to list all FileSets using ```bucket-name``` and ```packageId```.

API endpint is ```/file-store/v1/file-sets?bucket=bulk-ESG&packageId={packageId}```

The HTTP Request structure is as follows:

``` HTTP
GET /file-store/v1/file-sets?bucket=bulk-ESG&packageId={packageId} HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Access Token>
```

In [7]:
#step 3 - get file id from bucket name

CFS_url = f'{RDP_HOST}/file-store/v1/file-sets?bucket=bulk-ESG&packageId={packageId}'

try:
    response = requests.get(CFS_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive FileSets list from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive FileSets list from RDP APIs


In [8]:
print(json.dumps(response.json()['value'][0], sort_keys=True, indent=2, separators=(',', ':')))

{
  "attributes":[
    {
      "name":"ContentType",
      "value":"ESG PIT SDI Full"
    }
  ],
  "availableFrom":"2023-12-11T03:14:53Z",
  "availableTo":"2023-12-24T02:00:00Z",
  "bucketName":"bulk-ESG",
  "created":"2023-12-11T03:14:53Z",
  "files":[
    "402d-f0af-be521f70-8bff-7dc9d6a7cb38",
    "406c-a513-aedc4978-8e8b-07ee5f25686b",
    "40ed-8514-d64748ab-a275-24f882e23506",
    "40f0-54ad-5bfa4e1c-b108-547424e5484c",
    "40f3-b268-cad9ec26-afe9-b7f21614defa",
    "4100-5a85-4a6d9c68-b6a8-5239903d5135",
    "412a-22d7-67b48521-8542-42f250b1e590",
    "4146-106f-9d9230e5-a8fc-a80a5173859c",
    "4147-f08a-56d70a1a-9ed9-d5d6d0876ef8",
    "4187-822b-9107e62e-812e-bcdea62c05f4",
    "41ae-d468-b028e3c3-b366-6ead072dd74c",
    "41b8-3368-3116d29b-ad0c-a9db5938b3ec",
    "41c3-c0f2-1d61c724-8399-0b08f2d0054b",
    "420b-1d64-56219b3e-99a1-140631069e02",
    "4222-01d3-7dc8f906-aea1-41f200e8368d",
    "4228-fb1b-d6936f17-a3bc-abf3ff49fbef",
    "4253-12cb-bf53fea4-ae44-366c48acc505"

The File ID is in the ```files``` array

In [9]:
# try just one file
file_id = response.json()['value'][0]['files'][1]
file_id

'406c-a513-aedc4978-8e8b-07ee5f25686b'

# Step 3.1: Paging

By default, the ```/file-store/v1/file-sets``` and ```/file-store/v1/packages?bucketName={bucket-name}``` endpoints always return 25 results per request. You can adjust the number of return results via the ```pageSize``` query parameter, the maximum number is **100**.

The result also has the ```@nextLink``` node that contains the URL for requesting the next page of query.

You can find more detail about the Paging and @nextLink node on the step 3.1 of the [A Step-By-Step Workflow Guide for RDP Client File Store (CFS) API](https://developers.lseg.com/en/article-catalog/article/a-step-by-step-workflow-guide-for-rdp-client-file-store--cfs--ap) article and [GitHub](https://github.com/LSEG-API-Samples/LSEG-API-Samples-Example.RDP.Python.GenericBulkFile.Workflow).

### Step 4: Get the file URL on AWS S3

The last step is downloading the FIle using File ID with the RDP ```/file-store/v1/files/{file ID}/stream``` endpoint.

The HTTP Request structure is as follows:

``` HTTP
GET /file-store/v1/files/{fileId}/stream?doNotRedirect=true HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Access Token>
```

In [12]:
#step 4 - get file stream (content) from file id

FileID_url = f'{RDP_HOST}/file-store/v1/files/{file_id}/stream?doNotRedirect=true'

try:
    response = requests.get(FileID_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive File URL from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive File URL from RDP APIs


The File URL is in the ```url``` attribute.

In [13]:
file_url = response.json()['url']
file_url

'https://a206464-prod-esg.s3.amazonaws.com/ESGPIT/2023/12/10/EsgPITAnalyticValueScore.2006.F.2023-12-10-0857.zip?x-request-Id=8d762e74-6ca6-4aaf-aa38-89add3bafe38&x-package-id=4173-aec7-8a0b0ac9-96f9-48e83ddbd2ad&x-client-app-id=b4842f3904fb4a1fa18234796368799086c63541&x-file-name=EsgPITAnalyticValueScore.2006.F.2023-12-10-0857.zip&x-fileset-id=4476-df94-84b63642-9f49-5e7725d0925c&x-bucket-name=bulk-ESG&x-uuid=GESG1-178570&x-file-Id=406c-a513-aedc4978-8e8b-07ee5f25686b&x-fileset-name=RFT-ESG-PIT-SDI-2023-12-10&x-event-external-name=cfs-claimCheck-download&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEPj%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIEsVjHA5HYGMLwHM9vpczI1TBvzui0aG870EZZdIoA0AAiEAw7hJZAUiFj04b3PpUCdZb4wHLuYj7%2FvRgtKgIN9bS4MqmgIIcBAEGgw2NDIxNTcxODEzMjYiDA3O0Vq2Co7EoIN8ESr3AaIsSphKdBlnedj5UW2J2Lj54w%2Fm%2FfvlCE8GJO3p6iabrfK4cBJtjrJY4fwJWtn9okQi1fvTIJKJ%2BGfk0WF1cAVHZQAqnemGy9fmFHLGC2ArhypZk7B24IgQkYaiMg3LGFUe3XC8GSy8Lfj8MavuG82Qn%2B79p6k9KX5tnT0EKdRJgz3mPtZKc4E7o%2F%2Fpiab

### Step 5: Downloading the file

Based on the S3 ```file_url``` above, the actual file name is *EsgPITAnalyticValueScore.2006.F.2023-12-10-0857.zip*. So you need to replace the escape character ```%3A``` with ```_``` (underscore) character.

**Note**: If you cannot download the file, please wait for a while and then retry download the file from the URL. Please do not flush the download requests.

In [14]:
#step 5 - Downlaod file

import polling2

zipfilename = file_url.split("?")[0].split("/")[-1].replace("%3A","_")
print(f'Downloading File {zipfilename} ...')

def test_result(response):
    return response.status_code == 200

try:
    response = polling2.poll(lambda: requests.get(file_url), 
                            step = 10,
                            poll_forever = True,
                            check_success= test_result)
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')

if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive File Successfully')
    open(zipfilename, 'wb').write(response.content)
    print(f'{zipfilename} Saved')
else:
    print(f'RDP APIs: Request file failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Downloading File EsgPITAnalyticValueScore.2006.F.2023-12-10-0857.zip ...
Receive File Successfully
EsgPITAnalyticValueScore.2006.F.2023-12-10-0857.zip Saved


And then unzip the file.

The ESG - PIT service provides data for developer in **.zip** file that contains multiple CSV file. 
The other ESG Bulk file type provides data for developers in **.gz** file.

In [13]:
#unzip file
import zipfile
import os
import gzip
import shutil
zipExtention = zipfilename.split('.')[-1]
try:
    if zipExtention == 'zip':
        unzipfilename = zipfilename.split('.zip')[0]
        file_name = os.path.abspath(zipfilename) # get full path of files
        dir_name = os.getcwd()
        print(f'Unziping {zipfilename}')
        zip_ref = zipfile.ZipFile(file_name) # create zipfile object
        zip_ref.extractall(dir_name) # extract file to dir
        zip_ref.close() # close file
        print('Done')
    else:
        unzipfilename = zipfilename.split('.gz')[0]
        print(f'Unziping to {unzipfilename} ...')
        with gzip.open(zipfilename, 'rb') as f_in:
            with open(unzipfilename, 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)
        print('Done')
except Exception as e:
    print('The error is: ',e)


Unziping EsgPITAnalyticValue.2007.F.2023-10-22-0829.zip
Done


### Step 6: Refresh Token with RDP APIs

Before the session expires (based on the ```expires_in``` parameter, in seconds) , an application needs to send a Refresh Grant request message to RDP Authentication service to get a new access token before further request data from the platform.

The API requires the following access credential information:
- Refresh Token: The current Refresh Token value from the previous RDP Authentication call
- Client ID: This is also known as ```AppKey```, and it is generated using an App key Generator. This unique identifier is defined for the user or application and is deemed confidential (not shared between users). The client_id parameter can be passed in the request body or as an “Authorization” request header that is encoded as base64.
- Grant Type ```refresh_token```: This is for getting a new Access Token. 

The HTTP request for the RDP APIs Authentication service is as follows:

``` HTTP
POST /auth/oauth2/v1/token HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded
Host: api.refinitiv.com:443
Content-Length: XXX

refresh_token={current_refresh_token}
&grant_type=refresh_token
&client_id=RDP_APP_KEY
```

Once the authentication success, the function gets **access_token**, **refresh_token**, and **expires_in** from the RDP Auth service response message the same as the previous RDP Authentication call. An application must keep those value for the next Refresh Token call.

#### Caution: API Limit

The RDP Authentication service has the API limit described on the [RDP APIs: Limitations and Guidelines for the RDP Authentication Service](https://developers.lseg.com/en/article-catalog/article/limitations-and-guidelines-for-the-rdp-authentication-service) article.  If the application flushes the authentication request messages (both ```password``` and ```refresh_token``` grant_type) beyond the limit, the account will be blocked by the API Gateway. 

In [15]:
#step 6 - Refreshing Token

# Send HTTP Request
auth_url = f'{RDP_HOST}/auth/oauth2/v1/token'
payload = f'grant_type=refresh_token&client_id={clientId}&refresh_token={refresh_token}'
try:
    response = requests.post(auth_url, 
                             headers = {'Content-Type':'application/x-www-form-urlencoded'}, 
                             data = payload, 
                             auth = (clientId, '')
                )
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')

if response.status_code == 200:  # HTTP Status 'OK'
    print('Refresh Token success')
    access_token = response.json()['access_token']
    refresh_token = response.json()['refresh_token']
    expires_in = int(response.json()['expires_in'])

if response.status_code != 200:
    print(f'RDP authentication failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Refresh Token success


### Step 7: Revoke Token to ending the session.

This revocation mechanism allows an application to invalidate its tokens if the end-user logs out, changes identity, or exits the respective application. Notifying the authorization server that the token is no longer needed allows the authorization server to clean up data associated with that token (e.g., session data) and the underlying authorization grant.

The API requires the following HTTP Header and Credential parameter information:
- Header: 
    * Authorization = ```Basic <App Key in Base64 format>```

Please notice *the space* between the ```Basic``` and ```App Key in Base64 format``` values.
- Body parameter
    * token: The current ```Access Token``` value from the previous RDP Authentication call

The HTTP request for the RDP APIs Authentication service is as follows:

``` HTTP
POST /auth/oauth2/v1/revoke HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded
Host: api.refinitiv.com:443
Authorization: Basic <App Key in Base64>
Content-Length: XXX

token={current_Access_token}
```

In [16]:
#step 7 - Revoking Token

import base64

clientId_bytes = clientId.encode('ascii')
base64_bytes = base64.b64encode(clientId_bytes)
clientId_base64 = base64_bytes.decode('ascii')

# Send HTTP Request
auth_url = f'{RDP_HOST}/auth/oauth2/v1/revoke'
payload = f'token={access_token}'
try:
    response = requests.post(auth_url, 
                             headers = {
                                 'Content-Type':'application/x-www-form-urlencoded',
                                 'Authorization': f'Basic {clientId_base64}'
                             }, 
                             data = payload, 
                             auth = (clientId, '')
                )
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')

if response.status_code == 200:  # HTTP Status 'OK'
    print('Revoke Token success')
if response.status_code != 200:
    print(f'RDP authentication failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Revoke Token success


## <a id="references"></a>References

That brings me to the end of my ESG CFS API workflow project. For further details, please check out the following resources:
* [Refinitiv Data Platform APIs page](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis) on the [Refinitiv Developer Community](https://developers.lseg.com/) website.
* [Refinitiv Data Platform APIs Playground page](https://api.refinitiv.com).
* [Refinitiv Data Platform APIs: Introduction to the Request-Response API](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#introduction-to-the-request-response-api).
* [Refinitiv Data Platform APIs: Authorization - All about tokens](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#authorization-all-about-tokens).
* [Limitations and Guidelines for the RDP Authentication Service](https://developers.lseg.com/en/article-catalog/article/limitations-and-guidelines-for-the-rdp-authentication-service) article.
* [Getting Started with Refinitiv Data Platform](https://developers.lseg.com/en/article-catalog/article/getting-start-with-refinitiv-data-platform) article.
* [ESG Data Guide](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/documentation#esg-data-guide)
* [ESG-Bulk CFS API User Guide](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/documentation#esg-bulk-cfs-api-user-guide)
* [ESG Bulk - Point in Time User Guide](https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/documentation#esg-bulk-point-in-time-user-guide)

For any questions related to Refinitiv Data Platform APIs, please use the [RDP APIs Forum](https://community.developers.refinitiv.com/spaces/231/index.html) on the [Developers Community Q&A page](https://community.developers.refinitiv.com/).