# Green Revenue CFS File Workflow)

### Importing libararies

In [8]:
import os
import sys
from dotenv import dotenv_values
config = dotenv_values(".env")

### Set RDP credentials and Initial Parameters

In [9]:
username = config['RDP_USERNAME']
password = config['RDP_PASSWORD']
clientId = config['RDP_APP_KEY']

RDP_HOST= 'https://api.refinitiv.com'
acccess_token = None
refresh_token = None
expires_in = 0

## <a id="rdp_workflow"></a>RDP APIs Application Workflow

Refinitiv Data Platform entitlement check is based on OAuth 2.0 specification. The first step of an application workflow is to get a token from RDP Auth Service, which will allow access to the protected resource, i.e. data REST API. 

The API requires the following access credential information:
- Username: The username. 
- Password: Password associated with the username. 
- Client ID: This is also known as ```AppKey```, and it is generated using an App key Generator. This unique identifier is defined for the user or application and is deemed confidential (not shared between users). The client_id parameter can be passed in the request body or as an “Authorization” request header that is encoded as base64.

The HTTP request for the RDP APIs Authentication service is as follows:

``` HTTP
POST /auth/oauth2/v1/token HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded
Host: api.refinitiv.com:443
Content-Length: XXX

username=RDP_USERNAME
&password=RDP_PASSWORD
&client_id=RDP_APP_KEY
&grant_type=password
&takeExclusiveSignOnControl=true
&scope=trapi
```

Once the authentication success, the function gets the RDP Auth service response message and keeps the following RDP token information in the variables.
- **access_token**: The token used to invoke REST data API calls as described above. The application must keep this credential for further RDP APIs requests.
- **refresh_token**: Refresh token to be used for obtaining an updated access token before expiration. The application must keep this credential for access token renewal.
- **expires_in**: Access token validity time in seconds.

Next, after the application received the Access Token (and authorization token) from RDP Auth Service, all subsequent REST API calls will use this token to get the data. Please find more detail regarding RDP APIs workflow in the following resources:
- [RDP APIs: Introduction to the Request-Response API](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#introduction-to-the-request-response-api) page.
- [RDP APIs: Authorization - All about tokens](https://developers.refinitiv.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#authorization-all-about-tokens) page.

In [10]:
#step 1 - get RDP Access Token from RDP

import http.client
import requests 
import json

# Send HTTP Request
auth_url = f'{RDP_HOST}/auth/oauth2/v1/token'
payload = f'grant_type=password&username={username}&client_id={clientId}&password={password}&takeExclusiveSignOnControl=True&scope=trapi'
try:
    response = requests.post(auth_url, 
                             headers = {'Content-Type':'application/x-www-form-urlencoded'}, 
                             data = payload, 
                             auth = (clientId, '')
                )
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Authentication success')
    access_token = response.json()['access_token']
    refresh_token = response.json()['refresh_token']
    expires_in = int(response.json()['expires_in'])

if response.status_code != 200:
    print(f'RDP authentication failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Authentication success


## <a id="rdp_get_data"></a>Requesting Data from RDP APIs

That brings us to requesting the RDP APIs data. All subsequent REST API calls use the Access Token via the *Authorization* HTTP request message header as shown below to get the data. 
- Header: 
    * Authorization = ```Bearer <RDP Access Token>```

Please notice *the space* between the ```Bearer``` and ```RDP Access Token``` values.

The application then creates a request message in a JSON message format or URL query parameter based on the interested service and sends it as an HTTP request message to the Service Endpoint. Developers can get RDP APIs the Service Endpoint, HTTP operations, and parameters from Refinitiv Data Platform's [API Playground page](https://api.refinitiv.com/) - which is an interactive documentation site developers can access once they have a valid Refinitiv Data Platform account.

## <a id="rdp_get_green_bulk"></a>Requesting Bulk Green Revenues Data

To request the Green Revenues Bulk data, the first step is to send an HTTP ```GET``` request to the RDP ```/file-store/v1/file-sets?bucket=bulk-greenrevenue``` endpoint to list all FileSets.

The HTTP Request structure is as follows:

```HTTP
GET /file-store/v1/file-sets?bucket=bulk-greenrevenue HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Acces Token>
```

In [16]:
#step 2 - list Package IDs from bucket name

CFS_url = f'{RDP_HOST}/file-store/v1/file-sets?bucket=bulk-greenrevenue'

try:
    response = requests.get(CFS_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive list Package IDs from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive list Package IDs from RDP APIs


Example of the first entry of package IDs, the pacakgeId is the ```packageId``` field.

In [18]:
print(json.dumps(response.json()['value'][0], sort_keys=True, indent=2, separators=(',', ':')))

{
  "attributes":[
    {
      "name":"ContentType",
      "value":"GR Global Standard Full"
    }
  ],
  "availableFrom":"2023-07-16T21:04:32Z",
  "availableTo":"2023-08-16T21:04:32Z",
  "bucketName":"bulk-GreenRevenue",
  "contentFrom":"2023-07-09T20:55:00Z",
  "contentTo":"2023-07-16T20:55:00Z",
  "created":"2023-07-16T21:04:32Z",
  "files":[
    "4843-2847-82acac37-a194-89fd4c6c8521"
  ],
  "id":"402c-fd26-cdaabd41-8f36-368805d3fafc",
  "modified":"2023-07-16T21:04:35Z",
  "name":"Bulk-GR-Global-Standard-Full-v1-Jsonl-Delta-2023-07-16T21:01:26.149Z",
  "numFiles":1,
  "packageId":"4316-d43b-81c40763-8e6a-0dbec8162ab1",
  "status":"READY"
}


The next step is choosing the package Id

In [19]:
packageId = response.json()['value'][0]['packageId']
packageId

'4316-d43b-81c40763-8e6a-0dbec8162ab1'

The next step is calling the CFS API with the buket name and package Id to list all FileSets using **the package Id**.

API endpint is ```/file-store/v1/file-sets?bucket=bulk-greenrevenue&packageId={packageId}```

The HTTP Request structure is as follows:

``` HTTP
GET /file-store/v1/file-sets?bucket=bulk-greenrevenue&packageId={packageId} HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Access Token>
```

In [20]:
#step 3 - get file id from bucket name

CFS_url = f'{RDP_HOST}/file-store/v1/file-sets?bucket=bulk-greenrevenue&packageId={packageId}'

try:
    response = requests.get(CFS_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive FileSets list from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive FileSets list from RDP APIs


In [22]:
print(json.dumps(response.json()['value'][0], sort_keys=True, indent=2, separators=(',', ':')))

{
  "attributes":[
    {
      "name":"ContentType",
      "value":"GR Global Standard Full"
    }
  ],
  "availableFrom":"2023-07-16T21:04:32Z",
  "availableTo":"2023-08-16T21:04:32Z",
  "bucketName":"bulk-GreenRevenue",
  "contentFrom":"2023-07-09T20:55:00Z",
  "contentTo":"2023-07-16T20:55:00Z",
  "created":"2023-07-16T21:04:32Z",
  "files":[
    "4843-2847-82acac37-a194-89fd4c6c8521"
  ],
  "id":"402c-fd26-cdaabd41-8f36-368805d3fafc",
  "modified":"2023-07-16T21:04:35Z",
  "name":"Bulk-GR-Global-Standard-Full-v1-Jsonl-Delta-2023-07-16T21:01:26.149Z",
  "numFiles":1,
  "packageId":"4316-d43b-81c40763-8e6a-0dbec8162ab1",
  "status":"READY"
}


The File ID is in the ```files``` array

In [23]:
file_id = response.json()['value'][0]['files'][0]
file_id

'4843-2847-82acac37-a194-89fd4c6c8521'

The last step is downloading the FIle using File ID with the RDP ```/file-store/v1/files/{file ID}/stream``` endpoint.

The HTTP Request structure is as follows:

``` HTTP
GET /file-store/v1/files/{fileId}/stream?doNotRedirect=true HTTP/1.1
Host: api.refinitiv.com
Authorization: Bearer <Access Token>
```

In [24]:
#step 3 - get file stream (content) from file id

FileID_url = f'{RDP_HOST}/file-store/v1/files/{file_id}/stream?doNotRedirect=true'

try:
    response = requests.get(FileID_url, headers={'Authorization': f'Bearer {access_token}'})
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')


if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive File URL from RDP APIs')
else:
    print(f'RDP APIs: CFS request failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Receive File URL from RDP APIs


The File URL is in the ```url``` attribute.

In [25]:
file_url = response.json()['url']
file_url

'https://a206464-bulk-greenrevenue.s3.amazonaws.com/Bulk-GR-Global-Standard-Full-v1/2023/07/16/Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21%3A01%3A26.149Z.jsonl.gz?x-request-Id=baf2e03c-eb24-491f-a8d8-d1f314617cd9&x-package-id=4316-d43b-81c40763-8e6a-0dbec8162ab1&x-client-app-id=b4842f3904fb4a1fa18234796368799086c63541&x-file-name=Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21%3A01%3A26.149Z.jsonl.gz&x-fileset-id=402c-fd26-cdaabd41-8f36-368805d3fafc&x-bucket-name=bulk-GreenRevenue&x-uuid=GESG1-103676&x-file-Id=4843-2847-82acac37-a194-89fd4c6c8521&x-fileset-name=Bulk-GR-Global-Standard-Full-v1-Jsonl-Delta-2023-07-16T21%3A01%3A26.149Z&x-event-external-name=cfs-claimCheck-download&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEO3%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJIMEYCIQCLkesGrfjhELLS8iDcQbrSPBMJpdBvr2WL824DpiztQAIhAMQ1ZjkJ7UPeKNNShb8fT8XPLr6%2FqjRafFF%2F%2FP1SV9%2B%2BKpoCCHUQBBoMNjQyMTU3MTgxMzI2IgxH5z9VCr1o0OiEjkAq9wEwyzK3m2J6HM5A7PT%2B%2FLHq7d%2B8EEThLK44fuVQls5iOdMlxumH

Based on the S3 ```file_url``` above, the actual file name is *Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21_01_26.149Z.jsonl.gz*. So you need to replace the escape character ```%3A``` with ```_``` (underscore) character.

Downloading a file

In [26]:
#Downlaod file
zipfilename = file_url.split("?")[0].split("/")[-1].replace("%3A","_")
print(f'Downloading File {zipfilename} ...')

try:
    response = requests.get(file_url)
except requests.exceptions.RequestException as exp:
    print(f'Caught exception: {exp}')

if response.status_code == 200:  # HTTP Status 'OK'
    print('Receive File Successfully')
    open(zipfilename, 'wb').write(response.content)
    print(f'{zipfilename} Saved')
else:
    print(f'RDP APIs: Request file failure: {response.status_code} {response.reason}')
    print(f'Text: {response.text}')

Downloading File Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21_01_26.149Z.jsonl.gz ...
Receive File Successfully
Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21_01_26.149Z.jsonl.gz Saved


And then unzip the file.

In [27]:
#unzip file
import gzip
import shutil
unzipfilename = zipfilename.split('.gz')[0]
print(f'Unzip to {unzipfilename} ...')
with gzip.open(zipfilename, 'rb') as f_in:
    with open(unzipfilename, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
print('Done')

Unzip to Bulk-GR-Global-Standard-Full-v1-Delta-2023-07-16T21_01_26.149Z.jsonl ...
Done


In [28]:
# View some data
with open(unzipfilename) as f:
    data = f.read(2000)
    print(data)

{"ObjectId":"4295863939;238","GreenRevenue":{"SectorRevenues":[{"ParentSectorCode":"ER","ParentSectorCodeDesription":"Environmental Resources","SectorCode":"ER.02","SectorCodeDescription":"Environmental Resources Key Raw Minerals & Metals","SectorRevenuePercent":"19.794","SectorTypeCode":"Subsector"},{"ParentSectorCode":"ER.02","ParentSectorCodeDesription":"Environmental Resources Key Raw Minerals & Metals","SectorCode":"ER.02.4","SectorCodeDescription":"Environmental Resources Rare Earths","SectorRevenuePercent":"19.794","SectorTypeCode":"Microsector"},{"ParentSectorCode":null,"ParentSectorCodeDesription":null,"SectorCode":"ER","SectorCodeDescription":"Environmental Resources","SectorRevenuePercent":"19.794","SectorTypeCode":"Sector"}]}}
{"ObjectId":"4295878481;294","GreenRevenue":{"SectorRevenues":[{"ParentSectorCode":"TE.03","ParentSectorCodeDesription":"Transport Equipment Road Vehicles","SectorCode":"TE.03.4","SectorCodeDescription":"Transport Equipment Electrified Road Vehicles &