This notebook transfers ERA5 training data through Globus using the globus-sdk library.

I set this up using the tutorial in the documentation located at: https://globus-sdk-python.readthedocs.io/en/3.x-line/user_guide/getting_started/minimal_script.html


Annabel Wade 10-22-25


<!-- italicize -->
*Before running this notebook, ensure you have the globus-sdk library installed. You can install it using pip:*

```python
!pip install globus-sdk
```

*Also, make sure to create a project and client for authenticating with Globus as described in the Globus SDK documentation.*

In [None]:
import globus_sdk

In [8]:
# # client UUID for the ERA5-training-transfer-app
# CLIENT_ID = "b2919900-2241-4a74-b8dc-6e2bd4e8f39a"

# # create the app object for interacting with globus_sdk
# my_app = globus_sdk.UserApp("ERA5-training-transfer-app", client_id=CLIENT_ID)

We will transfer data from the source endpoint to the destination endpoint.


source endpoint ID: c4e40965-a024-43d7-bef4-6010f3731b61

source path: /d633000/e5.oper.an.sfc/YEARMO/e5*2t*


destination endpoint ID: 79b9c32e-8780-4f0b-b809-40e45e88511e

destination path: /projectnb/eb-general/shared_data/data/processed/sfno/ERA5_SFNO/training_years/YEARMO/e5*2t* (this should be done according to the source path)

*YEARMO will range from 197901 to 201512*

example of a source path: e5.oper.an.sfc.128_167_2t.ll025sc.1979010100_1979013123.nc

In [12]:
# Define the YEARMO range
start_yearmo = 197901
end_yearmo = 201512
yearmo_strings = [f"{year}{month:02d}" for year in range(1979, 2016 + 1) for month in range(1, 13)
                  if (year * 100 + month) >= start_yearmo and (year * 100 + month) <= end_yearmo]
print(yearmo_strings)


['197901', '197902', '197903', '197904', '197905', '197906', '197907', '197908', '197909', '197910', '197911', '197912', '198001', '198002', '198003', '198004', '198005', '198006', '198007', '198008', '198009', '198010', '198011', '198012', '198101', '198102', '198103', '198104', '198105', '198106', '198107', '198108', '198109', '198110', '198111', '198112', '198201', '198202', '198203', '198204', '198205', '198206', '198207', '198208', '198209', '198210', '198211', '198212', '198301', '198302', '198303', '198304', '198305', '198306', '198307', '198308', '198309', '198310', '198311', '198312', '198401', '198402', '198403', '198404', '198405', '198406', '198407', '198408', '198409', '198410', '198411', '198412', '198501', '198502', '198503', '198504', '198505', '198506', '198507', '198508', '198509', '198510', '198511', '198512', '198601', '198602', '198603', '198604', '198605', '198606', '198607', '198608', '198609', '198610', '198611', '198612', '198701', '198702', '198703', '198704',

In [22]:
source_endpoint_id = "c4e40965-a024-43d7-bef4-6010f3731b61"
destination_endpoint_id = "79b9c32e-8780-4f0b-b809-40e45e88511e"

In [26]:
# loop through the YEARMO directories and transfer the variable's file
var="2t"

# Authenticate with Globus (replace with your client ID and secret or use other auth methods)
CLIENT_ID = "b2919900-2241-4a74-b8dc-6e2bd4e8f39a"
client = globus_sdk.NativeAppAuthClient(CLIENT_ID)
client.oauth2_start_flow(refresh_tokens=True, requested_scopes=[
        "urn:globus:auth:scope:transfer.api.globus.org:all",
        # "urn:globus:auth:scope:transfer.api.globus.org:data_access" # this is unknown to the system commenting out
    ])

# Display the authorization URL
from IPython.display import display, HTML
print("Go to this URL and log in:")
display(HTML(f'<a href="{client.oauth2_get_authorize_url()}" target="_blank">Authenticate with Globus</a>'))

# Enter the authorization code
auth_code = input("Enter the authorization code here: ")

# Exchange the authorization code for tokens
token_response = client.oauth2_exchange_code_for_tokens(auth_code)

# Extract tokens
transfer_tokens = token_response.by_resource_server["transfer.api.globus.org"]
ACCESS_TOKEN = transfer_tokens["access_token"]
REFRESH_TOKEN = transfer_tokens["refresh_token"]

# Create a TransferClient
authorizer = globus_sdk.AccessTokenAuthorizer(ACCESS_TOKEN)
transfer_client = globus_sdk.TransferClient(authorizer=authorizer)

print("Authentication successful!")

Go to this URL and log in:


Authentication successful!


In [27]:
# Activate the source endpoint
source_activation = transfer_client.endpoint_autoactivate(source_endpoint_id)
print(f"Source endpoint activation status: {source_activation['code']}")

# Activate the destination endpoint
dest_activation = transfer_client.endpoint_autoactivate(destination_endpoint_id)
print(f"Destination endpoint activation status: {dest_activation['code']}")


Source endpoint activation status: AutoActivated.GlobusOnlineCredential
Destination endpoint activation status: AutoActivated.GlobusOnlineCredential


Let's test the transfer with one file first, then we can scale up to all files.

In [28]:
# Define the test source and destination paths
test_source_path = "/d633000/e5.oper.an.sfc/197901/e5.oper.an.sfc.128_167_2t.ll025sc.1979010100_1979013123.nc"
test_destination_path = "/projectnb/eb-general/shared_data/data/processed/sfno/test_files/e5_2t_197901.nc"

# Create a TransferData object for the test transfer
test_transfer_data = globus_sdk.TransferData(
    transfer_client,
    source_endpoint_id,
    destination_endpoint_id,
    label="Test Transfer",
    sync_level="checksum",  # Ensures file integrity
)

# Add the test file to the transfer
test_transfer_data.add_item(test_source_path, test_destination_path)

# Submit the test transfer
test_transfer_result = transfer_client.submit_transfer(test_transfer_data)
print(f"Test transfer submitted! Task ID: {test_transfer_result['task_id']}")

# Monitor the test transfer status
test_task_id = test_transfer_result["task_id"]
while True:
    test_task = transfer_client.get_task(test_task_id)
    print(f"Test Task {test_task_id} status: {test_task['status']}")
    if test_task["status"] in ["SUCCEEDED", "FAILED"]:
        break

# Check the final status
if test_task["status"] == "SUCCEEDED":
    print("Test transfer completed successfully!")
else:
    print("Test transfer failed. Check the task details for more information.")

TransferAPIError: ('POST', 'https://transfer.api.globus.org/v0.10/transfer', 'Bearer', 403, 'ConsentRequired', 'Missing required data_access consent', 'Dba0Mxufe')

In [None]:
# Loop through each YEARMO and transfer files
for yearmo_str in yearmo_strings:

    # Define source and destination paths
    source_path = f"/d633000/e5.oper.an.sfc/{yearmo_str}/e5*{var}*"
    destination_path = f"/projectnb/eb-general/shared_data/data/processed/sfno/ERA5_SFNO/training_years/{yearmo_str}/e5_{var}_{yearmo_str}.nc"
    
    # Create a TransferData object
    transfer_data = globus_sdk.TransferData(
        transfer_client,
        source_endpoint_id,
        destination_endpoint_id,
        label=f"Transfer for {yearmo_str}",
        sync_level="checksum",  # Ensures file integrity
    )
    
    # Add the file transfer item
    transfer_data.add_item(source_path, destination_path)
    
    # Submit the transfer
    transfer_result = transfer_client.submit_transfer(transfer_data)
    print(f"Transfer submitted for {yearmo_str}! Task ID: {transfer_result['task_id']}")