## Programmatic Access: Synchronous and Asynchronous Data Validation Using the SDMX Lab

This notebook demonstrates how to validate datasets against DSDs synchronously and asynchronously using REST-API calls to FMR running inside an SDMX Lab instance.

In [1]:
import utils
import requests
import zipfile
import io
import json
import pprint
from requests.auth import HTTPBasicAuth
from getpass import getpass
from pathlib import Path

Set up your credentials for basic authorization. These will be used to authenticate REST-API calls to the Lab instance.

In [2]:
user = input("Username: ")
password = getpass("Password: ")
auth = HTTPBasicAuth(username=user, password=password)

Specify the URL endpoint for the Lab instance. This is where the REST-API calls will be sent.

In [3]:
# Define Lab instance endpoint
# This is constructed automatically from the provided user name.
fmr_url = f"https://{user}.sdmx.solutions/fmr"

Submit the provision agreement referenced by the datasets in these examples to FMR

In [4]:
# Prepare provision agreement
endpoint = f"{fmr_url}/ws/secure/sdmxapi/rest"
path = 'provision_agreement.xml'
data = Path(path).read_text()
headers = {"Content-Type": "application/xml"}

# Post provision agreement
response = requests.post(
    endpoint, 
    data=data, 
    headers=headers, 
    auth=(user, password),
    verify=False
)

# Check success
if response.status_code in [200, 201]:
    print("Provision agreement successfully submitted to FMR.")
else:
    print(f"Failed to submit provision agreement. Status code: {response.status_code}")



Provision agreement successfully submitted to FMR.


Validate an SDMX-ML data file against the respective DSD using the synchronous FMR validation service. Here, the validation should pass because the data file conforms to the DSD.

In [5]:
### EXAMPLE #1 - Validate
### Input file: XML
### Scenario: PASS

# The URL of the synchronous validation web service
weservice_url = f"{fmr_url}/ws/public/data/validate"

# Path to the file you want to post
data_file_path = 'example_data_pass.xml' # validation pass test
input_type = "application/xml" # the data for validation is in XML
output_type = "application/vnd.sdmx.structurespecificdata+xml;version=2.1" # return valid and invalid output datasets as SDMX-ML 2.1 structure specific

# Define the headers
headers = {
    "Content-Type": input_type, # the data for validation is in XML
    "Inc-Metrics": "true", # include metrics in the validation report on the number of valid and invalid series and observations
    "Inc-Valid": "true", # output the valid data as an SDMX dataset in a file called 'ValidData'
    "Inc-Invalid": "true", # output the invalid data as an SDMX dataset in a file called 'InvalidData'
    "Zip": "true", # return the output packaged as a ZIP file
    "Accept": output_type # return valid and invalid output datasets as SDMX-ML 2.1 structure specific
}

utils.sync_validate(weservice_url, data_file_path, headers, auth)



File was processed successfully, continue doing useful things ...
Summary:
  1 valid series + 0 series with errors

Invalid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 0
  Observations: 0
  Groups: 0

Valid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 1
  Observations: 2
  Groups: 0

Datasets Summary:
  DSD: urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS_EXPERTS_CAPBLDG:DS2(1.0)
  Dataflow: urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS_EXPERTS_CAPBLDG:DATAFLOW2(1.0)
  Series Count: 1
  Observations Count: 2
  Groups Count: 0
    Period A: Annual (2019 to 2020)



Validate an SDMX-ML data file against the respective DSD using the synchronous FMR validation service. Here, the validation should fail because the data file does not conform to the DSD. This example assumes that the DSD has already been submitted to the FMR instance. 

In [6]:
### EXAMPLE #2 - Validate
### Input file: CSV
### Scenario: FAIL

# The URL of the synchronous validation web service
weservice_url = f"{fmr_url}/ws/public/data/validate"

# Path to the file you want to post
data_file_path = 'example_data_fail.csv' # validation failure test
input_type = "application/text" # the data for validation is in CSV
output_type = "application/vnd.sdmx.data+csv;version=2.0.0" # return valid and invalid output datasets as SDMX-CSV

# Define the headers
headers = {
    "Content-Type": input_type, # the data for validation is in XML
    "Inc-Metrics": "true", # include metrics in the validation report on the number of valid and invalid series and observations
    "Inc-Valid": "true", # output the valid data as an SDMX dataset in a file called 'ValidData'
    "Inc-Invalid": "true", # output the invalid data as an SDMX dataset in a file called 'InvalidData'
    "Zip": "true", # return the output packaged as a ZIP file
    "Accept": output_type # return valid and invalid output datasets as SDMX-ML 2.1 structure specific
}

utils.sync_validate(weservice_url, data_file_path, headers, auth)



Errors: True
Errors - Stop processing this file and action exception tasks ...
Summary:
  1 valid series + 4 series with errors

Invalid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 4
  Observations: 5
  Groups: 0

Valid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 1
  Observations: 1
  Groups: 0

Datasets Summary:
  DSD: urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS_EXPERTS_CAPBLDG:DS2(1.0)
  Dataflow: urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS_EXPERTS_CAPBLDG:DATAFLOW2(1.0)
  Series Count: 5
  Observations Count: 6
  Groups Count: 0
    Period A: Annual (2019 to 2023)
    Period M: Monthly (2021-12 to 2021-12)

    Validation Rule: Representation
      Error Code: REG-201-200
      Message: Dimension 'COUNTRY' is reporting value 'CA' which is not a valid repre

Validate an SDMX-ML data file against the respective DSD using the sychronous FMR validation service. Here, the validation should pass because the data file conforms to the DSD. Furthermore, the output should be returned as JSON. This example assumes that the DSD has already been submitted to the FMR instance. 

In [7]:
### EXAMPLE #3 - CONVERT
### Input file: XML
### OUTPUT file: JSON
### Scenario: PASS

# The URL of the synchronous validation web service
weservice_url = f"{fmr_url}/ws/public/data/validate"

# Path to the file you want to post
data_file_path = 'example_data_pass.xml' # validation pass test
input_type = "application/xml" # the data for validation is in XML
output_type = "application/vnd.sdmx.data+json;version=2.0.0" # return JSON file

# Define the headers
headers = {
    "Content-Type": input_type, # the data for validation is in XML
    "Inc-Metrics": "true", # include metrics in the validation report on the number of valid and invalid series and observations
    "Inc-Valid": "true", # output the valid data as an SDMX dataset in a file called 'ValidData'
    "Inc-Invalid": "true", # output the invalid data as an SDMX dataset in a file called 'InvalidData'
    "Zip": "true", # return the output packaged as a ZIP file
    "Accept": output_type # return valid and invalid output datasets as SDMX-ML 2.1 structure specific
}


utils.sync_validate(weservice_url, data_file_path, headers, auth)



File was processed successfully, continue doing useful things ...
Summary:
  1 valid series + 0 series with errors

Invalid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 0
  Observations: 0
  Groups: 0

Valid Data:
  Structure: urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)
  Series: 1
  Observations: 2
  Groups: 0

Datasets Summary:
  DSD: urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS_EXPERTS_CAPBLDG:DS2(1.0)
  Dataflow: urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS_EXPERTS_CAPBLDG:DATAFLOW2(1.0)
  Series Count: 1
  Observations Count: 2
  Groups Count: 0
    Period A: Annual (2019 to 2020)



This example validates a dataset against the respective DSD using the asynchronous web service. This service is suitable for larger datasets and heavier workloads that would otherwise result in HTTP timeouts due to the processing time if using the sychronous web service. For further information, see: https://fmrwiki.sdmxcloud.org/Asynchronous_Data_Validation_and_Transformation_Web_Service.

In [8]:
### EXAMPLE #4 - ASYNC - step 1 - load data
### Input file: XML
### Scenario: PASS

weservice_url = f"{fmr_url}/ws/public/data/load"
data_file_path = 'example_data_pass.xml' # Path to the file you want to post
headers = {'Content-Type': 'application/xml'} # the data for validation is in XML
job_status, job_id = utils.async_load(weservice_url, data_file_path, headers, auth)
print(f"Job Status: {job_status}")
print(f"Job ID: {job_id}")

Job Status: True
Job ID: fdc730c9-aceb-482d-8065-7331ea6bc4b7




In [9]:
### EXAMPLE #4 - ASYNC - step 2 - test status

weservice_url = f"{fmr_url}/ws/public/data/loadStatus"
utils.async_check_load_status(weservice_url, job_id, CHECK_INTERVAL=5, auth=auth)



False

In [10]:
### EXAMPLE #4 - ASYNC - step 3 - download results

weservice_url = f"{fmr_url}/ws/public/data/download"
headers = {'Accept': 'application/vnd.sdmx.data+json'}
response = utils.async_map_and_download(weservice_url, job_id, headers, auth)
pprint.pp(response)



('{"meta":{"id":"IREF778166","prepared":"2024-10-09T12:43:38","test":false,"datasetId":"a6d5d53ba-7860-4a60-b811-a7020a61f635","sender":{"id":"Unknown"},"links":[{"rel":"self","href":"/data/download?uid=fdc730c9-aceb-482d-8065-7331ea6bc4b7","uri":"https://raw.githubusercontent.com/sdmx-twg/sdmx-json/develop/data-message/tools/schemas/1.0/sdmx-json-data-schema.json"}]},"data":{"dataSets":[{"links":[{"rel":"dataflow","urn":"urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS_EXPERTS_CAPBLDG:DATAFLOW2(1.0)"}],"action":"Information","series":{"0:0:0":{"attributes":[],"observations":{"0":["2019"],"1":["2020"]}}}}],"structure":{"links":[{"rel":"dataflow","urn":"urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=BIS_EXPERTS_CAPBLDG:DATAFLOW2(1.0)"},{"rel":"provisionagreement","urn":"urn:sdmx:org.sdmx.infomodel.registry.ProvisionAgreement=BIS_EXPERTS_CAPBLDG:DATAFLOW2_BIS_EXPERTS_CAPBLDG_MY_DP(1.0)"},{"rel":"datastructure","urn":"urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS_E