Let's suppose that we have some data stored in a CSV file, which correspond to a dataflow following the BIS_DER datastructure from the BIS.

We can create a Dataset object in SDMXThon, and load this CSV data and the related metadata:

In [34]:
import sdmxthon
from sdmxthon.model.dataset import Dataset

data_instance = Dataset()
data_instance.read_csv('input_data.csv')

metadata = sdmxthon.read_sdmx('https://stats.bis.org/api/v1/datastructure/BIS/BIS_DER/1.0?references=all&detail=full')
data_instance.structure = metadata.content['DataStructures']['BIS:BIS_DER(1.0)']

SDMXThon provides a method to do a structural validation of the data against the metadata:

In [35]:
validation_results = data_instance.semantic_validation()

print (f'The dataset has {len(validation_results)} errors:\n {[error["Message"] for error in validation_results]}')

The dataset has 3 errors:
 ['Missing value in measure OBS_VALUE', 'Missing FREQ', 'Missing OBS_STATUS']


Thus, the dataset is incorrect, because there are some empty values, and the dimension 'FREQ' and the mandatory attribute 'OBS_STATUS' are missing.
It is possible to use Pandas to correct the dataset:

In [36]:
data_instance.data['OBS_VALUE'] = data_instance.data['OBS_VALUE'].fillna(0)
data_instance.data['FREQ'] = 'H'
data_instance.data['OBS_STATUS'] = 'A'

validation_results = data_instance.semantic_validation()
print (f'The dataset has {len(validation_results)} errors:\n {[error["Message"] for error in validation_results]}')

The dataset has 0 errors:
 []


Let's now suppose that we want to validate that each observation is within 50% of the observation for the previous period. Again, we can use Panda's capabilities to perform these validations: 

In [60]:
#Get list of dimensions excluding TIME_PERIOD:
dimension_descriptor = data_instance.structure.dimension_descriptor.components
dimenension_list = [key for key in dimension_descriptor]
dimenension_list.remove('TIME_PERIOD')


# Add a field with the previous value of the series:
data_instance.data["previous_value"] = \
    data_instance.data.sort_values("TIME_PERIOD").groupby(dimenension_list)\
            ["OBS_VALUE"].shift(1)


# Get if value is between the percentage of the previous value:
data_instance.data["val_result"] = data_instance.data["previous_value"] / data_instance.data["OBS_VALUE"]
errors = data_instance.data[~data_instance.data["val_result"].between(0.8, 1.2)].dropna()

#Drop inmaterial observations (previous or current below 1000):
errors = errors[(errors['previous_value'] > 1000) |  (errors['OBS_VALUE'] > 1000)]

print(len(data_instance.data))
print(len(errors))

errors.to_csv('error.csv')



9914
1818


SDMXThon provides a method to simply generate an SDMX-ML message from a Dataset object.
The message is generated as a StringIO object, but it is also possible to set a path to save the data as a file.

In [18]:
data_instance.to_xml(outputPath='output_data.xml')

We can also make use of the FMR web service to validate the generated data:

In [19]:
import requests

url = "http://127.0.0.1:8080/ws/public/data/load"
files = {'uploadFile': open('output_data.xml','rb')}

validate_request = requests.post(url, files=files)

print(validate_request.text)

{"Success":true,"uid":"6235fd89-6e08-48c6-a0c9-e870f734fde1"}


In [21]:
import json

url = "http://127.0.0.1:8080/ws/public/data/loadStatus"
uid =  json.loads(validate_request.text)['uid']

result_request = requests.get(f'{url}?uid={uid}')

result = json.loads(result_request.text)

print(result['Datasets'][0]['ValidationReport'])


[{'Type': 'MandatoryAttributes', 'Errors': [{'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'DECIMALS'", 'Dataset': 0, 'ComponentId': 'DECIMALS', 'Position': 'Dataset'}, {'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'UNIT_MEASURE'", 'Dataset': 0, 'ComponentId': 'UNIT_MEASURE', 'Position': 'Dataset'}, {'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'UNIT_MULT'", 'Dataset': 0, 'ComponentId': 'UNIT_MULT', 'Position': 'Dataset'}]}, {'Type': 'TimePeriodFormat', 'Errors': [{'ErrorCode': 'REG-201-151', 'Message': "The Observation Date does not match the expected format for the frequency Dimension: 'FREQ'. Expected frequency format: 'Hourly'. Observation Date is: '2017-S1'.", 'Dataset': 0, 'ComponentId': 'TIME_PERIOD', 'ReportedValue': '2017-S1', 'Position': 'Observation', 'Keys': ['H:A:R:D:5J:K:5J:A:TO1:GBP:A:A:3:C:2017-S1', 'H:A:S:D:5J:K:5J:A:GBP:TO1:A:A:3:A:2017-S1', 'H:A:S:U:5J:K:5J:A:TO1:TO1:A:A:3:A:2017-S1', 'H:A:T:D:5J:K:5