# Comprehensive example

This example demonstrates how SDMXThon can be used to produce SDMX data from a CSV file.

Let's suppose that we have some data stored in a CSV file, which correspond to a dataflow following the BIS_DER datastructure from the BIS.

We can create a Dataset object in SDMXThon, and load this CSV data and the related metadata:

In [2]:
import sdmxthon
from sdmxthon.model.dataset import Dataset

data_instance = Dataset()
data_instance.read_csv('input_data/input_data.csv')

metadata = sdmxthon.read_sdmx('https://stats.bis.org/api/v1/datastructure/BIS/BIS_DER/1.0?references=all&detail=full')
data_instance.structure = metadata.content['DataStructures']['BIS:BIS_DER(1.0)']

SDMXThon provides a method to do a structural validation of the data against the metadata:

In [3]:
import json

validation_results = data_instance.semantic_validation()

print (f'The dataset has {len(validation_results)} errors:\n {[error["Message"] for error in validation_results]}')

The dataset has 3 errors:
 ['Missing value in measure OBS_VALUE', 'Missing FREQ', 'Missing OBS_STATUS']


Thus, the dataset is incorrect, because there are some empty values, and the dimension 'FREQ' and the mandatory attribute 'OBS_STATUS' are missing.
It is possible to use Pandas to correct the dataset:

In [4]:
data_instance.data['OBS_VALUE'] = data_instance.data['OBS_VALUE'].fillna(0)
data_instance.data['FREQ'] = 'H'
data_instance.data['OBS_STATUS'] = 'A'

validation_results = data_instance.semantic_validation()
print (f'The dataset has {len(validation_results)} errors:\n {[error["Message"] for error in validation_results]}')

The dataset has 0 errors:
 []


Let's now suppose that we want to validate that each observation is within 50% of the observation for the previous period. Again, we can use Panda's capabilities to perform these validations: 

In [5]:
#Get list of dimensions excluding TIME_PERIOD:
dimension_descriptor = data_instance.structure.dimension_descriptor.components
dimenension_list = [key for key in dimension_descriptor]
dimenension_list.remove('TIME_PERIOD')


# Add a field with the previous value of the series:
data_instance.data["previous_value"] = \
    data_instance.data.sort_values("TIME_PERIOD").groupby(dimenension_list)\
            ["OBS_VALUE"].shift(1)


# Get if value is between the percentage of the previous value:
data_instance.data["val_result"] = data_instance.data["previous_value"] / data_instance.data["OBS_VALUE"]
errors = data_instance.data[~data_instance.data["val_result"].between(0.8, 1.2)].dropna()

#Drop inmaterial observations (previous or current below 1000):
errors = errors[(errors['previous_value'] > 1000) |  (errors['OBS_VALUE'] > 1000)]

print(len(data_instance.data))
print(len(errors))

errors.to_csv('error.csv')



9914
1818


SDMXThon provides a method to simply generate an SDMX-ML message from a Dataset object.
The message is generated as a StringIO object, but it is also possible to set a path to save the data as a file.

In [11]:
data_instance.to_xml(outputPath='output_data/output_data.xml')
print(data_instance.to_xml().getvalue()[:3000])

<?xml version="1.0" encoding="UTF-8"?>
<mes:StructureSpecificData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mes="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS:BIS_DER(1.0):ObsLevelDim:AllDimensions" xmlns:com="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xsi:schemaLocation="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message https://registry.sdmx.org/schemas/v2_1/SDMXMessage.xsd">
	<mes:Header>
		<mes:ID>test</mes:ID>
		<mes:Test>true</mes:Test>
		<mes:Prepared>2023-03-24T09:28:03</mes:Prepared>
		<mes:Sender id="Unknown"/>
		<mes:Receiver id="Not_supplied"/>
		<mes:Structure structureID="BIS_DER" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS:BIS_DER(1.0)" dimensionAtObservation="AllDimensions">
			<com:Structure>
				<Ref agencyID="BIS" id="BIS_DER" version="

We can also make use of the FMR web service to validate the generated data:

In [13]:
import requests

url = "http://127.0.0.1:8080/ws/public/data/load"
files = {'uploadFile': open('output_data/output_data.xml','rb')}

validate_request = requests.post(url, files=files)

print(validate_request.text)

{"Success":true,"uid":"913ca9c2-2598-4d98-9726-1409aea9ab5b"}


In [17]:
import time

url = "http://127.0.0.1:8080/ws/public/data/loadStatus"
uid =  json.loads(validate_request.text)['uid']

time.sleep(3) # Wait for the validation to finish

result_request = requests.get(f'{url}?uid={uid}')

result = json.loads(result_request.text)

print(result['Datasets'][0]['ValidationReport'][:2])


[{'Type': 'MandatoryAttributes', 'Errors': [{'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'DECIMALS'", 'Dataset': 0, 'ComponentId': 'DECIMALS', 'Position': 'Dataset'}, {'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'UNIT_MEASURE'", 'Dataset': 0, 'ComponentId': 'UNIT_MEASURE', 'Position': 'Dataset'}, {'ErrorCode': 'REG-201-051', 'Message': "Missing mandatory attribute 'UNIT_MULT'", 'Dataset': 0, 'ComponentId': 'UNIT_MULT', 'Position': 'Dataset'}]}, {'Type': 'FormatSpecific', 'Errors': [{'ErrorCode': '-', 'Message': "Unexpected attribute 'previous_value' for element 'StructureSpecificData/DataSet/Obs'", 'Dataset': 0, 'Position': 'Dataset'}, {'ErrorCode': '-', 'Message': "Unexpected attribute 'val_result' for element 'StructureSpecificData/DataSet/Obs'", 'Dataset': 0, 'Position': 'Dataset'}]}]
