Let's suppose that we have some data stored in a CSV file, which correspond to a dataflow following the BIS_DER datastructure from the BIS.

We can create a Dataset object in SDMXthon, and load this CSV data (ensure you have input_data.csv file in the same directory) and the related metadata:

In [1]:
import copy

import sdmxthon
from sdmxthon.model.dataset import Dataset

data_instance = Dataset(unique_id='BIS:BIS_DER(1.0)', structure_type='datastructure')
# Load the data from a CSV file:
data_instance.read_csv('input_data.csv')

metadata = sdmxthon.read_sdmx('https://stats.bis.org/api/v1/datastructure/BIS/BIS_DER/1.0?references=all&detail=full')
data_instance.structure = metadata.content['DataStructures']['BIS:BIS_DER(1.0)']

SDMXthon provides a method to do a structural validation of the data against the metadata:

In [2]:
validation_results = data_instance.structural_validation()

print (f'The dataset has {len(validation_results)} errors:\n')
[error for error in validation_results]

The dataset has 3 errors:


[{'Code': 'SS01',
  'ErrorLevel': 'CRITICAL',
  'Component': 'FREQ',
  'Type': 'Dimension',
  'Rows': None,
  'Message': 'Missing FREQ'},
 {'Code': 'SS03',
  'ErrorLevel': 'CRITICAL',
  'Component': 'OBS_STATUS',
  'Type': 'Attribute',
  'Rows': None,
  'Message': 'Missing OBS_STATUS'},
 {'Code': 'SS07',
  'Component': 'Duplicated',
  'Type': 'Datapoint',
  'Rows': [{'DER_TYPE': 'U',
    'DER_INSTR': 8,
    'DER_RISK': 'D',
    'DER_REP_CTY': '5J',
    'DER_SECTOR_CPY': 'A',
    'DER_CPC': '1E',
    'DER_SECTOR_UDL': 'A',
    'DER_CURR_LEG1': 'AUD',
    'DER_CURR_LEG2': 'TO1',
    'DER_ISSUE_MAT': 'A',
    'DER_RATING': 'A',
    'DER_EX_METHOD': 3,
    'DER_BASIS': 'C',
    'TIME_PERIOD': 2019,
    'OBS_VALUE': 1221,
    'DECIMALS': 3,
    'UNIT_MEASURE': 'USD',
    'UNIT_MULT': 6,
    'TIME_FORMAT': nan,
    'AVAILABILITY': 'K',
    'COLLECTION': 'S',
    'TITLE_TS': nan,
    'OBS_CONF': 'F',
    'OBS_PRE_BREAK': nan},
   {'DER_TYPE': 'U',
    'DER_INSTR': 8,
    'DER_RISK': 'D',
    

We can also use the FMR web service to validate the generated data:

In [3]:
data_instance.fmr_validation('fmr.meaningfuldata.eu', 443, use_https=True)

[{'Type': 'MandatoryComponents',
  'Errors': [{'ErrorCode': 'REG-201-051',
    'Message': "Missing mandatory attribute 'OBS_STATUS'",
    'Dataset': 0,
    'ComponentId': 'OBS_STATUS',
    'Position': 'Observation',
    'Keys': [':U:8:D:5J:A:1E:A:AED:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:AED:TO1:A:A:3:C:2020',
     ':U:8:D:5J:A:1E:A:ARS:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:ARS:TO1:A:A:3:C:2020',
     ':U:8:D:5J:A:1E:A:AUD:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:AUD:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:BGN:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:BHD:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:BRL:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:CAD:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:CHF:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:CLP:TO1:A:A:3:C:2019',
     ':U:8:D:5J:A:1E:A:CLP:TO1:A:A:3:C:2020']}]},
 {'Type': 'Structure',
  'Errors': [{'ErrorCode': 'REG-201-186',
    'Message': 'Missing value for Dimension FREQ',
    'Dataset': 0,
    'ComponentId': 'FREQ',
    'Position

Thus, the dataset is incorrect, because there are some duplicated values, and the dimension 'FREQ' and the mandatory attribute 'OBS_STATUS' are missing.
It is possible to use Pandas to correct the dataset:

In [4]:
wrong_data = copy.deepcopy(data_instance.data)
data_instance.data['FREQ'] = 'A'
data_instance.data['OBS_STATUS'] = 'A'
data_instance.data.drop_duplicates(inplace=True, keep='first', subset=data_instance.structure.dimension_codes)

validation_results = data_instance.structural_validation()
print (f'The dataset has {len(validation_results)} errors:\n {[error["Message"] for error in validation_results]}')

The dataset has 0 errors:
 []


Let's now suppose that we want to validate that each observation is within 50% of the observation for the previous period. Again, we can use Panda's capabilities to perform these validations: 

In [5]:
#Get list of dimensions excluding TIME_PERIOD:
dimension_descriptor = data_instance.structure.dimension_descriptor.components
dimension_list = [key for key in dimension_descriptor]
dimension_list.remove('TIME_PERIOD')


# Add a field with the previous value of the series:
data_instance.data["previous_value"] = \
    data_instance.data.sort_values("TIME_PERIOD").groupby(dimension_list)\
            ["OBS_VALUE"].shift(1)


# Get if value is between the percentage of the previous value:
data_instance.data["val_result"] = data_instance.data["previous_value"] / data_instance.data["OBS_VALUE"]
errors = data_instance.data[~data_instance.data["val_result"].between(0.8, 1.2)].dropna()

#Drop inmaterial observations (previous or current below 1000):
errors = errors[(errors['previous_value'] > 1000) |  (errors['OBS_VALUE'] > 1000)]

print(f"Data length: {len(data_instance.data)}")
print(f"Number of errors: {len(errors)}")

errors.to_dict(orient="records")

# Delete the added fields:
data_instance.data.drop(columns=["previous_value", "val_result"], inplace=True)



Data length: 12
Number of errors: 0


SDMXthon provides a method to simply generate an SDMX-ML message from a Dataset object.
The message is generated as a string, but it is also possible to set a path to save the data as a file (using 'output_path' parameter). 
You can use the prettyprint parameter to generate a more readable XML.

In [6]:
print(data_instance.to_xml(prettyprint=True))

<?xml version="1.0" encoding="UTF-8"?>
<mes:StructureSpecificData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mes="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS:BIS_DER(1.0):ObsLevelDim:AllDimensions" xmlns:com="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xsi:schemaLocation="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message https://registry.sdmx.org/schemas/v2_1/SDMXMessage.xsd">
	<mes:Header>
		<mes:ID>test</mes:ID>
		<mes:Test>true</mes:Test>
		<mes:Prepared>2024-03-05T17:49:30</mes:Prepared>
		<mes:Sender id="Unknown"/>
		<mes:Receiver id="Not_supplied"/>
		<mes:Structure structureID="BIS_DER" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=BIS:BIS_DER(1.0)" dimensionAtObservation="AllDimensions">
			<com:Structure>
				<Ref agencyID="BIS" id="BIS_DER" version="