# The SAFE Specification

Sentinel data products use a variation of the Standard Archive Format for Europe (SAFE)

The SENITNEL-SAFE format wraps a folder containing image data and metadata in XML.

A SENTINEL Product then refers to a folder containing:
- a 'manifest.safe' file which holds the general product information in XML
    - This includes file information and observation metadata (i.e. observation mode)
- subfolders for measurement datasets containing image data in various binary formats
    - .dat for raw data, .tiff and .jp2 for higher level data
- a preview folder containing 'quicklooks' in PNG format, Google Earth overlays in KML format and HTML preview files
- an annotation folder containing the product metadata in XML as well as calibration data
    - The top level xml files contain information such as average noise, orbital data, beam data, etc.
    - The xml files within calibration contain calibration vectors and information about the errors
        - How exactly these are applied you will have to check in the technical guide
- a support folder containing the XML schemes describing the product XML.
    - The text within it can be used to understand the calibration data


## Sentinel-1

### Level-0

The data files contain the raw measurement data in the form of a stream of downlinked ISPs.

The binary data is stored in big-endian format

### Level-1




# Opening the manifest, support and anotation files

## XML

XML is the extensible Markup Language

## XSD

XSD  is the XML Schema definition, this document expresses the constraints for the XML documents

## Some example files

## manifest.SAFE

In [2]:
# To open XML files we will be using BeautifulSoup (this is not a requirement but does make filehandling easier)

from bs4 import BeautifulSoup


#A helper function to print reduced output
def reduce_out(i,len=500):
    print(str(i)[:len:]+'...')


with open('S1B_IW_SLC__1SSH_20210717T014251_20210717T014306_027827_03520D_4B30.SAFE/manifest.safe') as f:
    manifest = f.read()

manifest = BeautifulSoup(manifest,'xml')

In [165]:
# Within the manifest there are several delimiters
# the first is the Information Package Map,

reduce_out(manifest.informationPackageMap)

# This contains general file information

<informationPackageMap>
<xfdu:contentUnit dmdID="acquisitionPeriod platform generalProductInformation measurementOrbitReference measurementFrameSet" pdiID="processing" textInfo="Sentinel-1 IW Level-1 SLC Product" unitType="SAFE Archive Information Package">
<xfdu:contentUnit repID="s1Level1ProductSchema" unitType="Metadata Unit">
<dataObjectPointer dataObjectID="products1biw1slchh20210717t01425320210717t01430402782703520d001"/>
</xfdu:contentUnit>
<xfdu:contentUnit repID="s1Level1NoiseSchema" un...


In [160]:
# Then there is the metadataSection

reduce_out(manifest.metadataSection)

# This contains metadata Object ID's, Post Processing information (location, time, facilities), and much more

<metadataSection>
<metadataObject ID="products1biw1slchh20210717t01425320210717t01430402782703520d001Annotation" category="DMD" classification="DESCRIPTION">
<dataObjectPointer dataObjectID="products1biw1slchh20210717t01425320210717t01430402782703520d001"/>
</metadataObject>
<metadataObject ID="noises1biw1slchh20210717t01425320210717t01430402782703520d001Annotation" category="DMD" classification="DESCRIPTION">
<dataObjectPointer dataObjectID="noises1biw1slchh20210717t01425320210717t0143040278270...


In [162]:
# And lastly we have the dataObjectSection
reduce_out(manifest.dataObjectSection)

# This contains DataObjectIDs the filenames of the individual xml and xsd files with the repID, an identifier such as s1Level1CalibrationSchema,
# the relative file path, and the MD5 checksum (used to verify downloads)

<dataObjectSection>
<dataObject ID="products1biw1slchh20210717t01425320210717t01430402782703520d001" repID="s1Level1ProductSchema">
<byteStream mimeType="text/xml" size="428278">
<fileLocation href="./annotation/s1b-iw1-slc-hh-20210717t014253-20210717t014304-027827-03520d-001.xml" locatorType="URL"/>
<checksum checksumName="MD5">4e1e15bc8e760f1424ce80eb13250e8b</checksum>
</byteStream>
</dataObject>
<dataObject ID="noises1biw1slchh20210717t01425320210717t01430402782703520d001" repID="s1Level1Noi...


In [178]:
# Say we want to get the footprint, we will use a dictionary with the ID key and the relevant identifier for the footprint
manifest.metadataSection.find('metadataObject',{'ID':'measurementFrameSet'})

<metadataObject ID="measurementFrameSet" category="DMD" classification="DESCRIPTION">
<metadataWrap mimeType="text/xml" textInfo="Frame Set" vocabularyName="SAFE">
<xmlData>
<safe:frameSet>
<safe:frame>
<safe:footPrint srsName="http://www.opengis.net/gml/srs/epsg.xml#4326">
<gml:coordinates>-8.098805,59.837269 -7.596385,57.602821 -6.688732,57.807568 -7.187776,60.036827</gml:coordinates>
</safe:footPrint>
</safe:frame>
</safe:frameSet>
</xmlData>
</metadataWrap>
</metadataObject>

In [33]:
# We now have the correct metadataObject group, and extract the coordinates

coord = manifest.metadataSection.find('metadataObject',{'ID':'measurementFrameSet'}).find('gml:coordinates')

print('Coord group:')
print(coord)

# The content within this xml file is no longer in xml format so we need to do the rest manually
# Now to only get the data we can do:

coord = coord.getText()
print('Coord data:')
print(coord)

# And lastly we place it in a list of lists, as this is sequential long, lat data

coord = [[float(j) for j in i.split(',')] for i in coord.split(' ')]

print('Coord formated:')
print(coord)

Coord group:
<gml:coordinates>-8.098805,59.837269 -7.596385,57.602821 -6.688732,57.807568 -7.187776,60.036827</gml:coordinates>
Coord data:
-8.098805,59.837269 -7.596385,57.602821 -6.688732,57.807568 -7.187776,60.036827
Coord formated:
[[-8.098805, 59.837269], [-7.596385, 57.602821], [-6.688732, 57.807568], [-7.187776, 60.036827]]


## Calibration Files

Here we will extract some calibration vectors

While this will always work for the example product type, this may not directly work for others, always make sure to double check in and output

In [23]:
from bs4 import BeautifulSoup

with open('S1B_IW_SLC__1SSH_20210717T014251_20210717T014306_027827_03520D_4B30.SAFE/annotation/calibration/calibration-s1b-iw1-slc-hh-20210717t014253-20210717t014304-027827-03520d-001.xml') as f:
    iw1_calibration = f.read()

iw1_calibration = BeautifulSoup(iw1_calibration,'xml')

In [40]:
# If we want the first entry of productType we can do:
iw1_calibration.productType
# However this only returns the very first entry, this can cause unexpected behavior, try using the full path, or be sure this is the entry you wanted
# This for example is not a top level entry, the full path to it would be
iw1_calibration.calibration.adsHeader.productType

<productType>SLC</productType>

In [37]:
# Lets get all header data, using children to get a list of all subattributes
header = {}
for i in iw1_calibration.adsHeader.children:
    #We filter out all children without content
    if len(i.getText()) != 0 and not i.getText().isspace():
        header[i.name] = i.getText()
header

{'missionId': 'S1B',
 'productType': 'SLC',
 'polarisation': 'HH',
 'mode': 'IW',
 'swath': 'IW1',
 'startTime': '2021-07-17T01:42:53.624770',
 'stopTime': '2021-07-17T01:43:04.954997',
 'absoluteOrbitNumber': '27827',
 'missionDataTakeId': '217613',
 'imageNumber': '001'}

In [41]:
# The next section is calibrationInformation, it only has one entry:
iw1_calibration.calibration.calibrationInformation

<calibrationInformation>
<absoluteCalibrationConstant>1.393000e+00</absoluteCalibrationConstant>
</calibrationInformation>

In [43]:
# To retrieve the last entry group the calibration vectors we can do
reduce_out(iw1_calibration.calibration.calibrationVectorList)

<calibrationVectorList count="18">
<calibrationVector>
<azimuthTime>2021-07-17T01:42:51.809714</azimuthTime>
<line>-1029</line>
<pixel count="518">0 40 80 120 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800 840 880 920 960 1000 1040 1080 1120 1160 1200 1240 1280 1320 1360 1400 1440 1480 1520 1560 1600 1640 1680 1720 1760 1800 1840 1880 1920 1960 2000 2040 2080 2120 2160 2200 2240 2280 2320 2360 2400 2440 2480 2520 2560 2600 2640 2680 2720 2760 2800 2840 2880 2920 2960 3000 30...


In [48]:
import numpy as np
import datetime as dt

# Below I will define a function to extract all elements of a calibration vector and transform dates into datetime objects, floats into floats, and lists into numpy arrays of floats
def extract_cal_vector(cal_vector):
    # The """ text """ is a docstring, always add docstrings
    # They should include the functions purpose and a descriprion of in and (out) -put
    # How you format this is up to you, it should be pythonic however, so readable and understandable over all else
    """ Extracts data from xml calibration vector entry
    ----------------------------------------------------
        input:
    param: cal_vector (bs4.element.Tag) ---> xml vector endry as returned by bs4.BeautifulSoup
        
        output:
    param: calibration (dict) ---> dictionary containing xml name fields with corresponding formated text
    """
    calibration = {}
    # We will use the .children attribute to extract each individual element in the form <identifier>content</identifier>
    # Then we will add all calibration vectors to a dictionary object for easier use
    for i  in cal_vector.children:
        # First we check if the child has content or is just newlines and whitespaces, 
        # note that i is not a string object but a bs4 Tag object
        if len(i.getText()) != 0 and not i.getText().isspace(): 
            # We get the name of the identifier
            identifier = i.name
            # Next we get the associated text
            cont = i.getText()
            # Now we will format the loaded data and assign it to our dictionary object
            if 'T' in cont:
                # If the object is a time object we load it as a datetime object
                calibration[identifier] = dt.datetime.strptime(cont, '%Y-%m-%dT%H:%M:%S.%f')
            elif len(cont.split(' ')) == 1:
                # If it is a singular value
                calibration[identifier] = float(cont)
            else:
                # Otherwise we place the items in a numpy array for further use (all data is seperated using a single whitespace)
                calibration[identifier] = np.fromstring(cont, sep=' ',dtype=float)
    return calibration

In [49]:
# We can extract all calibration vectors using:
cal_vectors = iw1_calibration.calibration.calibrationVectorList.find_all('calibrationVector')
# Now we can iterate over the calibration vectors to extract their data
cal_vectors_dict = {}
count = 1 
# We will index the calibration vectors using number indices, the exact meaning of this must be extracted from the technical and user guide 
for i in cal_vectors:
    # Extract all attributes
    cal_vectors_dict[count] = extract_cal_vector(i)
    # And add one to count
    count += 1

del cal_vectors

In [50]:
# We can then check how many calibration vectors we have
cal_vectors_dict.keys()
# As we can see the max index 18 matches the count as returned to us above

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])

In [51]:
# Each cal vector then has the attributes
cal_vectors_dict[1].keys()

dict_keys(['azimuthTime', 'line', 'pixel', 'sigmaNought', 'betaNought', 'gamma', 'dn'])

## Support files

In [52]:
from bs4 import BeautifulSoup


#A helper function to print reduced output
def reduce_out(i,len=500):
    print(str(i)[:len:]+'...')


with open('S1B_IW_SLC__1SSH_20210717T014251_20210717T014306_027827_03520D_4B30.SAFE/support/s1-level-1-noise.xsd') as f:
    support_noise = f.read()

support_noise = BeautifulSoup(support_noise,'xml')

In [111]:
# Now we create a nested dictionary object to hold all the data
support_noise_dict = {}
# We will extract all elements complexType
for i in support_noise.schema.find_all('complexType'):
    # Then we get the children of each complexType and define a temporary dict to assign each parameter to 
    tmp_dict = {}
    for j in i.children:
        #Filter out empty entries
        if len(j.getText()) != 0 and not j.getText().isspace(): 
            # There is a lot of newlines in here so we will do some formating
            # Split along newlines
            text = j.getText().split('\n')
            # Filter empty lines
            text = [k for k in text if len(k)>0]
            # recombine non empty strings
            text = '\n'.join(text)
            #assign to temporary dict
            tmp_dict[j.name] = text
    # Now we assign the tmp_dict to our final dictionary, as its name we will be using the groups name
    support_noise_dict[i.get('name')]= tmp_dict


In [112]:
# Now we have our dict
support_noise_dict.keys()

dict_keys(['noiseRangeVectorType', 'noiseRangeVectorListType', 'noiseAzimuthVectorType', 'noiseAzimuthVectorListType', 'l1NoiseVectorType'])

In [113]:
support_noise_dict['noiseRangeVectorType'].keys()

dict_keys(['annotation', 'sequence'])

In [114]:
print(support_noise_dict['noiseRangeVectorType']['annotation'])

Annotation record for range noise vectors.


In [115]:
print(support_noise_dict['noiseRangeVectorType']['sequence'])

Zero Doppler azimuth time at which noise vector applies [UTC].
Image line at which the noise vector applies.
Image pixel at which the noise vector applies. This array contains the count attribute number of integer values (i.e. one value per point in the noise vector), separated by spaces. The maximum length of this array will be one value for every pixel in an image line, however in general the vectors will be subsampled.
Range thermal noise correction vector power values. This array contains the count attribute number of floating point values separated by spaces. 


This concludes the introduction to xml, all methods shown above are generally applicable however will require some modification based on the data your using


In [3]:
import os 
for subdir, dirs, files in os.walk('S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP'):
    for i in files:
        print(subdir, i)

S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255 MTD_DS.xml
S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255/QI_DATA SENSOR_QUALITY.xml
S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255/QI_DATA FORMAT_CORRECTNESS.xml
S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255/QI_DATA RADIOMETRIC_QUALITY.xml
S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255/QI_DATA GEOMETRIC_QUALITY.xml
S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/DATASTRIP/DS_VGS2_20210801T220817_S20210801T184255/QI_DATA GENERAL_QUALITY.xml


In [9]:
def PathToDict(path):
    """Creates nested tree structure from filepath
    DIrectly taken from https://stackoverflow.com/questions/19522004/building-a-dictionary-from-directory-structure"""
    st = os.stat(path)
    result = {}
    result['active'] = True
    #result['stat'] = st
    result['full_path'] = path
    if S_ISDIR(st.st_mode):
        result['type'] = 'd'
        result['items'] = {
            name : PathToDict(path+'/'+name)
            for name in os.listdir(path)}
    else:
        result['type'] = 'f'
    return result

In [11]:
files = PathToDict('S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE')

In [35]:
for i in files['items']:
    print(type(i))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


In [15]:
with open('S2A_MSIL1C_20210801T182921_N0301_R027_T11RNQ_20210801T220817.SAFE/manifest.safe','r') as f:
    file_cont = f.read()

In [24]:
import xml.etree.ElementTree as ET
import re

tree = ET.fromstring(file_cont)
vehicle = {re.sub(r'{.*}', '', node.tag): node.text for node in tree}


In [29]:
vehicle

{'informationPackageMap': '\t\t',
 'metadataSection': '\t\t\t\t\t\t',
 'dataObjectSection': '\n        '}

In [36]:
import xmltodict
pprint.pprint(xmltodict.parse(file_cont))

OrderedDict([('xfdu:XFDU',
              OrderedDict([('@xmlns:gml', 'http://www.opengis.net/gml'),
                           ('@xmlns:safe',
                            'http://www.esa.int/safe/sentinel/1.1'),
                           ('@xmlns:xfdu', 'urn:ccsds:schema:xfdu:1'),
                           ('@version',
                            'esa/safe/sentinel/1.1/sentinel-2/msi/archive_l1c_user_product'),
                           ('informationPackageMap',
                            OrderedDict([('xfdu:contentUnit',
                                          OrderedDict([('@dmdID',
                                                        'acquisitionPeriod '
                                                        'platform'),
                                                       ('@pdiID', 'processing'),
                                                       ('@textInfo',
                                                        'SENTINEL-2 MSI '
                                

In [25]:
vehicle

{'informationPackageMap': '\t\t',
 'metadataSection': '\t\t\t\t\t\t',
 'dataObjectSection': '\n        '}