Skip to content
Philip Crotwell edited this page Jun 29, 2023 · 34 revisions

This is a collection of python scripts for working with responses in stationXml and extendedStationXml.

The basic idea is to convert a StationXML file with full response for all channels into a Extended StationXml for initial loading of existing metadata for seismic stations into SIS. Generally, you want to preserve the older metadata and have it in SIS, but do not expect to make changes and do not want to track hardware inventory for these older epochs. StationXML has limited hardware information, especially when it was generated by converting existing dataless seed or from the a FDSN Station web service which generally has the same limitations as dataless.

In order to reduce the volume of metadata input into SIS during this process, the script tries both to insert each unique response only once, and to not insert it at all if it matches an existing NRL response. This matching is conservative, in that a match is a match only if the responses are exactly the same. Oftentimes, even a response that originally came from the NRL will not match if the normalization frequency has been changed, for example because it is above the nyquist for the channel. In these cases, either manual intervention, or just accepting that the NRL response will not be used are the choices as comparing responses with different reference frequencies is too computationally expensive to do.

Requirements

Now using Python3.

Clone this repo with:

git clone https://github.com/crotwell/2extStationXML.git

Need sisxmlparser3_0.py in same directory, available here: http://wiki.anss-sis.scsn.org/SIStrac/wiki/SIS/Code or from Github.

Also need the NRL checked out in same directory and named nrl with:

svn checkout http://seiscode.iris.washington.edu/svn/nrl/trunk nrl

On OSX and maybe others, the sisxmlparser needs the lxml python library. See install instructions here: http://lxml.de/installation.html#installation

Might also need dateutil: pip3 install python-dateutil

sta2extsta.py now validates the input station xml before even trying to convert it to prevent garbage in garbage out issues. To validate you need the SIS validator and xerces in the same directory. To get them:

wget http://ftp.wayne.edu/apache/xerces/j/binaries/Xerces-J-bin.2.12.2-xml-schema-1.1.tar.gz
tar zxf Xerces-J-bin.2.12.2-xml-schema-1.1.tar.gz
wget http://wiki.anss-sis.scsn.org/SIStrac/raw-attachment/wiki/SIS/Code/validator.tar.gz
tar zxf validator.tar.gz

You also need the sis 3.0 schema file locally, which you can get with:

wget -O sis_extension_3.0.xsd https://raw.githubusercontent.com/anss-sis/extstationxml/master/sis_extension.xsd

or by cloning the sis git repo here: https://github.com/anss-sis/extstationxml

Things that might be helpful

Validation

If you want to run the validator manually use the validate.sh shell script. This ensures that the xml matches the xschema definition of stationxml.

validate.sh <stationxmlfile>

You may then want to run the IRIS StationXML Validator. Unlike the previous one, that validates the structure of the file, the IRIS StationXML Validator has rules that try to find places where the data doesn't make sense seismologically. These are things that are too complex to capture with the xschema file that is mainly about the file organization. You can get it from GitHub and run it like:

java -jar stationxml-validator-1.7.1.jar --input <stationxmlFile>

You should probably fix any errors from the validators before attempting the conversion from stationXML to extendedStationXML.

Units

SIS has limitations on the units for import via extendedStationXML. The cleanUniNames.py script attempts to fix unit names to reflect this, or at least warn you about them. It can be run like:

python3 cleanUnitNames.py -s dummy.xml -o cleandummy.xml -v

and will output a summary of how many changes and what names were replaced

Clean unit names
ok (90 changes)
    M/S => m/s
    COUNTS => counts
    M/S**2 => m/s**2

Sample rate index file

To speed the search for loggers up, a file with the final sample rate of each logger RESP file is read. This way only RESP files that match the sample rate of the channel need be checked. To generate

python3 checkNRL.py --samplerate --nrl <path to nrl>

which will put logger_samp_rate.sort inside the nrl directory. This needs to rerun after updating the nrl.

Debugging/Playing Scripts

Generally you will not run these, but can be useful for dubugging.

to match sensors and loggers in stationXML with NRL (mainly for my own testing) do:

python3 checkNRL.py --nrl <nrlDir> -s <stationxmlFile>

to find unique responses (mainly for my own testing) do:

python3 uniqResponses.py <stationxmlFile>

output will be list of channels where all channels on a line are more or less identical.

to compare responses from two channels to see what caused them to not be the same do:

python3 compare2chans.py <stationxmlFile> <chanA> <chanB>

where chanA is of the form output by uniqResponses like CO.BIRD.00.HHE_2012-09-20T14:00:00

To get a list of channels in a stationxml file do:

python3 compare2chans.py <stationxmlFile> --list

And now the main thing...

to convert stationXML to extendedStationXML, replacing NRL responses with nrl-subResponse and all other responses by named responses using ResponseDictLink and possibly deleting currently active epochs do:

python3 sta2extsta.py -h

usage: sta2extsta.py [-h] -s STATIONXML [--nrl NRL] [--namespace NAMESPACE]
                     [--operator OPERATOR] [--delcurrent] [-o [OUTFILE]]

Convert StationXML to ExtendedStationXML.

optional arguments:
  -h, --help            show this help message and exit
  -s STATIONXML, --stationxml STATIONXML
                        input FDSN StationXML file, often retrieved from http://service.iris.edu/fdsnws/station/1/
  --nrl NRL             replace matching responses with links to NRL
  --namespace NAMESPACE
                        SIS namespace to use for named responses, see http://anss-sis.scsn.org/sis/master/namespace/
  --operator OPERATOR   SIS operator to use for stations, see http://anss-sis.scsn.org/sis/master/org/
  --delcurrent          remove channels that are currently operating. Only do this if you want to go back and manually via the web interface add hardware for current epochs.
  --onlychan ONLYCHAN   only channels with codes matching regular expression,
                        ie BH. for all broadband. Can also match locid like
                        '00\.HH.' Empty loc ids for filtering as '--'
  -o [OUTFILE], --outfile [OUTFILE]

For example:

curl -o co_teeba.staxml 'http://service.iris.edu/fdsnws/station/1/query?net=CO&sta=TEEBA&level=response&format=xml&includecomments=true&nodata=404'
python3 cleanUnitNames.py -s co_teeba.staxml -o co_teeba_clean.staxml --verbose
python3 sta2extsta.py -s co_teeba_clean.staxml --nrl nrl --namespace SCSN-SC --operator SCSN-SC --delcurrent --onlychan HH. -o co_teeba.extstaxml

will output extended station xml for only channels like HHE, HHN, HHZ, HH1, HH2 from TEEBA, deleting the currently active epoch, using nrl responses where possible and outputing things in the SCSN-SC namespace and operator.

Because SIS currently does not overwrite preexisting named responses, and because it is nice to keep the number of them to a minimum, it is probably best to do one mega-run over all of your stations that do not have NRL responses. Currently the script names the named responses after the first channel it finds that contain those responses. Stations that only have NRL responses are fine to do on a piecemeal basis.

Be aware that the NRL matching is more or less exact or nothing. We assume it is better to not match something in the NRL than to match something that is not right. In particular if everything is the same but the frequency for the stage gain has been changed to match the passband of the channel, for example because the NRL frequency is above the nyquist, it will not match.

To see why a given channel did not match, you can do:

python3 isnrl.py -s <stationxml> -c <channelid> --nrl NRLdir --sensordir kinemetrics

to compare the sensor to the responses in the kinemetrics NRL subdir, or

python3 isnrl.py -s <stationxml> -c <channelid> --nrl NRLdir --loggerdir quanterra

to compare the logger stages to the responses in the quanterra NRL subdir. A FAIL or MATCH line will be printed for each RESP file in the given subdir along with a reason in the case of failures.

Speed

The slowest part of this process is parsing all of the NRL RESP files to check to see if they match, and in particular the datalogger parsing as there are simply so many of them. Doing the "sample rate" check, above, does help, but it can still be very slow. However, if you know you have a limited variety of dataloggers, then you can manually delete subdirectories within the NRL datalogger directory. For example, if you know that you never operated any guralp dataloggers, ever, and you a really really positive about this and have triple checked, then by deleting the dataloggers/guralp subdirectory, the script will not have to read all of those RESP files and so run faster. This is especially helpful when testing on a small number of stations/channels where you know the types of hardware that might be there.