Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Spectrum() and SpectrumStream() classes #50

Merged
merged 18 commits into from
Apr 6, 2024
Merged

Conversation

claudiodsf
Copy link
Member

@claudiodsf claudiodsf commented Mar 28, 2024

For an ongoing project, I need to save spectra to the output directory.

This motivated me to tackle the technical debt which was accumulating in the Spectrum() class.

This is the first class I wrote 12 years ago (!) and at that time I decided that a quick and dirty –yet functional– solution was to make it inherit from ObsPy Trace() object.
One of the limits of this approach was that there wasn't any easy option (except of using pickle) to save spectra to disk.

So, I rewrote the Spectrum() class from scratch, while trying to keep the same interface of the previous one, with one difference:

  • The method Spectrum().get_freq() does not exist anymore: it has been replaced by the class attribute Spectrum().freq. I chose indeed to store all the frequencies, and not only the starting frequency and the frequency delta. In this way, we have the same interface for Spectrum().freq and Spectrum().freq_logspaced.

The main improvements are the following:

  • The new Spectrum() class, of course 😄

  • A new SpectrumStream() class, similar to ObsPy Stream(), to store Spectrum() objects. (Note that ObsPy Stream() will no longer accept Spectrum() objects, since they do not inherit anymore from Trace()).

  • Both Spectrum() and SpectrumStream() have a .write() method to save the spectra to disk in HDF5 or TEXT format. Both formats use YAML for serializing the metadata in Spectrum().stats.

  • The new HDF5 format is used for storing spectral residuals (replacing the pickle format).

  • A new config option save_spectra (default: False) is introduced, to save spectra to the output dir.

  • Finally, I added the function read_spectra() (similar to ObsPy read()) which will read an HDF5 or TEXT file and return a SpectrumStream() with one or more Spectrum() objects.

I thoroughly tested these new classes with my default examples and with many other events: the results are consistent, with some small (~1%) differences arising sometimes, probably because of the different way frequencies are stored.

However, before merging this PR, I'm calling @krisvanneste and @rcabdia to kindly check that the new code does not introduce any regression in your use cases, since you are both using the SourceSpec API.

Thanks! ☺️
Claudio

@krisvanneste
Copy link
Collaborator

Claudio,

I have an option to write the processed traces and the spectra (as "auxiliary" data) to an ASDF file (which is based on HDF5). This will need to be revised, but I should not otherwise be affected. I will test this tomorrow or beginning of next week.

Actually, it would be nice if obspy itself would support writing to HDF5, including the header information. None of the currently supported formats support preserving the metadata.

@claudiodsf
Copy link
Member Author

I have an option to write the processed traces and the spectra (as "auxiliary" data) to an ASDF file (which is based on HDF5). This will need to be revised, but I should not otherwise be affected. I will test this tomorrow or beginning of next week.

Thanks Kris!

Actually, it would be nice if obspy itself would support writing to HDF5, including the header information. None of the currently supported formats support preserving the metadata.

ObsPy Stream().write() and Trace().write() have a PICKLE option which should preserve everything. But that comes with all the limitations of pickle, namely dependency on the library versions.

A more portable HDF5 format (lighter than ASDF and built-in) would be definitively nice. Maybe you can take some inspiration from this work and propose a PR to ObsPy 😉?

@claudiodsf claudiodsf force-pushed the new_spectrum_class branch 2 times, most recently from bdd5bfe to 02e346a Compare March 29, 2024 14:53
@krisvanneste
Copy link
Collaborator

After a clean switch to the new_spectrum_class branch, I run into this error:

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 740, in build_spectra
    weight_st = _build_weight_spectral_stream(config, spec_st, specnoise_st)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 520, in _build_weight_spectral_stream
    weight = _build_weight_from_inv_frequency(spec_h)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 443, in _build_weight_from_inv_frequency
    snr_fmin = getattr(spec.stats, 'spectral_snratio_fmin', None)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\spectrum.py", line 54, in __getattr__
    return self[name]

KeyError: 'spectral_snratio_fmin'

This is very strange, as getattr should not throw an error if a default value (in this case None) is provided...
No hurry, as my weekend starts in a few minutes!

@claudiodsf
Copy link
Member Author

No hurry, as my weekend starts in a few minutes!

Will look into that on Tuesday!

Happy Easter! 🐣

@krisvanneste
Copy link
Collaborator

I investigated this problem a bit further. It happens for traces with no or too short noise window:

GR.GRA1..BH: No available noise window: a uniform weight will be applied

In that case, the spectrum header fields spectral_snratio_fmin and spectral_snratio_fmax are not set in the _check_spectral_sn_ratio function, and due to the way the __getattr__ method of AttribDict is implemented, getattr fails even if a default value is specified. The AttribDict in obspy does work with a default value.

The solution is to either set spectral_snratio_fmin and spectral_snratio_fmax also if there is no noise window, or perhaps revert to using obspy's AttribDict.

Happy Easter to you as well!

@krisvanneste
Copy link
Collaborator

A suggestion for writing to HDF5:

  • maybe all spectra could be grouped in the group 'Spectra'
  • maybe name the spectra by nslc-code (network-station-location-channel) instead of just numbering them
    ?

@claudiodsf
Copy link
Member Author

I investigated this problem a bit further. It happens for traces with no or too short noise window:

GR.GRA1..BH: No available noise window: a uniform weight will be applied

In that case, the spectrum header fields spectral_snratio_fmin and spectral_snratio_fmax are not set in the _check_spectral_sn_ratio function, and due to the way the __getattr__ method of AttribDict is implemented, getattr fails even if a default value is specified. The AttribDict in obspy does work with a default value.

The solution is to either set spectral_snratio_fmin and spectral_snratio_fmax also if there is no noise window, or perhaps revert to using obspy's AttribDict.

Happy Easter to you as well!

Hello, this should be fixed with the latest commit.

I prefer to use my implementation of AttributeDict because I would like the Spectrum() class to be independent of ObsPy (except for the Spectrum().from_obspy_trace() method, which lazy-imports obspy.core.trace.Trace).

Maybe, in the future, I'll use this AttributeDict class all along the code 😉

@claudiodsf
Copy link
Member Author

A suggestion for writing to HDF5:

  • maybe all spectra could be grouped in the group 'Spectra'

You mean something like:

HDF5_file
└──Spectra
    ├──spectrum_0000
    ├──spectrum_0001

?

What would be the advantage to have this top-level group? Would there be any other group than Spectra in the spectrum HDF5 file?

  • maybe name the spectra by nslc-code (network-station-location-channel) instead of just numbering them
    ?

The numbering is meant to reflect the order in the SpectrumStream object, which behaves like a list: Spectrum objects are accessed using an index.
In your example, what should happen if two spectra have the same nslc-code?

@claudiodsf
Copy link
Member Author

Force-pushed a better solution. Please verify again 😉 🙏

@krisvanneste
Copy link
Collaborator

Claudio,

The problem with the getattr call in the _check_spectral_sn_ratio function is solved now.
However, there is another problem. Actually, the traces that triggered the first problem because there was no noise window, do have a noise window. When I run them with the main branch and a fixed window length, the trace plot looks like this:
651 traces 00
When I switch to the new_spectral_class branch without any modification in my script, the trace plot is very different:
651 traces 00
Apparently, something goes wrong with the window definitions.

@krisvanneste
Copy link
Collaborator

Concerning the suggestions for the HDF5 format:

  • The idea for the "Spectrum" group is that maybe at some point you will want to write your processed traces to HDF5 as well. This would allow them to be written to the same file, but also if you don't it would still be helpful to know which kind of data is stored in the HDF5 file.
  • The advantage of using nslc codes as keys in the HDF5 file is that this makes it easy to extract the spectrum of a particular station. Without this, you either need to know the original index number of the spectrum or you need to read all spectra and check their nslc codes. As you point out, this does not allow duplicate nslc codes, but I think this is a very rare situation.

These are just suggestions, based on my experience with ASDF files. It's of course your decision, but it could promote this HDF5 format as a more general format for seismic data...

@claudiodsf
Copy link
Member Author

That's nasty! The time window has totally changed!

Let's try something first: I'll rebase this branch on main so that it will include the latest modificaitons on time window.

@claudiodsf
Copy link
Member Author

Rebased and force-pushed.

Could you please force-pull and try again?

@krisvanneste
Copy link
Collaborator

The problem is gone after force-pushing.
I still get empty noise windows for some stations, but that was also the case with the main branch, so nothing to do with the new spectrum class. I will investigate this later.

@claudiodsf
Copy link
Member Author

These are just suggestions, based on my experience with ASDF files. It's of course your decision, but it could promote this HDF5 format as a more general format for seismic data...

I'm convinced! 👍

See the last commit, where I implemented your idea, with the only difference of using spectrum_NNNNN_NET.STA.LOC.CHAN for keys 😉

@claudiodsf
Copy link
Member Author

@krisvanneste, ok for you to merge?

@krisvanneste
Copy link
Collaborator

Claudio,
I still have a question. I ran my test case again with the option to save the spectra. I then read the saved spectra and compared them to those I collect in my wrapper function (and which come directly from the build_spectra function. Strangely, they are not the same...
This is the spectrum for station GR.GRA1 written to the HDF5 file:
Spectrum GR.GRA1..BHH | 225 samples, 0.23-5.00 Hz | 0.02 Hz sample interval | 68 samples logspaced, 0.23-5.12 Hz | 0.02 log10([Hz]) sample interval logspaced
image
This is the one I collect in my wrapper function:
Spectrum GR.GRA1..BHH | 225 samples, 0.23-5.00 Hz | 0.02 Hz sample interval | 68 samples logspaced, 0.23-5.12 Hz | 0.02 log10([Hz]) sample interval logspaced
image
And this is what it looks like in the sourcespec plot:
image
I have the impression that the saved spectra have not been scaled yet. Is this the intention?

@claudiodsf
Copy link
Member Author

Very strange!

Spectra are written after all the manipulations and after the inversion (see here).

Looking at your figure, I have the impression that you read the .residuals.hdf5 file (and not the .spectra.hdf5 file).
Is that possible?

@krisvanneste
Copy link
Collaborator

Claudio,
Yes, sorry, a question I forgot to ask was: why is the output file called .residuals.hdf5?
That explains it then. However, I do not see a .spectra.hdf5 file in the output folder...

@krisvanneste
Copy link
Collaborator

OK, got it sorted now. I had to add a line to save the spectra in my wrapper function.
I see the HDF5 file contains both the actual spectra and the fitted spectra.

@claudiodsf
Copy link
Member Author

Sorry for late reply.

Yes, there is a new config option called save_spectra 😉

@krisvanneste
Copy link
Collaborator

Yes, but I also had to call the save_spectra function in my wrapper.

@claudiodsf
Copy link
Member Author

So, good to merge? 🙃

@krisvanneste
Copy link
Collaborator

Yes, go ahead!

Class is redesigned so that it does not inherit anymore from ObsPy's
Trace class. This allows to have a more flexible class that can be
expanded in the future.

Also add a SpectrumStream class to handle a collection of
Spectrum objects.
@claudiodsf claudiodsf merged commit 5c233b7 into main Apr 6, 2024
4 checks passed
@claudiodsf claudiodsf deleted the new_spectrum_class branch April 6, 2024 18:06
@claudiodsf
Copy link
Member Author

Merged!!! 🚀

@claudiodsf
Copy link
Member Author

Thanks Kris for the review and the suggestions 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants