Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CWB data ingestion into seiscomp3 #8

Closed
basaks opened this issue Aug 30, 2017 · 8 comments
Closed

CWB data ingestion into seiscomp3 #8

basaks opened this issue Aug 30, 2017 · 8 comments
Assignees

Comments

@basaks
Copy link
Contributor

basaks commented Aug 30, 2017

This is how this can proceed:

  1. Use cwb query to download gap filled and overlap corrected miniseed from the CWB server
  2. Miniseeds need to be pushed into slarchive (can use scart).
  3. Push events (from antelope for now) into sc3. Make sure events overlap the timeseries data. This will help here: https://github.com/GeoscienceAustralia/passive-seismic/tree/master/antelope.
  4. Test that event based miniseed queries work via sc3 utilities (scevtstreams and scart combination)
  5. Automation of the whole process

POC target: 1 month worth of historical data for all primary stations in aws.

Example of cwb query: query -h localhost -t ms -s ".*" -b "2005-02-08 05:37:52.98" -d 1000

Clarification on 3:

  • You can not create the antelope virtualenv inside our sc3 image in aws since antelope uses propriatary python libraries. Instead you will have to export the events in seiscomp3 xml format and copy them across to the sc3 image. This part will require some ingenuity to automate.
  • You will also need to make sure CWB instance contains waveform data that overlap the events that you are exporting from antelope and importing into sc3 in the previous step. @zhang01GA Will help with additional waveform data if you need so.
@basaks
Copy link
Contributor Author

basaks commented Sep 5, 2017

You might also have to use antelope sql to filer events by date for this POC.
@gasuperdev can help.

@basaks
Copy link
Contributor Author

basaks commented Sep 6, 2017

This is the POC we want....events captured in antelope via primary stations data, which also overlaps some temporary stations data.

Therefore, for events from antelope, and time series data from CWB, make sure that we have at least one temporary survey which overlaps the events/primary stations waveforms.

@niketchhajed
Copy link
Contributor

We are ingesting the waveform and events.xml data for the month of March 2017 into sc3 for this POC. So it would be good to have temporary waveform data for the same period as per Sudipta's comment above.

@basaks
Copy link
Contributor Author

basaks commented Sep 6, 2017

We cannot specify when temporary stations data are available, and because we always have data from primary statons, the strategy should be to select events/primary stations data that overlap with some available temporary stations data.

@niketchhajed
Copy link
Contributor

The antelope events for the month of March 2015 (#=695) have been ingested into the seiscomp3 instance "niket_pst_poc_latest". FYI: Out of the total 780 events that are registered in antelope DB, some of the events were found to be having fatal errors while being parsed by obspy library. So around 85 such events were filtered out during the event extraction process.

@basaks
Copy link
Contributor Author

basaks commented Sep 15, 2017

This are the tests we need to ensure/establish the procedure and gain confidence on the CWB->miniseed->seiscomp3 miniseed ingestion->seiscomp3 miniseed dump process:

  1. Use cwbquery to download miniseed
  2. Use scart to ingest data into seiscomp3
  3. Use scart to download data from seiscomp3
  4. read miniseed downloaded in step 1 into python (can use obspy). Compare with data dumped in step 3. Use numpy.testing.assert_array_alomost_equal for comparison.
  5. We also need tests like splitting step 1 into multiple parts (say 30% and 70%, or any combination), and then comparing with 100% of data in step 4. This will answer limitations of cwb/query for data delivery.

Step 4 is critical and should be done with wide range of time ranges and many stations. We probably need to make this comparison for every miniseed we ingest into the production seiscomp3.

These steps will ensure a robust data migration from CWB to seiscomp3.

@niketchhajed
Copy link
Contributor

The test of using scevtstreams in conjunction with scart to export the waveform timeseries data for a historical event from March 2015 to a miniseed file is successful. Attaching the waveform of such miniseed file for a 7.4 magnitude event on 2nd March 2015 as seen on scrttv.

image

There is a small issue with the event data being imported from antelope though. scevtstreams doesn't pre-pend the network code to the output for the antelope events, while does it for the native seiscomp3 events. For e.g. the output for native seiscomp3 events looks like:

centos@ip-172-31-26-134:/opt/seiscomp3/var/lib/archive/2015/IU/GNI/BHZ.D$ scevtstreams -E ga2017skmtsa -d mysql://sysop:sysop@localhost/seiscomp3 -L 0 -m 300
2017-09-18 15:59:49;2017-09-18 16:10:22;AU.GLAD.00.SH?
2017-09-18 15:59:49;2017-09-18 16:10:22;AU.TOO..BH?
2017-09-18 15:59:49;2017-09-18 16:10:22;AU.BRAT.00.SH?

but the output for imported events from antelope looks like:

centos@ip-172-31-26-134:~$ scevtstreams -E "quakeml:ga.ga.gov.au/event/00967663" -d mysql://sysop:sysop@localhost/seiscomp3 -L 0 -m 300
2015-03-02 19:06:15;2015-03-02 19:22:33;.GNI.00.BH?
2015-03-02 19:06:15;2015-03-02 19:22:33;.SBA.00.BH?
2015-03-02 19:06:15;2015-03-02 19:22:33;.MOO..BH?
2015-03-02 19:06:15;2015-03-02 19:22:33;.PMSA.00.BH?

with the network code "IU" missing in this case. This is just to make a note.

@basaks
Copy link
Contributor Author

basaks commented Sep 20, 2017

Our POC seiscomp3 machine should have all other utilities installed, specifically needs to run following bash scripts:

Make sure you use the PST-157 branch for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants