This is the most up to date version of this file format produced by Borealis version 0.7, the current version.
For data files from previous Borealis software versions, see here.
The pyDARNio format class for this format is BorealisRawacf found in the borealis_formats.
The rawacf format is intended to hold beamformed, averaged, correlated data.
Both site files and array-restructured files exist for this file type. Both are described below.
Array restructured files are produced after the radar has finished writing a file and contain record data in multi-dimensional arrays so as to avoid repeated values, shorten the read time, and improve human readability. Fields that are unique to the record are written as arrays where the first dimension is equal to the number of records recorded. Other fields that are unique to the slice or experiment (and are therefore repeated for all records) are written only once.
The group names in these files are the field names themselves, greatly reducing the number of group names in the file when compared to site files and making the file much more human readable.
The naming convention of the rawacf array-structured files are: :
[YYYYmmDD].[HHMM].[SS].[station_id].[slice_id].rawacf.hdf5
For example: :
20191105.1400.02.sas.0.rawacf.hdf5
This is the file that began writing at 14:00:02 UT on November 5 2019 at the Saskatoon site, and it provides data for slice 0 of the experiment that ran at that time. It has been array restructured because it does not have a .site designation at the end of the filename.
These files are zlib compressed which is native to hdf5 and no decompression is necessary before reading using your hdf5 library.
The file fields in the rawacf array files are:
FIELD NAME
| [dimensions]type |
description |
---|---|
agc_status_word uint32 [num_records] |
AGC status word. Bit position corresponds to the USRP motherboard/ transmitter. A '1' indicates an agc fault occurred at least once during integration |
averaging_method unicode |
A string describing the averaging method. Default is 'mean' but an experiment can set this to 'median' to get the median of all sequences in an integration period, and other methods to combine all sequences in an integration period could be added in the future. |
beam_azms float64 [num_records x max_num_beams] |
A list of the beam azimuths for each beam in degrees off boresite. Note that this is padded with zeroes for any record which has num_beams less than the max_num_beams. The num_beams field should be used to read the correct number of beams for each record. |
beam_nums uint32 [num_records x max_num_beams] |
A list of beam numbers used in this slice in this record. Note that this is padded with zeroes for any record which has num_beams less than the max_num_beams. The num_beams field should be used to read the correct number of beams for each record. |
blanked_samples uint32 [num_records x max_num_blanked_samples ] |
Samples that should be blanked because they occurred during transmission times, given by sample number (index into decimated data). Can differ from the pulses array due to multiple slices in a single sequence and can differ from record to record if a new slice is added. |
borealis_git_hash unicode |
Identifies the version of Borealis that made this data. Contains git commit hash characters. Typically begins with the latest git tag of the software. |
data_descriptors bytes [4] |
Denotes what each data dimension (in main_acfs, intf_acfs, xcfs) represents. = 'num_records', ‘max_num_beams’, 'num_ranges', 'num_lags' |
data_normalization_factor float32 |
Scale of all the filters used, multiplied, for a total scale to normalize the data by. |
experiment_comment unicode |
Comment provided in experiment about the experiment as a whole. |
experiment_id int16 |
Number used to identify the experiment. |
experiment_name unicode |
Name of the experiment file. |
first_range float32 |
Distance to use for first range in km. |
first_range_rtt float32 |
Round trip time of flight to first range in microseconds. |
freq uint32 |
The frequency used for this experiment, in kHz. This is the frequency the data has been filtered to. |
gps_locked bool [num_records] |
Designates if the local GPS had a lock during the entire integration period. False if it unlocked at least once. |
gps_to_system_time_diff float32 [num_records] |
The max time difference between box_time GPS time) and system time (NTP) during the integration. Negative when GPS time is ahead of system time. |
int_time float32 [num_records] |
Integration time in seconds. |
intf_acfs complex64 [num_records x max_num_beams x num_ranges x num_lags] |
Interferometer array correlations. Note that records that do not have num_beams = max_num_beams will have padded zeros. The num_beams array should be used to determine the correct number of beams to read for the record. |
intf_antenna_count uint32 |
Number of interferometer array antennas |
lags uint32 [number of lags, 2] |
The lags created from two pulses in the pulses array. Values have to be from pulses array. The lag number is lag[1] - lag[0] for each lag pair. |
lp_status_word uint32 [num_records] |
Low power status word. Bit position corresponds to the USRP motherboard/ transmitter. A '1' indicates low power occurred at least once during integration |
main_acfs complex64 [num_records x max_num_beams x num_ranges x num_lags] |
Main array correlations. Note that records that do not have num_beams = max_num_beams will have padded zeros. The num_beams array should be used to determine the correct number of beams to read for the record. |
main_antenna_count uint32 |
Number of main array antennas |
noise_at_freq float64 [num_records x max_num_sequences] |
Noise at the receive frequency, with dimension = number of sequences. 20191114: not currently implemented and filled with zeros. Still a TODO. Note that records that do not have num_sequences = max_num_sequences will have padded zeros. The num_sequences array should be used to determine the correct number of sequences to read for the record. |
num_beams uint32 [num_records] |
The number of beams calculated for each record. Allows the user to correctly read the data up to the correct number and remove the padded zeros in the data array. |
num_blanked_samples uint32 [num_records] |
The number of blanked samples for each record. |
num_sequences int64 [num_records] |
Number of sampling periods (equivalent to number sequences transmitted) in the integration time for each record. Allows the user to correctly read the data up to the correct number and remove the padded zeros in the data array. |
num_slices int64 [num_records] |
Number of slices used simultaneously in the record by the experiment. If more than 1, data should exist in another file for the same time period as that record for the other slice. |
pulses uint32 [number of pulses] |
The pulse sequence in units of the tau_spacing. |
range_sep float32 |
Range gate separation (conversion from time (1/rx_sample_rate) to equivalent distance between samples), in km. |
rx_sample_rate float64 |
Sampling rate of the samples in this file's data in Hz. |
samples_data_type unicode |
C data type of the samples, provided for user friendliness. = 'complex float' |
scan_start_marker bool [num_records] |
Designates if the record is the first in a scan (scan is defined by the experiment). |
scheduling_mode unicode |
The mode being run during this time period (ex. 'common', 'special', 'discretionary'). |
slice_comment unicode |
Additional text comment that describes the slice written in this file. |
slice_id uint32 |
The slice id of this file. |
slice_interfacing unicode [num_records] |
The interfacing of this slice to other slices for each record. String representation of the python dictionary of {slice : interface_type, ... }. Can differ between records if slices updated. |
sqn_timestamps float64 [num_records x max_num_sequences] |
A list of GPS timestamps corresponding to the beginning of transmission for each sampling period in the integration time. These timestamps come back from the USRP driver and the USRPs are GPS disciplined and synchronized using the Octoclock. Provided in seconds since epoch. Note that records that do not have num_sequences = max_num_sequences will have padded zeros. The num_sequences array should be used to determine the correct number of sequences to read for the record. |
station unicode |
Three-letter radar identifier. |
tau_spacing uint32 |
The minimum spacing between pulses in microseconds. Spacing between pulses is always a multiple of this. |
tx_antenna_phases complex64 [num_records x num_main_antennas] |
The complex phase for each antenna for transmission, normalized such that full- power has magnitude 1. |
tx_pulse_len uint32 |
Length of the transmit pulse in microseconds. |
xcfs complex64 [num_records x max_num_beams x num_ranges x num_lags] |
Cross correlations of interferometer to main array. Note that records that do not have num_beams = max_num_beams will have padded zeros. The num_beams array should be used to determine the correct number of beams to read for the record. |
Site files are produced by the Borealis code package and have the data in a record by record style format. In site files, the hdf5 group names (ie record names) are given as the timestamp in ms past epoch of the first sequence or sampling period recorded in the record.
The naming convention of the rawacf site-structured files are: :
[YYYYmmDD].[HHMM].[SS].[station_id].[slice_id].rawacf.hdf5.site
For example: :
20191105.1400.02.sas.0.rawacf.hdf5.site
This is the file that began writing at 14:00:02 UT on November 5 2019 at the Saskatoon site, and it provides data for slice 0 of the experiment that ran at that time.
These files are often bzipped after they are produced.
The file fields under the record name in rawacf site files are:
Field name
| type |
description |
---|---|
agc_status_word uint32 |
AGC status word. Bit position corresponds to the USRP motherboard/ transmitter. A '1' indicates an agc fault occurred at least once during integration |
averaging_method unicode |
A string describing the averaging method. Default is 'mean' but an experiment can set this to 'median' to get the median of all sequences in an integration period, and other methods to combine all sequences in an integration period could be added in the future. |
beam_azms [float64, ] |
A list of the beam azimuths for each beam in degrees off boresite. |
beam_nums [uint32, ] |
A list of beam numbers used in this slice in this record. |
blanked_samples [uint32, ] |
Samples that should be blanked because they occurred during transmission times, given by sample number (index into decimated data). Can differ from the pulses array due to multiple slices in a single sequence. |
borealis_git_hash unicode |
Identifies the version of Borealis that made this data. Contains git commit hash characters. Typically begins with the latest git tag of the software. |
data_descriptors [bytes, ] |
Denotes what each data dimension (in main_acfs, intf_acfs, xcfs) represents. ('num_beams, 'num_ranges', 'num_lags') |
data_dimensions [uint32, ] |
The dimensions of the acf of xcf datasets. Dimensions correspond to data_descriptors. |
data_normalization_factor float32 |
Scale of all the filters used, multiplied for a total scale to normalize the data by. |
experiment_comment unicode |
Comment provided in experiment about the experiment as a whole. |
experiment_id int16 |
Number used to identify the experiment. |
experiment_name unicode |
Name of the experiment file. |
first_range float32 |
Distance to use for first range in km. |
first_range_rtt float32 |
Round trip time of flight to first range in microseconds. |
freq uint32 |
The frequency used for this experiment, in kHz. This is the frequency the data has been filtered to. |
gps_locked bool |
Designates if the local GPS had a lock during the entire integration period. |
gps_to_system_time_diff float32 |
The max time difference between box_time GPS time) and system time (NTP) during the integration. Negative when GPS time is ahead of system time. |
int_time float32 |
Integration time in seconds. |
intf_acfs [complex64, ] |
Interferometer array correlations. |
intf_antenna_count uint32 |
Number of interferometer array antennas |
lags [[uint32, ], ] |
The lags created from two pulses in the pulses array. Dimensions are number of lags x 2. Values have to be from pulses array. The lag number is lag[1] - lag[0] for each lag pair. |
lp_status_word uint32 |
Low power status word. Bit position corresponds to the USRP motherboard/ transmitter. A '1' indicates low power occurred at least once during integration |
main_acfs [complex64, ] |
Main array correlations. |
main_antenna_count uint32 |
Number of main array antennas |
noise_at_freq [float64, ] |
Noise at the receive frequency, with dimension = number of sequences. 20191114: not currently implemented and filled with zeros. Still a TODO. |
num_sequences int64 |
Number of sampling periods (equivalent to number sequences transmitted) in the integration time. |
num_slices int64 |
Number of slices used simultaneously in this record by the experiment. If more than 1, data should exist in another file for this time period for the other slice. |
pulses [uint32, ] |
The pulse sequence in units of the tau_spacing. |
range_sep float32 |
Range gate separation (conversion from time (1/rx_sample_rate) to equivalent distance between samples), in km. |
rx_sample_rate float64 |
Sampling rate of the samples in this file's data in Hz. |
samples_data_type unicode |
C data type of the samples, provided for user friendliness. = 'complex float' |
scan_start_marker bool |
Designates if the record is the first in a scan (scan is defined by the experiment). |
scheduling_mode unicode |
The mode being run during this time period (ex. 'common', 'special', 'discretionary'). |
slice_comment unicode |
Additional text comment that describes the slice written in this file. |
slice_id uint32 |
The slice id of this file. |
slice_interfacing unicode |
The interfacing of this slice to other slices. String representation of the python dictionary of {slice : interface_type, ... } |
sqn_timestamps [float64, ] |
A list of GPS timestamps corresponding to the beginning of transmission for each sampling period in the integration time. These timestamps come from the USRP driver and the USRPs are GPS disciplined and synchronized using the Octoclock. Provided in seconds since epoch. |
station unicode |
Three-letter radar identifier. |
tau_spacing uint32 |
The minimum spacing between pulses in microseconds. Spacing between pulses is always a multiple of this. |
tx_antenna_phases [complex64, ] |
The complex phase for each antenna for transmission, normalized such that full- power has magnitude 1. |
tx_pulse_len uint32 |
Length of the transmit pulse in microseconds. |
xcfs [complex64, ] |
Cross correlations of interferometer to main array. |
File restructuring to and from array files is done using an additional code package. Currently, this code is housed within pyDARNio.
Restructuring between site and array formats occur within the BorealisRestructure class, found here.
Conversion to SDARN IO (DMap rawacf) is available but can fail based on experiment complexity. The conversion also reduces the precision of the data due to conversion from complex floats to int of all samples. Similar precision is lost in timestamps.
HDF5 is a much more user-friendly format and we encourage the use of this data if possible. Please reach out if you have questions on how to use the Borealis rawacf files.
The mapping to rawacf dmap files is completed as follows:
rawacf_mapping