Skip to content

Commit

Permalink
Start writing UVH5 memo
Browse files Browse the repository at this point in the history
  • Loading branch information
plaplant committed May 18, 2018
1 parent fd2e8fa commit 83bd004
Showing 1 changed file with 136 additions and 0 deletions.
136 changes: 136 additions & 0 deletions docs/references/uvh5_memo.tex
@@ -0,0 +1,136 @@
\documentclass[11pt, oneside]{article}
\usepackage{geometry}
\geometry{letterpaper}
%\geometry{landscape}
%\usepackage[parfill]{parskip}
\usepackage{graphicx}

\usepackage{amssymb}

\usepackage{hyperref}
\hypersetup{
colorlinks = true
}

\usepackage{cleveref}
\crefformat{footnote}{#2\footnotemark[#1]#3}

\title{Memo: UVH5 file format}
\author{Paul La Plante, and the pyuvdata team}
\date{\today}

\begin{document}
\maketitle
\section{Introduction}
This memo introduces a new HDF5\footnote{\url{https://www.hdfgroup.org/}}-based
file format of a UVData object in
pyuvdata\footnote{\url{https://github.com/HERA-Team/pyuvdata}}, a python package
that provides an interface to interferometric data. Here, we describe the
required and optional elements and the structure of this file format, called
\textit{UVH5}.

We assume that the user has a working knowledge of HDF5, as well as UVData
objects in pyuvdata. For more information about HDF5, please visit
\url{https://portal.hdfgroup.org/display/HDF5/HDF5}. For more information about
the parameters present in a UVData object, please visit
\url{http://pyuvdata.readthedocs.io/en/latest/uvdata_parameters.html}. An
example for how to interact with UVData objects in pyuvdata is available at
\url{http://pyuvdata.readthedocs.io/en/latest/tutorial.html}.

\section{Overview}
A UVH5 object contains the interferometric data from a radio telescope, as well
as the associated metadata necessary to interpret it. A UVH5 file contains two
primary HDF5 groups: the \verb+Header+ group, which contains the metadata, and
the \verb+Data+ group, which contains the data itself, the flags, and
information about the number of samples corresponding to the data. Datasets in
the \verb+Data+ group are also typically passed through HDF5's compression
pipeline, to reduce the amount of on-disk space required to store the data.
However, because HDF5 is aware of any compression applied to a dataset, there is
little that the user has to explicitly do when reading data. For users
interested in creating new files, the use of compression is not strictly
required by the UVH5 format, again because the HDF5 file is self-documenting in
this regard. However, be warned that most UVH5 files ``in the wild'' typically
feature compression of datasets in the \verb+Data+ group.

In the disucssion below, we discuss required and optional datasets in the
various groups. We note in parenthesis the corresponding attribute of a UVData
object. Note that in nearly all cases, the names are coincident, to make things
as transparent as possible to the user.

\section{Header}
The \verb+Header+ group of the file contains the metadata necessary to interpret
the data. We begin with the required parameters, then continue to optional
ones. Unless otherwise noted, all datasets are scalars (i.e., not arrays). The
preceision of the data type is also not specified as part of the format, because
in general the user is free to set it according to the desired use case (and
HDF5 records the precision and endianness when generating datasets). When using
the standard \verb+h5py+-based implementation in pyuvdata, this typically
results in 32-bit integers and double precision floating point numbers.

\subsection{Required Parameters}
\begin{itemize}
\item \textbf{latitude}: \textit{float} the latitude of the telescope site, in
radians. (\textit{latitude})
\item \textbf{longitude}: \textit{float} the longitude of the telescope site, in
radians. (\textit{longitude})
\item \textbf{altitude}: \textit{float} the altitude of the telescope site, in
meters. (\textit{altitude})
\item \textbf{instrument}: \textit{string} the name of the instrument, typically
the elescope name. (\textit{instrument})
\item \textbf{object_name}: \textit{string} the name of the object tracked by
the telescope. For a drift-scan antenna, this is typically
``zenith''. (\textit{object_name})
\item \textbf{history}: \textit{string} the history of the data
file. (\textit{history})
\item \textbf{phase_type}: \textit{string} the phase type of the
observation. Should be ``phased'' or ``drift''. Any other value is treated as
an unrecognized type. (\textit{phase_type})
\item \textbf{Nants_data}: \textit{int} the number of antennas that data in the
file corresponds to. May be smaller than the number of antennas in the
array. (\textit{Nants_data})
\item \textbf{Nants_telescope}: \textit{int} the number of antennas in the
array. May be larger than the number of antennas with data corresponding to
them. (\textit{Nants_telescope})
\item \textbf{ant_1_array}: \textit{int} an array of the first antenna indices
corresponding to baselines present in the data. This is a one-dimensional
array of size Nblts. (\textit{ant_1_array})
\item \textbf{ant_2_array}: \textit{int} an array of the second antenna indices
corresponding to baselines present in the data. This is a one-dimensional
array of size Nblts. (\textit{ant_2_array})
\item \textbf{antenna_names}: \textit{string} an array of the names of antennas
present in the array. This is a one-dimensional array of size
Nants_telescope. Note there must be one entry for every unique antenna in
ant_1_array and ant_2_array, but there may be additional
entries. (\textit{antenna_names})
\item \textbf{baseline_array}: \textit{int} an array of baseline indices
corresponding to the data. This is a one-dimensional array of size Nblts. The
baseline index is calculated as:
\[
\mathtt{baseline} = 2048*(\mathtt{ant2}+1)+(\mathtt{ant1}+1) + 2^{16}.
\]
For current implementations, 32-bit integers (an ``int'' type in C) are
sufficient. However, for arrays with a very large number of antennas
($\mathcal{O}(10^6)$), 64-bit integers (a ``long'' type in C) may be
required. (\textit{baseline_array})
\item \textbf{Nbls}: \textit{int} the number of baselines present in the
data. For full cross-correlation data (including auto-correlations), this
should be Nants_data*(Nants_data+1)/2. (\textit{Nbls})
\item \textbf{Nblts}: \textit{int} The number of baseline-times (i.e., the
number of spectra) present in the data. Note that this value need not be equal
to Nbls * Ntimes. (\textit{Nblts})
\item \textbf{Nfreqs}: \textit{int} The number of frequency channels in the
data. (\textit{Nfreqs})
\item \textbf{Npols}: \textit{int} The number of polarization products in the
data. (\textit{Npols})
\item \textbf{Ntimes}: \textit{int} The number of time samples present in the
data. (\textit{Ntimes})
\item \textbf{Nspws}: \textit{int} The number of spectral windows present in the
data. (\textit{Nspws})
\item \textbf{}
\end{itemize}





\end{document}

0 comments on commit 83bd004

Please sign in to comment.