Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

power spectrum pipeline v1 #151

Merged
merged 67 commits into from
Jul 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
92d405d
modified: hera_pspec/utils.py
nkern Jun 11, 2018
f9a3221
modified: hera_pspec/utils.py
nkern Jun 11, 2018
7b25135
added config_pspec_pipe function and tests
nkern Jul 1, 2018
c49b815
updated pspec_run for better handling of dset loading
nkern Jul 1, 2018
21e5662
updated preprocess_data.py given new read_miriad_metadata func
nkern Jul 1, 2018
f2580f0
made argparser in pspec_run handle lists of tuples
nkern Jul 2, 2018
b506dc7
propagated pspec_run verbose to pspec
nkern Jul 2, 2018
9f551d9
increased pspecdata test coverage
nkern Jul 2, 2018
16e11c3
created container.merge_spectra func, and added pspec_type to uvpspec
nkern Jul 4, 2018
de17dbd
first round of code edits for grouping.bootstrap_run function
nkern Jul 4, 2018
7b72dab
modified: hera_pspec/utils.py
nkern Jun 11, 2018
f92afaa
modified: hera_pspec/utils.py
nkern Jun 11, 2018
7cc80b5
added config_pspec_pipe function and tests
nkern Jul 1, 2018
0bfdb3f
updated pspec_run for better handling of dset loading
nkern Jul 1, 2018
1481fea
propagated pspec_run verbose to pspec
nkern Jul 2, 2018
0bfa626
addressed pspec_run_config PR comments: added test for repeated dset …
nkern Jul 4, 2018
bd271f0
fixed lots of PEP8 from analytic error PR. also fixed bug from analyt…
nkern Jul 4, 2018
cb65021
added store_cov and dsets_std to pspec_run_argparser [skip ci]
nkern Jul 4, 2018
83b7354
fixed how store_cov is assigned in uvpspec_utils._select
nkern Jul 4, 2018
d7dcd25
increased test_pspecdata coverage
nkern Jul 4, 2018
eee28fe
added tests for container.merge_spectra
nkern Jul 5, 2018
c594113
added tests for utils.get_blvec_reds
nkern Jul 5, 2018
1a6599f
enabled testing.uvpspec_from_data to take bl_groups
nkern Jul 5, 2018
9dcfac3
added tests for grouping.bootstrap_run and bootstrap_resampled_error
nkern Jul 5, 2018
dfee5b4
updated config_pspec_blpairs output format
nkern Jul 6, 2018
0fee18e
added glob-parsing to pspecdata._load_dsets
nkern Jul 6, 2018
d61ed7c
modified: hera_pspec/utils.py
nkern Jul 6, 2018
18f771e
updated label handling in PSpecData.add()
nkern Jul 7, 2018
2d5025e
updated utils.log handling of traceback and updated load_config
nkern Jul 7, 2018
5532b4b
added .coveragerc omitting tests directory from coveralls
nkern Jul 7, 2018
80a39d1
another update to dset label handling in pspecdata
nkern Jul 7, 2018
2637986
added pspec_pipe.py and pspec_pipe.yaml
nkern Jul 7, 2018
522ccf3
added skeleton jacknife and bootstrap stages in pspec_pipe.py
nkern Jul 8, 2018
ede6776
added stats pipe to pspec_pipe.py and hera_stats dependency
nkern Jul 8, 2018
c71488f
added pspec_pipe.py to scripts in setup.py [skip ci]
nkern Jul 8, 2018
678bf45
condensed job monitoring to a single function in pspec_pipe.py
nkern Jul 8, 2018
0c06c85
removed stats pipeline due to circular dependency: this should go in …
nkern Jul 8, 2018
d173abb
added pspec_batch.sh for PBS run on NRAO [skip ci]
nkern Jul 8, 2018
d00a02a
modified: pspec_batch.sh
nkern Jul 8, 2018
b773cd3
removed hera_stats as optional dependency
nkern Jul 9, 2018
021d4d1
added utils.job_monitor
nkern Jul 9, 2018
fe8e80a
updated preprocess_data.py for new utils.job_monitor
nkern Jul 9, 2018
8df153c
updated pspec_pipe.py for new utils.job_monitor
nkern Jul 9, 2018
950b9c6
fixed line in testing.py after merge commit
nkern Jul 9, 2018
13e1e74
modified: ../utils.py
nkern Jul 9, 2018
f5fabaf
added utils.job_monitor tests
nkern Jul 9, 2018
e146994
split uvpspec.spw_array into spw_dly_array and spw_freq_array
nkern Jul 9, 2018
907ad36
updates to container.merge_spectra
nkern Jul 10, 2018
19eb814
fixed bug in uvpspec.combine_uvpspec across blpts for scalar_array
nkern Jul 10, 2018
7b0c88e
created utils.get_reds and modified utils.calc_reds to use it
nkern Jul 11, 2018
cd9f616
modified: pipelines/pspec_pipeline/pspec_batch.sh
nkern Jul 11, 2018
1d6e89a
addressed power spectrum pipe v1 PR comments
nkern Jul 15, 2018
4e937f6
rebase leftovers
nkern Jul 15, 2018
45029b7
added stats_array to bootstrap_resample_error and fixed bug
nkern Jul 16, 2018
46713e1
pep8 and docstrings
nkern Jul 16, 2018
ae909de
modified: pipelines/pspec_pipeline/pspec_pipe.py
nkern Jul 16, 2018
6a61a22
modified: pipelines/pspec_pipeline/pspec_pipe.py
nkern Jul 16, 2018
e959298
added omit_flags kwarg to UVPSpec.get_* funcs to omit flagged data
nkern Jul 17, 2018
0803aef
modified: pipelines/pspec_pipeline/pspec_batch.sh
nkern Jul 17, 2018
d23356b
added basic unit tests for pipeline scripts
nkern Jul 17, 2018
a9a6907
fixed tests in test_pspecdata
nkern Jul 18, 2018
1ac9da4
enforced numpy>=1.14 in travis
nkern Jul 18, 2018
4379c81
specified numpy=1.14 in travis
nkern Jul 18, 2018
0445648
added conda update conda in .travis.yml
nkern Jul 18, 2018
261c986
added multiprocess installation to travis
nkern Jul 18, 2018
7ea5f11
moved multiprocess install from conda to pip
nkern Jul 18, 2018
26efa40
modified: .travis.yml
nkern Jul 18, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[run]
omit = */tests/*

[report]
omit = */tests/*

5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ install:
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda install -q conda=4.3.25
- conda update -q conda
# Useful for debugging any issues with conda
- conda info -a

# create environment and install dependencies
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION numpy scipy nose pip matplotlib coverage h5py
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION numpy scipy nose pip matplotlib coverage
- source activate test-environment
- conda install -c conda-forge aipy
- pip install coveralls
Expand All @@ -41,6 +41,7 @@ install:
- pip install scikit-learn
- pip install h5py
- pip install pyyaml
- pip install multiprocess
- python setup.py install

before_script:
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ include *.md
include LICENSE
include hera_pspec/VERSION
include hera_pspec/GIT_INFO
include pipelines/*/*.yaml

153 changes: 136 additions & 17 deletions hera_pspec/container.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import numpy as np
import h5py
from hera_pspec.uvpspec import UVPSpec
import hera_pspec.version as version
from hera_pspec import uvpspec, version, utils
import argparse


class PSpecContainer(object):
"""
Expand Down Expand Up @@ -31,8 +32,7 @@ def __init__(self, filename, mode='r'):
# Open file ready for reading and/or writing
self.data = None
self._open()



def _open(self):
"""
Open HDF5 file ready for reading/writing.
Expand All @@ -41,10 +41,16 @@ def _open(self):
# allow non-destructive operations!
mode = 'a' if self.mode == 'rw' else 'r'
self.data = h5py.File(self.filename, mode)

# Update header info
if self.mode == 'rw':
# Update header
self._update_header()



# Denote as Container
if 'pspec_type' not in self.data.attrs.keys():
self.data.attrs['pspec_type'] = self.__class__.__name__

def _store_pspec(self, pspec_group, uvp):
"""
Store a UVPSpec object as group of datasets within the HDF5 file.
Expand All @@ -61,12 +67,11 @@ def _store_pspec(self, pspec_group, uvp):
raise IOError("HDF5 file was opened read-only; cannot write to file.")

# Get data and attributes from UVPSpec object (stored in dicts)
assert isinstance(uvp, UVPSpec)
assert isinstance(uvp, uvpspec.UVPSpec)

# Write UVPSpec to group
uvp.write_to_group(pspec_group, run_check=True)



def _load_pspec(self, pspec_group):
"""
Load a new UVPSpec object from a HDF5 group.
Expand All @@ -84,17 +89,16 @@ def _load_pspec(self, pspec_group):
"""
# Check that group is tagged as containing UVPSpec (pspec_type attribute)
if 'pspec_type' in pspec_group.attrs.keys():
if pspec_group.attrs['pspec_type'] != UVPSpec.__name__:
if pspec_group.attrs['pspec_type'] != uvpspec.UVPSpec.__name__:
raise TypeError("HDF5 group is not tagged as a UVPSpec object.")
else:
raise TypeError("HDF5 group is not tagged as a UVPSpec object.")

# Create new UVPSpec object and fill with data from this group
uvp = UVPSpec()
uvp = uvpspec.UVPSpec()
uvp.read_from_group(pspec_group)
return uvp


def _update_header(self):
"""
Update the header in the HDF5 file with useful metadata, including the
Expand All @@ -110,9 +114,9 @@ def _update_header(self):
if hdr.attrs['hera_pspec.git_hash'] != version.git_hash:
print("WARNING: HDF5 file was created by a different version "
"of hera_pspec.")
hdr.attrs['hera_pspec.git_hash'] = version.git_hash

else:
hdr.attrs['hera_pspec.git_hash'] = version.git_hash

def set_pspec(self, group, psname, pspec, overwrite=False):
"""
Store a delay power spectrum in the container.
Expand Down Expand Up @@ -143,7 +147,7 @@ def set_pspec(self, group, psname, pspec, overwrite=False):
if getattr(pspec, '__iter__', False) and len(pspec) == len(psname):
# Recursively call set_pspec() on each item of the list
for _psname, _pspec in zip(psname, pspec):
if not isinstance(_pspec, UVPSpec):
if not isinstance(_pspec, uvpspec.UVPSpec):
raise TypeError("pspec lists must only contain UVPSpec "
"objects.")
self.set_pspec(group, _psname, _pspec, overwrite=overwrite)
Expand All @@ -158,7 +162,7 @@ def set_pspec(self, group, psname, pspec, overwrite=False):
# No lists should pass beyond this point

# Check that input is of the correct type
if not isinstance(pspec, UVPSpec):
if not isinstance(pspec, uvpspec.UVPSpec):
raise TypeError("pspec must be a UVPSpec object.")

key1 = "%s" % group
Expand Down Expand Up @@ -309,3 +313,118 @@ def __del__(self):
self.data.close()
except:
pass


def combine_psc_spectra(psc, groups=None, dset_split_str='_x_', ext_split_str='_',
verbose=True, overwrite=False):
"""
Iterate through a PSpecContainer and, within each specified group,
combine UVPSpec (i.e. spectra) of similar name but varying psname extension.

Power spectra to-be-merged are assumed to follow the naming convention

dset1_x_dset2_ext1, dset1_x_dset2_ext2, ...

where _x_ is the default dset_split_str, and _ is the default ext_split_str.
The spectra names are first split by dset_split_str, and then by ext_split_str. In
this particular case, all instances of dset1_x_dset2* will be merged together.

In order to merge spectra names with no dset distinction and only an extension,
feed dset_split_str as '' or None. Example, to merge together: uvp_1, uvp_2, uvp_3
feed dset_split_str=None and ext_split_str='_'.

Note this is a destructive and inplace operation, all of the *_ext1 objects are
removed after merge.

Parameters
----------
psc : PSpecContainer object
A PSpecContainer object with one or more groups and spectra.

groups : list
A list of groupnames to operate on. Default is all groups.

dset_split_str : str
The pattern used to split dset1 from dset2 in the psname.

ext_split_str : str
The pattern used to split the dset name from its extension in the psname.

verbose : bool
If True, report feedback to stdout.

overwrite : bool
If True, overwrite output spectra if they exist.
"""
# load container
if isinstance(psc, (str, np.str)):
psc = PSpecContainer(psc, mode='rw')
else:
assert isinstance(psc, PSpecContainer)

# get groups
_groups = psc.groups()
if groups is None:
groups = _groups
else:
groups = [grp for grp in groups if grp in _groups]
assert len(groups) > 0, "no specified groups exist in this Container object"

# Iterate over groups
for grp in groups:
# Get spectra in this group
spectra = psc.data[grp].keys()

# Get unique spectra by splitting and then re-joining
unique_spectra = []
for spc in spectra:
if dset_split_str == '' or dset_split_str is None:
sp = spc.split(ext_split_str)[0]
else:
sp = utils.flatten([s.split(ext_split_str) for s in spc.split(dset_split_str)])[:2]
sp = dset_split_str.join(sp)
if sp not in unique_spectra:
unique_spectra.append(sp)

# Iterate over each unique spectra, and merge all spectra extensions
for spc in unique_spectra:
# check for overwrite
if spc in spectra and overwrite == False:
if verbose:
print "spectra {}/{} already exists and overwrite == False, skipping...".format(grp, spc)
continue

# get merge list
to_merge = [spectra[i] for i in np.where([spc in _sp for _sp in spectra])[0]]
try:
# merge
uvps = [psc.get_pspec(grp, uvp) for uvp in to_merge]
merged_uvp = uvpspec.combine_uvpspec(uvps, verbose=verbose)
# write to file
psc.set_pspec(grp, spc, merged_uvp, overwrite=True)
# if successful merge, remove uvps
for uvp in to_merge:
if uvp != spc:
del psc.data[grp][uvp]
except:
# merge failed, so continue
if verbose:
print "uvp merge failed for spectra {}/{}".format(grp, spc)


def get_combine_psc_spectra_argparser():
a = argparse.ArgumentParser(
description="argument parser for hera_pspec.container.combine_psc_spectra")

# Add list of arguments
a.add_argument("filename", type=str,
help="Filename of HDF5 container (PSpecContainer) containing "
"groups / input power spectra.")

a.add_argument("--dset_split_str", default='_x_', type=str, help='The pattern used to split dset1 '
'from dset2 in the psname.')
a.add_argument("--ext_split_str", default='_', type=str, help='The pattern used to split the dset '
'names from their extension in the psname (if it exists).')
a.add_argument("--verbose", default=False, action='store_true', help='Report feedback to stdout.')

return a
7 changes: 5 additions & 2 deletions hera_pspec/data/_test_utils.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ data:
root: ../hera_pspec/
subdirs:
- data
pairs: [['xx', 'xx'], ['yy', 'yy']]
template: zen.*.xx.HH.uvXA
beam: data/NF_HERA_Beams.beamfits
flags: zen.2458098.66239.yy.HH.uv.vis.uvfits.flags.npz
Expand All @@ -13,9 +14,11 @@ pspec:
weight: iC
norm: I
taper: none
groupname: test
groupname: None
little_h: True
avg_group: False
exclude_auto_bls: False
exclude_permutations: False

options:
foo: None
bar: [['foo', 'bar']]
Loading