Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

power spectrum pipeline v1 #151

Merged
merged 67 commits into from
Jul 18, 2018
Merged

power spectrum pipeline v1 #151

merged 67 commits into from
Jul 18, 2018

Conversation

nkern
Copy link
Member

@nkern nkern commented Jul 8, 2018

This adds the
• pipelines/pspec_pipeline/pspec_pipe.py
• pipelines/pspec_pipeline/pspec_pipe.yaml
• pipelines/pspec_pipeline/pspec_batch.sh
scripts as version 1 of the power spectrum pipeline, which goes through the analysis blocks in the following order:

  1. Visibility data difference (e.g. for jacknives) [optional]
  2. OQE pipeline
  3. Bootstrap error pipeline

This branch should be merged after the stats_array branch is merged in.

A statistical evaluation step seems appropriate for this script (after 3.) but due to circular dependency with hera_stats its not possible to add it in. We should make a stats_pipe.py script and stats_pipe.yaml script in hera_stats/pipelines/stats_pipe of similar format to perform this last step of the full power spectrum pipeline.

[UPDATE]
The #145 PR and #148 PR were merged into this PR because they were all inter-dependent.

@ghost ghost assigned nkern Jul 8, 2018
@ghost ghost added the in progress label Jul 8, 2018
@coveralls
Copy link

coveralls commented Jul 8, 2018

Coverage Status

Coverage decreased (-0.6%) to 96.552% when pulling ccff2e4 on pspec_pipe into b97441c on master.

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few docstring changes needed, and a couple of minor questions. Nothing that should get in the way of doing an initial end-to-end run.

def merge_spectra(psc, groups=None, dset_split_str='_x_', ext_split_str='_', verbose=True):
"""
Iterate through a PSpecContainer and, within each specified group,
merge spectra of similar name but different psname extension.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does merge mean average them together, or just combine them into a single object? Should be a bit clearer in the docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning that this is a destructive operation, i.e. it removes the old unmerged spectra.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I changed the name to combine_psc_spectra to be more suggestive of what its actually doing, which is just setting up a combine_uvpspec call.


Parameters
----------
groups : list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for psc and verbose kwargs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning that this is an in-place operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, thanks

ext_split_str : str
The pattern used to split the dset name from its extension in the psname.
"""
from hera_pspec import uvpspec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this import statement here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I must of thought it was a circular dependency, but obviously its not. thanks

from hera_pspec import uvpspec
# load container
if isinstance(psc, (str, np.str)):
psc = PSpecContainer(psc, mode='rw')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe pass an overwrite kwarg as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

seed : int
Random seed to use in bootstrap resampling.

normal_std : bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention the name of the keys in the stats_array of the output UVPSpec where these error estimates can be found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point

To form cross spectra between these two files, one would feed a group_pair
of: group_pairs = [('even', 'odd'), ...].

A baseline-pair is formed by self-matching unique-files in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this first line. Does it mean it only allows baseline pairs to be formed between files that have the same identifier, e.g. for group_pairs = [('even', 'odd'),] it would do even.1234 x odd.1234, but not even.1234 x odd.1235?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will do the first, but not the second based on how you had it: i.e. it doesn't permute the keys for you. if you wanted to do both, you should feed:
group_pairs = [('even', 'odd'), ('odd', 'even')]


action_name : str
The name of the block in the pipeline

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M kwarg is not documented.


def job_monitor(run_func, iterator, action_name, M=map, lf=None, maxiter=1, verbose=True):
"""
Job monitoring function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit more explanation.

#-------------------------------------------------------------------------------
if run_diff:
# get algorithm parameters
globals().update(cf['algorithm']['diff'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this strategy of updating globals. It seems a bit opaque, and I can imagine things going wrong. Perhaps we can discuss this, but probably fine to leave it for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we can change to a more explicit call to the input parameter attributes

return 0

# launch pspec jobs
failures = hp.utils.job_monitor(pspec, range(len(jobs)), "PSPEC", lf=lf, maxiter=maxiter, verbose=verbose)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do any timing? If not, it might be neat to print some timing info so we can check on progress and estimate how long things are likely to run for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can add some basic timing into it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So each block in the pspec pipeline already has a timer, so we can do small runs and get a sense for how long each block takes.

nkern added 23 commits July 16, 2018 00:06
…ic err PR

where cov_array was not being propagated to pspec_run, and also wasn't being
loaded when read_from_group was called. added tests to check for these issues.
@philbull
Copy link
Collaborator

OK, @nkern feel free to merge when you're ready.

@nkern
Copy link
Member Author

nkern commented Jul 17, 2018

@philbull. Okay, I'm going to add some unittesting for the pipeline scripts, now that we've settled on keeping the pspec pipeline in hera_pspec and creating a secondary stats_pipe in hera_stats. should be done in a few hours...

@nkern
Copy link
Member Author

nkern commented Jul 18, 2018

...and we are there..

@nkern nkern merged commit 20e7efc into master Jul 18, 2018
@ghost ghost removed the in progress label Jul 18, 2018
@nkern nkern deleted the pspec_pipe branch July 18, 2018 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants