Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option for z_truth to be inside an hdf5 group #94

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

joselotl
Copy link
Contributor

Problem & Solution Description (including issue #)

If the spec-z file has truth-z inside an h5 group (for example ['photometry']['z_true']) the evaluator needs to use hdf5_groupname to access this row.

Code Quality

  • My code follows the code style of this project
  • I have written unit tests or justified all instances of #pragma: no cover; in the case of a bugfix, a new test that breaks as a result of the bug has been added
  • My code contains relevant comments and necessary documentation for future maintainers; the change is reflected in applicable demos/tutorials (with output cleared!) and added/updated docstrings use the NumPy docstring format
  • Any breaking changes, described above, are accompanied by backwards compatibility and deprecation warnings

@joselotl joselotl linked an issue Mar 13, 2024 that may be closed by this pull request
@joselotl joselotl changed the title I added the option for z_truth to be inside an hdf5 group Added option for z_truth to be inside an hdf5 group Mar 22, 2024
Copy link

codecov bot commented Mar 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.78%. Comparing base (0b731ec) to head (f374463).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #94   +/-   ##
=======================================
  Coverage   96.78%   96.78%           
=======================================
  Files          31       31           
  Lines        1584     1585    +1     
=======================================
+ Hits         1533     1534    +1     
  Misses         51       51           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joselotl
Copy link
Contributor Author

I did a small change to the evaluator.py so that we would use catalogs where z_truth is inside an hdf5 group like ['photometry']

@sschmidt23
Copy link
Collaborator

I think what most of the estimator codes do is read in the training data as a whole and then grab the redshift from that data in a separate line. But, this doesn't cover the case where the redshift column might be in a different hdf5 group, should we be worried about that at all?
The other thing is, with the code in this PR, if there happened to be a column with name self.config.redshift_col in the top level hdf5, then the try statement would grab that even if the intended redshift_col was in the sub-group. I think maybe the better thing to do would be what we do in several of the estimator codes, e.g. this from rail_flexzboost:

        if self.config.hdf5_groupname:
            training_data = self.get_data('input')[self.config.hdf5_groupname]
        else:  # pragma: no cover
            training_data = self.get_data('input')
        speczs = np.array(training_data[self.config['redshift_col']])

@joselotl
Copy link
Contributor Author

@sschmidt23 If I try to use self.config.hdf5_groupname I get an error when it was not previously defined. Or if I add
hdf5_groupname=SHARED_PARAMS['hdf5_groupname']
to the config_options, the test asigns the value "photometry" when I want it to have a default value of None

@sschmidt23
Copy link
Collaborator

I just pushed a change where to get the test to pass I set an hdf5_groupname="" in the test's make_stage setup. If you want a default value of hdf5_groupname of None or "", then maybe we could define a config parameter with a different name from hdf5_groupname, maybe truth_groupname or something like that to distinguish it from the hdf5_groupname parameter that we use for the estimators, and then we can set that config parameter to have whatever default we want. I guess it's really a question of whether we think that the truth data is always going to be read from files that follow the same patterns of having and hdf5_groupname that is the same as the data from the estimators. If so, hdf5_groupname is probably fine, if we think it will occasionally be different, then maybe a unique config param is needed.

@eacharles eacharles force-pushed the 93-allow-evaluator-to-read-z_true branch from 6345890 to 6825bbe Compare April 2, 2024 23:11
@sschmidt23
Copy link
Collaborator

Ok, I'm fine with doing it this way, I'll approve the PR, if you think about things further and think we need a "truth_groupname" we can add that in a future PR.

Copy link
Collaborator

@sschmidt23 sschmidt23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@eacharles eacharles merged commit 9dd9522 into main Apr 3, 2024
6 checks passed
@joselotl joselotl deleted the 93-allow-evaluator-to-read-z_true branch May 1, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow evaluator to read z_true
3 participants