Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/10/tomo bins #18

Merged
merged 27 commits into from
Aug 3, 2023
Merged

Issue/10/tomo bins #18

merged 27 commits into from
Aug 3, 2023

Conversation

hangqianjun
Copy link
Contributor

@hangqianjun hangqianjun commented Jun 19, 2023

Change Description

Solution Description

In this PR I have added a tomographer module under src/rail/estimation. It contains two generic tomographer classes:

  • PZTomographer, which takes per-galaxy n(z) from a qp.Ensemble object and output a tabular object with tomographic binning;
  • CatTomographer, which takes catalogue-like data and output a tabular object with tomographic binning;
    The second type will be compatible with the classifiers in tomo-challenge, where features in the catalogue are used to assign tomographic bins.

For each of these types, I've added an example classifier in algos/. naiveClassifierSRD is a PZtomographer that uses simple point estimate SRD binning; randomForestClassifier is a CatTomographer which is adapted from TXPipe.

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Build/CI Change Checklist

  • If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
  • If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

  • Any new or updated docstrings use the NumPy docstring format.
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover any changes
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

@codecov
Copy link

codecov bot commented Jun 19, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.19% 🎉

Comparison is base (c84ab6a) 95.86% compared to head (727298d) 96.05%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   95.86%   96.05%   +0.19%     
==========================================
  Files          29       32       +3     
  Lines        1643     1724      +81     
==========================================
+ Hits         1575     1656      +81     
  Misses         68       68              
Files Changed Coverage Δ
src/rail/estimation/algos/equal_count.py 100.00% <100.00%> (ø)
src/rail/estimation/algos/uniform_binning.py 100.00% <100.00%> (ø)
src/rail/estimation/classifier.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aimalz aimalz self-requested a review June 26, 2023 15:33
@hangqianjun hangqianjun self-assigned this Jul 4, 2023
@hangqianjun
Copy link
Contributor Author

While trying to writing a unit test for the PZTomographer algorithm, I realised that there doesn't seem to be test qp.Ensemble data available in src/rail/examples_data/testdata/. Could we add a small test qp file like test_dc2_training_9816.hdf5?

@eacharles
Copy link
Collaborator

eacharles commented Jul 4, 2023 via email

@hangqianjun
Copy link
Contributor Author

hangqianjun commented Jul 6, 2023

Depends on the size. If it is more that a few 10’s of galaxies it would be better to download the data than to include it in the repo.

Are these data available somewhere already?

@eacharles
Copy link
Collaborator

eacharles commented Jul 6, 2023

Does src/rail/examples_data/testdata/output_BPZ_lite.fits work? it is already in the repo.

@sschmidt23
Copy link
Collaborator

sschmidt23 commented Jul 6, 2023

@hangqianjun I was looking at this PR earlier, I don't think that we want to have input parameters in a .ini file unless there is a strong reason to do so, as that file could change between runs if someone pushes a change and updated values could lead to non-reproducibility on subsequent runs. Having all of the parameters as config params seems like a better way of tracking things in terms of reproducibility. Was there a reason to do this with a .ini file, e.g. maybe TXPipe does something like that?

Also, output_BPZ_lite.fits may not be the best test file, as the mode values for the first 10 galaxies are all either at z<0.2 or z>2.8, and thus the default tomographer value (for SRD binning) for all 10 is -99. We may need to include another small test file here with more appropriate mode values.

Other than that, this looks very nice!

@eacharles eacharles self-requested a review July 11, 2023 18:41
Copy link
Collaborator

@eacharles eacharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks Qianjun!

@hangqianjun
Copy link
Contributor Author

Hi @aimalz, are you happy with the module/algorithm names? Let me know if you have suggestions!

Copy link
Collaborator

@aimalz aimalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple comments on naming and documentation but my major request is to split this first classifier into two because the algorithms they're implementing are different enough that they should be distinguished by more than just a config keyword.

(Not a barrier to this PR, but just so we don't forget, a final follow-up to this would be to add a cell or two to the Golden Spike notebook in the vanilla rail repo demonstrating usage and plotting the results.)

src/rail/estimation/tomographer.py Outdated Show resolved Hide resolved
src/rail/estimation/algos/naiveClassifierSRD.py Outdated Show resolved Hide resolved
src/rail/estimation/algos/naiveClassifierSRD.py Outdated Show resolved Hide resolved
src/rail/estimation/algos/naiveClassifierSRD.py Outdated Show resolved Hide resolved
src/rail/estimation/tomographer.py Outdated Show resolved Hide resolved
tests/estimation/test_tomographer.py Outdated Show resolved Hide resolved
src/rail/estimation/algos/naiveClassifierSRD.py Outdated Show resolved Hide resolved
@aimalz aimalz requested a review from joezuntz July 13, 2023 17:23
Copy link
Collaborator

@aimalz aimalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple class for tomographic binning
5 participants