Merge branch 'main' into awo/update-readme

LSSTDESC · Jul 25, 2023 · 414c59d · 414c59d
2 parents d98c11b + b1b2738
commit 414c59d
Show file tree

Hide file tree

Showing 28 changed files with 370 additions and 337 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -33,12 +33,16 @@ The RAIL source code is publically available at https://github.com/LSSTDESC/RAIL
 
 .. toctree::
    :maxdepth: 1
-   :caption: Usage Demos
+   :caption: API
 
-   demos
+   api
 
 .. toctree::
    :maxdepth: 1
-   :caption: API
+   :caption: Usage Demos
 
-   api
+   Core Notebooks <https://rail-hub.readthedocs.io/projects/rail-notebooks/en/latest/core_notebooks.html>
+   Creation Notebooks <https://rail-hub.readthedocs.io/projects/rail-notebooks/en/latest/creation_notebooks.html>
+   Estimation Notebooks <https://rail-hub.readthedocs.io/projects/rail-notebooks/en/latest/estimation_notebooks.html>
+   Evaluation Notebooks <https://rail-hub.readthedocs.io/projects/rail-notebooks/en/latest/evaluation_notebooks.html>
+   Goldenspike <https://rail-hub.readthedocs.io/projects/rail-notebooks/en/latest/goldenspike.html>
diff --git a/docs/source/citing.rst b/docs/source/citing.rst
@@ -27,7 +27,7 @@ The following list provides the necessary references for external codes accessib
 
 | GPz: 
 
-| PZFlowPDF:
+| PZFlowEstimator:
 | J. F. Crenshaw et al (in prep)
 | `Zenodo link <https://zenodo.org/record/6369625#.Ylcpjy-cYW8>`_
 
@@ -38,4 +38,4 @@ The following list provides the necessary references for external codes accessib
 | trainZ:
 | `Schmidt, Malz et al (2020) <https://ui.adsabs.harvard.edu/abs/2020MNRAS.499.1587S/abstract>`_
 
-| varInference: 
+| VarInfStackSummarizer: 
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
@@ -1,6 +1,6 @@
-************
-Overview
-************
+**********************
+Contribution Overview
+**********************
 
 RAIL is a constellation of multiple packages developed publicly on GitHub and 
 welcomes all interested developers, regardless of DESC membership or LSST data rights.
@@ -83,6 +83,8 @@ Once you are satisfied with your PR, request that other team members review and
 approve it. You could send the request to someone whom you've worked with on the 
 topic, or one of the core maintainers of rail.
 
+**TODO what to call branches goes here**
+
 
 Merge
 -----
@@ -93,6 +95,15 @@ Once the changes in your PR have been approved, these are your next steps:
 2. enter ``closes #[#]`` in the comment field to close the resolved issue
 3. delete your branch using the button on the merged pull request.
 
+If you are making changes that affect multiple repositories, make a branch and PR on each one.
+The PRs should be merged and new releases made in the following order without long delays between steps:
+1. `rail_base`
+2. all per-algorithm repositories in any order
+3. `rail`
+4. `rail_pipelines`
+This will minimize the time when new installations from PyPI could be broken by conflicts.
+
+
 Reviewing a PR
 --------------
 
@@ -118,36 +129,39 @@ Naming conventions
 We follow the `pep8 <https://peps.python.org/pep-0008/#descriptive-naming-styles>`_ 
 recommendations for naming new modules and ``RailStage`` classes within them.
 
+
 Modules
 -------
 
 Modules should use all lowercase, with underscores where it aids the readability
-of the module name. If the module performs only one of p(z) or n(z) calculations,
-it is convenient to include that in the module name.
+of the module name. 
 
-e.g. 
+For example:
 
-*  ``simple_neurnet`` is a module name for algorithms that use simple neural networks from sklearn to compute p(z) or n(z)
-*  ``random_pz`` is an algorithm that computes p(z)
+*  ``skl_neurnet`` is a module name for algorithms that use scikit-learn's simple neural network implementation to estimate p(z)
+*  ``random_gauss`` is a module name for a p(z) estimation algorithm that assigns each galaxy a random Gaussian distribution
+
+It's good for the module name to specify the source of the implementation of a particularly common algorithm, e.g. ``minisom_som`` and ``somoclu_som`` are distinct.
+Note that these names should not be identical to the name of the package the algorithm came from, to avoid introducing namespace collisions for users who have imported the original package as well, i.e. ``pzflow_nf`` is a safer name than ``pzflow``.
 
 
 Stages
 ------
 
-RailStages are python classes and so should use CapWords convention. All rail 
-stages using the same algorithm should use the same short, descriptive prefix, 
-and the suffix is the type of stage.
+RailStages are python classes and so should use the CapWords convention. All 
+rail stages using the same algorithm should use the same short, descriptive 
+prefix, and the suffix is the type of stage.
 
 e.g.
 
-*  ``SimpleNNInformer`` is an informer using a simple neural network
-*  ``SimpleNNEstimator`` is an estimator using a simple neural network
+*  ``KNearNeighInformer`` is an informer using the k-nearest neighbors algorithm
+*  ``KNearNeighEstimator`` is an estimator using the k-nearest neighbors algorithm
 
 Possible suffixes include:
 
-* Summarizer
 * Informer
 * Estimator
+* Summarizer
 * Classifier
 * Creator
 * Degrader
@@ -164,3 +178,4 @@ for those workflows:
 * :ref:`Adding a new Rail Stage` without new dependencies
 * :ref:`Adding a new algorithm` (new engine or package)
 * :ref:`Sharing a Rail Pipeline`
+
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -187,15 +187,15 @@ For Delight you should be able to just do:
 
     pip install pz-rail-delight
 
-However, the particular estimator `Delight` is built with `Cython` and uses `openmp`.  Mac has dropped native support for `openmp`, which will likely cause problems when trying to run the `delightPZ` estimation code in RAIL.  See the notes below for instructions on installing Delight if you wish to use this particular estimator.
+However, the particular estimator `Delight` is built with `Cython` and uses `openmp`.  Mac has dropped native support for `openmp`, which will likely cause problems when trying to run the `DelightEstimator` estimation code in RAIL.  See the notes below for instructions on installing Delight if you wish to use this particular estimator.
 
-If you are installing RAIL on a Mac, as noted above the `delightPZ` estimator requires that your machine's `gcc` be set up to work with `openmp`. If you are installing on a Mac and do not plan on using `delightPZ`, then you can simply install RAIL with `pip install .[base]` rather than `pip install .[all]`, which will skip the Delight package.  If you are on a Mac and *do* expect to run `delightPZ`, then follow the instructions `here <https://github.com/LSSTDESC/Delight/blob/master/Mac_installation.md>`_ to install Delight before running `pip install .[all]`.
+If you are installing RAIL on a Mac, as noted above the `DelightEstimator` estimator requires that your machine's `gcc` be set up to work with `openmp`. If you are installing on a Mac and do not plan on using `DelightEstimator`, then you can simply install RAIL with `pip install .[base]` rather than `pip install .[all]`, which will skip the Delight package.  If you are on a Mac and *do* expect to run `DelightEstimator`, then follow the instructions `here <https://github.com/LSSTDESC/Delight/blob/master/Mac_installation.md>`_ to install Delight before running `pip install .[all]`.
 
 
-Installing FZBoost
+Installing FlexZBoost
 ------------------
 
-For FZBoost, you should be able to just do
+For FlexZBoost, you should be able to just do
 
 .. code-block:: bash
 
@@ -229,7 +229,7 @@ Using GPU-optimization for pzflow
 Note that the Creation Module depends on pzflow, which has an optional GPU-compatible installation.
 For instructions, see the `pzflow Github repo <https://github.com/jfcrenshaw/pzflow/>`_.
 
-On some systems that are slightly out of date, e.g. an older version of python's `setuptools`, there can be some problems installing packages hosted on GitHub rather than PyPi.  We recommend that you update your system; however, some users have still reported problems with installation of subpackages necessary for `FZBoost` and `bpz_lite`.  If this occurs, try the following procedure:
+On some systems that are slightly out of date, e.g. an older version of python's `setuptools`, there can be some problems installing packages hosted on GitHub rather than PyPi.  We recommend that you update your system; however, some users have still reported problems with installation of subpackages necessary for `flexzboost` and `bpz_lite`.  If this occurs, try the following procedure:
 
 Once you have installed RAIL, you can import the package (via `import rail`) in any of your scripts and notebooks.
 For examples demonstrating how to use the different pieces, see the notebooks in the `examples/` directory.

diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -72,8 +72,8 @@ Methods that estimate per-galaxy PDFs directly from photometry are referred to a
 Individual estimation and summarization codes are "wrapped" as RAIL stages so that they can be run in a controlled way.  
 
 **base design**: 
-Estimators for several popular codes `BPZ_lite` (a slimmed down version of the popular template-based BPZ code), `FlexZBoost`, and delight `Delight` are included in rail/estimation, as are an estimator `PZFlowPDF` that uses the same normalizing flow employed in the creation module, and `KNearNeighPDF` for a simple color-based nearest neighbor estimator.  
-The pathological `trainZ` estimator is also implemented.  
+Estimators for several popular codes `BPZliteEstimator` (a slimmed down version of the popular template-based BPZ code), `FlexZBoostEstimator`, and `DelightEstimator` are included in rail/estimation, as are an estimator `PZFlowEstimator` that uses the same normalizing flow employed in the creation module, and `KNearNeighEstimator` for a simple color-based nearest neighbor estimator.  
+The pathological `TrainZEstimator` estimator is also implemented.  
 Several very basic summarizers such as a histogram of point source estimates, the naive "stacking"/summing of PDFs, and a variational inference-based summarizer are also included in RAIL.
 
 **Usage**: 

diff --git a/examples/core_examples/FileIO_DataStore.ipynb b/examples/core_examples/FileIO_DataStore.ipynb
@@ -4,10 +4,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Data, files, IO, and RAIL\n",
+    "# Data, Files, IO, and RAIL\n",
     "\n",
-    "author: Sam Schmidt<br>\n",
-    "Last successfully run: Apr 26, 2023<br>\n",
+    "author: Sam Schmidt\n",
+    "\n",
+    "last successfully run: Apr 26, 2023"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "\n",
     "The switchover to a `ceci`-based backend has increased the complexity of methods of data access and IO, this notebook will demonstrate a variety of ways that users may interact with data in RAIL<br>\n",
     "\n",
@@ -17,7 +24,7 @@
     "\n",
     "In short, `tables_io` aims to simplify fileIO, and much of the io is automatically sorted out for you if your files have the appriorate extensions: that is, you can simply do a tables_io.read(\"file.fits\") to read in a fits file or tables_io.read(\"newfile.pq\") to read in a dataframe in parquet format.  Similarly, you can specify the output format via the extension as well.  This functionality is extended to `qp` and `RAIL` through their use of `tables_io`, and file extensions will control how files are read and written unless explicitly overridden.\n",
     "\n",
-    "Another concept used in the `ceci`-based RAIL when used in a Jupyter Notebook is the DataStore and DataHandle file specifications (see [RAIL/rail/core/data.py](https://github.com/LSSTDESC/RAIL/blob/main/rail/core/data.py) for the actual code implementing these).  `ceci` requires that each pipeline stage have defined `input` and `output` files, and is primarily geared toward pipelines rather than interactive runs with a jupyter notebook.  The DataStore enables interactive use of files in Jupyter.  We will demonstrate some useful features of the DataStore below.\n",
+    "Another concept used in the `ceci`-based RAIL when used in a Jupyter Notebook is the DataStore and DataHandle file specifications (see [rail_base/src/rail/core/data.py](https://github.com/LSSTDESC/rail_base/blob/main/src/rail/core/data.py) for the actual code implementing these).  `ceci` requires that each pipeline stage have defined `input` and `output` files, and is primarily geared toward pipelines rather than interactive runs with a jupyter notebook.  The DataStore enables interactive use of files in Jupyter.  We will demonstrate some useful features of the DataStore below.\n",
     "\n",
     "Let's start out with some imports:"
    ]
@@ -42,7 +49,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First, let's use tables_io to read in some example data.  There are two example files that ship with RAIL containing a small amount of cosmoDC2 data from healpix pixel `9816`, it is located in the `RAIL/tests/data/` directory in the RAIL repository, one for \"training\" and one for \"validation\".  Let's read in one of those data files with tables_io:\n",
+    "First, let's use tables_io to read in some example data.  There are two example files that ship with RAIL containing a small amount of cosmoDC2 data from healpix pixel `9816`, it is located in the `rail_base/src/rail/examples_data/testdata/` directory in the rail_base repository, one for \"training\" and one for \"validation\".  Let's read in one of those data files with tables_io:\n",
     "\n",
     "(NOTE: for historical reasons, our examples files have data that is in hdf5 format where all of the data arrays are actually in a single hdf5 group named \"photometry\".  We will grab the data specifically from that hdf5 group by reading in the file and specifying [\"photometry\"] as the group in the cell below.  We'll call our dataset \"traindata_io\" to indicate that we've read it in via tables_io, and distinguish it from the data that we'll place in the DataStore in later steps:"
    ]
@@ -219,9 +226,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Using the data in a pipeline stage: photo-z estimation example\n",
+    "## Using the data in a pipeline stage: photo-z estimation example\n",
     "\n",
-    "Now that we have our data in place, we can use it in a RAIL stage.  As an example, we'll estimate photo-z's for our data.  Let's train the `KNearNeighPDF` algorithm with our train_data, and then estimate photo-z's for the test_data.  We need to make the RAIL stages for each of these steps, first we need to train/inform our nearest neighbor algorithm with the train_data:"
+    "Now that we have our data in place, we can use it in a RAIL stage.  As an example, we'll estimate photo-z's for our data.  Let's train the `KNearNeighEstimator` algorithm with our train_data, and then estimate photo-z's for the test_data.  We need to make the RAIL stages for each of these steps, first we need to train/inform our nearest neighbor algorithm with the train_data:"
    ]
   },
   {
@@ -230,7 +237,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from rail.estimation.algos.knnpz import Inform_KNearNeighPDF, KNearNeighPDF"
+    "from rail.estimation.algos.k_nearneigh import KNearNeighInformer, KNearNeighEstimator"
    ]
   },
   {
@@ -239,7 +246,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "inform_knn = Inform_KNearNeighPDF.make_stage(name='inform_knn', input='train_data', \n",
+    "inform_knn = KNearNeighInformer.make_stage(name='inform_knn', input='train_data', \n",
     "                                            nondetect_val=99.0, model='knnpz.pkl',\n",
     "                                            hdf5_groupname='')\n"
    ]
@@ -268,7 +275,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "estimate_knn = KNearNeighPDF.make_stage(name='estimate_knn', hdf5_groupname='photometry', nondetect_val=99.0,\n",
+    "estimate_knn = KNearNeighEstimator.make_stage(name='estimate_knn', hdf5_groupname='photometry', nondetect_val=99.0,\n",
     "                                        model='knnpz.pkl', output=\"KNNPZ_estimates.hdf5\")"
    ]
   },
@@ -361,20 +368,6 @@
    "source": [
     "That's about it.  For more usages, including how to chain together multiple stages, feeding results one into the other with the DataStore names, see goldenspike.ipynb in the examples/goldenspike directory."
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

diff --git a/examples/core_examples/FluxtoMag_and_Deredden_example.ipynb b/examples/core_examples/FluxtoMag_and_Deredden_example.ipynb
@@ -5,8 +5,11 @@
    "id": "21af28b2",
    "metadata": {},
    "source": [
-    "author: Sam Schmidt<br>\n",
-    "Last successfully run: Apr 26, 2023<br>"
+    "# Flux to Mag And Deredden\n",
+    "\n",
+    "author: Sam Schmidt\n",
+    "\n",
+    "last successfully run: Apr 26, 2023"
    ]
   },
   {
@@ -54,12 +57,22 @@
     "test_data().info()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "53bbeef5",
+   "metadata": {},
+   "source": [
+    "### Fluxes to Mags"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "6c235d20-74bd-4acb-957a-d2f24faa5827",
    "metadata": {},
    "source": [
-    "To convert mags to fluxes, we need to specify patterns for the `flux_name` and `flux_err_name` columns to be converted, and the `mag_name` and `mag_err_name` columsn that will store the newly created magnitudes.  This is done as below, by specifying a string listing the bands, and `{band}` in the patterns where the individual bands will go.  The dictionary below duplicates the default behavior of the converter, but is written out explicitly as an example:"
+    "To convert fluxes to mags, we need to specify patterns for the `flux_name` and `flux_err_name` columns to be converted, and the `mag_name` and `mag_err_name` columns that will store the newly created magnitudes.\n",
+    "\n",
+    "This is done as below, by specifying a string listing the bands, and `{band}` in the patterns where the individual bands will go.  The dictionary below duplicates the default behavior of the converter, but is written out explicitly as an example:"
    ]
   },
   {
@@ -110,15 +123,11 @@
    "id": "7214fc16-3555-4553-9a0f-b145c21d86d0",
    "metadata": {},
    "source": [
-    "To deredden magnitudes we need to grab one of the dust maps used by the `dustmaps` package.  We'll grab the default Schlegel-Finkbeiner-Davis \"SFD\" map.  NOTE: This will download a file to your machine containing the SFD data!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6394d1ce-df67-4da3-a64e-e21a3cbfffd3",
-   "metadata": {},
-   "source": [
-    "We need to feed the location of the directory containing the newly created \"sfd\" maps to the stage, as we downloaded the data to the present working directory with the command above, that directory is just `\"./\"`"
+    "### Deredden Mags\n",
+    "\n",
+    "To deredden magnitudes we need to grab one of the dust maps used by the `dustmaps` package.  We'll grab the default Schlegel-Finkbeiner-Davis \"SFD\" map.  NOTE: This will download a file to your machine containing the SFD data!\n",
+    "\n",
+    "We need to feed the location of the directory containing the newly created \"sfd\" maps to the stage.  As we downloaded the data to the present working directory with the command above, that directory is just `\"./\"`"
    ]
   },
   {
@@ -189,7 +198,9 @@
     "tags": []
    },
    "source": [
-    "## for cleanup, uncomment the line below to delete that SFD map directory downloaded in this example\n"
+    "### Clean up\n",
+    "\n",
+    "For cleanup, uncomment the line below to delete that SFD map directory downloaded in this example:"
    ]
   },
   {
@@ -203,14 +214,6 @@
    "source": [
     "#! rm -rf sfd/"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "028193ea",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {