Emissivity compiling and loading. (#62)

* Created CSP archistecture and basic sfh class * Added variables to constants * implements utility functions & new test files * Implements event rate calculation over time * BPASS data_files for testing file_loading * Adds `at_time` function to the event rate class This method allows for the calculation of the event rate at a given lookback time. It uses the BPASS binning for this. This method is much faster, but suffers from a lack of accuracy. * Adds docstrings and extra tests * Implements calculate spectrum at_time and over_time functions Both functions are implemented for the BPASS spectra. However, the method is similar to the BPASS event rate calculation, which is sub optimal for the spectra calculation. It takes an extremely long time to do the calculation. A for loop of 100000 iterations is present, because each wavelength is seen as it's own "event type". It should be possible to remove this by implementing a function to take the separate wavelengths as a single unit. It would be interesting so see the influence of putting the for loop within a numba function, but thinking about restructuring the function is probably better. Including a "caching" of the BPASS spectra into a pickled DataFrame. Additional test files are required for this to run. * Add normalisation function for the BPASS spectrum * fixed hrd plot * adding csp folder * minor changes * Moved tests to CSP folder + fix in function name _type_check * added a hokitype error * Added some code review for Max and minor stylistic changes * Added some code review for Max and minor stylistic changes * Updated the tests * bla * Adjust tests and docstrings * Adjust eventrate functions to take lists of functions Instead of requiring a scipy interpolated spline as input. These functions now take a list of functions as input for the stellar formation and metallicity history. This adjustment has been made by using `numpy.interp` instead of scipy.interpolate. Furthermore, instead of scipy's spline integration `numpy.trapz` is now used to calculate the mass per bin. * Disabled spectra test * Added parametric star formation histories to SFH and refactored some variable names * fixing CI * Auto stash before merge of "csp" and "official_hoki/csp" * upload files for remote work * Improve testing and functions - Adds BPASS IMFS to the constants - Adds check to BPASS IMFS - Adds test for `data_compiler.SpectraData` - Reduces data usage for rate tests * Add SpectraCompiler test Uses `unittest.mock` to allow for a smaller DataFrame to be used. * Adds extensive tests for the CSP package * Update SpectraCompiler The compiler now outputs the BPASS spectra as a 3D `numpy.ndarray` this minimises storage and allows for faster and less memory hungry operations than the MultiIndex pandas DataFrame. The tests are updated accordingly and using unittest.mock a single spectra file is use. (This could be simulated too) * Documentation update + SFH input The documentation now shows the size and shape of each input and output variable. The `_over_time` and `_at_time` for `CSPSpectra` and `CSPEventRate` are now able to take the `SFH` object as an input. * Removed CSP from utils due to cyclical import The CSP class in `utils` was importing the SFH class, which was importing the `utils` module. Therefore, failing importing. Thus, the CSP class has been moved to a separate file * Updated setup.cfg to include csp_test_data subfolder * Removes leftover print statement + prints loading precompiled failed * Correct metallicity mid-points * Added the parametric star formation histories as public functions * at_time new default sample rate + load functions moved to hoki.load at_time now uses 1000 as the default sample rate. The BPASS bins can still be used is a negative number is given. The `load_spectra` and `load_rates` have been moved to `hoki.load` and renamed to `all_spectra` and `all_rates`, respectively. tests have been adjusted accordingly * Fixed typo * Removes cyclical import * Adjusted tests that use SpectraCompiler * Getting tox to work * private optimised mass_per_time_bin calculation Implements 2 new functions: - A custom numba trapezoidal function - A optimised mass_per_bin function for non-function-based data * Refactored Event Rate calculation from array data The original event rate calculation took functions as in input. This becomes cumbersome when you have data at specific time values. This calculation is done by taking a 2D matrix with time points on one axis and BPASS metallicities on the other. The values of the matrix describes the SFR at that moment in time. The calculate_rate_over_time function has been adjusted to use this new function. * Removed dependency and added more doc information * BUG: pass on kwarg to spectra compiler. * Fix SpectraCompiler * BUG: use np instead of pd in SpectraCompiler. * BUG: fix index and make test pass. * BUG: fix another regression bug. * Fix further regression bug. * Clean up loading module * Updates mass per bin calculation for vectorized funciton input mass_per_bin now takes a normal python callable and a vectorized function as input. The latter boosts performance significantly * Renames CSPEventRate functions and adds grid functions - CSPEventRate functions are now called: - calculate_rate_over_time -> over_time - calculate_rate_at_time -> at_time new functions: - grid_over_time: calculates the rates from a 2D SFH (per BPASS metallicity) from time_points to time bins - grid_at_time: calculate the event rate at a specific moment in lookback time from a 2D SFH grid * np.zeros to np.empty where possible * Fixed incorrect test of at_time and over_time * Set default sample rate for at_time Also includes some docstring updates * Adds `cache` parameter to over_time_spectrum * Adds grid functions for the spectra calculation CSPSpectra now has the following functions: Public: - at_time: at time with function input - over_time: over lb time with function input - grid_at_time: at time with SFH grid input - grid_over_time: over lb time with SFH grid input Private: - grid_rate_calculator_at_time - grid_rate_calculator_over_time The grid_rate_calculators_* are numba functions that have the grid_* around them as wrappers for the input and output * Adds return_time_edges to grid_over_time & bug fix sfh vectorisation * Emissivity compiling and loading. * Rename func, add existence of metallicity files, remove try/except Try/except has been replace by a if/else with a file check. * Finalise merge of `origin/dev1.6` into `data_loading` Co-authored-by: Max Briel <max.briel@auckland.ac.nz> Co-authored-by: heloises <heloisefw@gmail.com> Co-authored-by: Max Briel <max.briel@gmail.com> Co-authored-by: Heloise <HeloiseS@users.noreply.github.com> Co-authored-by: Max <14039563+maxbriel@users.noreply.github.com> Co-authored-by: Martin Glatzle <mglatzle@mpa-garching.mpg.de>
HeloiseS · Oct 6, 2020 · 8bbe799 · 8bbe799
1 parent 920edef
commit 8bbe799
Show file tree

Hide file tree

Showing 8 changed files with 278 additions and 95 deletions.
diff --git a/hoki/constants.py b/hoki/constants.py
@@ -40,6 +40,8 @@
               "imf135_100", "imf135_300", "imf135all_100", "imf170_100",
               "imf170_300"]
 
+# wavelengths at which spectra are given [angstrom]
+BPASS_WAVELENGTHS = np.arange(1, 100001)
 
 # Create a deprecation warning when using dummy_dict
 dummy_dict = {'timestep': 0, 'age': 1, 'log(R1)': 2, 'log(T1)': 3, 'log(L1)': 4, 'M1': 5, 'He_core1': 6, 'CO_core1': 7,

diff --git a/hoki/csp/tests/test_spectra.py b/hoki/csp/tests/test_spectra.py
@@ -13,7 +13,6 @@
 from hoki.csp.spectra import CSPSpectra
 from hoki.load import model_output
 from hoki.constants import BPASS_NUM_METALLICITIES
-import itertools
 data_path = pkg_resources.resource_filename('hoki', 'data')
 
 # Load Test spectra. Not an actual spectra.

diff --git a/hoki/data_compilers.py b/hoki/data_compilers.py
@@ -1,14 +1,14 @@
 """
-Author: Max Briel & Heloise Stevance
+Author: Heloise Stevance, Max Briel, & Martin Glatzle
 
 Objects and pipelines that compile BPASS data files into more convenient, more pythonic data types
 """
-
+import abc
 import numpy as np
-
 from hoki.constants import (DEFAULT_BPASS_VERSION, MODELS_PATH,
                             OUTPUTS_PATH, dummy_dicts,
-                            BPASS_IMFS, BPASS_METALLICITIES)
+                            BPASS_IMFS, BPASS_METALLICITIES,
+                            BPASS_WAVELENGTHS, BPASS_TIME_BINS)
 import hoki.load
 import pandas as pd
 import re
@@ -17,6 +17,120 @@
 from hoki.utils.exceptions import HokiFatalError, HokiUserWarning, HokiFormatError, HokiKeyError
 from hoki.utils.hoki_object import HokiObject
 
+class _CompilerBase(abc.ABC):
+    def __init__(self, input_folder, output_folder, imf, binary=True,
+                 verbose=False):
+        if verbose:
+            _print_welcome()
+
+        # Check population type
+        star = "bin" if binary else "sin"
+
+        # check IMF key
+        if imf not in BPASS_IMFS:
+            raise HokiKeyError(
+                f"{imf} is not a BPASS IMF. Please select a correct IMF.")
+
+        # Setup the numpy output
+        output = np.empty(self._shape(), dtype=np.float64)
+
+        # loop over all the metallicities and load all the spectra
+        for num, metallicity in enumerate(BPASS_METALLICITIES):
+            print_progress_bar(num, 12)
+            # Check if file exists
+            assert isfile(f"{input_folder}/spectra-{star}-{imf}.{metallicity}.dat"),\
+                   "HOKI ERROR: This file does not exist, or its path is incorrect."
+            output[num] = self._load_single(
+                f"{input_folder}/{self._input_name()}-{star}-{imf}.{metallicity}.dat"
+            )
+
+        np.save(f"{output_folder}/{self._output_name()}-{star}-{imf}", output)
+        # pickle the datafile
+        self.output = output
+        print(
+            f"Compiled data stored in {output_folder} as '{self._output_name()}-{star}-{imf}.npy'")
+        if verbose:
+            _print_exit_message()
+        return
+
+    @abc.abstractmethod
+    def _input_name(self):
+        return
+
+    @abc.abstractmethod
+    def _output_name(self):
+        return
+
+    @abc.abstractmethod
+    def _shape(self):
+        return
+
+    @abc.abstractmethod
+    def _load_single(self, path):
+        return
+
+class SpectraCompiler(_CompilerBase):
+    """
+    Pipeline to load the BPASS spectra txt files and save them as a 3D
+    `numpy.ndarray` binary file.
+
+    Attributes
+    output : `numpy.ndarray` (N_Z, N_age, N_lam) [(metallicity, log_age, wavelength)]
+    ----------
+        A 3D numpy array containing all the BPASS spectra for a specific imf
+        and binary or single star population.
+        Usage: spectra[1][2][1000]
+                (gives L_\\odot for Z=0.0001 and log_age=6.2 at 999 Angstrom)
+    """
+
+    def _input_name(self):
+        return "spectra"
+
+    def _output_name(self):
+        return "all_spectra"
+
+    def _shape(self):
+        return (
+            len(BPASS_METALLICITIES),
+            len(BPASS_TIME_BINS),
+            len(BPASS_WAVELENGTHS)
+        )
+
+    def _load_single(self, path):
+        return np.loadtxt(path).T[1:, :]
+
+
+class EmissivityCompiler(_CompilerBase):
+    """
+    Pipeline to load the BPASS ionizing txt files and save them as a 3D
+    `numpy.ndarray` binary file.
+
+    Attributes
+    ----------
+    output : `numpy.ndarray` (N_Z, N_age, 4) [(metallicity, log_age, band)]
+        A 3D numpy array containing all the BPASS emissivities (Nion [1/s],
+        L_Halpha [ergs/s], L_FUV [ergs/s/A], L_NUV [ergs/s/A]) for a specific
+        imf and binary or single star population.
+        Usage: spectra[1][2][0]
+                (gives Nion for Z=0.0001 and log_age=6.2)
+    """
+    def _input_name(self):
+        return "ionizing"
+
+    def _output_name(self):
+        return "all_ionizing"
+
+    def _shape(self):
+        return (
+            len(BPASS_METALLICITIES),
+            len(BPASS_TIME_BINS),
+            4
+        )
+
+    def _load_single(self, path):
+        return np.loadtxt(path)[:, 1:]
+
+
 class ModelDataCompiler(HokiObject):
     """
     Given a list of metalicities, a list of valid BPASS model attributes (in the dummy array), chosen types of model
@@ -202,81 +316,6 @@ def _select_input_files(z_list, directory=OUTPUTS_PATH,
 
     return input_file_list
 
-class SpectraCompiler():
-    """
-    Pipeline to load the BPASS spectra txt files and save them as a 3D
-    `numpy.ndarray` binary file.
-
-    Attributes
-    ----------
-    spectra : `numpy.ndarray` (13, 51, 100000) [(metallicity, log_age, wavelength)]
-        A 3D numpy array containing all the BPASS spectra for a specific imf
-        and binary or single star population.
-        Usage: spectra[1][2][1000]
-                (gives L_\\odot for Z=0.0001 and log_age=6.2 at 999 Angstrom)
-    """
-
-    def __init__(self, spectra_folder, output_folder, imf, binary=True, verbose=False):
-        """
-        Input
-        -----
-        spectra_folder : `str`
-            Path to the folder containing the spectra of the given imf.
-
-        output_folder : `str`
-            Path to the folder, where to output the pickled pandas.DataFrame
-
-        imf : `str`
-            BPASS IMF Identifiers
-            The accepted IMF identifiers are:
-            - `"imf_chab100"`
-            - `"imf_chab300"`
-            - `"imf100_100"`
-            - `"imf100_300"`
-            - `"imf135_100"`
-            - `"imf135_300"`
-            - `"imfall_300"`
-            - `"imf170_100"`
-            - `"imf170_300"`
-
-        binary : `bool`
-            If `True`, loads the binary files. Otherwise, just loads single stars.
-            Default=True
-
-        verbose : `bool`
-            If `True` prints out extra information for the user.
-            Default=False
-        """
-        if verbose:
-            _print_welcome()
-
-        # Check population type
-        star = "bin" if binary else "sin"
-
-        # check IMF key
-        if imf not in BPASS_IMFS:
-            raise HokiKeyError(
-                f"{imf} is not a BPASS IMF. Please select a correct IMF.")
-
-        # Setup the numpy output
-        spectra = np.empty((13, 51, 100000), dtype=np.float64)
-
-        # loop over all the metallicities and load all the spectra
-        for num, metallicity in enumerate(BPASS_METALLICITIES):
-            print_progress_bar(num, 12)
-            assert isfile(f"{spectra_folder}/spectra-{star}-{imf}.{metallicity}.dat"),\
-                   "HOKI ERROR: This file does not exist, or its path is incorrect."
-            spectra[num] = np.loadtxt(f"{spectra_folder}/spectra-{star}-{imf}.{metallicity}.dat").T[1:, :]
-
-        # pickle the datafile
-        np.save(f"{output_folder}/all_spectra-{star}-{imf}", spectra)
-        self.spectra = spectra
-        print(
-            f"Spectra file stored in {output_folder} as 'all_spectra-{star}-{imf}.npy'")
-        if verbose:
-            _print_exit_message()
-
-
 
 def _print_welcome():
     print('*************************************************')
@@ -285,11 +324,10 @@ def _print_welcome():
     print("\n\nThis may take a while ;)\nGo get yourself a cup of tea, sit back and relax\nI'm working for you boo!")
 
     print(
-      "\nNOTE: The progress bar doesn't move smoothly - it might accelerate or slow down - it's perfectly normal :D")
+        "\nNOTE: The progress bar doesn't move smoothly - it might accelerate or slow down - it's perfectly normal :D")
 
 
 def _print_exit_message():
-
     print('\n\n\n*************************************************')
     print('*******     JOB DONE! HAPPY SCIENCING!     ******')
     print('*************************************************')
diff --git a/hoki/load.py b/hoki/load.py
@@ -571,10 +571,82 @@ def spectra_all_z(data_path, imf, binary=True):
         spec = hoki.data_compilers.SpectraCompiler(
             data_path, data_path, imf, binary=binary
         )
-        spectra = spec.spectra
+        spectra = spec.output
     return spectra
 
 
+def emissivities_all_z(data_path, imf, binary=True):
+    """
+    Load all BPASS emissivities from files.
+
+    Notes
+    -----
+    The first time this function is run on a folder it will generate an npy
+    file containing all the BPASS emissivities for faster loading in the
+    future. It stores the file in the same folder with the name:
+    `all_ionizing-[bin/sin]-[imf].npy`.
+
+    The emissivities are just read from file and not normalised.
+
+
+    Input
+    -----
+    data_path : `str`
+        The path to the folder containing the BPASS files.
+
+    binary : `bool`
+        Use the binary files or just the single stars. Default=True
+
+    imf : `str`
+        BPASS Identifier of the IMF to be used.
+        The accepted IMF identifiers are:
+        - `"imf_chab100"`
+        - `"imf_chab300"`
+        - `"imf100_100"`
+        - `"imf100_300"`
+        - `"imf135_100"`
+        - `"imf135_300"`
+        - `"imfall_300"`
+        - `"imf170_100"`
+        - `"imf170_300"`
+
+    Returns
+    -------
+    emissivities : `numpy.ndarray` (N_Z, N_age, 4) [(metallicity, log_age, band)]
+        A 3D numpy array containing all the BPASS emissivities (Nion [1/s],
+        L_Halpha [ergs/s], L_FUV [ergs/s/A], L_NUV [ergs/s/A]) for a specific
+        imf and binary or single star population.
+        Usage: spectra[1][2][0]
+                (gives Nion for Z=0.0001 and log_age=6.2)
+    """
+    # Check population type
+    star = "bin" if binary else "sin"
+
+    # check IMF key
+    if imf not in BPASS_IMFS:
+        raise HokiKeyError(
+            f"{imf} is not a BPASS IMF. Please select a correct IMF.")
+
+    # check if data_path is a string
+    if not isinstance(data_path, str):
+         raise HokiTypeError("The folder location is expected to be a string.")
+
+    # Check if compiled spectra are already present in data folder
+    if os.path.isfile(f"{data_path}/all_ionizing-{star}-{imf}.npy"):
+        print("Load precompiled file.")
+        emissivities = np.load(f"{data_path}/all_ionizing-{star}-{imf}.npy")
+        print("Done Loading.")
+
+    # Compile the spectra for faster reading next time otherwise
+    else:
+        print("Compiled file not found. Data will be compiled.")
+        res = hoki.data_compilers.EmissivityCompiler(
+            data_path, data_path, imf, binary=binary
+        )
+        emissivities = res.output
+    return emissivities
+
+
 #################
 #               #
 #################

diff --git a/hoki/tests/corner_plot.png b/hoki/tests/corner_plot.png