Skip to content

Commit

Permalink
Emissivity compiling and loading. (#62)
Browse files Browse the repository at this point in the history
* Created CSP archistecture and basic sfh class
* Added variables to constants
* implements utility functions & new test files
* Implements event rate calculation over time
* BPASS data_files for testing file_loading

* Adds `at_time` function to the event rate class

This method allows for the calculation of the event rate at a given lookback time. It uses the BPASS binning for this. 

This method is much faster, but suffers from a lack of accuracy.

* Adds docstrings and extra tests

* Implements calculate spectrum at_time and over_time functions

Both functions are implemented for the BPASS spectra. 
However, the method is similar to the BPASS event rate calculation, which is sub optimal for the spectra calculation. It takes an extremely long time to do the calculation. A for loop of 100000 iterations is present, because each wavelength is seen as it's own "event type". It should be possible to remove this by implementing a function to take the separate wavelengths as a single unit. It would be interesting so see the influence of putting the for loop within a numba function, but thinking about restructuring the function is probably better. 

Including a "caching" of the BPASS spectra into a pickled DataFrame. 



Additional test files are required for this to run.

* Add normalisation function for the BPASS spectrum

* fixed hrd plot

* adding csp folder

* minor changes

* Moved tests to CSP folder + fix in function name _type_check

* added a hokitype error

* Added some code review for Max and minor stylistic changes

* Added some code review for Max and minor stylistic changes

* Updated the tests

* bla

* Adjust tests and docstrings

* Adjust eventrate functions to take lists of functions

Instead of requiring a scipy interpolated spline as input. These functions now take a list of functions as input for the stellar formation and metallicity history. 

This adjustment has been made by using `numpy.interp` instead of scipy.interpolate. Furthermore, instead of scipy's spline integration `numpy.trapz` is now used to calculate the mass per bin.

* Disabled spectra test

* Added parametric star formation histories to SFH and refactored some variable names

* fixing CI

* Auto stash before merge of "csp" and "official_hoki/csp"

* upload files for remote work

* Improve testing and functions

- Adds BPASS IMFS to the constants
- Adds check to BPASS IMFS
- Adds test for `data_compiler.SpectraData`
- Reduces data usage for rate tests

* Add SpectraCompiler test

Uses `unittest.mock` to allow for a smaller DataFrame to be used.

* Adds extensive tests for the CSP package

* Update SpectraCompiler

The compiler now outputs the BPASS spectra as a 3D `numpy.ndarray` this minimises storage and allows for faster and less memory hungry operations than the MultiIndex pandas DataFrame. 

The tests are updated accordingly and using unittest.mock a single spectra file is use. (This could be simulated too)

* Documentation update + SFH input

The documentation now shows the size and shape of each input and output variable. 

The `_over_time` and `_at_time` for `CSPSpectra` and `CSPEventRate` are now able to take the `SFH` object as an input.

* Removed CSP from utils due to cyclical import

The CSP class in `utils` was importing the SFH class, which was importing the `utils` module. Therefore, failing importing. Thus, the CSP class has been moved to a separate file

* Updated setup.cfg to include csp_test_data subfolder

* Removes leftover print statement + prints loading precompiled failed

* Correct metallicity mid-points

* Added the parametric star formation histories as public functions

* at_time new default sample rate + load functions moved to hoki.load

at_time now uses 1000 as the default sample rate. 
The BPASS bins can still be used is a negative number is given. 
The `load_spectra` and `load_rates` have been moved to `hoki.load` and renamed to `all_spectra` and `all_rates`, respectively.
tests have been adjusted accordingly

* Fixed typo

* Removes cyclical import

* Adjusted tests that use SpectraCompiler

* Getting tox to work

* private optimised mass_per_time_bin calculation

Implements 2 new functions:
- A custom numba trapezoidal function
- A optimised mass_per_bin function for non-function-based data

* Refactored Event Rate calculation from array data

The original event rate calculation took functions as in input. This becomes cumbersome when you have data at specific time values. This calculation is done by taking a 2D matrix with time points on one axis and BPASS metallicities on the other. The values of the matrix describes the SFR at that moment in time. 

The calculate_rate_over_time function has been adjusted to use this new function.

* Removed dependency and added more doc information

* BUG: pass on kwarg to spectra compiler.

* Fix SpectraCompiler

* BUG: use np instead of pd in SpectraCompiler.

* BUG: fix index and make test pass.

* BUG: fix another regression bug.

* Fix further regression bug.

* Clean up loading module

* Updates mass per bin calculation for vectorized funciton input

mass_per_bin now takes a normal python callable and a vectorized function as input. 
The latter boosts performance significantly

* Renames CSPEventRate functions and adds grid functions

- CSPEventRate functions are now called:
    - calculate_rate_over_time -> over_time
    - calculate_rate_at_time -> at_time
new functions: 
    - grid_over_time: calculates the rates from a 2D SFH (per BPASS metallicity) from  time_points to time bins
     - grid_at_time: calculate the event rate at a specific moment in lookback time from a 2D SFH grid

* np.zeros to np.empty where possible

* Fixed incorrect test of at_time and over_time

* Set default sample rate for at_time

Also includes some docstring updates

* Adds `cache` parameter to over_time_spectrum

* Adds grid functions for the spectra calculation

CSPSpectra now has the following functions: 
Public: 
- at_time: at time with function input
- over_time: over lb time with function input
- grid_at_time: at time with SFH grid input
- grid_over_time: over lb time with SFH grid input
Private: 
- grid_rate_calculator_at_time
- grid_rate_calculator_over_time

The grid_rate_calculators_* are numba functions that  have the grid_* around them as wrappers for the input and output

* Adds return_time_edges to grid_over_time & bug fix sfh vectorisation

* Emissivity compiling and loading.

* Rename func, add existence of metallicity files,  remove try/except

Try/except has been replace by a if/else with a file check.

* Finalise merge of `origin/dev1.6` into `data_loading`

Co-authored-by: Max Briel <max.briel@auckland.ac.nz>
Co-authored-by: heloises <heloisefw@gmail.com>
Co-authored-by: Max Briel <max.briel@gmail.com>
Co-authored-by: Heloise <HeloiseS@users.noreply.github.com>
Co-authored-by: Max <14039563+maxbriel@users.noreply.github.com>
Co-authored-by: Martin Glatzle <mglatzle@mpa-garching.mpg.de>
  • Loading branch information
7 people committed Oct 6, 2020
1 parent 920edef commit 8bbe799
Show file tree
Hide file tree
Showing 8 changed files with 278 additions and 95 deletions.
2 changes: 2 additions & 0 deletions hoki/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
"imf135_100", "imf135_300", "imf135all_100", "imf170_100",
"imf170_300"]

# wavelengths at which spectra are given [angstrom]
BPASS_WAVELENGTHS = np.arange(1, 100001)

# Create a deprecation warning when using dummy_dict
dummy_dict = {'timestep': 0, 'age': 1, 'log(R1)': 2, 'log(T1)': 3, 'log(L1)': 4, 'M1': 5, 'He_core1': 6, 'CO_core1': 7,
Expand Down
1 change: 0 additions & 1 deletion hoki/csp/tests/test_spectra.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
from hoki.csp.spectra import CSPSpectra
from hoki.load import model_output
from hoki.constants import BPASS_NUM_METALLICITIES
import itertools
data_path = pkg_resources.resource_filename('hoki', 'data')

# Load Test spectra. Not an actual spectra.
Expand Down
200 changes: 119 additions & 81 deletions hoki/data_compilers.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
"""
Author: Max Briel & Heloise Stevance
Author: Heloise Stevance, Max Briel, & Martin Glatzle
Objects and pipelines that compile BPASS data files into more convenient, more pythonic data types
"""

import abc
import numpy as np

from hoki.constants import (DEFAULT_BPASS_VERSION, MODELS_PATH,
OUTPUTS_PATH, dummy_dicts,
BPASS_IMFS, BPASS_METALLICITIES)
BPASS_IMFS, BPASS_METALLICITIES,
BPASS_WAVELENGTHS, BPASS_TIME_BINS)
import hoki.load
import pandas as pd
import re
Expand All @@ -17,6 +17,120 @@
from hoki.utils.exceptions import HokiFatalError, HokiUserWarning, HokiFormatError, HokiKeyError
from hoki.utils.hoki_object import HokiObject

class _CompilerBase(abc.ABC):
def __init__(self, input_folder, output_folder, imf, binary=True,
verbose=False):
if verbose:
_print_welcome()

# Check population type
star = "bin" if binary else "sin"

# check IMF key
if imf not in BPASS_IMFS:
raise HokiKeyError(
f"{imf} is not a BPASS IMF. Please select a correct IMF.")

# Setup the numpy output
output = np.empty(self._shape(), dtype=np.float64)

# loop over all the metallicities and load all the spectra
for num, metallicity in enumerate(BPASS_METALLICITIES):
print_progress_bar(num, 12)
# Check if file exists
assert isfile(f"{input_folder}/spectra-{star}-{imf}.{metallicity}.dat"),\
"HOKI ERROR: This file does not exist, or its path is incorrect."
output[num] = self._load_single(
f"{input_folder}/{self._input_name()}-{star}-{imf}.{metallicity}.dat"
)

np.save(f"{output_folder}/{self._output_name()}-{star}-{imf}", output)
# pickle the datafile
self.output = output
print(
f"Compiled data stored in {output_folder} as '{self._output_name()}-{star}-{imf}.npy'")
if verbose:
_print_exit_message()
return

@abc.abstractmethod
def _input_name(self):
return

@abc.abstractmethod
def _output_name(self):
return

@abc.abstractmethod
def _shape(self):
return

@abc.abstractmethod
def _load_single(self, path):
return

class SpectraCompiler(_CompilerBase):
"""
Pipeline to load the BPASS spectra txt files and save them as a 3D
`numpy.ndarray` binary file.
Attributes
output : `numpy.ndarray` (N_Z, N_age, N_lam) [(metallicity, log_age, wavelength)]
----------
A 3D numpy array containing all the BPASS spectra for a specific imf
and binary or single star population.
Usage: spectra[1][2][1000]
(gives L_\\odot for Z=0.0001 and log_age=6.2 at 999 Angstrom)
"""

def _input_name(self):
return "spectra"

def _output_name(self):
return "all_spectra"

def _shape(self):
return (
len(BPASS_METALLICITIES),
len(BPASS_TIME_BINS),
len(BPASS_WAVELENGTHS)
)

def _load_single(self, path):
return np.loadtxt(path).T[1:, :]


class EmissivityCompiler(_CompilerBase):
"""
Pipeline to load the BPASS ionizing txt files and save them as a 3D
`numpy.ndarray` binary file.
Attributes
----------
output : `numpy.ndarray` (N_Z, N_age, 4) [(metallicity, log_age, band)]
A 3D numpy array containing all the BPASS emissivities (Nion [1/s],
L_Halpha [ergs/s], L_FUV [ergs/s/A], L_NUV [ergs/s/A]) for a specific
imf and binary or single star population.
Usage: spectra[1][2][0]
(gives Nion for Z=0.0001 and log_age=6.2)
"""
def _input_name(self):
return "ionizing"

def _output_name(self):
return "all_ionizing"

def _shape(self):
return (
len(BPASS_METALLICITIES),
len(BPASS_TIME_BINS),
4
)

def _load_single(self, path):
return np.loadtxt(path)[:, 1:]


class ModelDataCompiler(HokiObject):
"""
Given a list of metalicities, a list of valid BPASS model attributes (in the dummy array), chosen types of model
Expand Down Expand Up @@ -202,81 +316,6 @@ def _select_input_files(z_list, directory=OUTPUTS_PATH,

return input_file_list

class SpectraCompiler():
"""
Pipeline to load the BPASS spectra txt files and save them as a 3D
`numpy.ndarray` binary file.
Attributes
----------
spectra : `numpy.ndarray` (13, 51, 100000) [(metallicity, log_age, wavelength)]
A 3D numpy array containing all the BPASS spectra for a specific imf
and binary or single star population.
Usage: spectra[1][2][1000]
(gives L_\\odot for Z=0.0001 and log_age=6.2 at 999 Angstrom)
"""

def __init__(self, spectra_folder, output_folder, imf, binary=True, verbose=False):
"""
Input
-----
spectra_folder : `str`
Path to the folder containing the spectra of the given imf.
output_folder : `str`
Path to the folder, where to output the pickled pandas.DataFrame
imf : `str`
BPASS IMF Identifiers
The accepted IMF identifiers are:
- `"imf_chab100"`
- `"imf_chab300"`
- `"imf100_100"`
- `"imf100_300"`
- `"imf135_100"`
- `"imf135_300"`
- `"imfall_300"`
- `"imf170_100"`
- `"imf170_300"`
binary : `bool`
If `True`, loads the binary files. Otherwise, just loads single stars.
Default=True
verbose : `bool`
If `True` prints out extra information for the user.
Default=False
"""
if verbose:
_print_welcome()

# Check population type
star = "bin" if binary else "sin"

# check IMF key
if imf not in BPASS_IMFS:
raise HokiKeyError(
f"{imf} is not a BPASS IMF. Please select a correct IMF.")

# Setup the numpy output
spectra = np.empty((13, 51, 100000), dtype=np.float64)

# loop over all the metallicities and load all the spectra
for num, metallicity in enumerate(BPASS_METALLICITIES):
print_progress_bar(num, 12)
assert isfile(f"{spectra_folder}/spectra-{star}-{imf}.{metallicity}.dat"),\
"HOKI ERROR: This file does not exist, or its path is incorrect."
spectra[num] = np.loadtxt(f"{spectra_folder}/spectra-{star}-{imf}.{metallicity}.dat").T[1:, :]

# pickle the datafile
np.save(f"{output_folder}/all_spectra-{star}-{imf}", spectra)
self.spectra = spectra
print(
f"Spectra file stored in {output_folder} as 'all_spectra-{star}-{imf}.npy'")
if verbose:
_print_exit_message()



def _print_welcome():
print('*************************************************')
Expand All @@ -285,11 +324,10 @@ def _print_welcome():
print("\n\nThis may take a while ;)\nGo get yourself a cup of tea, sit back and relax\nI'm working for you boo!")

print(
"\nNOTE: The progress bar doesn't move smoothly - it might accelerate or slow down - it's perfectly normal :D")
"\nNOTE: The progress bar doesn't move smoothly - it might accelerate or slow down - it's perfectly normal :D")


def _print_exit_message():

print('\n\n\n*************************************************')
print('******* JOB DONE! HAPPY SCIENCING! ******')
print('*************************************************')
74 changes: 73 additions & 1 deletion hoki/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -571,10 +571,82 @@ def spectra_all_z(data_path, imf, binary=True):
spec = hoki.data_compilers.SpectraCompiler(
data_path, data_path, imf, binary=binary
)
spectra = spec.spectra
spectra = spec.output
return spectra


def emissivities_all_z(data_path, imf, binary=True):
"""
Load all BPASS emissivities from files.
Notes
-----
The first time this function is run on a folder it will generate an npy
file containing all the BPASS emissivities for faster loading in the
future. It stores the file in the same folder with the name:
`all_ionizing-[bin/sin]-[imf].npy`.
The emissivities are just read from file and not normalised.
Input
-----
data_path : `str`
The path to the folder containing the BPASS files.
binary : `bool`
Use the binary files or just the single stars. Default=True
imf : `str`
BPASS Identifier of the IMF to be used.
The accepted IMF identifiers are:
- `"imf_chab100"`
- `"imf_chab300"`
- `"imf100_100"`
- `"imf100_300"`
- `"imf135_100"`
- `"imf135_300"`
- `"imfall_300"`
- `"imf170_100"`
- `"imf170_300"`
Returns
-------
emissivities : `numpy.ndarray` (N_Z, N_age, 4) [(metallicity, log_age, band)]
A 3D numpy array containing all the BPASS emissivities (Nion [1/s],
L_Halpha [ergs/s], L_FUV [ergs/s/A], L_NUV [ergs/s/A]) for a specific
imf and binary or single star population.
Usage: spectra[1][2][0]
(gives Nion for Z=0.0001 and log_age=6.2)
"""
# Check population type
star = "bin" if binary else "sin"

# check IMF key
if imf not in BPASS_IMFS:
raise HokiKeyError(
f"{imf} is not a BPASS IMF. Please select a correct IMF.")

# check if data_path is a string
if not isinstance(data_path, str):
raise HokiTypeError("The folder location is expected to be a string.")

# Check if compiled spectra are already present in data folder
if os.path.isfile(f"{data_path}/all_ionizing-{star}-{imf}.npy"):
print("Load precompiled file.")
emissivities = np.load(f"{data_path}/all_ionizing-{star}-{imf}.npy")
print("Done Loading.")

# Compile the spectra for faster reading next time otherwise
else:
print("Compiled file not found. Data will be compiled.")
res = hoki.data_compilers.EmissivityCompiler(
data_path, data_path, imf, binary=binary
)
emissivities = res.output
return emissivities


#################
# #
#################
Expand Down
Binary file added hoki/tests/corner_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8bbe799

Please sign in to comment.