Skip to content

afarahi/kllr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub PyPI PyPI - Python Version ascl:2008.003

Introduction

Linear regression of the simple least-squares variety has been a canonical method used to characterize the relation between two variables, but its utility is limited by the fact that it reduces full population statistics down to three numbers: a slope, normalization and variance/standard deviation. With large empirical or simulated samples we can perform a more sensitive analysis using a localized linear regression method (see, Farahi et al. 2018 and Anbajagane et al. 2020). The KLLR method generates estimates of conditional statistics in terms of the local the slope, normalization, and covariance. Such a method provides a more nuanced description of population statistics appropriate for the very large samples with non-linear trends.

This code is an implementation of the Kernel Localized Linear Regression (KLLR) method that performs a localized Linear regression described in Farahi et al. (2022). It employs bootstrap re-sampling technique to estimate the uncertainties. We also provide a set of visualization tools so practitioners can seamlessly generate visualization of the model parameters.

If you use KLLR or derivates based on it, please cite the following papers which introduced the tool:

KLLR: A scale-dependent, multivariate model class for regression analysis

Localized massive halo properties in BAHAMAS and MACSIS simulations: scalings, lognormality, and covariance.

A list of other publications that employed KLLR in their data analysis.

Wu et al., Optical selection bias and projection effects in stacked galaxy cluster weak lensing, MNRAS (2022).

W. K. Black, A. E. Evrard, Red Dragon: a redshift-evolving Gaussian mixture model for galaxies, MNRAS (2022).

D. Anbajagane, A. Evrard, A. Farahi, Baryonic Imprints on DM Halos: Population Statistics from Dwarf Galaxies to Galaxy Clusters, MNRAS (2022).

D. Anbajagane et al., Galaxy Velocity Bias in Cosmological Simulations: Towards Percent-level Calibration, MNRAS (2022).

A. Nachmann, W. K. Black, Intra-cluster Summed Galaxy Colors, arXiv preprint (2021).

D. Anbajagane et al., Stellar Property Statistics of Massive Halos from Cosmological Hydrodynamics Simulations: Common Kernel Shapes, MNRAS (2020).

Dependencies

numpy, scipy, matplotlib, pandas, sklearn, tqdm

References

[1]. A. Farahi, et al. "KLLR: A scale-dependent, multivariate model class for regression analysis." (2022, arXiv: 2202.09903)

[2]. A. Farahi, et al. "Localized massive halo properties in BAHAMAS and MACSIS simulations: scalings, lognormality, and covariance." Monthly Notices of the Royal Astronomical Society 478.2 (2018): 2618-2632.

[3]. D. Anbajagane, et al. Stellar Property Statistics of Massive Halos from Cosmological Hydrodynamics Simulations: Common Kernel Shapes. No. arXiv: 2001.02283. 2020.

Acknowledgment

A.F. is supported by the University of Texas at Austin. D.A. is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1746045.

Installation

Run the following to install:

pip install kllr

Quickstart

To start using KLLR, simply use from KLLR import kllr_model to access the primary functions and class. The exact requirements for the inputs are listed in the docstring of the kllr_model() class further below. An example for using KLLR looks like this:

    from kllr import kllr_model                                       
                                                                      
    lm = kllr_model(kernel_type = 'gaussian', kernel_width = 0.2)     
    xrange, yrange_mean, intercept, slope, scatter, skew, kurt = lm.fit(x, y, bins=11)                                   

Illustration

Illustration of the KLLR fit with varying kernel size.