Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Modelling to Include Foregrounds #15

Merged
merged 69 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
8d9935b
Add foreground model from Bevins et al. 2022
ThomasGesseyJones Feb 10, 2024
836d54f
Add simulator for the foreground model
ThomasGesseyJones Feb 10, 2024
85b3e72
Add foreground parameter to configuration file
ThomasGesseyJones Feb 13, 2024
4ae9b05
Add foreground prior sampler generator code
ThomasGesseyJones Feb 13, 2024
b291cea
Add the foreground model to simulators
ThomasGesseyJones Feb 13, 2024
811449e
Extend additive combiner to support an arbitrary number of simulators
ThomasGesseyJones Feb 13, 2024
4c2feca
Update meta data
ThomasGesseyJones Feb 13, 2024
7094e0d
Correct wording in train evidence network
ThomasGesseyJones Feb 13, 2024
d0f2dda
Correct wording in visualize_forecast.py
ThomasGesseyJones Feb 13, 2024
f8152bb
Add option for preprocessing simulated data in Evidence Network
ThomasGesseyJones Feb 13, 2024
e550977
Add log preprocessing to data now we have foregrounds and a large dyn…
ThomasGesseyJones Feb 13, 2024
8291f86
Add the foreground model to Polychord verification
ThomasGesseyJones Feb 13, 2024
f59f4ab
Reduce verification data sets per model to 5 for testing
ThomasGesseyJones Feb 13, 2024
4179b64
Use SARAS 3 foreground posteriors as our priors (std increased by fac…
ThomasGesseyJones Feb 13, 2024
ca37c9b
Fix incorrect casting to list
ThomasGesseyJones Feb 14, 2024
62de3f4
Decrease std enhancment factor to 100
ThomasGesseyJones Feb 17, 2024
9110ead
Typo
ThomasGesseyJones Feb 19, 2024
861a6fb
Change to a physically motivated foreground model
ThomasGesseyJones Feb 22, 2024
3d609f8
Correct wrong ndims being used
ThomasGesseyJones Feb 22, 2024
6758c14
Typo
ThomasGesseyJones Feb 22, 2024
d43c5e1
Correct bug where prior sampler still sampling over components not be…
ThomasGesseyJones Feb 22, 2024
30d830b
Slicing more intelligently for greater reusability
ThomasGesseyJones Feb 22, 2024
883a764
Add noise fitting for realism
ThomasGesseyJones Feb 22, 2024
6d043e3
Typo in printed output
ThomasGesseyJones Feb 23, 2024
66c8f62
Remove old figures
ThomasGesseyJones Feb 23, 2024
53f532d
Increase verification simulation number
ThomasGesseyJones Feb 23, 2024
caf6d24
Add missing docstring
ThomasGesseyJones Feb 23, 2024
31be166
Add scikit-learn to requirements
ThomasGesseyJones Feb 23, 2024
e0980f2
Preprocessing bugfixes
ThomasGesseyJones Feb 25, 2024
db068ac
Add anesthetic as a requirement
ThomasGesseyJones Feb 25, 2024
66247ad
Return to fixed noise, using high noise run to aid polychord convergence
ThomasGesseyJones Feb 25, 2024
62665dd
Use PCA preprocessing function now using fixed noise
ThomasGesseyJones Feb 25, 2024
a1cf039
Modify training and testing to account for fixed noise
ThomasGesseyJones Feb 25, 2024
fe7fd48
Fix non-assignment issue in MPI broadcasting
ThomasGesseyJones Feb 26, 2024
f4de187
Flush prints
ThomasGesseyJones Feb 26, 2024
d8d0f6b
Correct all batches trying to run full set of verification models
ThomasGesseyJones Feb 26, 2024
9f7764f
Further correction to batching of verification simulations
ThomasGesseyJones Feb 28, 2024
439ce0a
clipnorm to improve stability
ThomasGesseyJones Feb 28, 2024
75fc325
Testing settings for training
ThomasGesseyJones Feb 28, 2024
92c97a9
Change default noise level to 15 mK
ThomasGesseyJones Feb 28, 2024
8725ccd
Update to latest polychord
ThomasGesseyJones Feb 28, 2024
f15eaa4
Remove high noise pre-run stage
ThomasGesseyJones Feb 28, 2024
9be7177
Remove fixed seed
ThomasGesseyJones Feb 28, 2024
0fe9496
Increase to pick up clusters better
ThomasGesseyJones Feb 28, 2024
7c4a6cf
Adjust batch size for new nlive
ThomasGesseyJones Feb 28, 2024
798b09e
Remove nlike tracking as not used
ThomasGesseyJones Feb 29, 2024
c670f27
Full verification set
ThomasGesseyJones Feb 29, 2024
d085b9f
Add options for the whitening transform we use on the data
ThomasGesseyJones Mar 1, 2024
ec0f7db
Merge branch 'foregrounds' of github.com:ThomasGesseyJones/FullyBayes…
ThomasGesseyJones Mar 1, 2024
15bea13
Add new verification data
ThomasGesseyJones Mar 4, 2024
ec18896
Fixes to avoid overflow issues in blind coverage test
ThomasGesseyJones Mar 4, 2024
fb68756
Expand network and training set size as the classifying problem is no…
ThomasGesseyJones Mar 26, 2024
8cb8183
Restore outputs from earlier version (without foregrounds)
ThomasGesseyJones Mar 26, 2024
97b33e1
Rename old outputs to new file naming convention
ThomasGesseyJones Mar 26, 2024
e8909bf
Add old outputs back to repository for comparison
ThomasGesseyJones Mar 26, 2024
55337f7
Remove file
ThomasGesseyJones Mar 26, 2024
fc504c0
Store without foreground outputs in dedicated folders
ThomasGesseyJones Mar 26, 2024
17b227c
Add Polychord verification data for with foregrounds model
ThomasGesseyJones Mar 27, 2024
2448dab
Limit predict batch size to avoid OOM errors
ThomasGesseyJones Apr 8, 2024
f7a10d2
Merge remote-tracking branch 'origin/foregrounds' into foregrounds
ThomasGesseyJones Apr 8, 2024
f38147f
Add timing data from polychord verification
ThomasGesseyJones Apr 11, 2024
db693e1
Add results for analysis with foregrounds
ThomasGesseyJones Apr 12, 2024
138aff8
Updated README.rst
ThomasGesseyJones Apr 12, 2024
98958cc
Add EN Bayes ratio estimates on verification data
ThomasGesseyJones Apr 12, 2024
90e849c
Add network
ThomasGesseyJones Apr 12, 2024
05d902f
Update comments and docstrings
ThomasGesseyJones Apr 12, 2024
fa475ff
Update comments and docstrings
ThomasGesseyJones Apr 12, 2024
684d6c5
Reduce excessive number of blind coverage test samples
ThomasGesseyJones Apr 12, 2024
4290438
Add resubmission draft outputs and network
ThomasGesseyJones Apr 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Thomas Gessey-Jones
Copyright (c) 2024 Thomas Gessey-Jones

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
35 changes: 20 additions & 15 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Overview

:Name: Fully Bayesian Forecast Example
:Author: Thomas Gessey-Jones
:Version: 0.1.3
:Version: 0.2.0
:Homepage: https://github.com/ThomasGesseyJones/FullyBayesianForecastsExample
:Letter: https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G

Expand All @@ -31,7 +31,7 @@ reproducible analysis pipeline for the letter.

The overall goal of the code is to produce a fully Bayesian forecast of
the chance of a `REACH <https://ui.adsabs.harvard.edu/abs/2022NatAs...6..984D/abstract>`__-like experiment
making a significant detection of the 21-cm global signal, given a noise level. It also produces
making a significant detection of the 21-cm global signal from within foregrounds and noise. It also produces
figures showing how this conclusion changes with different astrophysical parameter values
and validates the forecast through blind coverage
tests and comparison to `PolyChord <https://ui.adsabs.harvard.edu/abs/2015MNRAS.453.4384H/abstract>`__.
Expand Down Expand Up @@ -74,17 +74,19 @@ There are three modules included in the repository:
that take a number of data simulations to run and return that number of mock data
simulations alongside the values of any parameters that were used in the
simulations. Submodules of this module define functions to generate specific
simulators for models with noise only and models with a noisy 21-cm global signal.
simulators for noise, foregrounds, and the 21-cm signal.

These three modules are used in the three analysis scripts:

- verification_with_polychord.py: This script generates a range of mock data
sets from both the noise-only model and the noisy-signal model, and then
sets from both the no-signal model and the with-signal model, and then
performs a Bayesian analysis on each of them.
Evaluating the Bayes ratio between the two models of the data
using Polychord. These results are then stored in the verification_data directory
for later comparison with the results from the evidence network to
verify its accuracy. It should be run first, ideally in parallel.
verify its accuracy. It should be run first, ideally with a large number of
versions in parallel as it is very computationally expensive but
splits simply into one task per data set.
- train_evidence_network.py: This script builds the evidence network object and
the data simulator functions, then trains the evidence network. Once trained
it stores the evidence network in the models directory, then runs a blind
Expand All @@ -106,40 +108,41 @@ scripts can be run from the terminal using the following commands:

.. code:: bash

python verification_with_polychord.py
python verification_with_polychord.py 0
python train_evidence_network.py
python visualize_forecasts.py

to run with the default noise level of 79 mK and replicate the
to run with the default noise level of 15 mK and replicate the
analysis from `Gessey-Jones et al. (2023) <https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G>`__.
Alternatively you can pass
the scripts a command line argument to specify the experiments noise level in K. For example
to run with a noise level of 100 mK you would run the following commands:

.. code:: bash

python verification_with_polychord.py 0.1
python verification_with_polychord.py 0 0.1
python train_evidence_network.py 0.1
python visualize_forecasts.py 0.1

Two other files of interest are:

- fbf_utilities.py: which defines IO functions
needed by the three scripts and a utility function to assemble the data
simulators for the noise-only and noisy-signal model.
needed by the three scripts, utility functions to assemble the data
simulators for the noise-only and noisy-signal model, and standard
whitening transforms.
- configuration.yaml: which defines several parameters used in the code
including the experimental frequency resolution, the priors on the
astrophysical parameters of the global 21-cm signal model, and parameters
that control which astrophysical parameters are plotted in the forecast
figures. If you change the priors or resolution the entire pipeline
needs to be rerun to get accurate results.
astrophysical and foreground parameters, and the astrophysical parameters
which are plotted in the forecast figures. If you change the priors or
resolution the entire pipeline needs to be rerun to get accurate results.

The various figures produced in the analysis are stored in the
figures_and_results directory alongside the timing_data to assess the
performance of the methodology and some summary statistics of the evidence
networks performance. The figures and data generated in the
analysis for `Gessey-Jones et al. (2023) <https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G>`__ are provided in this
repository for reference.
repository for reference, alongside the figures generated for an earlier
version of the letter which did not model foregrounds.

Licence and Citation
--------------------
Expand Down Expand Up @@ -194,6 +197,8 @@ To run the code you will need to following additional packages:
- `pypolychord <https://github.com/PolyChord/PolyChordLite>`__
- `scipy <https://pypi.org/project/scipy/>`__
- `mpi4py <https://pypi.org/project/mpi4py/>`__
- `scikit-learn <https://pypi.org/project/scikit-learn/>`__
- `anesthetic <https://pypi.org/project/anesthetic/>`__

The code was developed using python 3.8. It has not been tested on other versions
of python. Exact versions of the packages used in our analysis
Expand Down
100 changes: 66 additions & 34 deletions configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,49 +21,81 @@ frequency_resolution: 1.0
# parameters of the prior. low and high are used in place of min and
# max to avoid clashing with python keywords.
priors:
f_star:
type: log_uniform
low: emu_min # If given emu_min or emu_max, the value is taken from the
high: emu_max # minimum or maximum value of GlobalEmu was trained on.
v_c:
type: log_uniform
low: emu_min
high: 30.0
f_x:
type: log_uniform
low: 0.001
high: emu_max
tau:
type: truncated_gaussian
mean: 0.054
std: 0.007
low: emu_min
high: emu_max
alpha:
type: uniform
low: emu_min
high: emu_max
nu_min:
type: log_uniform
low: emu_min
high: emu_max
R_mfp:
type: uniform
low: emu_min
high: emu_max
global_signal:
f_star:
type: log_uniform
low: emu_min # If given emu_min or emu_max, the value is taken from the
high: emu_max # minimum or maximum value of GlobalEmu was trained on.
v_c:
type: log_uniform
low: emu_min
high: 30.0
f_x:
type: log_uniform
low: 0.001
high: emu_max
tau:
type: truncated_gaussian
mean: 0.054
std: 0.007
low: emu_min
high: emu_max
alpha:
type: uniform
low: emu_min
high: emu_max
nu_min:
type: log_uniform
low: emu_min
high: emu_max
R_mfp:
type: uniform
low: emu_min
high: emu_max
foregrounds:
d0:
type: uniform
low: 1500.0 # K
high: 2000.0 # K
d1:
type: uniform
low: -1.0
high: 1.0
d2:
type: uniform
low: -0.05
high: 0.05
tau_e:
type: uniform
low: 0.005
high: 0.200
t_e:
type: uniform
low: 200.0 # K
high: 2000.0 # K
#
#
# PREPROCESSING
# =============
# Settings to control the preprocessing of the data before being fed into the
# neural network.
whitening_transform: 'Cholesky' # None, ZCA, PCA, Cholesky, ZCA-cor or PCA-cor
covariance_samples: 100_000 # Number of samples to use when calculating the
# covariance matrix for the whitening transform.
#
#
# VERIFICATION
# ============
# Number of data sets generate from each model to use when verifying the
# network against PolyChord. Each method is used to evaluate log K and then
# the results are compared.
# Number of data sets generated from each model to use when verifying the
# network against PolyChord. Evaluated in batches of fixed size due to
# HPC scheduling limitations.
verification_data_sets_per_model: 1000
verification_data_set_batch_size: 5
#
#
# PLOTTING
# ========
# Parameters that control details of the plots used to visualise the results.
# Parameters that control details of the plots used to visualize the results.
br_evaluations_for_forecast: 1000000
detection_thresholds: ["2 sigma", "3 sigma", "5 sigma"]
parameters_to_plot: ["f_star", "f_x", "tau"]
Expand Down
Loading
Loading