Feature/profiling lh #110

Jammy2211 · 2023-06-11T15:45:21Z

Improve tools for profiling the likelihood funciton of an analysis class, including making profiling work on interferometer datasets.

All example scripts now on the workspace include the following run-times section (or text which is much shorter):

"""
__Run Times__

modeling can be a computationally expensive process. When fitting complex models to high resolution datasets 
run times can be of order hours, days, weeks or even months.

Run times are dictated by two factors:

 - The log likelihood evaluation time: the time it takes for a single `instance` of the model to be fitted to 
   the dataset such that a log likelihood is returned.

 - The number of iterations (e.g. log likelihood evaluations) performed by the non-linear search: more complex lens
   models require more iterations to converge to a solution.

The log likelihood evaluation time can be estimated before a fit using the `profile_log_likelihood_function` method,
which returns two dictionaries containing the run-times and information about the fit.
"""
run_time_dict, info_dict = analysis.profile_log_likelihood_function(
    instance=model.random_instance()
)

"""
The overall log likelihood evaluation time is given by the `fit_time` key.

For this example, it is ~0.01 seconds, which is extremely fast for modeling. More advanced lens
modeling features (e.g. shapelets, multi Gaussian expansions, pixelizations) have slower log likelihood evaluation
times (1-3 seconds), and you should be wary of this when using these features.

Feel free to go ahead a print the full `run_time_dict` and `info_dict` to see the other information they contain. The
former has a break-down of the run-time of every individual function call in the log likelihood function, whereas the 
latter stores information about the data which drives the run-time (e.g. number of image-pixels in the mask, the
shape of the PSF, etc.).
"""
print(f"Log Likelihood Evaluation Time (second) = {run_time_dict['fit_time']}")

"""
To estimate the expected overall run time of the model-fit we multiply the log likelihood evaluation time by an 
estimate of the number of iterations the non-linear search will perform. 

Estimating this quantity is more tricky, as it varies depending on the model complexity (e.g. number of parameters)
and the properties of the dataset and model being fitted.

For this example, we conservatively estimate that the non-linear search will perform ~10000 iterations per free 
parameter in the model. This is an upper limit, with models typically converging in far fewer iterations.

If you perform the fit over multiple CPUs, you can divide the run time by the number of cores to get an estimate of
the time it will take to fit the model. However, above ~6 cores the speed-up from parallelization is less efficient and
does not scale linearly with the number of cores.
"""
print(
    "Estimated Run Time Upper Limit (seconds) = ",
    (run_time_dict["fit_time"] * model.total_free_parameters * 10000)
    / search.number_of_cores,
)

rhayes777 · 2023-06-12T07:41:17Z

autogalaxy/analysis/analysis.py

+
+        start = time.time()
+
+        for i in range(repeats):


i is never used.

for _ in range(repeats): ...

rhayes777 · 2023-06-12T07:41:34Z

autogalaxy/analysis/analysis.py

+        run_time_dict["fit_time"] = fit_time
+
+        fit = self.fit_func(instance=instance, run_time_dict=run_time_dict)
+        fit.figure_of_merit


Does this do anything?

Ensures numba functions are compiled before profiling begins, comment added

rhayes777 · 2023-06-12T07:44:50Z

autogalaxy/imaging/fit_imaging.py

@@ -304,7 +305,7 @@ def refit_with_new_preloads(
        -------
        A new fit which has used new preloads input into this function but the same dataset, plane and other settings.
        """
-        profiling_dict = {} if self.profiling_dict is not None else None
+        run_time_dict = {} if self.run_time_dict is not None else None


Kind of weird

rhayes777 · 2023-06-12T07:45:27Z

autogalaxy/imaging/model/analysis.py

@@ -400,3 +402,44 @@ def save_attributes_for_aggregator(self, paths: af.DirectoryPaths):

        paths.save_object("psf", self.dataset.psf)
        paths.save_object("mask", self.dataset.mask)
+
+    def profile_log_likelihood_function(


I guess you can't run cProfile on the super computer?

This is specifically a profiling of every step of the likelihood function, so it can produce output like this during a hpc run:

{ "image_2d_from_0": 0.0018813610076904297, "image_2d_from_1": 0.0012469291687011719, "relocated_grid_from_0": 7.867813110351562e-06, "relocated_mesh_grid_from_0": 0.0001354217529296875, "mesh_grid_from_0": 2.8848648071289062e-05, "image_2d_from_2": 0.0018084049224853516, "image_2d_from_3": 0.0011234283447265625, "_curvature_matrix_mapper_diag_0": 2.4543256759643555, "_curvature_matrix_multi_mapper_0": 2.9087066650390625e-05, "linear_func_operated_mapping_matrix_dict_0": 2.4318695068359375e-05, "_curvature_matrix_func_list_and_mapper_0": 0.018375158309936523, "curvature_matrix_0": 0.01932501792907715, "regularization_matrix_0": 0.24328160285949707, "curvature_reg_matrix_0": 0.0036950111389160156, "w_tilde_data_0": 0.01537179946899414, "_data_vector_mapper_0": 0.0012423992156982422, "_data_vector_func_list_and_mapper_0": 0.0017066001892089844, "data_vector_0": 3.123283386230469e-05, "reconstruction_0": 3.636324882507324, "mapped_reconstructed_data_dict_0": 0.014848470687866211, "mapped_reconstructed_data_0": 0.00013518333435058594, "reconstruction_reduced_0": 7.200241088867188e-05, "regularization_matrix_reduced_0": 0.013326644897460938, "regularization_term_0": 0.0009260177612304688, "curvature_reg_matrix_reduced_0": 0.011399507522583008, "log_det_curvature_reg_matrix_term_0": 0.03193974494934082, "log_det_regularization_matrix_term_0": 0.03502917289733887 }

cProfile is useful, but dedicated profiling functionality became key a few years ago (this PR is updating old code).

rhayes777 · 2023-06-12T07:49:05Z

test_autogalaxy/imaging/model/test_analysis_imaging.py

+    galaxy = ag.Galaxy(redshift=0.5, pixelization=pixelization)
+
+    model = af.Collection(galaxies=af.Collection(galaxy=galaxy))


Galaxy and model should also be fixtures

Jammy2211 added 8 commits June 7, 2023 10:47

setup.py

6d5e9e3

setup.py

76e199d

visualization and docs

be407e5

profiling dict refactored and implemented

4cff4a1

profiling of interferometer added

da3a362

added profiling of visibilities explicitly

baf653d

refactoring

5a2fa54

profiling_dict -> run_time_dict

986ace5

Jammy2211 requested review from rhayes777, Sketos, qiuhan96 and samlange04 June 11, 2023 15:45

Jammy2211 merged commit 586ab2a into main Jun 11, 2023
1 of 7 checks passed

rhayes777 reviewed Jun 12, 2023

View reviewed changes

Jammy2211 deleted the feature/profiling_lh branch June 12, 2023 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/profiling lh #110

Feature/profiling lh #110

Jammy2211 commented Jun 11, 2023

rhayes777 Jun 12, 2023

rhayes777 Jun 12, 2023

Jammy2211 Jun 12, 2023

rhayes777 Jun 12, 2023

rhayes777 Jun 12, 2023

Jammy2211 Jun 12, 2023

rhayes777 Jun 12, 2023

		galaxy = ag.Galaxy(redshift=0.5, pixelization=pixelization)

		model = af.Collection(galaxies=af.Collection(galaxy=galaxy))

Feature/profiling lh #110

Feature/profiling lh #110

Conversation

Jammy2211 commented Jun 11, 2023

rhayes777 Jun 12, 2023

Choose a reason for hiding this comment

rhayes777 Jun 12, 2023

Choose a reason for hiding this comment

Jammy2211 Jun 12, 2023

Choose a reason for hiding this comment

rhayes777 Jun 12, 2023

Choose a reason for hiding this comment

rhayes777 Jun 12, 2023

Choose a reason for hiding this comment

Jammy2211 Jun 12, 2023

Choose a reason for hiding this comment

rhayes777 Jun 12, 2023

Choose a reason for hiding this comment