You should first read this issue the AggregatorPng, as the general design and idea is the same:
#1086
Use Case:
Similar to .png splicing, it was common for me to want to view the numerical results of lens modeling in a single
.csv file, rather than navigating the output folder to find the information I needed.
Implementation:
Here is an example of the .csv file for the 2 example images on the agg_png_csv URL above:
https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/result.csv
This .csv file contains headers which come from different parts of the dataset and output folders, thus navigating
the folders to find the information requires combining information from multiple files.
Here is the example .csv maker python file I was using, which is a bit of a mess, but works:
https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/csv_make.py
AggregatorCSV
I am picturing an AggregatorCSV class that would take the output folder and help build the .csv file.
Using Output Folder
For the example on the workspace, all information used to build the .csv file comes from the info.json and
result.json files in the dataset folder.
So you would just need to write an AggregatorCSV that navigates the dataset folder and loads the info.json and
result.json files, and produces the .csv file.
My .csv building never used the output folder, but this was because I wrote a quite complicated pipeline (in a hurry)
which output all results to the dataset folder.
I think the general use case would be that the output folder is used to extract all information whenever possible.
For example, the einstein_radius_max_lh is stored here:

Where samples_summary.json has all the information on the model and instance, with an einstein radius at this part:
"median_pdf_sample": {
"type": "instance",
"class_path": "autofit.non_linear.samples.sample.Sample",
"arguments": {
"log_likelihood": 21775.911731379132,
"log_prior": 1.4651345875102018,
"weight": 6.53867998265804e-05,
"kwargs": {
"type": "dict",
"arguments": {
"galaxies.lens.bulge.profile_list.59.centre.centre_0": -0.07025111305343998,
"galaxies.lens.bulge.profile_list.59.centre.centre_1": -0.02258690706633907,
"galaxies.lens.bulge.profile_list.29.ell_comps.ell_comps_0": 0.250115244578848,
"galaxies.lens.bulge.profile_list.29.ell_comps.ell_comps_1": -0.18539079820818452,
"galaxies.lens.bulge.profile_list.59.ell_comps.ell_comps_0": 0.05644153124215467,
"galaxies.lens.bulge.profile_list.59.ell_comps.ell_comps_1": -0.16025101750452186,
"galaxies.source.bulge.profile_list.19.centre.centre_0": -0.02911164136961475,
"galaxies.source.bulge.profile_list.19.centre.centre_1": 0.16875116292548728,
"galaxies.source.bulge.profile_list.19.ell_comps.ell_comps_0": -0.3747737836275342,
"galaxies.source.bulge.profile_list.19.ell_comps.ell_comps_1": -0.191211048679011,
"galaxies.lens.mass.ell_comps.ell_comps_0": 0.12954201123863499,
"galaxies.lens.mass.ell_comps.ell_comps_1": -0.09446519305629113,
"galaxies.lens.mass.einstein_radius": 0.8099997493702147,
"galaxies.lens.shear.gamma_1": -0.04293296870204821,
"galaxies.lens.shear.gamma_2": 0.10475246785931866
}
}
}
},
Therefore an API which allows the user to use the model_path to choose what the .csv headers are would be ideal, something like:
agg = AggregatorCSV.from_directory(
directory=path.join("output),
)
agg.add_column(
folder=source_lp[1],
name="einstein_radius_max_lh",
model_path="galaxies.lens.mass.einstein_radius",
)
Errors
Note that samples_summary.json also stores errors on parameters, so the API above should be extended to specify the
error on the parameter, e.g.:
agg.add_column(
folder=source_lp[1],
name="einstein_radius_max_lh",
model_path="galaxies.lens.mass.einstein_radius",
error="errors_at_sigma_3", # this string is in the `samples_summary.json` file
)
Latent Variable API
The AggregatorCSV should also support the latent variable API, so that the user can use the model_path to access the
latent variables of the model.
The example GitHub repo does not have latent variables, but the AggregatorCSV should be designed to support them,
as they will just be in a latent_summary.json file analogous to the samples_summary.json file.
Errors should also be supported for latent variables.
Manual Function API
From samples_summary.json the AggregatorCSV can create an instance of the maximum likelihood or median PDF model.
A user may want to compute a quantity from this instance and add it to the .csv file. This value may be something
you could add as a latent variable, but lets pretend the user forgot to add it before running the pipeline or doesnt
want loads of latent variables in the code.
The following API could allows the user to do this:
agg = AggregatorCSV.from_directory(
directory=path.join("output),
)
def einstein_radius_x2_from(instance):
einstein_radius = instance.galaxies.lens.mass.einstein_radius
return einstein_radius * 2.0
agg.add_column(
folder=source_lp[1],
name="einstein_radius_x2_max_lh",
latent_func=einstein_radius_x2_from,
use_max_lh_instance=True, # as opposed to the median PDF instance
)
Manual Function with Samples API
The example above uses samples_summary.json to create an instance of the maximum likelihood or median PDF model.
It does not use the full set of non-linear search samples and therefore cannot provide an error estimate on the
quantity computed.
The samples are fully included in samples.json and the AggregatorCSV should support the following API to compute
a quantity from the samples and add it to the .csv file, which can then include an error estimate:
agg = AggregatorCSV.from_directory(
directory=path.join("output),
)
def einstein_radius_x2_via_samples_from(samples):
random_draws = 50
einstein_radius_x2_list = []
for i in range(random_draws):
instance = samples.draw_randomly_via_pdf()
ell_comps = instance.galaxies.lens.mass.ell_comps
einstein_radius_x2 = al.convert.einstein_radius_x2_from(ell_comps=ell_comps)
einstein_radius_x2_list.append(einstein_radius_x2)
median_einstein_radius_x2, lower_einstein_radius_x2, upper_einstein_radius_x2 = af.marginalize(
parameter_list=einstein_radius_x2_list,
sigma=3.0,
)
return median_einstein_radius_x2, lower_einstein_radius_x2, upper_einstein_radius_x2
agg.add_x3_columns_with_errors(
folder=source_lp[1],
name="einstein_radius_x2",
samples_func=einstein_radius_x2_via_samples_from,
)
Missing .json Files
A user can disable the output of samples.json, so the AggregatorCSV should raise an warning if the user tries to use
the samples_func API and the samples.json file is not present, and leave blank the column in the .csv file.
You should first read this issue the
AggregatorPng, as the general design and idea is the same:#1086
Use Case:
Similar to .png splicing, it was common for me to want to view the numerical results of lens modeling in a single
.csv file, rather than navigating the
outputfolder to find the information I needed.Implementation:
Here is an example of the .csv file for the 2 example images on the
agg_png_csvURL above:https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/result.csv
This .csv file contains headers which come from different parts of the
datasetandoutputfolders, thus navigatingthe folders to find the information requires combining information from multiple files.
Here is the example .csv maker python file I was using, which is a bit of a mess, but works:
https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/csv_make.py
AggregatorCSV
I am picturing an
AggregatorCSVclass that would take theoutputfolder and help build the .csv file.Using Output Folder
For the example on the workspace, all information used to build the .csv file comes from the
info.jsonandresult.jsonfiles in thedatasetfolder.So you would just need to write an
AggregatorCSVthat navigates thedatasetfolder and loads theinfo.jsonandresult.jsonfiles, and produces the .csv file.My .csv building never used the
outputfolder, but this was because I wrote a quite complicated pipeline (in a hurry)which output all results to the
datasetfolder.I think the general use case would be that the
outputfolder is used to extract all information whenever possible.For example, the
einstein_radius_max_lhis stored here:Where
samples_summary.jsonhas all the information on the model and instance, with an einstein radius at this part:Therefore an API which allows the user to use the
model_pathto choose what the .csv headers are would be ideal, something like:Errors
Note that
samples_summary.jsonalso stores errors on parameters, so the API above should be extended to specify theerror on the parameter, e.g.:
Latent Variable API
The
AggregatorCSVshould also support the latent variable API, so that the user can use themodel_pathto access thelatent variables of the model.
The example GitHub repo does not have latent variables, but the
AggregatorCSVshould be designed to support them,as they will just be in a
latent_summary.jsonfile analogous to thesamples_summary.jsonfile.Errors should also be supported for latent variables.
Manual Function API
From
samples_summary.jsontheAggregatorCSVcan create an instance of the maximum likelihood or median PDF model.A user may want to compute a quantity from this
instanceand add it to the .csv file. This value may be somethingyou could add as a latent variable, but lets pretend the user forgot to add it before running the pipeline or doesnt
want loads of latent variables in the code.
The following API could allows the user to do this:
Manual Function with Samples API
The example above uses
samples_summary.jsonto create an instance of the maximum likelihood or median PDF model.It does not use the full set of non-linear search samples and therefore cannot provide an error estimate on the
quantity computed.
The samples are fully included in
samples.jsonand theAggregatorCSVshould support the following API to computea quantity from the samples and add it to the .csv file, which can then include an error estimate:
Missing .json Files
A user can disable the output of
samples.json, so theAggregatorCSVshould raise an warning if the user tries to usethe
samples_funcAPI and thesamples.jsonfile is not present, and leave blank the column in the .csv file.