Aggregator for csv building

You should first read this issue the `AggregatorPng`, as the general design and idea is the same:

https://github.com/rhayes777/PyAutoFit/issues/1086

**Use Case:**

Similar to .png splicing, it was common for me to want to view the numerical results of lens modeling in a single
.csv file, rather than navigating the `output` folder to find the information I needed.

**Implementation:**

Here is an example of the .csv file for the 2 example images on the `agg_png_csv` URL above:

https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/result.csv

This .csv file contains headers which come from different parts of the `dataset` and `output` folders, thus navigating
the folders to find the information requires combining information from multiple files.

Here is the example .csv maker python file I was using, which is a bit of a mess, but works:

https://github.com/Jammy2211/autolens_workspace_test/blob/main/agg_png_csv/csv_make.py

**AggregatorCSV**

I am picturing an `AggregatorCSV` class that would take the `output` folder and help build the .csv file.

**Using Output Folder**

For the example on the workspace, all information used to build the .csv file comes from the `info.json` and
`result.json` files in the `dataset` folder.

So you would just need to write an `AggregatorCSV` that navigates the `dataset` folder and loads the `info.json` and
`result.json` files, and produces the .csv file.

My .csv building never used the `output` folder, but this was because I wrote a quite complicated pipeline (in a hurry)
which output all results to the `dataset` folder.

I think the general use case would be that the `output` folder is used to extract all information whenever possible.

For example, the `einstein_radius_max_lh` is stored here:

![image](https://github.com/user-attachments/assets/382e6775-4ddc-401b-ba1b-6db1701513ef)

Where `samples_summary.json` has all the information on the model and instance, with an einstein radius at this part:

```
    "median_pdf_sample": {
            "type": "instance",
            "class_path": "autofit.non_linear.samples.sample.Sample",
            "arguments": {
                "log_likelihood": 21775.911731379132,
                "log_prior": 1.4651345875102018,
                "weight": 6.53867998265804e-05,
                "kwargs": {
                    "type": "dict",
                    "arguments": {
                        "galaxies.lens.bulge.profile_list.59.centre.centre_0": -0.07025111305343998,
                        "galaxies.lens.bulge.profile_list.59.centre.centre_1": -0.02258690706633907,
                        "galaxies.lens.bulge.profile_list.29.ell_comps.ell_comps_0": 0.250115244578848,
                        "galaxies.lens.bulge.profile_list.29.ell_comps.ell_comps_1": -0.18539079820818452,
                        "galaxies.lens.bulge.profile_list.59.ell_comps.ell_comps_0": 0.05644153124215467,
                        "galaxies.lens.bulge.profile_list.59.ell_comps.ell_comps_1": -0.16025101750452186,
                        "galaxies.source.bulge.profile_list.19.centre.centre_0": -0.02911164136961475,
                        "galaxies.source.bulge.profile_list.19.centre.centre_1": 0.16875116292548728,
                        "galaxies.source.bulge.profile_list.19.ell_comps.ell_comps_0": -0.3747737836275342,
                        "galaxies.source.bulge.profile_list.19.ell_comps.ell_comps_1": -0.191211048679011,
                        "galaxies.lens.mass.ell_comps.ell_comps_0": 0.12954201123863499,
                        "galaxies.lens.mass.ell_comps.ell_comps_1": -0.09446519305629113,
                        "galaxies.lens.mass.einstein_radius": 0.8099997493702147,
                        "galaxies.lens.shear.gamma_1": -0.04293296870204821,
                        "galaxies.lens.shear.gamma_2": 0.10475246785931866
                    }
                }
            }
        },
```

Therefore an API which allows the user to use the `model_path` to choose what the .csv headers are would be ideal, something like:

```
agg = AggregatorCSV.from_directory(
    directory=path.join("output),
)

agg.add_column(
    folder=source_lp[1],
    name="einstein_radius_max_lh",
    model_path="galaxies.lens.mass.einstein_radius",
)
```


**Errors**

Note that `samples_summary.json` also stores errors on parameters, so the API above should be extended to specify the
error on the parameter, e.g.:

```
agg.add_column(
    folder=source_lp[1],
    name="einstein_radius_max_lh",
    model_path="galaxies.lens.mass.einstein_radius",
    error="errors_at_sigma_3", # this string is in the `samples_summary.json` file
)
```

**Latent Variable API**

The `AggregatorCSV` should also support the latent variable API, so that the user can use the `model_path` to access the
latent variables of the model.

The example GitHub repo does not have latent variables, but the `AggregatorCSV` should be designed to support them,
as they will just be in a `latent_summary.json` file analogous to the `samples_summary.json` file.

Errors should also be supported for latent variables.

**Manual Function API**

From `samples_summary.json` the `AggregatorCSV` can create an instance of the maximum likelihood or median PDF model.

A user may want to compute a quantity from this `instance` and add it to the .csv file. This value may be something
you could add as a latent variable, but lets pretend the user forgot to add it before running the pipeline or doesnt
want loads of latent variables in the code.

The following API could allows the user to do this:

```
agg = AggregatorCSV.from_directory(
    directory=path.join("output),
)

def einstein_radius_x2_from(instance):
    einstein_radius = instance.galaxies.lens.mass.einstein_radius
    return einstein_radius * 2.0

agg.add_column(
    folder=source_lp[1],
    name="einstein_radius_x2_max_lh",
    latent_func=einstein_radius_x2_from,
    use_max_lh_instance=True, # as opposed to the median PDF instance
)
```


**Manual Function with Samples API**

The example above uses `samples_summary.json` to create an instance of the maximum likelihood or median PDF model.
It does not use the full set of non-linear search samples and therefore cannot provide an error estimate on the
quantity computed.

The samples are fully included in `samples.json` and the `AggregatorCSV` should support the following API to compute
a quantity from the samples and add it to the .csv file, which can then include an error estimate:

```
agg = AggregatorCSV.from_directory(
    directory=path.join("output),
)

def einstein_radius_x2_via_samples_from(samples):

    random_draws = 50

    einstein_radius_x2_list = []
    
    for i in range(random_draws):

        instance = samples.draw_randomly_via_pdf()
    
        ell_comps = instance.galaxies.lens.mass.ell_comps
    
        einstein_radius_x2 = al.convert.einstein_radius_x2_from(ell_comps=ell_comps)
    
        einstein_radius_x2_list.append(einstein_radius_x2)
    
    median_einstein_radius_x2, lower_einstein_radius_x2, upper_einstein_radius_x2 = af.marginalize(
        parameter_list=einstein_radius_x2_list,
        sigma=3.0,
    )

    return median_einstein_radius_x2, lower_einstein_radius_x2, upper_einstein_radius_x2
    
agg.add_x3_columns_with_errors(
    folder=source_lp[1],
    name="einstein_radius_x2",
    samples_func=einstein_radius_x2_via_samples_from,
)
```

**Missing .json Files**

A user can disable the output of `samples.json`, so the `AggregatorCSV` should raise an warning if the user tries to use
the `samples_func` API and the `samples.json` file is not present, and leave blank the column in the .csv file.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregator for csv building #1087

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aggregator for csv building #1087

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions