# KBMOD Results and Filtering  
  
This notebook demonstrates the basic functionality for loading and filtering results. KBMOD provides the ability to load results into a ``Results`` data structure and then apply a sequence of filters to those results. It also contains special columns (`psi_curve`, `phi_curve`, and `obs_valid`) and helper functions that can be used to automatically update the scoring metrics like `likelihood`, `flux`, and `obs_count`.

# Setup
Before importing, make sure you have installed kbmod using `pip install .` in the root directory.  Also be sure you are running with python3 and using the correct notebook kernel.

In [None]:
# everything we will need for this demo
from kbmod.results import Results
import matplotlib.pyplot as plt
import numpy as np

# Load the results

We use the fake result data provided in ``data/fake_results_noisy`` which is generated from 256 x 256 images with multiple fake objects inserted. KBMOD is run with wider than normal filter parameters so as to produce a noisy set of results.

The `Results` object behaves like an astropy Table with some additional book keeping to help with filtering.

In [None]:
results = Results.from_trajectory_file("../data/fake_results_noisy/results_DEMO.txt")
print(f"Loaded {len(results)} results with columns {results.colnames}")

# Turn on filtered result tracking.
results.track_filtered = True

# Show the top 5 rows
print(results[0:5])

We can access individual rows and columns using the `[]` notation:

In [None]:
results["likelihood"][0]

# Sorting Results

We can sort the results by any of the cols of a ``Results`` in either increasing or decreasing order by operating directly on its table object. By default the items are sorted in increasing order, so we will often want to use `reverse=True` in order to get the results in decreasing order.

In [None]:
results.table.sort(keys="obs_count", reverse=True)
print(f"Top 5 by observation count:")
print(results[0:5])

print(f"\nBottom 5 by Flux:")
results.table.sort(keys=["flux"], reverse=False)
print(results[0:5])

# Return to sorted by decreasing likelihood.
results.table.sort(keys=["likelihood"], reverse=True)
print(results[0:5])

# Extracting Individual Attributes

Since the `Results`class stores data as a table, the user can easily extract all of the values for a given attribute of the results. For example we could extract all of the flux values and create a histogram.

In [None]:
plt.hist(results["flux"])

# Filtering

Using the `filter_rows()` method, you can filter out individual rows based on either their indices or a Boolean mask. In addition to the indices/mask, the `filter_rows()` method allows you to specify and optional label for later analysis.

In [None]:
# Filter out all results that have a likelihood < 40.0.
mask = results["likelihood"] > 40.0
results.filter_rows(mask, "likelihood")
print(f"{len(results)} results remaining.")

We can look at the rows that passed the filter.

In [None]:
print(results)

The `Results` object always keeps a count of how many results were filtered at each stage in a dictionary `filtered_stats`.

In [None]:
print(results.filtered_stats)

Because we set ``results.track_filtered = True`` above, the ``Results`` object also keeps each row that was rejected by one of the filters. These rows are indexed by the filter name, allowing the user to determine which rows were removed during which filtering stage. 

We can use the ``get_filtered`` function to retrieve all the filtered rows for a given filter name:

In [None]:
# Extract the rows that did not pass filter1.
filtered = results.get_filtered("likelihood")
print(filtered)

We can apply multiple filters to the ``Results`` object to progressively rule out more and more candidate trajectories. We can even apply the same filter with different parameters.

Next we filter out anything with fewer than 10 observations:

In [None]:
# Filter out all results with fewer than 10 observations.
results.filter_rows(results["obs_count"] >= 10, "obscount=10")
print(f"{len(results)} results remaining.")

### Reverting filters

As long as we have ``track_filtered`` turned on, we can undo any of the filtering steps. This appends the previously filtered results to the end of the list (and thus does not preserve ordering). However we can always re-sort if needed.

In [None]:
results.revert_filter("likelihood")
print(f"{len(results)} results remaining.")

# Likleihood columns

The default likelihood are taken from the `lh` field of the `Trajectory` object. However we may want to update these by filtering individual time steps, such as when applying clipped sigmaG. `Results` provides the ability to append columns with the psi and phi curves and to update the likelihoods directly from those curves using the `add_psi_phi_data()` function.

**NOTE:** It is important to use the `add_psi_phi_data()` function to add or update psi and phi information as it will automatically propogate the changes to other columns.

Here we start by creating a random psi curve and a constant phi curve. There must be one curve for each result in the data set and all curves must be the same length.

In [None]:
num_times = 20
num_results = len(results)

rng = np.random.default_rng()
psi_curves = 10.0 + rng.standard_normal((num_results, num_times))  
phi_curves = np.full((num_results, num_times), 0.1)

results = results.add_psi_phi_data(psi_curves, phi_curves)
results[0:3]

The power of psi and phi curves comes from the ability to specify a column `obs_valid` which indicates which time steps of the curves are valid. The `obs_valid` entry is a mask of Booleans the same length as the both the psi and phi curves. Only the valid entries are used in the computation of `flux` and `likelihood`. Notice how marking some of the first results's entries as invalid changes the likelihood, flux, and obs_count columns.

In [None]:
obs_valid = np.full((num_results, num_times), True)
obs_valid[0, 0:3] = False

results = results.add_psi_phi_data(psi_curves, phi_curves, obs_valid)
results[0:3]

You can also update the valid observations at a later time using the `update_obs_valid()` function.

**NOTE:** It is important to use either the `add_psi_phi_data()` or `update_obs_valid()` functions to change the `obs_valid` data as the will automatically propogate the changes to other columns.

In [None]:
obs_valid[0, 0:3] = True
results = results.update_obs_valid(obs_valid)

results[0:3]