# Example: selection

This example shows the methods that you can use to select variables to iteratively analyse data with setvis. The example shows how to programmatically select variables via a PlotSession (Method 1) or a Membership object (Method 2). The example also how to interactively select variables from a plot (Method 3; i.e., the interactive version of Method 1).

There are similar methods for selecting intersections (using Membership.select_intersections()) and records (using Membership.select_records()).

There are also "drop" functions: Membership.drop_columns(), Membership.drop_intersections(), Membership.drop_records()

And "invert" functions: Membership.invert_columns(), Membership.invert_intersections(), Membership.invert_records()

Alternatively PlotSession.add_selection() can be used with its columns, intersections, records and/or invert arguments.

## Preamble

### Includes: setvis and other libraries

In [None]:
import pandas as pd
import setvis
import setvis.plots

### Loading the data

In [None]:
input_file = "../examples/datasets/Synthetic_APC_DIAG_Fields.csv"

## Whole dataset - visualize patterns of missing values

Create a membership object and setvis plots. There are 16 intersections (0 - 16) and 200 records (0 - 199).

In [None]:
m = setvis.Membership.from_csv(input_file)
session1 = setvis.plots.PlotSession(m)
session1.add_plot(name="all")
print(session1.membership().intersections().index.tolist())
print(session1.membership().records())

## Method 1: Select and plot missingness for specific columns

### Add a selection to a PlotSession

Notes:
  - Use add_selection() to select the columns DIAG_02 - DIAG_05 in the plot session.
  - Then add a new plot to visualize the missingness. The argument `name` is the name for the new plot (and can be used to refer to a more refined selection that we make interactively within it). The argument `based_on` is the name of the selection from which we take the data to plot.  Notice that the plot below shows only the combinations that we selected in "gaps".
  - The intersection and record IDs are not changed.
  - But the Combination Heatmap only contains 9 intersections, because those that only differed due to missing values in DIAG_06 - DIAG_10 in the plot named "all" have been merged.

In [None]:
cols = ['DIAG_02','DIAG_03','DIAG_04','DIAG_05']
session1.add_selection(name="selected columns", columns=cols)
session1.add_plot(name="selected columns plot", based_on="selected columns")
print(session1.membership().intersections().index.tolist())
print(session1.membership().records())

## Method 2: Create a new Membership object from a subset of columns

This is equivalent to only importing the subset of columns from the dataset. Note that:
  - The intersections have been recalculated, so the intersection IDs (0-9) are different to session1.
  - But the record IDs are the same (0-199) as session1.
  - The visualizations identical to the Method 1 visualizations 

In [None]:
m2 = m.select_columns(cols)
session2 = setvis.plots.PlotSession(m2)
session2.add_plot(name="subset of columns)")
print(m2.intersections().index.tolist())
print(m2.records())

## Method 3: Interactively select columns and visualize the missingness

### First, create a new PlotSession that again visualizes patterns of missing values in the whole dataset

In [None]:
session3 = setvis.plots.PlotSession(m)
session3.add_plot(name="session 3 all")

### Use SHIFT-click to select the DIAG_02 - DIAG-05 columns in the above Value Bar Chart plot

If you have done the selection correctly then the "selected columns plot" visualizations below will be identical to the Method 1 and Method 2 visualizations.

In [None]:
session3.add_plot(name="selected columns plot", based_on="session 3 all")