**Using Evaluation Utils**:

In this notebook, we discuss how to use the provided utilities file to compare models on numerous metrics on different problems at the same time

In [1]:
import evaluation
import load_data
import Padgan_variants
import utils
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

import Padgan_variants
import VAEs

  from tqdm.autonotebook import tqdm, trange


**Setting up DGMS**:

Let's create a pandas series with several DGMs

In [2]:
reg_clf_params = None
config_params = [False, False, False, None, None, False]
train_params = [1, 0, 4, 5] #Setting DPP weight to 0 for normal GAN
DTAI_params= [None, None, None]

methods=pd.Series()
methods["GAN"] = Padgan_variants.padgan_wrapper(config_params, train_params, DTAI_params, reg_clf_params, reg_clf_params)
methods["VAE"] = VAEs.VAE_wrapper([1, 128, 1e-3, 4, .05, False])

  methods=pd.Series()


**Setting up Problems**

The utilities provided expect each function to be specified as a list of the following components:
- Sampling function
- Validity test
- Objectives
- Plotting Range
- Conditioning Function 
- Condition Value

Unused components can be left as None

In [3]:
functions=[]

DM_val = load_data.all_val_wrapper()

pareto = np.stack([0.4705*np.linspace(0,1,1000), 0.4705*np.linspace(1,0,1000)], axis=1)
sampling_func_1 = load_data.sample_circle_blobs_wrapper(10000, 6, 1.3, 0.22) #Uniform Sampling with Number of positive samples & Negative Samples
sampling_func_2 = load_data.sample_circle_blobs_wrapper(10000, 2, 1.3, 0.22) #Uniform Sampling with Number of positive samples & Negative Samples

rangearr = np.array([[-2,2], [-2,2]])

functions.append([sampling_func_1, DM_val, None, rangearr, None, None])
functions.append([sampling_func_2, DM_val, None, rangearr, None, None])

**Setting Up Metrics**:
We set up teh metrics we want to evaluate in a pandas series. 
Each entry consists of:
metrics["name"] = ["direction", metric wrapper]

- name is a name you are assigning to the metrics
- direction is either "minimize" or "maximize"
- metric wrapper is the a wrapper function of the desired metric with any hyperparameters specified

In [4]:
metrics=pd.Series()
metrics["Nearest Dataset Sample"] = ["minimize", evaluation.gen_data_distance_wrapper("x", "min")]
metrics["Nearest Generated Sample"] = ["minimize", evaluation.data_gen_distance_wrapper("x", "min")]
metrics["F1"] = ["maximize", evaluation.F_wrapper("x", 1)]
metrics["F10"] = ["maximize", evaluation.F_wrapper("x", 10)]
metrics["F0.1"] = ["maximize", evaluation.F_wrapper("x", 0.1)]
metrics["AUC-PR"] = ["maximize", evaluation.AUC_wrapper("x")]
metrics["MMD"] = ["minimize", evaluation.MMD_wrapper()]

  metrics=pd.Series()


**General Parameters**

We set up some flags and general settings:

In [5]:
numgen = 1000 #Number of samples to generate
numinst = 3 #Number of instantiations to test
scaling = True #Scale or not
scorebars = True #Print progress bars for scoring functions

np.random.seed(0)

validity_status = 0 #whether we are considering constraints
obj_status = 0 #whether we are considering functional performance
conditional_status = 0 #whether we are considering conditioning
cond_dist=False #Whether conditional metrics are compared against conditional or marginal distribution

**fit_and_generate**
We call fit_and_generate from the utilities file to generate the datasets and train the models. 
fit_and_generate takes:
- functions: Our list of functions defined earlier
- methods: Our list of methods defined earlier
- numinst: How many model instantiations to test
- numgen: How many points to sample from each generated model
- scaling: Whether to scale the datasets before training
- obj_status: #wheteher we are considering functional performance
- conditional_status: Whether we are considering conditioning
- holdout: fraction of dataset to hold out during training (used for rediscovery)

The fit_and_generate function returns a timestamp in a string corresponding to the folder in which the results are saved. 

In [6]:
timestr = utils.fit_and_generate(functions, methods, numinst, numgen, scaling, obj_status, conditional_status, 0)

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]

0
Lambda1 set to 0, DPP loss disabled; Ignoring CLF and REG...


GAN Training::   0%|          | 0/5 [00:00<?, ?it/s]



**score**:
Next, we score the generated models. The scroring utilities function takes:
- timestr: the timestring corresponding to the results we want to evaluate
- functions: Our list of functions defined earlier
- methods: Our list of methods defined earlier
- metrics: The metrics to test
- numinst: How many model instantiations to test
- scaling: Whether to scale the datasets before training 
- cond_dist: Whether conditional metrics are compared against conditional or marginal distribution
- scorebars: Whether to print progress bars/ evaluation status

score saves scores in the folder indicated by timestr.

In [8]:
utils.score(timestr, functions, methods, metrics, numinst, scaling, cond_dist, scorebars)

Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy
Calculating Gen-Data Distance
Calculating Data-Gen Distance
Calculating F1




Calculating F10




Calculating F0.1




Calculating AUC




Calculating Maximum Mean Discrepancy


ModuleNotFoundError: No module named 'openpyxl'

**plot_all**:
Next, we plot the generated distributions. The plotting function takes:

- timestr: the timestring corresponding to the results we want to evaluate
- functions: Our list of functions defined earlier
- methods: Our list of methods defined earlier
- numinst: How many model instantiations to test
- scaling: Whether to scale the datasets before training 
- validity_status:whether we are considering constraints
- obj_status: whether we are considering functional performance
- conditional_status: whether we are considering conditioning
- cond_dist: Whether conditional metrics are compared against conditional or marginal distribution

plot saves plots in the folder indicated by timestr. If numinst is greater than 1, saves an animation of the plots

In [None]:
utils.plot_all(timestr, functions, methods, numinst, scaling, validity_status, obj_status, conditional_status, cond_dist, "red")

**Special Metrics**

When working with a few special types of metrics we must do some special setup. For rediscovery, we must designate a holdout fraction which we pass to fit_and_generate. For ML efficacy, we must include an auxiliary predictive task. In this case, we encode this predictive task in an objective function. 

In [None]:
metrics["Rediscovery"] = ["minimize", evaluation.data_gen_distance_wrapper("x", "min")]
holdout = 0.05 #If using rediscovery, we need to hold out a portion of the data during training

metrics["ML Efficacy"] = ["maximize", evaluation.ML_efficacy_wrapper(KNeighborsRegressor(n_neighbors=5), r2_score)]

In [None]:
functions=[]

DM_val = load_data.all_val_wrapper()

#In this case, we include objectives specifically for ML efficacy
DM_objs = [load_data.KNO1_a_wrapper(4,4), load_data.KNO1_b_wrapper(4,4)] 

pareto = np.stack([0.4705*np.linspace(0,1,1000), 0.4705*np.linspace(1,0,1000)], axis=1)
sampling_func_1 = load_data.sample_circle_blobs_wrapper(10000, 6, 1.3, 0.22) #Uniform Sampling with Number of positive samples & Negative Samples
sampling_func_2 = load_data.sample_circle_blobs_wrapper(10000, 2, 1.3, 0.22) #Uniform Sampling with Number of positive samples & Negative Samples

rangearr = np.array([[-2,2], [-2,2]])

functions.append([sampling_func, DM_val, DM_objs, rangearr, None, None])
functions.append([sampling_func, DM_val, DM_objs, rangearr, None, None])

In [None]:
timestr = utils.fit_and_generate(functions, methods, numinst, numgen, scaling, obj_status, conditional_status, holdout)

**Other Use Cases**

In this notebook, we have demonstrated how to evaluate numerous models on numerous problems in a distribution-matching setting. To evaluate models for other types of problems, such as diversity, constraint satisfaction, performance, and conditioning, please refer to Notebook 3. 