# Problem Definition Examples

This Jupyter Notebook demonstrates the usage of the ProblemDefinition class for defining machine learning problems using the PLAID library. It includes examples of:

1. Initializing an empty ProblemDefinition
2. Configuring problem characteristics and retrieve data
3. Saving and loading problem definitions

This notebook provides examples of using the ProblemDefinition class to define machine learning problems, configure characteristics, and save/load problem definitions.

**Each section is documented and explained.**

In [2]:
# Import required libraries
import numpy as np
import os

In [3]:
# Import necessary libraries and functions
from plaid.containers.dataset import Dataset, Sample
from plaid.problem_definition import ProblemDefinition
from plaid.utils.split import split_dataset

## Section 1: Initializing an Empty ProblemDefinition

This section demonstrates how to initialize a Problem Definition and add inputs / outputs.

### Initialize and print ProblemDefinition

In [4]:
print("#---# Empty ProblemDefinition")
problem = ProblemDefinition()
print(f"{problem = }")

#---# Empty ProblemDefinition
problem = ProblemDefinition()


### Add inputs / outputs to a Problem Definition

In [5]:
# Add unique input and output variables
problem.add_input('in')
problem.add_output('out')

# Add list of input and output variables
problem.add_inputs(['in2', 'in3'])
problem.add_outputs(['out2'])

print(f"{problem.get_inputs() = }")
print(f"{problem.get_outputs() = }", )

problem.get_inputs() = ['in', 'in2', 'in3']
problem.get_outputs() = ['out', 'out2']


## Section 2: Configuring Problem Characteristics and retrieve data

This section demonstrates how to handle and configure ProblemDefinition objects and access data.

### Set Problem Definition task

In [6]:
# Set the task type (e.g., regression)
problem.set_task('regression')
print(f"{problem.get_task() = }")

problem.get_task() = 'regression'


### Set Problem Definition split

In [7]:
# Init an empty Dataset
dataset = Dataset()
print(f"{dataset = }")

# Add Samples
dataset.add_samples([Sample(), Sample(), Sample(), Sample()])
print(f"{dataset = }")

dataset = Dataset(0 samples, 0 scalars, 0 fields)
dataset = Dataset(4 samples, 0 scalars, 0 fields)


In [8]:
# Set startegy options for the split
options = {
    'shuffle': False,
    'split_sizes': {
        'train': 2,
        'val': 1,
    },
}

split = split_dataset(dataset, options)
print(f"{split = }")

split = {'train': [0, 1], 'val': [2], 'other': [3]}


In [9]:
problem.set_split(split)
print(f"{problem.get_split() = }")

problem.get_split() = {'train': [0, 1], 'val': [2], 'other': [3]}


### Retrieves Problem Definition split indices

In [10]:
# Get all split indices
print(f"{problem.get_all_indices() = }")

problem.get_all_indices() = [0, 1, 2, 3]


### Filter Problem Definition inputs / outputs by name

In [11]:
print(f"{problem.filter_input_names(['in', 'in3', 'in5']) = }")
print(f"{problem.filter_output_names(['out', 'out3', 'out5']) = }")

problem.filter_input_names(['in', 'in3', 'in5']) = ['in', 'in3']
problem.filter_output_names(['out', 'out3', 'out5']) = ['out']


## Section 3: Saving and Loading Problem Definitions

This section demonstrates how to save and load a Problem Definition from a directory.

### Save a Problem Definition to a directory

In [12]:
test_pth = f"/tmp/test_safe_to_delete_{np.random.randint(1e10, 1e12)}"
pb_def_save_fname = os.path.join(test_pth, 'test')
os.makedirs(test_pth)
print(f"saving path: {pb_def_save_fname}")

problem._save_to_dir_(pb_def_save_fname)

saving path: /tmp/test_safe_to_delete_195515190983/test


### Load a ProblemDefinition from a directory via initialization

In [13]:
problem = ProblemDefinition(pb_def_save_fname)
print(problem)

ProblemDefinition(input_names=['in', 'in2', 'in3'], output_names=['out', 'out2'], task='regression', split_names=['train', 'val', 'other'])


### Load from a directory via the ProblemDefinition class

In [14]:
problem = ProblemDefinition.load(pb_def_save_fname)
print(problem)

ProblemDefinition(input_names=['in', 'in2', 'in3'], output_names=['out', 'out2'], task='regression', split_names=['train', 'val', 'other'])


### Load from a directory via a Dataset instance

In [15]:
problem = ProblemDefinition()
problem._load_from_dir_(pb_def_save_fname)
print(problem)

ProblemDefinition(input_names=['in', 'in2', 'in3'], output_names=['out', 'out2'], task='regression', split_names=['train', 'val', 'other'])
