# Daptics Terminology and Concepts

This notebook provides definitions and basic explanations for some of the data objects
used by the Daptics GraphQL API. For additional help or information, 
please visit or contact Daptics.

On the web at https://daptics.ai  
By email at support@daptics.ai

Daptics API Version 0.7.2  
Copyright (c) 2019 Daptics Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), the rights to use, copy, modify, merge, publish, and/or distribute, copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

You do not have the right to sub-license or sell copies of the Software.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

## Experimental Spaces

<span class="brand">daptics</span> handles two types of experimental space:

* **Factorial**, for the exploration of the set of possible combinations of specified *values* for the experimental parameters.
* **Mixture**, for the exploration of the set of possible *proportions* of each experimental parameter, chosen within a specified minimum and maximum.

---
##### Factorial experimental spaces
The experimental space is conceptually defined by a list of experimental parameters, with a set of numerical or categorical _values_ that may be taken by each parameter in an experiment. For example:

```
Param1: 1.1, 2, 3
Param2: 2, 3.56, 4
Param3: A, B, C
```

An experiment on *n* parameters lying in a factorial experimental space consists of numeric vector (_v_<sub>1</sub>, _v_<sub>2</sub>, … , _v_<sub>n</sub>), with element _v_<sub>i</sub> chosen from the possible values of the *i*-th  parameter. For example, one experiment that lies in the space defined above is (1.1, 3.56, A). The experiment (5, 2, C), on the other hand, _does not_ lie in the space, because Param1 cannot take the value 5.

---
##### Mixture experimental spaces
The experimental space is conceptually defined by a list of experimental parameters (as for factorial experimental spaces), along with two numerical integer values for each parameter, corresponding to the minimum and maximum number of _units_ or "levels" that may be taken by each parameter in an experiment, with the constraint that the total number of units in the experiment must be equal to *u<sub>sum</sub>*. For example:

```
Param1: 0, 4
Param2: 2, 6
Param3: 0, 7
```

with *u<sub>sum</sub>* set to 10.

For a given *u<sub>sum</sub>*, an experiment on *n* parameters lying in a mixture experimental space consists of integer vector (*u*<sub>1</sub>, *u*<sub>2</sub>, ... , *u<sub>n</sub>*), with element *u<sub>i</sub>* chosen within the minimum and maximum number of units for the *i*-th  parameter, and with all elements summing to *u<sub>sum</sub>*. For example, one experiment that lies in the space defined above is (0, 4, 6). Experiment (3, 1, 6), on the other hand, _does not_ lie in the space, because the minimum number of units for Param2 is 2. Experiment (2, 5, 6) also _does not_ lie in the space, because the total number of units is not equal to 10.

**N.B.**: Specifying mixture experiments with _units_ is equivalent to specifying mixture experiments with _proportions_. In fact, for a given *u<sub>sum</sub>* (fixed for all experiments in a mixture space), if *u<sub>i</sub>* is the number of units of the *i*-th parameter in a mixture experiment, *u<sub>i</sub>/u<sub>sum</sub>* is the *proportion* of the *i*-th parameter in that experiment.


## Generation Parameters

<span class="brand">daptics</span> iteratively designs experiments in batches, called *generations*. Each generation contains a sample of experiments intelligently selected from your experimental space. The total number of experiments *N<sub>exp</sub>* in each generation, or "generation size", depends on two parameters:

* Population size *N<sub>p</sub>*, which defines the number of _distinct_ experiments.
* Number of replicates *N<sub>r</sub>*, which defines the number of replicates for each of the distinct experiments.

The relationship between generation size and the above parameters is expressed by this formula: *N<sub>exp</sub>* = *N<sub>p</sub>* (*N<sub>r</sub>* + 1).

##### Requirements:
1. Both population size and number of replicates must be specified as integers.
2. The population size must be no less than 10 or than the number of experimental parameters, whichever is largest.
3. The population size must be no more than 2500 or than 1/3 of the total number of experiments in the experimental space, whichever is smallest.
4. The number of replicates must be between 0 and 10.

In the Python DapticsClient class these parameters are named "populationSize" and "replicates".

## Mixture Volume Constraint Parameter

For all experiments in a mixture experimental space, the total number of units, summed over all experimental parameters, is constrained to be equal to a constant _u<sub>sum</sub>_. This constant must be specified here as a positive integer.

In the Python DapticsClient class this constraint parameter is named "totalUnits".

## Experimental Space Definition
The experimental space definition is specified in a CSV file. The table consists of a row for each parameter, containing the parameter name, the parameter type, and two or more values, depending on whether the experimental space is factorial or mixture.

#### No header row:
The file is supplied without a header row. 

##### Parameter row requirements:
The CSV file will contain one row for each parameter defined. For each row:

1. Entries in each row must be separated by a comma.
2. The name of the parameter, in the first column, must start with a letter, followed by other letters and/or digits, and must contain no spaces.
3. The file must not contain empty rows between non-empty rows, nor empty columns between non-empty columns.

##### Additional requirements for factorial experimental spaces only:
For each row in a factorial experimental space:

1. The type of the parameter, in the second column, should be either 'numerical' or 'categorical'.
2. Each parameter must have at least two values, indicating the values that such parameter can take in an experiment. For example:

     ```
     Param1_name, numerical, 1.1, 2, 3
     Param2_name, numerical, 2, 3.56, 4
     Param3_name, categorical, A, B, C
     ```

3. Categorical parameters whose values can be conceptually _ordered_ should be specified as if they were numerical. For example, a categorical parameter with 3 ordered values (such as 'small', 'medium', 'large'), should be specified as follows:

```
Param_name, numerical, 1, 2, 3
```

##### Additional requirements for mixture experimental spaces only:
For each row in a mixture experimental space:

1. The type of the parameter, in the second column, should all be 'unit'.
2. Each parameter must have exactly two _non-negative integer_ values, indicating the minimum and maximum number of units that such parameter can take in an experiment. For example:

     ```
     Param1_name, unit, 0, 4
     Param2_name, unit, 2, 6
     Param3_name, unit, 0, 7
     ```

3. The minimum number of units for each parameter, summed over all parameters, must be smaller than the total number of units specified in the "totalUnits" constraint parameter.
4. The maximum number of units for a parameter must be greater than the minimum number of units for that parameter, and must be smaller than or equal to the the total number of units specified in the "totalUnits" constraint parameter.

## Initial Experimental Results
At the beginning of a daptics campaign, you may specify parameter values and response measurements for any initial experiments that have been performed, specified in a CSV file. The file consists of a header row, followed by a row for each experiment, containing the values taken by each parameter and the response measurement for that experiment.

#### Header row:
The file must contain a header row of parameter names that match exactly the names of the parameters
specified in the Experimental Space Definition table, and in the same order, separated by commas. 
After the columns for the parameter names, the header row must contain a final column named "Response".

##### Experiment row requirements:
For each experiment row, following the header row:

1. Entries in the row must be separated by a comma.
2. A value must be specified for each of the parameters and for the response of each experiment.
3. For **factorial** experimental spaces only, each _categorical_ parameter value must correspond to one of those specified in the experimental space definition, whereas _numerical_ parameters can take any numerical value.
4. For **mixture** experimental spaces only, the parameter values of each experiment must be non-negative integers and must sum to the "totalUnits" constraint parameter specified for the space.
5. The value in the final "Response" column must be numerical.
6. <span class="brand">daptics</span> assumes your goal is to find experiments that will _maximize_ the response value. If your goal is that of _minimizing_ the response, the response measurements must be multiplied by -1 before being specified in the "Response" column.
7. The file must not contain empty rows between non-empty rows, nor empty columns between non-empty columns.


Example:
```
Param1_name, Param2_name, Param3_name, Response
1.1, 3.56, A, 10
2, 5, C, 12.35
0, 2, A, 3.23
```

## Design Generation

A generation consists of a "design" of experiments, corresponding to a sample of experiments intelligently selected from your experimental space.
If your choice of generation parameters includes 1 or more replicates, the generation will also include replicates of each experiment in the design.
The order in which experiments and replicates are listed in the generation is random. Performing experiments in this non-systematic order helps to reduce bias in the response measurements in case of systematic experimental error (for example, dependent on the specific location where an experiment is performed on a microwell plate).

After you specify the experimental results of the current generation, <span class="brand">daptics</span> will use those results, along with those from all previous generations, to select the best design for the next generation.

## Generated Experimental Results

Responses for <span class="brand">daptics</span>-designed experiments, together with parameter and response values for any extra experiments that have been performed, may be uploaded using a file in <span class="brand">daptics</span>'s standard CSV format. The file consists of a header row, and a row for each experiment, containing the values taken by each parameter and the response measurement for that experiment.

#### Header row:
The file must contain a header row of parameter names that match exactly the names of the parameters
specified in the Experimental Space Definition table, and in the same order, separated by commas. 
After the columns for the parameter names, the header row must contain a final column named "Response".

##### Experiment row requirements:
For each experiment row, following the header row:

1. Entries in the row must be separated by a comma.
2. Each of the rows below the header row must contain parameter and response values for a different <span class="brand">daptics</span>-designed experiment. The parameter values for these rows must match exactly those given in the table on this page.
3. After the <span class="brand">daptics</span>-designed experiment rows, you may _optionally_ add more rows with parameter and response values for any extra experiments you wish to include.
4. For **factorial** experimental spaces only, for each extra experiment, the value of each _categorical_ parameter must correspond to one of those specified in the experimental space definition, whereas _numerical_ parameters can take any numerical value.
5. For **mixture** experimental spaces only, for each extra experiment, the parameter values must be non-negative integers and must sum to the "totalUnits" constraint parameter specified for the space.
6. The value in the final "Response" column, for the <span class="brand">daptics</span>-designed experiments and for any extra experiments, must be numerical.
7. <span class="brand">daptics</span> assumes your goal is to find experiments with _maximal_ the response. If your goal is that of _minimizing_ the response, the response measurements must be multiplied by -1 before being specified in the Response column.
8. Missing response measurements for the <span class="brand">daptics</span>-designed experiments must be specified as "-" (minus sign)
9. Extra experiments, if specified, must not contain any missing response measurements.
10. The file must not contain empty rows between non-empty rows, nor empty columns between non-empty columns.


```
Param1_name, Param2_name, Param3_name, Response
1.1, 2, C, 11.5
2, 4, C, 9.07
2, 2, B, 4.35
```
