# Creating a `Dataset` and populating it with data

In [19]:
# Uninstall old version and reinstall in editable mode
import sys
import subprocess

# Uninstall current version
subprocess.check_call([sys.executable, "-m", "pip", "uninstall", "-y", "mempyfit"])
subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", r"c:\Users\simon\Documents\mempyfit"])

0

In [20]:
from mempyfit import *

We can start by instantiating an empty `Dataset`:

In [21]:
ds = Dataset()

Now we succesively add entries. 
The process is somewhat similar to what is done in AmPtool, with some (intended) syntactical differences. <br>

Below is the minimum amount of information that we need to enter:

In [22]:
ds.add(
    name = 'N_max',
    value = 100,
    units = '#',
    labels = 'maximum cell abundance',
)

[INFO] Assuming N_max to be zerovariate


  elif isinstance(value, np.ndarray) and value.ndim == 2 and value.shape[1] == 2:


This just adds a single data point, what we refer to as "zerovariate". 

The entered `value` can have any shape though.

In [23]:
ds.add(
    name = 'tN', 
    value = np.array([
        [0, 0.1], 
        [1, 0.2], 
        [3, 0.4],
        [4, 0.45], 
        [5, 0.475]
    ]), 
    units = ['d', '#'], 
    labels = ['time', 'cell count']
)

[INFO] Assuming tN to be univariate


  elif isinstance(value, np.ndarray) and value.ndim == 2 and value.shape[1] == 2:


By default, `mempyfit` will assume that matrices are column-oriented. That is meaning to say, each column is a different variable, and each row is a different observation.

It is possible to ignore these defaults, but it means more work.

Also, we assume that the last column is the dependent variable, and (for now) that each entry has exactly one dependent variable. This also follows the add-my-pet style of entering data. 

In [24]:
ds.add(
    name = 't-TN', 
    value = np.array([
        [0, 25, 0.1], 
        [1, 25, 0.2], 
        [3, 25, 0.4],
        [4, 25, 0.45], 
        [5, 25, 0.475], 
        [0, 27.5, 0.1], 
        [1, 27.5, 0.25], 
        [3, 27.5, 0.45],
        [4, 27.5, 0.475], 
        [5, 27.5, 0.48]
    ]), 
    units = ['d', '°C', '#'], 
    labels = ['time', 'temperature', 'cell count']
)

[INFO] Assuming t-TN to be multivariate


Printing the dataset will give us an overview over what we have added:

In [25]:
ds

<Dataset with 3 entries>
  1. N_max (ZEROVARIATE) [['#']] @ nan K
  2. tN (UNIVARIATE) [['d', '#']] @ nan K
  3. t-TN (MULTIVARIATE) [['d', '°C', '#']] @ nan K

We can also retrieve the information on a specific entry, 

In [26]:
ds.getinfo('t-TN')

OrderedDict([('name', 't-TN'),
             ('value', array([[ 0.   , 25.   ,  0.1  ],
                     [ 1.   , 25.   ,  0.2  ],
                     [ 3.   , 25.   ,  0.4  ],
                     [ 4.   , 25.   ,  0.45 ],
                     [ 5.   , 25.   ,  0.475],
                     [ 0.   , 27.5  ,  0.1  ],
                     [ 1.   , 27.5  ,  0.25 ],
                     [ 3.   , 27.5  ,  0.45 ],
                     [ 4.   , 27.5  ,  0.475],
                     [ 5.   , 27.5  ,  0.48 ]])),
             ('units', ['d', '°C', '#']),
             ('labels', ['time', 'temperature', 'cell count']),
             ('temperature', nan),
             ('temperature_units', 'K'),
             ('dimensionality_type', 'MULTIVARIATE'),
             ('bibkey', ''),
             ('comment', '')])

, or just the value:

In [27]:
ds['t-TN']

array([[ 0.   , 25.   ,  0.1  ],
       [ 1.   , 25.   ,  0.2  ],
       [ 3.   , 25.   ,  0.4  ],
       [ 4.   , 25.   ,  0.45 ],
       [ 5.   , 25.   ,  0.475],
       [ 0.   , 27.5  ,  0.1  ],
       [ 1.   , 27.5  ,  0.25 ],
       [ 3.   , 27.5  ,  0.45 ],
       [ 4.   , 27.5  ,  0.475],
       [ 5.   , 27.5  ,  0.48 ]])

See below for all arguments that can be passed on to the `add` method.

In [28]:
help(ds.add)

Help on method add in module mempyfit.dataset:

add(name: str, value, units, labels, temperature: float = nan, temperature_unit: str = 'K', dimensionality_type: mempyfit.dataset.DimensionalityType | None = None, bibkey: str = '', comment: str = '') -> None method of mempyfit.dataset.Dataset instance

