# Managing annotations

`MuData` objects have multimodal annotations stored in the same way as `AnnData` objects. For instance, observations are annotated using the `.obs` table, and variables are annotated usign the `.var` table.

As observations and variables of the `MuData` object are derived from observations and variables of individual modalities, it can be useful to copy or to move annotations between the global table and the tables of individual modalities tables.

For this, `mudata` offers `.pull_obs()` / `.pull_var()` methods to copy metadata from individual modalities to the global annotation (`.obs` or `.var`). The opposite flow of metadata — from global metadata to individual modalities — can be achieved with `.push_obs()` / `.push_var()` methods.

In [1]:
import numpy as np
import pandas as pd
from mudata import *

## Annotations in multimodal objects

### Pulling annotations

There are a few parameters that can help to specify which annotations to be pulled. Generally, there are two ways of specifying the annotation columnns: providing them explicitely with `columns=[...]` and providing the types of columns to be pulled (e.g. `common` or `unique`).

#### Pulling feature annotations with `.pull_var()`

For demonstration purposes, we will use a simple `MuData` object with some annotations for the features:

In [2]:
def make_mdata():
    N = 100
    D1, D2, D3 = 10, 20, 30
    D = D1 + D2 + D3

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1))
    mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)]
    mod1.var_names = [f"var{i}" for i in range(D1)]

    mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2))
    mod2.obs_names = mod1.obs_names.copy()
    mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)]

    mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3))
    mod3.obs_names = mod1.obs_names.copy()
    mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)]

    # common column already present in all modalities
    mod1.var["highly_variable"] = True
    mod2.var["highly_variable"] = np.tile([False, True], D2 // 2)
    mod3.var["highly_variable"] = np.tile([True, False], D3 // 2)

    # column present in some (2 out of 3) modalities (non-unique)
    mod2.var["arange"] = np.arange(D2)
    mod3.var["arange"] = np.arange(D3)

    # column present in one modality (unique)
    mod3.var["is_region"] = True

    mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3})
    return mdata

In [3]:
mdata = make_mdata()
# TODO: shouldn't be needed from 0.4
# mdata.update(pull=False)
mdata.var = mdata.var.loc[:,[]]

In [4]:
mdata

**All columns.** By default, all columns will be pulled:

In [5]:
mdata.pull_var(join_nonunique=True)
mdata.var.dtypes

highly_variable    boolean
arange             float64
mod3:is_region     boolean
dtype: object

In [6]:
# Clean up
mdata.var = mdata.var.loc[:,[]]

**`columns=...`** Individual columns can be specified to be used in this operation. Both `colname` and `modname:colname` formats are supported.

A column that is present across modalities will be pulled from all the modalities:

In [7]:
mdata.pull_var(columns=["highly_variable"])
print(f"{(~pd.isnull(mdata.var.highly_variable)).sum()} values in mdata.var.highly_variable")
mdata.var.dtypes

60 values in mdata.var.highly_variable


highly_variable    boolean
dtype: object

In [8]:
mdata.var = mdata.var.loc[:,[]]

Pull particular columns, e.g. a single column from a specified modality:

In [9]:
mdata.pull_var(columns=["mod2:highly_variable"])
print(f"{(~pd.isnull(mdata.var['mod2:highly_variable'])).sum()} values in mdata.var['mod2:highly_variable']")
mdata.var.dtypes

20 values in mdata.var['mod2:highly_variable']


mod2:highly_variable    boolean
dtype: object

In [10]:
mdata.var = mdata.var.loc[:,[]]

As a result, `mdata.var['mod2:highly_variable']` will be [a nullable boolean array](https://pandas.pydata.org/docs/user_guide/boolean.html) with corresponding values from `mdata['mod2'].var.highly_variable`. 
The value of `highly_variable` for features from other modalities is `NA`.

**`common`, `nonunique`, `unique`** Note that the *common* annotation is now prefixed with a modality name as it is has been requested from a limited set of modalities. In this case it behaves similarly to a *unique* column such as `is_region` in the `mod3`. The third type of annotations is *non-unique* — those are the ones that are present in some but not all modalities. 

In [11]:
mdata.pull_var(common=True, nonunique=True, unique=False)
mdata.var.dtypes

highly_variable    boolean
mod2:arange        float64
mod3:arange        float64
dtype: object

In [12]:
mdata.var = mdata.var.loc[:,[]]

This makes it possible to pull only unique, i.e. modality-specific, columns:

In [13]:
# unique column
mdata.pull_var(unique=True, common=False, nonunique=False)
mdata.var.dtypes

mod3:is_region    boolean
dtype: object

In [14]:
mdata.var = mdata.var.loc[:,[]]

... just as it is possible to pull a specific *unique* column without specifying the modality name:

In [15]:
# unique column
mdata.pull_var(columns=["is_region"])
mdata.var.dtypes

mod3:is_region    boolean
dtype: object

In [16]:
mdata.var = mdata.var.loc[:,[]]

**`join_common=..., join_nonunique=...`** Use `join_common=False` and `join_nonunique=True` to change if the annotations are collated across modalities. Unique columns are always prefixed by modality name.

Compare `join_nonunique=False`:

In [17]:
mdata.pull_var(columns=["arange"], join_nonunique=False)
mdata.var.dtypes

mod2:arange    float64
mod3:arange    float64
dtype: object

In [18]:
mdata.var = mdata.var.loc[:,[]]

— with `join_nonunique=True`:

In [19]:
mdata.pull_var(columns=["arange"], join_nonunique=True)
mdata.var.dtypes

arange    float64
dtype: object

In [20]:
mdata.var = mdata.var.loc[:,[]]

**`mods=...`** It is also possible to limit the amount of modalities to pull columns from. For example, `columns=["mod1:highly_variable", "mod3:highly_variable"]` can also be expressed as

In [21]:
mdata.pull_var(columns=["highly_variable"], mods=["mod1", "mod3"])
mdata.var.dtypes

mod1:highly_variable    boolean
mod3:highly_variable    boolean
dtype: object

In [22]:
mdata.var = mdata.var.loc[:,[]]

Last but not least, columns can be automatically dropped from source.

In [23]:
mdata.pull_var(nonunique=False, unique=False, drop=True)
mdata.var.dtypes

highly_variable    boolean
dtype: object

The `highly_variable` label has thus been effectively moved from the individual modalities to the global annotation:

In [24]:
for mod in mdata.mod.values():
    print("highly_variable" in mod.var.columns)

False
False
False


#### Pulling samples annotations with `.pull_obs()`

Annotating individual observations is one of the key steps of analytical workflows. For instance, in single-cell sequencing datasets, observations are individual cells, and annotating their identity (cell type, cell state, etc.) as well as managing their source (tissue, organ, donor identity, species) are pivotal for understainding the underlying biology. Those operations are also complicated by multi-layered structure of multimodal datasets.

The `.pull_obs()` method of `MuData` aims to abstract this complexity away.

For demonstration purposes, we will use a simple `MuData` object with some annotations for the observations:

In [25]:
def make_mdata():
    N = 100
    D1, D2, D3 = 10, 20, 30
    D = D1 + D2 + D3

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1))
    mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)]
    mod1.var_names = [f"var{i}" for i in range(D1)]

    mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2))
    mod2.obs_names = mod1.obs_names.copy()
    mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)]

    mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3))
    mod3.obs_names = mod1.obs_names.copy()
    mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)]

    # common column already present in all modalities
    mod1.obs["qc"] = True
    mod2.obs["qc"] = True
    mod3.obs["qc"] = np.tile([True, False], N // 2)

    # column present in some (2 out of 3) modalities (non-unique)
    mod2.obs["arange"] = np.arange(N)
    mod3.obs["arange"] = np.arange(N, 2*N)

    # column present in one modality (unique)
    mod3.obs["mod3_cell"] = True

    mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3})
    return mdata

In [26]:
mdata = make_mdata()
# TODO: shouldn't be needed from 0.4
# mdata.update(pull=False)
mdata.obs = mdata.obs.loc[:,[]]

In [27]:
mdata

In a multimodal object, observations are shared across modalities. For this reason, `join_*` arguments cannot be set to `True`, and the annotations will always be prefixed with a modality name. Apart from this, the underlying implementation as well as the available parameters are the same as demonstrated above for `.var`.

**All columns.** By default, all columns will be pulled:

In [28]:
mdata.pull_obs()
mdata.obs.dtypes

mod1:qc           boolean
mod2:arange         int64
mod2:qc           boolean
mod3:arange         int64
mod3:mod3_cell    boolean
mod3:qc           boolean
dtype: object

In [29]:
# Clean up
mdata.obs = mdata.obs.loc[:,[]]

**`columns=...`** Individual columns can be specified to be used in this operation. Both `colname` and `modname:colname` formats are supported.

A column that is present across modalities will be pulled from all the modalities:

In [30]:
mdata.pull_obs(columns=["qc"])
mdata.obs.dtypes

mod1:qc    boolean
mod2:qc    boolean
mod3:qc    boolean
dtype: object

In [31]:
mdata.obs = mdata.obs.loc[:,[]]

Pull particular columns, e.g. a single column from a specified modality:

In [32]:
mdata.pull_obs(columns=["mod2:qc"])
mdata.obs.dtypes

mod2:qc    boolean
dtype: object

In [33]:
mdata.obs = mdata.obs.loc[:,[]]

**`common`, `nonunique`, `unique`** Column types are deduced according to the presence in all / some / single modality(-ies). Because of the sharedness structure, they will all be prefixed by a modality name:

In [34]:
mdata.pull_obs(common=True, nonunique=True, unique=False)
mdata.obs.dtypes

mod1:qc        boolean
mod2:arange      int64
mod2:qc        boolean
mod3:arange      int64
mod3:qc        boolean
dtype: object

In [35]:
mdata.obs = mdata.obs.loc[:,[]]

So it is possible to pull only unique, i.e. modality-specific, columns:

In [36]:
# unique column
mdata.pull_obs(unique=True, common=False, nonunique=False)
mdata.obs.dtypes

mod3:mod3_cell    boolean
dtype: object

In [37]:
mdata.obs = mdata.obs.loc[:,[]]

... just as it is possible to pull a specific *unique* column without specifying the modality name:

In [38]:
# unique column
mdata.pull_obs(columns=["mod3_cell"])
mdata.obs.dtypes

mod3:mod3_cell    boolean
dtype: object

In [39]:
mdata.obs = mdata.obs.loc[:,[]]

**`mods=...`** It is also possible to limit the amount of modalities to pull columns from. For example, `columns=["mod1:qc", "mod3:qc"]` can also be expressed as

In [40]:
mdata.pull_obs(columns=["qc"], mods=["mod1", "mod3"])
mdata.obs.dtypes

mod1:qc    boolean
mod3:qc    boolean
dtype: object

In [41]:
mdata.obs = mdata.obs.loc[:,[]]

Last but not least, columns can be automatically dropped from source.

In [42]:
mdata.pull_obs(nonunique=False, unique=False, drop=True)
mdata.obs.dtypes

mod1:qc    boolean
mod2:qc    boolean
mod3:qc    boolean
dtype: object

The `qc` label has thus been effectively moved from the individual modalities to the global annotation:

In [43]:
for mod in mdata.mod.values():
    print("qc" in mod.obs.columns)

False
False
False


### Pushing annotations

Annotations can also be *pushed* from the global .var or .obs table to the individual modalities.

#### Pushing feature annotations with `.push_var()`

For demonstration purposes, we will use a simple `MuData` object with some global annotations for the features:

In [44]:
def make_mdata():
    N = 100
    D1, D2, D3 = 10, 20, 30
    D = D1 + D2 + D3

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1))
    mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)]
    mod1.var_names = [f"var{i}" for i in range(D1)]

    mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2))
    mod2.obs_names = mod1.obs_names.copy()
    mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)]

    mod3 = AnnData(np.arange(5101, 8101, 1).reshape(-1, D3))
    mod3.obs_names = mod1.obs_names.copy()
    mod3.var_names = [f"var{i}" for i in range(D1 + D2, D)]
    
    mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3})

    # common column to be propagated to all modalities
    mdata.var["highly_variable"] = True

    # prefix column to be propagated to the respective modalities
    mdata.var["mod2:if_mod2"] = np.concatenate([
        np.repeat(pd.NA, D1), 
        np.repeat(True, D2),
        np.repeat(pd.NA, D3),
    ])

    return mdata

In [45]:
mdata = make_mdata()

In [46]:
mdata

**`push_var()`** will add a `highly_variable` column to each modality and a `if_mod2` column to the `mod2` modality:

In [47]:
mdata.push_var()

In [48]:
for m in mdata.mod.keys():
    print(mdata[m].var.dtypes)

highly_variable    bool
dtype: object
highly_variable      bool
if_mod2            object
dtype: object
highly_variable    bool
dtype: object


In [49]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].var = mdata[m].var.loc[:,[]]

**`common=`, `prefixed=`** options can be used to adjust the selection of columns to be pushed — non-prefixed (*common*) ones and/or the ones *prefixed* with modality name.

Only common:

In [50]:
mdata.push_var(common=True, prefixed=False)

In [51]:
for m in mdata.mod.keys():
    print(mdata[m].var.dtypes)

highly_variable    bool
dtype: object
highly_variable    bool
dtype: object
highly_variable    bool
dtype: object


In [52]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].var = mdata[m].var.loc[:,[]]

... or only prefixed columns can be pushed:

In [53]:
mdata.push_var(common=False, prefixed=True)

In [54]:
for m in mdata.mod.keys():
    print(mdata[m].var.dtypes)

Series([], dtype: object)
if_mod2    object
dtype: object
Series([], dtype: object)


In [55]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].var = mdata[m].var.loc[:,[]]

Prefixed columns are pushed to the respective modalities.

Alternatively, **`columns=`** allows to provide an explicit list of columns to be propagated to modalities, and **`mods=`** allows to limit the modalities for propagating annotations:

In [56]:
mdata.push_var(columns=["highly_variable"], mods=["mod3"])

In [57]:
for m in mdata.mod.keys():
    print(mdata[m].var.dtypes)

Series([], dtype: object)
Series([], dtype: object)
highly_variable    bool
dtype: object


In [58]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].var = mdata[m].var.loc[:,[]]

Annotations can be also dropped from the `mdata.var` after pushing them from individual modalities with **`drop=True`** — or just dropped without propagation with **`only_drop=True`**:

In [59]:
mdata.push_var(prefixed=False, drop=True)
mdata.push_var(columns=["if_mod2"], only_drop=True)

This will propagate `highly_variable` column to all the modalities and drop it from the `mdata.var`, and will also drop `mdata.var.if_mod2` column:

In [60]:
print(f"mdata.var columns:\n{mdata.var.dtypes}")

mdata.var columns:
Series([], dtype: object)


In [61]:
print(f"mdata['mod2'].var columns:\n{mdata['mod2'].var.dtypes}")

mdata['mod2'].var columns:
highly_variable    bool
dtype: object


#### Pushing samples annotations with `.push_obs()`

For demonstration purposes, we will use a simple `MuData` object with some global annotations for the observations:

In [62]:
def make_mdata():
    N = 100
    D1, D2 = 10, 20
    D = D1 + D2

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(-1, D1))
    mod1.obs_names = [f"obs{i}" for i in range(mod1.n_obs)]
    mod1.var_names = [f"var{i}" for i in range(D1)]

    mod2 = AnnData(np.arange(3101, 5101, 1).reshape(-1, D2))
    mod2.obs_names = mod1.obs_names.copy()
    mod2.var_names = [f"var{i}" for i in range(D1, D1 + D2)]

    mdata = MuData({"mod1": mod1, "mod2": mod2})

    # common column to be propagated to all modalities
    mdata.obs["true"] = True

    return mdata

In [63]:
mdata = make_mdata()

In [64]:
mdata

**`push_obs()`** will add a `true` column to each modality:

In [65]:
mdata.push_obs()

In [66]:
for m in mdata.mod.keys():
    print(mdata[m].obs.dtypes)

true    bool
dtype: object
true    bool
dtype: object


In [67]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].obs = mdata[m].obs.loc[:,[]]

**`common=`, `prefixed=`** options can be used to adjust the selection of columns to be pushed — non-prefixed (*common*) ones and/or the ones *prefixed* with modality name:

In [68]:
mdata.push_obs(common=False)

In [69]:
for m in mdata.mod.keys():
    print(mdata[m].obs.dtypes)

Series([], dtype: object)
Series([], dtype: object)


In [70]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].obs = mdata[m].obs.loc[:,[]]

Alternatively, **`columns=`** allows to provide an explicit list of columns to be propagated to modalities, and **`mods=`** allows to limit the modalities for propagating annotations:

In [71]:
mdata.push_obs(columns=["true"], mods=["mod2"])

In [72]:
for m in mdata.mod.keys():
    print(f"modality {m}:")
    print(mdata[m].obs.dtypes)
    print()

modality mod1:
Series([], dtype: object)

modality mod2:
true    bool
dtype: object



In [73]:
# Clean up
for m in mdata.mod.keys():
    mdata[m].obs = mdata[m].obs.loc[:,[]]

Annotations can be also dropped from `mdata.obs` after pushing them from individual modalities with **`drop=True`** — or just dropped without propagation with **`only_drop=True`**:

In [74]:
mdata.push_obs(only_drop=True)

This will just drop `mdata.var.true` column:

In [75]:
print(f"mdata.obs columns:\n{mdata.obs.dtypes}")

mdata.obs columns:
Series([], dtype: object)


In [76]:
print(f"mdata['mod2'].obs columns:\n{mdata['mod2'].obs.dtypes}")

mdata['mod2'].obs columns:
Series([], dtype: object)


## Multi-dataset annotations

[The axes interface](https://mudata.readthedocs.io/en/latest/notebooks/axes.html) enables `MuData` to be used beyond multimodal data. This includes multi-dataset containers with `axis=1` (shared features) and data subsets with `axis=-1` (shared observations and features).

In [77]:
def make_mdata():
    N1, N2, N3 = 10, 20, 30
    N = N1 + N2 + N3
    D = 100

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(N1, -1))
    mod1.obs_names = [f"obs{i}" for i in range(N1)]
    mod1.var_names = [f"var{i}" for i in range(D)]

    mod2 = AnnData(np.arange(3101, 5101, 1).reshape(N2, -1))
    mod2.obs_names = [f"obs{i}" for i in range(N1, N1 + N2)]
    mod2.var_names = mod1.var_names.copy()

    mod3 = AnnData(np.arange(5101, 8101, 1).reshape(N3, -1))
    mod3.obs_names = [f"obs{i}" for i in range(N1 + N2, N)]
    mod3.var_names = mod1.var_names.copy()

    # common column already present in all modalities
    mod1.obs["dataset"] = "dataset1"
    mod2.obs["dataset"] = "dataset2"
    mod3.obs["dataset"] = "dataset3"

    # column present in some (2 out of 3) modalities (non-unique)
    mod2.obs["species"] = "human"
    mod3.obs["species"] = "mouse"

    # column present in one modality (unique)
    mod3.obs["reference"] = True

    mdata = MuData({"mod1": mod1, "mod2": mod2, "mod3": mod3}, axis=1)
    return mdata

In [78]:
mdata = make_mdata()
# TODO: shouldn't be needed from 0.4
# mdata.update(pull=False)
mdata.obs = mdata.obs.loc[:,[]]
mdata.var = mdata.var.loc[:,[]]

In [79]:
mdata

In [80]:
mdata.pull_obs(join_nonunique=True, prefix_unique=False)
mdata.obs.dtypes

dataset       object
species       object
reference    boolean
dtype: object

In [81]:
mdata.pull_var()
mdata.var.dtypes

Series([], dtype: object)

## Stages annotations

`MuData` objects with `mdata.axis == -1` can contains "modalities" that have both samples and features shared. This can be useful for example for storing different processing stages, with both samples and features being filtered out with some quality control (QC) procedures.

Similarly to other axes, `.pull_obs()`/`pull_var()` and `.push_obs()`/`.push_var()` work as well.

In [82]:
def make_staged_mdata():
    N, D = 10, 100
    Nsub, Dsub = 8, 50

    mod1 = AnnData(np.arange(0, 100, 0.1).reshape(N, D))
    mod1.obs_names = [f"obs{i}" for i in range(N)]
    mod1.var_names = [f"var{i}" for i in range(D)]

    mod2 = AnnData(np.arange(3101, 3501, 1).reshape(Nsub, Dsub))
    mod2.obs_names = [f"obs{i}" for i in range(Nsub)]
    mod2.var_names = [f"var{i}" for i in range(Dsub)]

    # common column already present in all modalities
    mod1.obs["status"] = True
    mod2.obs["status"] = True

    # column present in one modality (unique)
    mod2.obs["filtered"] = True
    mod2.var["filtered"] = True

    mdata = MuData({"raw": mod1, "qced": mod2}, axis=-1)
    return mdata

In [83]:
mdata = make_staged_mdata()
# TODO: shouldn't be needed from 0.4
# mdata.update(pull=False)
mdata.obs = mdata.obs.loc[:,[]]
mdata.var = mdata.var.loc[:,[]]

In [84]:
mdata

In [85]:
mdata.pull_obs(prefix_unique=False)
mdata.obs.dtypes

raw:status     boolean
filtered       boolean
qced:status    boolean
dtype: object

In [86]:
mdata.pull_var(prefix_unique=False)
mdata.var.dtypes

filtered    boolean
dtype: object

## Nested `MuData` objects

Annotations can be also managed for nested MuData objects:

In [87]:
def make_nested_mdata():
    stages = make_staged_mdata()
    stages.obs = stages.obs.loc[:,[]]  # pre-0.3
    
    mod2 = AnnData(np.arange(10000, 12000, 1).reshape(10, -1))
    mod2.obs_names = [f"obs{i}" for i in range(mod2.n_obs)]
    mod2.var_names = [f"mod2:var{i}" for i in range(mod2.n_vars)]

    mdata = MuData({"mod1": stages, "mod2": mod2}, axis=-1)
    
    mdata.obs["dataset"] = "ref"

    return mdata

In [88]:
mdata = make_nested_mdata()
mdata

In [89]:
print(mdata.mod)

{'mod1': MuData object with n_obs × n_vars = 10 × 100
  2 modalities
    raw:	10 x 100
      obs:	'status'
    qced:	8 x 50
      obs:	'status', 'filtered'
      var:	'filtered', 'mod2': AnnData object with n_obs × n_vars = 10 × 200}


Propagation is not recursive by intention, and annotations in the inner `mod1` should be explicitely pushed down to individual `AnnData` objects when desired:

In [90]:
mdata.push_obs()

In [91]:
for m, mod in mdata.mod.items():
    print(mod.obs.dtypes)

dataset    object
dtype: object
dataset    object
dtype: object


In [92]:
for m, mod in mdata['mod1'].mod.items():
    print(mod.obs.dtypes)

status    bool
dtype: object
status      bool
filtered    bool
dtype: object


An example of the recursive `push_obs()` operation:

In [93]:
def push_obs_rec(mdata: MuData):
    mdata.push_obs()
    for m, mod in mdata.mod.items():
        if isinstance(mod, MuData):
            push_obs_rec(mod)

In [94]:
push_obs_rec(mdata)

In [95]:
for m, mod in mdata['mod1'].mod.items():
    assert "dataset" in mod.obs