# Learning with partial data

In this tutorial we detail an example on how to define a partially observed dataset compatible with MultiVae models. 

## Incomplete dataset

The MultiVae library has an simple class to handle incomplete datasets: the IncompleteDataset class. 

Below we demonstrate how to initialize an dataset from tensors using this class. 

In [7]:
from multivae.data.datasets import IncompleteDataset, DatasetOutput
import torch

# Define random data samples
data = dict(
    modality_1 = torch.randn((100,3,32,32)),
    modality_2 = torch.randn((100, 1, 28, 28))
)
# Define random masks 
masks = dict(
    modality_1 = torch.bernoulli(0.7*torch.ones((100,))),
    modality_2 = torch.ones((100,))
)

labels = torch.bernoulli(0.5*torch.ones((100,)))

dataset = IncompleteDataset(data, masks, labels)


But you can also define completely custom dataset with the same structure as the IncompleteDataset Class:
- the get_item method must return a `DatasetOutput`instance with a field `data`containing a dictionary, `masks` containing also a dictionary, and an optional `labels`field containing a tensor. 

As an example, check out the MMNIST dataset class in `multivae.data.datasets.mmnist`. That dataset has five image modalities and can be initialized with partially missing data.


The following models in MultiVae can be trained using partially observed data:
- MMVAE
- MVAE
- MoPoE
- MVTCAE
- MMVAE+

using the exact same training process as complete dataset. 

In [None]:
from multivae.data.datasets.mmnist import MMNISTDataset

dataset = MMNISTDataset(data_path = '../../../data/',
                        download=True,
                        missing_ratio=0.2)