In [None]:
%matplotlib inline


# A quick example

Amputation is the opposite of imputation: the generation of missing values in complete datasets. That is useful in an experimental setting where you want to evaluate the effect of missing values on the outcome of a model. 

:class:`~pyampute.ampute.MultivariateAmputation` is designed following scikit-learn's ``fit`` and ``transform`` paradigm, and can therefore seamless be integrated in a larger data processing pipeline.

Here, we give a short demonstration. A more extensive example can be found in `this example`_. For people who are familiar with the implementation of multivariate amputation in R-function `ampute`_, `this blogpost`_ gives an overview of the similarities and differences with :class:`~pyampute.ampute.MultivariateAmputation`. Inspection of an incomplete dataset can be done with :class:`~pyampute.exploration.md_patterns.mdPatterns`.

Note that the amputation methodology itself is proposed in `Generating missing values for simulation purposes`_ and in `The dance of the mechanisms`_.



In [None]:
# Author: Rianne Schouten <https://rianneschouten.github.io/>
# Co-Author: Davina Zamanzadeh <https://davinaz.me/>

## Transforming one dataset

 Multivariate amputation of one dataset can directly be performed with ``fit_transform``. Inspection of an incomplete dataset can be done with :class:`~pyampute.exploration.md_patterns.mdPatterns`. By default, :class:`~pyampute.ampute.MultivariateAmputation` generates 1 pattern with MAR missingness in 50% of the data rows for 50% of the variables.




In [None]:
import numpy as np

from pyampute.ampute import MultivariateAmputation
from pyampute.exploration.md_patterns import mdPatterns

rng = np.random.RandomState(2022)

m = 1000
n = 10
X_compl = np.random.randn(m,n)

ma = MultivariateAmputation()
X_incompl = ma.fit_transform(X_compl)

mdp = mdPatterns()
patterns = mdp.get_patterns(X_incompl)

## A separate fit and transform

 Integration in a larger pipeline requires separate ``fit`` and ``transform`` functionality. 




In [None]:
from sklearn.model_selection import train_test_split

X_compl_train, X_compl_test = train_test_split(X_compl, random_state=2022)
ma = MultivariateAmputation()
ma.fit(X_compl_train)
X_incompl_test = ma.transform(X_compl_test)

## Integration in a pipeline

 A short pipeline may look as follows. 





In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt

pipe = make_pipeline(MultivariateAmputation(), SimpleImputer())
pipe.fit(X_compl_train)

X_imp_test = pipe.transform(X_compl_test)

By default, ``SimpleImputer`` imputes with the mean of the observed data. It is therefore like that we find the median in 50% of the rows (of the test set, which contains 25% of $m$) for 50% of the variables.



In [None]:
medians = np.nanmedian(X_imp_test, axis=0)
print(np.sum(X_imp_test == medians[None,:], axis=0))

For more information about ``pyampute``'s parameters, see `A mapping from R-function ampute to pyampute`_. To learn how to design a more thorough experiment, see `Evaluating missing values with a simulation pipeline`_.


