# Simulating Missing Data with `MissMechaGenerator`

This notebook demonstrates how to simulate missing data using the `MissMechaGenerator` from the `missmecha` package.

We use:
- A complete synthetic dataset (`data_num`)
- A single missingness mechanism applied globally across all features
- No per-column customization

For each configuration, we display:
- The generated missing data
- The overall missingness rate
- Little's MCAR test result (to assess if data *may* be MCAR)

## Setup
Import required packages


In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
from missmecha import MissMechaGenerator
from missmecha.analysis import compute_missing_rate, MCARTest

### Generate Complete Data


In [2]:
# Create a synthetic numeric dataset
data_num = np.random.default_rng(1).normal(loc=0.0, scale=1.0, size=(1000, 10))
data_num[:5]

array([[ 0.34558419,  0.82161814,  0.33043708, -1.30315723,  0.90535587,
         0.44637457, -0.53695324,  0.5811181 ,  0.3645724 ,  0.2941325 ],
       [ 0.02842224,  0.54671299, -0.73645409, -0.16290995, -0.48211931,
         0.59884621,  0.03972211, -0.29245675, -0.78190846, -0.25719224],
       [ 0.00814218, -0.27560291,  1.29406381,  1.00672432, -2.71116248,
        -1.88901325, -0.17477209, -0.42219041,  0.213643  ,  0.21732193],
       [ 2.11783876, -1.11202076, -0.37760501,  2.04277161,  0.646703  ,
         0.66306337, -0.51400637, -1.64807517,  0.16746474,  0.10901409],
       [-1.22735205, -0.68322666, -0.07204368, -0.94475162, -0.09826997,
         0.09548303,  0.03558624, -0.50629166,  0.59374807,  0.89116695]])

### Train/Test Split

In [3]:
X_train, X_test = train_test_split(data_num, test_size=0.3, random_state=42)

### Run Simulations Across Mechanism Types

In [6]:
missing_type = "mcar"
mechanism_type_list = [1, 2, 3]
missing_rate_list = [0.3, 0.7]

for mechanism_type in mechanism_type_list:
    for missing_rate in missing_rate_list:
        print(f"Mechanism: {missing_type.upper()}-{mechanism_type} | Missing rate: {missing_rate}")
        
        # Initialize generator
        mecha = MissMechaGenerator(
            mechanism=missing_type,
            mechanism_type=mechanism_type,
            missing_rate=missing_rate,
            seed=42
        )

        # Fit and apply
        X_missing = mecha.fit_transform(X_train)

        # Report missing rate
        compute_missing_rate(X_missing)

        # Run Little's test
        pval = MCARTest(method="little")(X_missing)
        print("-----------------------------------------------------------")

Mechanism: MCAR-1 | Missing rate: 0.3
Overall missing rate: 30.73%
2151 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col7,232,33.14,468,float64,700
col4,223,31.86,477,float64,700
col0,221,31.57,479,float64,700
col3,220,31.43,480,float64,700
col5,218,31.14,482,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.105967
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MCAR-1 | Missing rate: 0.7
Overall missing rate: 70.74%
4952 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col7,514,73.43,186,float64,700
col4,501,71.57,199,float64,700
col3,499,71.29,201,float64,700
col1,497,71.0,203,float64,700
col8,494,70.57,206,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.763772
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MCAR-2 | Missing rate: 0.3
Overall missing rate: 30.00%
2100 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col6,222,31.71,478,float64,700
col8,221,31.57,479,float64,700
col3,217,31.0,483,float64,700
col2,216,30.86,484,float64,700
col7,215,30.71,485,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.329909
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MCAR-2 | Missing rate: 0.7
Overall missing rate: 70.00%
4900 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col5,500,71.43,200,float64,700
col9,499,71.29,201,float64,700
col7,494,70.57,206,float64,700
col3,491,70.14,209,float64,700
col8,490,70.0,210,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.874607
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MCAR-3 | Missing rate: 0.3
Overall missing rate: 30.00%
2100 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,210,30.0,490,float64,700
col1,210,30.0,490,float64,700
col2,210,30.0,490,float64,700
col3,210,30.0,490,float64,700
col4,210,30.0,490,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.304731
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MCAR-3 | Missing rate: 0.7
Overall missing rate: 70.00%
4900 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,490,70.0,210,float64,700
col1,490,70.0,210,float64,700
col2,490,70.0,210,float64,700
col3,490,70.0,210,float64,700
col4,490,70.0,210,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.572157
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------


In [7]:
missing_type = "mar"
mechanism_type_list = [1, 2, 3]
missing_rate_list = [0.3, 0.7]

for mechanism_type in mechanism_type_list:
    for missing_rate in missing_rate_list:
        print(f"Mechanism: {missing_type.upper()}-{mechanism_type} | Missing rate: {missing_rate}")
        
        # Initialize generator
        mecha = MissMechaGenerator(
            mechanism=missing_type,
            mechanism_type=mechanism_type,
            missing_rate=missing_rate,
            seed=42
        )

        # Fit and apply
        X_missing = mecha.fit_transform(X_train)

        # Report missing rate
        compute_missing_rate(X_missing)

        # Run Little's test
        pval = MCARTest(method="little")(X_missing)
        print("-----------------------------------------------------------")

Mechanism: MAR-1 | Missing rate: 0.3
Overall missing rate: 21.19%
1483 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col8,225,32.14,475,float64,700
col1,218,31.14,482,float64,700
col7,213,30.43,487,float64,700
col2,213,30.43,487,float64,700
col5,212,30.29,488,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.000000
Decision: Reject the null hypothesis at significance level α = 0.05
The data is unlikely to be Missing Completely At Random (MCAR).
-----------------------------------------------------------
Mechanism: MAR-1 | Missing rate: 0.7
Overall missing rate: 48.80%
3416 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col8,505,72.14,195,float64,700
col2,496,70.86,204,float64,700
col5,488,69.71,212,float64,700
col3,484,69.14,216,float64,700
col1,483,69.0,217,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.000000
Decision: Reject the null hypothesis at significance level α = 0.05
The data is unlikely to be Missing Completely At Random (MCAR).
-----------------------------------------------------------
Mechanism: MAR-2 | Missing rate: 0.3
Overall missing rate: 27.51%
1926 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col4,700,100.0,0,float64,700
col5,348,49.71,352,float64,700
col3,337,48.14,363,float64,700
col1,202,28.86,498,float64,700
col2,118,16.86,582,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.943256
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MAR-2 | Missing rate: 0.7
Overall missing rate: 48.06%
3364 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col4,700,100.0,0,float64,700
col3,700,100.0,0,float64,700
col5,700,100.0,0,float64,700
col1,473,67.57,227,float64,700
col2,276,39.43,424,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.427275
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MAR-3 | Missing rate: 0.3
Overall missing rate: 29.93%
2095 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col4,542,77.43,158,float64,700
col1,318,45.43,382,float64,700
col5,298,42.57,402,float64,700
col3,277,39.57,423,float64,700
col2,247,35.29,453,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.307300
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MAR-3 | Missing rate: 0.7
Overall missing rate: 61.29%
4290 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col1,700,100.0,0,float64,700
col4,700,100.0,0,float64,700
col5,696,99.43,4,float64,700
col3,647,92.43,53,float64,700
col2,578,82.57,122,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.599871
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------


In [8]:
missing_type = "mnar"
mechanism_type_list = [1, 2, 3]
missing_rate_list = [0.3, 0.7]

for mechanism_type in mechanism_type_list:
    for missing_rate in missing_rate_list:
        print(f"Mechanism: {missing_type.upper()}-{mechanism_type} | Missing rate: {missing_rate}")
        
        # Initialize generator
        mecha = MissMechaGenerator(
            mechanism=missing_type,
            mechanism_type=mechanism_type,
            missing_rate=missing_rate,
            seed=42
        )

        # Fit and apply
        X_missing = mecha.fit_transform(X_train)

        # Report missing rate
        compute_missing_rate(X_missing)

        # Run Little's test
        pval = MCARTest(method="little")(X_missing)
        print("-----------------------------------------------------------")

Mechanism: MNAR-1 | Missing rate: 0.3
Overall missing rate: 18.51%
1296 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,448,64.0,252,float64,700
col9,424,60.57,276,float64,700
col6,424,60.57,276,float64,700
col1,0,0.0,700,float64,700
col2,0,0.0,700,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.962032
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MNAR-1 | Missing rate: 0.7
Overall missing rate: 38.79%
2715 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,453,64.71,247,float64,700
col9,417,59.57,283,float64,700
col6,414,59.14,286,float64,700
col8,367,52.43,333,float64,700
col7,364,52.0,336,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.755542
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MNAR-2 | Missing rate: 0.3
Overall missing rate: 30.90%
2163 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col2,235,33.57,465,float64,700
col8,228,32.57,472,float64,700
col4,226,32.29,474,float64,700
col6,225,32.14,475,float64,700
col0,216,30.86,484,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.001424
Decision: Reject the null hypothesis at significance level α = 0.05
The data is unlikely to be Missing Completely At Random (MCAR).
-----------------------------------------------------------
Mechanism: MNAR-2 | Missing rate: 0.7
Overall missing rate: 70.07%
4905 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col8,506,72.29,194,float64,700
col0,502,71.71,198,float64,700
col4,501,71.57,199,float64,700
col2,487,69.57,213,float64,700
col7,486,69.43,214,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.374838
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MNAR-3 | Missing rate: 0.3
Overall missing rate: 30.33%
2123 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,229,32.71,471,float64,700
col4,225,32.14,475,float64,700
col1,217,31.0,483,float64,700
col9,215,30.71,485,float64,700
col8,211,30.14,489,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.708031
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
Mechanism: MNAR-3 | Missing rate: 0.7
Overall missing rate: 69.57%
4870 / 7000 total values are missing.

Top variables by missing rate:


Unnamed: 0_level_0,n_missing,missing_rate (%),n_unique,dtype,n_total
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
col0,493,70.43,207,float64,700
col3,491,70.14,209,float64,700
col7,491,70.14,209,float64,700
col8,491,70.14,209,float64,700
col6,491,70.14,209,float64,700


Method: Little's MCAR Test
Test Statistic p-value: 0.776942
Decision: Fail to reject the null hypothesis at significance level α = 0.05
Interpretation: There is insufficient evidence to suggest the data deviates from MCAR.
-----------------------------------------------------------
