<div class="alert alert-warning" role="alert">
    <b style="font-size: 1.5em;">🚧 Warning</b>
    <br>
    <br>
    <p>
        <i>
        "<b>Implicit</b> refers to anything that is understood to be included  
        without being directly or explicitly stated."
        </i>
    </p>
    <p>
    An <code>implicit missing value</code> indicates that the missing value  
    <b>should be included</b> in the dataset under analysis,  
    <b>without it being explicitly stated</b> or <b>specified</b>.  
    These values are usually found when pivoting data  
    or counting the occurrences of variable combinations in the study.  
    </p>
</div>

In [18]:
import pandas as pd
import sys
import pyprojroot
import numpy as np
import janitor
sys.path.append(str(pyprojroot.here()))
from src.utils import make_dir_function
from src.pandas_missing_extension import MissingMethods

In [19]:
implicit_to_explicit_df = pd.DataFrame.from_dict(
    data={
        "name": ["lynn", "lynn", "lynn", "zelda"],
        "time": ["morning", "afternoon", "night", "morning"],
        "value": [350, 310, np.nan, 320]
    }
)

implicit_to_explicit_df

Unnamed: 0,name,time,value
0,lynn,morning,350.0
1,lynn,afternoon,310.0
2,lynn,night,
3,zelda,morning,320.0


### Strategies for Identifying Implicit Missing Values"

Pivot the data table

In [20]:
pivoted_data = implicit_to_explicit_df.pivot(index="name", columns="time", values="value").reset_index()
pivoted_data

time,name,afternoon,morning,night
0,lynn,310.0,350.0,
1,zelda,,320.0,


Quantify occurrences of n-tuples

In [21]:
(
    implicit_to_explicit_df.value_counts(
        subset=['name']
    ).reset_index(
        name="n"
    ).query(
        "n < 3"
    )
)

Unnamed: 0,name,n
1,zelda,1


### Expose implicit missing rows as explicit

<div class="alert alert-info">
    <b style="font-size: 1.5em;">📘 Information</b>
    <p>
       <a href="https://pyjanitor-devs.github.io/pyjanitor/api/functions/#janitor.functions.complete.complete" class="alert-link"><code>janitor.complete()</code></a> is modeled after the <a href="https://tidyr.tidyverse.org/reference/complete.html" class="alert-link"><code>complete()</code></a> function from the <a href="https://tidyr.tidyverse.org/index.html" class="alert-link"><code>tidyr</code></a> package and serves as a <i>wrapper</i> around <a href="https://pyjanitordevs.github.io/pyjanitor/api/functions/#janitor.functions.expand_grid.expand_grid" class="alert-link"><code>janitor.expand_grid()</code></a>, <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html" class="alert-link"><code>pd.merge()</code></a>, and <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html" class="alert-link"><code>pd.fillna()</code></a>. In a way, it is the opposite of <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html" class="alert-link"><code>pd.dropna()</code></a>, as it implicitly exposes missing rows.
    </p>
    <p>
    Possible inputs include column name combinations, a list/tuple of column names, or even a dictionary of column names and new values.
    </p>
    <p>
    <a href="https://pandas.pydata.org/docs/user_guide/advanced.html"><code>MultiIndex</code></a> columns are not supported.
    </p>
</div>


In [24]:
implicit_to_explicit_df.complete(
    "name",
    "time"
)

Unnamed: 0,name,time,value
0,lynn,morning,350.0
1,lynn,afternoon,310.0
2,lynn,night,
3,zelda,morning,320.0
4,zelda,afternoon,
5,zelda,night,


Limit the exposure of n-tuples of missing values

In [27]:
implicit_to_explicit_df.complete(
    {"name": ["lynn", "zelda"]}, 
    {"time": ["morning", "afternoon"]},
    sort=True
)

Unnamed: 0,name,time,value
0,lynn,afternoon,310.0
1,lynn,morning,350.0
2,zelda,afternoon,
3,zelda,morning,320.0
4,lynn,night,


Complete missing values

In [28]:
implicit_to_explicit_df.complete(
    "name",
    "time",
    fill_value=0
)

Unnamed: 0,name,time,value
0,lynn,morning,350.0
1,lynn,afternoon,310.0
2,lynn,night,0.0
3,zelda,morning,320.0
4,zelda,afternoon,0.0
5,zelda,night,0.0


In [29]:
implicit_to_explicit_df.complete(
    "name",
    "time",
    fill_value=0,
    explicit=False
)

Unnamed: 0,name,time,value
0,lynn,morning,350.0
1,lynn,afternoon,310.0
2,lynn,night,
3,zelda,morning,320.0
4,zelda,afternoon,0.0
5,zelda,night,0.0
