In [1]:
import numpy as np
import pandas as pd
import blindat as bd

Create a `pandas.DataFrame()` with four columns of random data:


In [None]:
# data params
COLUMNS = ["A", "B", "C", "D"]
NUM_ROWS = int(1e7)
DATA_SEED = 19421127

# generate data
np.random.seed(DATA_SEED)
data = np.random.rand(NUM_ROWS, len(COLUMNS))
df = pd.DataFrame(data, columns=COLUMNS)

df.head()

### `generate_rules()`

Rules are defined by a specification that describes which columns to blind using a linear transform: `blinded_data = scale * data + offset`.  

The `offset` and `scale` parameters can be fixed or randomly sampled from a given range.  Randomness ensures the transform is not known to the user.

The simplest way to specify rules is with a column name (or a list of names) and global ranges for `offset` and/or `scale`.

In [3]:
# list of columns with global offset and scale ranges
rules = bd.generate_rules("A", offset=(10.0, 20.0), random_seed=42)

### `inspect()`

You shouldn't be looking at the rule parameters.  But maybe you have a legit reason, in which case, use `inspect()`.

In [None]:
bd.inspect(rules)

It's not necessary to save the rules because they can be recreated by fixing the `random_seed`.  But if you really want to store them, consider using `dill` (regular pickling doesn't work with lambda functions).  Or you could save the output of `inspect()`.


###  `blind()`

In [None]:
# blind data
df1 = bd.blind(df, rules)
df1.head()