# KnockoffTable
(`knockoff.sdk.table:KnockoffTable`)

This is the primary class for configuring how knockoff populates a table. 

* [Factories](#Factories)
    * [ColumnFactory](#ColumnFactory) (`knockoff.sdk.factory.column:ColumnFactory`)
    * [CollectionsFactory](#CollectionsFactory) ('knockoff.sdk.factory.collections:CollectionsFactory')
    * [KnockoffDataFrameFactory](#KnockoffDataFrameFactory) (`knockoff.sdk.factory.collections:KnockoffDataFrameFactory`) 
        * randomly sample input DataFrame rows
        * cycling through input DataFrame rows
    * [KnockoffTableFactory](#KnockoffTableFactory) (`knockoff.sdk.factory.collections:KnockoffTableFactory`)
* [KnockoffUniqueConstraint](#KnockoffUniqueConstraint)
* [Autoloading](#Autoloading)


### <a name="Factories"></a> Factories

A list of `factories` are provided in the __init__ of KnockoffTable that will be used at build time to generate rows (the number of rows generated is declared with the `size` parameter). Calls to the factories are made to generate data for each row.



#### <a name="ColumnFactory"></a>ColumnFactory

In [1]:
import random
from knockoff.sdk.factory.column import ColumnFactory

# create a factory that generates a random value for "some_column" using the random.random function
factory = ColumnFactory('some_column', random.random)
for i in range(5):
    print(f"call[{i}]: {factory()}")

call[0]: {'some_column': 0.5000619168058782}
call[1]: {'some_column': 0.9885946890702473}
call[2]: {'some_column': 0.41250444122739394}
call[3]: {'some_column': 0.4462999921807126}
call[4]: {'some_column': 0.06017712225706928}


#### <a name="CollectionsFactory"></a>CollectionsFactory

In [2]:
from knockoff.sdk.factory.collections import CollectionsFactory

# this can be any callable that returns a dictionary
def func():
    return {'col1': random.randint(0,10),
            'col2': random.random()}

factory = CollectionsFactory(func)
for i in range(5):
    print(f"call[{i}]: {factory()}")

call[0]: {'col1': 4, 'col2': 0.5771082134122193}
call[1]: {'col1': 10, 'col2': 0.9443929581234238}
call[2]: {'col1': 9, 'col2': 0.5833487911234037}
call[3]: {'col1': 10, 'col2': 0.16599398285529154}
call[4]: {'col1': 5, 'col2': 0.3534517643813895}


#### <a name="KnockoffDataFrameFactory"></a>KnockoffDataFrameFactory

This class takes a DataFrame as input to use to generate rows. The default behavior is to **randomly sample input DataFrame rows** for each call to the factory.

In [3]:
import pandas as pd
from knockoff.sdk.factory.collections import KnockoffDataFrameFactory

# Create an input DataFrame
df = pd.DataFrame({letter: [f"{letter}{i}" for i in range(5)] for letter in ['a', 'b' ,'c']})
display(df)

factory = KnockoffDataFrameFactory(df)
for i in range(5):
    print(f"call[{i}]: {factory()}")

Unnamed: 0,a,b,c
0,a0,b0,c0
1,a1,b1,c1
2,a2,b2,c2
3,a3,b3,c3
4,a4,b4,c4


call[0]: {'a': 'a1', 'b': 'b1', 'c': 'c1'}
call[1]: {'a': 'a2', 'b': 'b2', 'c': 'c2'}
call[2]: {'a': 'a2', 'b': 'b2', 'c': 'c2'}
call[3]: {'a': 'a1', 'b': 'b1', 'c': 'c1'}
call[4]: {'a': 'a2', 'b': 'b2', 'c': 'c2'}


The `next_strategy_factory` or `next_strategy_callable` parameters can be used to change the behavior of the sampling. E.g. the `cycle_df_factory` can be used to cycle **through input DataFrame rows** instead of randomly sampling.

In [4]:
from knockoff.sdk.factory.next_strategy.df import cycle_df_factory
factory = KnockoffDataFrameFactory(df, next_strategy_factory=cycle_df_factory)
for i in range(10):
    print(f"call[{i}]: {factory()}")

call[0]: {'a': 'a0', 'b': 'b0', 'c': 'c0'}
call[1]: {'a': 'a1', 'b': 'b1', 'c': 'c1'}
call[2]: {'a': 'a2', 'b': 'b2', 'c': 'c2'}
call[3]: {'a': 'a3', 'b': 'b3', 'c': 'c3'}
call[4]: {'a': 'a4', 'b': 'b4', 'c': 'c4'}
call[5]: {'a': 'a0', 'b': 'b0', 'c': 'c0'}
call[6]: {'a': 'a1', 'b': 'b1', 'c': 'c1'}
call[7]: {'a': 'a2', 'b': 'b2', 'c': 'c2'}
call[8]: {'a': 'a3', 'b': 'b3', 'c': 'c3'}
call[9]: {'a': 'a4', 'b': 'b4', 'c': 'c4'}


#### <a name="KnockoffTableFactory"></a>KnockoffTableFactory
The KnockoffTableFactory behaves very similar to the KnockoffDataFrameFactory except that it takes another KnockoffTable as input instead of a KnockoffDataFrame. When this factory is used, we must declare this dependency when providing the KnockoffTable to the KnockoffDB.


#### <a name="KnockoffTable-Example"></a> KnockoffTable Example

 `columns` need to be provided to the __init__ to determine which fields will be used as columns (factories can generate unused fields). `columns` do not need to be provided if the `autoload` flag is set to `True` where the columns will be reflected from the database table.

In [5]:
from knockoff.sdk.table import KnockoffTable
from knockoff.sdk.factory.column import ColumnFactory, ChoiceFactory, FakerFactory


table = KnockoffTable(
    "person",
    columns=["name", "address", "gender", "age"],
    factories=[
        ColumnFactory("name", FakerFactory("name")),
        ColumnFactory("address", FakerFactory("address")),
        ColumnFactory("gender", ChoiceFactory(["male", "female"])),
        ColumnFactory("age", FakerFactory("pyint", min_value=0, max_value=100)),
    ],
    size=10,
)

display(table.build())

Unnamed: 0,name,address,gender,age
0,Joshua Taylor,"935 Anderson Lane\nAguilarmouth, NH 83805",male,62
1,Andrew Smith,"PSC 3613, Box 7286\nAPO AP 82138",female,47
2,Melissa Schroeder,"23917 Mcdonald Path Apt. 242\nWest Mark, ID 73045",male,6
3,Michael Anderson,8132 Horton Avenue Suite 634\nLake Melissaland...,male,75
4,Nancy Johnson,"207 Juan Islands Suite 189\nJaniceside, CT 33016",female,45
5,Amy Rosales,"15030 Beard Club Apt. 873\nPort Matthewbury, O...",male,24
6,Dawn Lynch,"201 Chad Valleys Suite 416\nEast Dawn, WY 39184",female,92
7,Jenna Davis,"62813 Kimberly Meadows\nWest Heatherchester, T...",female,35
8,Bobby Diaz,"8555 Watkins Brooks Apt. 607\nSharonmouth, OH ...",male,65
9,Matthew Jones MD,"57744 Angelica Ramp\nBrianashire, WV 41784",female,66


If multiple factories provide the same column, the KnockoffTable will apply those in the order provided to the `factories` parameter. I.e. factories towards the end of the list will take precendence. ColumnFactory and CollectionsFactory can declare a dependency on another column which will be provided to the factory as a kwarg. The factory must be provided after the factory that generates the column it depends on in order to do so. Please see the below example as a reference.

In [6]:

def split_address(address):
    street_address, other = address.split('\n')
    return {
        "address": street_address,
        "other_address": other
    }
    
table = KnockoffTable(
    "person",
    columns=["name", "address", "gender", "age", "other_address"],
    factories=[
        ColumnFactory("name", FakerFactory("name")),
        ColumnFactory("address", FakerFactory("address")),
        ColumnFactory("gender", ChoiceFactory(["male", "female"])),
        ColumnFactory("age", FakerFactory("pyint", min_value=0, max_value=100)),
        # this factory will take the address generated and split into columns
        # including a column that will replace the original address with just
        # the street address
        CollectionsFactory(split_address, depends_on=["address"])
    ],
    size=10,
)

display(table.build())

Unnamed: 0,name,address,gender,age,other_address
0,Heather Chung,9509 James Rapids,female,74,"Rebeccaborough, OR 73595"
1,Stephen Hayes,188 David Run Suite 795,male,49,"New Linda, WA 05108"
2,Thomas Huerta,481 Rivera Ford,female,26,"Lake Bradview, FL 11180"
3,Samuel Lawson,61652 Amy Road Suite 256,female,39,"Markview, WV 39420"
4,Richard Cummings,468 Michael Skyway,male,81,"Kaylafort, OR 85551"
5,Stephanie Clark,0885 Reeves Camp Suite 040,male,86,"Dianaland, CO 88542"
6,David Hahn,1123 Hernandez Corner,male,61,"South Toddport, KS 11955"
7,James Davis,674 Monica Dam,male,84,"Meganport, IA 84426"
8,Laura Hunter,2304 Marvin Inlet Suite 853,female,4,"Olsonchester, TX 69125"
9,Stephanie Gamble,129 Howard Knolls Suite 070,female,72,"Gonzalezfort, CT 76785"


###  <a name="KnockoffUniqueConstraint"></a> KnockoffUniqueConstraint

Constraints such as the KnockoffUniqueConstraint can be provided to the KnockoffTable to enforce when generating a row. Any generated row must satisfy the unique constraint, otherwise it will be rejected. If `attempt_limit` is reached for trying to generate a row that satisfies all constraints, an `AttemptLimitReached` Error will be thrown. The default `attempt_limit` is 1000000 if `None` is provided in the __init__ of KnockoffTable. 

In [7]:
from knockoff.sdk.constraints import KnockoffUniqueConstraint

table = KnockoffTable(
    "person",
    columns=["id", "name"],
    factories=[
        ColumnFactory("id", FakerFactory("pyint", min_value=1, max_value=5)),
        ColumnFactory("name", FakerFactory("name"))
    ],
    size=5,
    constraints=[KnockoffUniqueConstraint(['id'])]
)
display(table.build())

Unnamed: 0,id,name
0,4,Debbie Hill
1,3,Rebecca Bell
2,5,Michael Kelly
3,2,Christine Hernandez
4,1,Stephanie Wright


### <a name="Autoloading"></a> Autoloading

Using `autoload=True` in the __init__ of KnockoffTable will allow the schema including any unique constraints to be relfected automatically from the database. This setting requires the KnockoffTable instance to be prepared with a `KnockoffDatabaseService` which provides database operations that enables the autoloading. This is done via the KnockoffTable's `prepare` method which takes an optional database service which needs to be called prior to the `build` method. In most cases, the `prepare` and `build` method for the KnockoffTable will be called indirectly by a `KnockoffDB` instance.