# Usage Example

In this notebook we will show the most basic usage of **TGAN** in order to generate samples from a
given dataset.

### 1. Load the data

The first step is to load the data wich we will use to fit TGAN. In order to do so, we will first
import the function `tgan.data.load_data` and call it with the name the dataset that we want to load.

In this case, we will load the `census` dataset, which we will use during the subsequent steps, and obtain two objects:

1. `data` will contain a `pandas.DataFrame` with the table of data from the `census` dataset ready to be used to fit the model.

2. `continous_columns` will contain a `list` with the indices of continuous columns.

In [1]:
import sys
sys.path

['C:\\Users\\Owner\\PycharmProjects\\AB Fraud Detection',
 'C:\\Users\\Owner\\Anaconda3\\python37.zip',
 'C:\\Users\\Owner\\Anaconda3\\DLLs',
 'C:\\Users\\Owner\\Anaconda3\\lib',
 'C:\\Users\\Owner\\Anaconda3',
 '',
 'C:\\Users\\Owner\\AppData\\Roaming\\Python\\Python37\\site-packages',
 'C:\\Users\\Owner\\Anaconda3\\lib\\site-packages',
 'C:\\Users\\Owner\\Anaconda3\\lib\\site-packages\\win32',
 'C:\\Users\\Owner\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\Users\\Owner\\Anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\Users\\Owner\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\Owner\\.ipython']

from tgan.data import load_demo_data

data, continuous_columns = load_demo_data('census')

data.head(3).T

In [1]:
import pandas as pd

In [2]:
Data = pd.read_csv('./Data/train.csv')

FileNotFoundError: [Errno 2] No such file or directory: './Data/train.csv'

In [3]:
X = Data.iloc[:, :-1]
y = Data.iloc[:, -1]

In [4]:
print(X.columns.values)

['No of Days Stayed' 'Net Amt' 'settlement_delay' 'age' 'Start_year'
 'Start_month' 'Start_week' 'Start_day' 'Start_dayofweek' 'Reported_year'
 'Reported_month' 'Reported_week' 'Reported_day' 'Reported_dayofweek'
 'Commencement_year' 'Commencement_month' 'Commencement_week'
 'Commencement_day' 'Commencement_dayofweek' 'Termination_year'
 'Termination_month' 'Termination_week' 'Termination_day'
 'Termination_dayofweek' 'Benefit Type_MEDICAL' 'Benefit Type_SURGICAL'
 'Claim Status_Cancelled / Rejected' 'Claim Status_Outstanding'
 'Claim Status_Paid' 'Claim Status_Pre-Auth Approved'
 'ClaimEvent_TPA_ID_51381' 'ClaimEvent_TPA_ID_59682'
 'Primary Diagnosis Code_-N' 'Primary Diagnosis Code_M1'
 'Primary Diagnosis Code_M2' 'Primary Diagnosis Code_M3'
 'Primary Diagnosis Code_M4' 'Primary Diagnosis Code_M5'
 'Primary Diagnosis Code_M6' 'Primary Diagnosis Code_M7'
 'Primary Diagnosis Code_M8' 'Primary Diagnosis Code_MC'
 'Primary Diagnosis Code_MG' 'Primary Diagnosis Code_MM'
 'Primary Diagnosi

In [5]:
continuous_columns = [0,1,2,3]

In [6]:
Data = X
Data['FraudFound'] = y

In [7]:
Data.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.012346,0.089984,0.366211,1.0,1.0,0.272727,0.27451,0.233333,0.333333,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1
1,0.00823,0.02134,0.445312,1.0,1.0,0.090909,0.098039,0.2,0.666667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
2,0.024691,0.03073,0.396484,1.0,1.0,0.0,0.039216,0.433333,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
3,0.012346,0.04268,0.359375,1.0,1.0,0.0,0.039216,0.566667,0.833333,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0
4,0.0,0.005691,0.456055,1.0,0.0,1.0,0.0,0.966667,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


### 2. Create a TGAN instance

The next step is to import TGAN and create an instance of the model.

To do so, we need to import the `tgan.model.TGANModel` class and call it.

This will create a TGAN instance with the default parameters.


In [None]:
import ctgan.model


In [None]:

tgan = TGANModel(
    continuous_columns,
    output='output',
    gpu=None,
    max_epoch=5,
    steps_per_epoch=100,
    save_checkpoints=True,
    restore_session=True,
    batch_size=100,
    z_dim=200,
    noise=0.2,
    l2norm=0.00001,
    learning_rate=0.001,
    num_gen_rnn=100,
    num_gen_feature=100,
    num_dis_layers=1,
    num_dis_hidden=100,
    optimizer='AdamOptimizer'
)

### 3. Fit the model

The third step is to pass the data that we have loaded previously to the `TGANModel.fit` method to
start the fitting.

This process will not return anything, however, the progress of the fitting will be printed into screen.

**NOTE** Depending on the performance of the system you are running, and the parameters selected
for the model, this step can take up to a few hours.


In [9]:
fraudulent = Data.loc[Data['FraudFound'] == 1]
non_fraudulent = Data.loc[Data['FraudFound'] == 0]

In [10]:
fraudulent.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.012346,0.089984,0.366211,1.0,1.0,0.272727,0.27451,0.233333,0.333333,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1
8,0.00823,0.04268,0.34375,0.0,1.0,0.272727,0.313725,0.666667,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
15,0.0,0.00626,0.327148,1.0,1.0,0.545455,0.568627,0.733333,0.5,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1
18,0.00823,0.0,0.34375,0.0,1.0,0.0,0.019608,0.2,0.166667,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
19,0.00823,0.0,0.34375,0.0,1.0,1.0,0.960784,0.233333,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1


In [11]:
non_fraudulent.shape

(252147, 99)

In [12]:
non_fraudulent.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
1,0.00823,0.02134,0.445312,1.0,1.0,0.090909,0.098039,0.2,0.666667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
2,0.024691,0.03073,0.396484,1.0,1.0,0.0,0.039216,0.433333,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
3,0.012346,0.04268,0.359375,1.0,1.0,0.0,0.039216,0.566667,0.833333,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0
4,0.0,0.005691,0.456055,1.0,0.0,1.0,0.0,0.966667,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
5,0.020576,0.025608,0.476562,0.0,0.0,0.909091,0.921569,0.866667,0.333333,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


In [13]:
fraudulent.shape

(34793, 99)

In [None]:
tgan.fit(fraudulent)

### 4. Sample new data

After the model has been fit, we are ready to generate new samples by calling the `TGANModel.sample`
method passing it the desired amount of samples.

The returned object, `samples`, is a `pandas.DataFrame` containing a table of synthetic data with
the same format as the input data and 1000 rows as we requested.

num_samples = 1000

samples = tgan.sample(num_samples)

samples.head(3)

### 5. Save and Load a model

In the steps above we saw that the fitting process is slow, so we probably would like to avoid having to fit every we want to generate samples. Instead we can fit a model once, save it, and load it every time we want to sample new data.

If we have a fitted model, we can save it by calling the `TGANModel.save` method, that only takes
as argument the path to store the model into. Similarly, the `TGANModel.load` allows to load a model stored on disk by passing as argument a path where the model is stored.

At this point we could use this model instance to generate more samples.

In [2]:
model_path = 'demo/my_model'

In [4]:
import tgan.load

ModuleNotFoundError: No module named 'tgan.load'

In [3]:
tgan.load(model_path)

AttributeError: module 'tgan' has no attribute 'load'

In [18]:
new_tgan = TGANModel.load(model_path)

[32m[0214 15:29:32 @collection.py:146][0m New collections created in tower : tf.GraphKeys.REGULARIZATION_LOSSES
[32m[0214 15:29:32 @collection.py:165][0m These collections were modified but restored in : (tf.GraphKeys.SUMMARIES: 0->2)
[32m[0214 15:29:32 @sessinit.py:87][0m [5m[31mWRN[0m The following variables are in the checkpoint, but not found in the graph: global_step, optimize/beta1_power, optimize/beta2_power
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[32m[0214 15:29:47 @sessinit.py:114][0m Restoring checkpoint from output\model\model-500 ...
INFO:tensorflow:Restoring parameters from output\model\model-500


In [19]:
fraudulent.shape[0]

34793

In [20]:
non_fraudulent.shape[0]

252147

In [21]:
num_samples = non_fraudulent.shape[0]-fraudulent.shape[0]
num_samples

217354

In [22]:
new_samples = new_tgan.sample(num_samples)
new_samples.head(3)

|                                                                                        |2172/?[29:50<00:00, 1.21it/s]


Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.007231,0.00714,0.331078,-0.00106,1.0,0.1818181818181818,0.9803921568627452,0.7,0.1666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
1,0.007554,-0.001812,0.339094,0.998393,1.0,1.0,0.9215686274509804,0.2,0.5,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,0.006446,-0.002573,0.328494,1.001566,1.0,0.4545454545454545,0.96078431372549,0.8333333333333334,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1


In [23]:
new_samples.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.007231,0.00714,0.331078,-0.00106,1.0,0.1818181818181818,0.9803921568627452,0.7,0.1666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
1,0.007554,-0.001812,0.339094,0.998393,1.0,1.0,0.9215686274509804,0.2,0.5,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,0.006446,-0.002573,0.328494,1.001566,1.0,0.4545454545454545,0.96078431372549,0.8333333333333334,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
3,0.023636,0.03929,0.328152,0.999979,1.0,0.0,0.7058823529411764,0.2333333333333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1
4,0.008888,-0.002754,0.338464,0.001374,1.0,0.6363636363636364,0.392156862745098,0.0,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1


In [24]:
fraudulent.columns = non_fraudulent.columns

In [25]:
fraudulent.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.012346,0.089984,0.366211,1.0,1.0,0.272727,0.27451,0.233333,0.333333,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1
8,0.00823,0.04268,0.34375,0.0,1.0,0.272727,0.313725,0.666667,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
15,0.0,0.00626,0.327148,1.0,1.0,0.545455,0.568627,0.733333,0.5,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1
18,0.00823,0.0,0.34375,0.0,1.0,0.0,0.019608,0.2,0.166667,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
19,0.00823,0.0,0.34375,0.0,1.0,1.0,0.960784,0.233333,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1


In [26]:
non_fraudulent.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
1,0.00823,0.02134,0.445312,1.0,1.0,0.090909,0.098039,0.2,0.666667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
2,0.024691,0.03073,0.396484,1.0,1.0,0.0,0.039216,0.433333,0.166667,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
3,0.012346,0.04268,0.359375,1.0,1.0,0.0,0.039216,0.566667,0.833333,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0
4,0.0,0.005691,0.456055,1.0,0.0,1.0,0.0,0.966667,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0
5,0.020576,0.025608,0.476562,0.0,0.0,0.909091,0.921569,0.866667,0.333333,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0


In [27]:
TGAN_samples = new_samples.append(fraudulent)
TGAN_samples = TGAN_samples.append(non_fraudulent)
TGAN_samples.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1,FraudFound
0,0.007231,0.00714,0.331078,-0.00106,1.0,0.1818181818181818,0.9803921568627452,0.7,0.1666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
1,0.007554,-0.001812,0.339094,0.998393,1.0,1.0,0.9215686274509804,0.2,0.5,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,0.006446,-0.002573,0.328494,1.001566,1.0,0.4545454545454545,0.96078431372549,0.8333333333333334,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1
3,0.023636,0.03929,0.328152,0.999979,1.0,0.0,0.7058823529411764,0.2333333333333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1
4,0.008888,-0.002754,0.338464,0.001374,1.0,0.6363636363636364,0.392156862745098,0.0,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1


In [28]:
TGAN_samples.shape

(504240, 99)

In [29]:
X_TGAN = TGAN_samples.iloc[:, :-1]
y_TGAN = pd.DataFrame(TGAN_samples.iloc[:, -1])

In [30]:
print(X_TGAN.shape,y_TGAN.shape)

(504240, 98) (504240, 1)


In [31]:
X_TGAN.head()

Unnamed: 0,No of Days Stayed,Net Amt,settlement_delay,age,Start_year,Start_month,Start_week,Start_day,Start_dayofweek,Reported_year,...,Residence Location_PATHANKOT,Residence Location_PATIALA,Residence Location_RUPNAGAR,Residence Location_S.A.S Nagar,Residence Location_SANGRUR,Residence Location_SRI MUKTSAR SAHIB,Residence Location_Shahid Bhagat Singh Nagar,Residence Location_Tarn Taran,Reject status more than 3 months_0,Reject status more than 3 months_1
0,0.007231,0.00714,0.331078,-0.00106,1.0,0.1818181818181818,0.9803921568627452,0.7,0.1666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.007554,-0.001812,0.339094,0.998393,1.0,1.0,0.9215686274509804,0.2,0.5,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.006446,-0.002573,0.328494,1.001566,1.0,0.4545454545454545,0.96078431372549,0.8333333333333334,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,0.023636,0.03929,0.328152,0.999979,1.0,0.0,0.7058823529411764,0.2333333333333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
4,0.008888,-0.002754,0.338464,0.001374,1.0,0.6363636363636364,0.392156862745098,0.0,0.6666666666666666,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [32]:
y_TGAN.head()

Unnamed: 0,FraudFound
0,1
1,1
2,1
3,1
4,1


In [33]:
X_TGAN.to_csv(r"./Data/TGAN/X_TGAN.csv", index=False)
y_TGAN.to_csv(r"./Data/TGAN/y_TGAN.csv", index=False)
TGAN_samples.to_csv(r"./Data/TGAN/TGAN.csv", index=False)

## Loading custom datasets

In the previous steps we used some demonstration data but we did not show how to load your own dataset.

In order to do so you can use `pandas.read_csv` by passing it the path to the CSV file that you want to load.

Additionally, you will need to create 0-indexed list of columns indices to be considered continuous.

For example, if we want to load a local CSV file, `path/to/my.csv`, that has as continuous columns their first 4 columns, that is, indices `[0,1,2,3]`, we would do it like this:

In [None]:
import pandas as pd

data = pd.read_csv('data/census.csv')

continuous_columns = [0,1,2,3]

## Model Parameters

If you want to change the default behavior of TGANModel, such as as different `batch_size` or
`num_epochs`, you can do so by passing different arguments when creating the instance. Have b

### Model general behavior

* continous_columns (`list[int]`, required): List of columns to be considered continuous.
* output (`str`, default=`output`): Path to store the model and its artifacts.
* gpu (`list[str]`, default=`[]`): Comma separated list of GPU(s) to use.

### Neural network definition and fitting

* max_epoch (`int`, default=`100`): Number of epochs to use during training.
* steps_per_epoch (`int`, default=`10000`): Number of steps to run on each epoch.
* save_checkpoints(`bool`, default=True): Whether or not to store checkpoints of the model after each training epoch.
* restore_session(`bool`, default=True): Whether or not continue training from the last checkpoint.
* batch_size (`int`, default=`200`): Size of the batch to feed the model at each step.
* z_dim (`int`, default=`100`): Number of dimensions in the noise input for the generator.
* noise (`float`, default=`0.2`): Upper bound to the gaussian noise added to categorical columns.
* l2norm (`float`, default=`0.00001`): L2 reguralization coefficient when computing losses.
* learning_rate (`float`, default=`0.001`): Learning rate for the optimizer.
* num_gen_rnn (`int`, default=`400`):
* num_gen_feature (`int`, default=`100`): Number of features of in the generator.
* num_dis_layers (`int`, default=`2`):
* num_dis_hidden (`int`, default=`200`):
* optimizer (`str`, default=`AdamOptimizer`): Name of the optimizer to use during `fit`, possible
  values are: [`GradientDescentOptimizer`, `AdamOptimizer`, `AdadeltaOptimizer`].

If we wanted to create an identical instance to the one created on step 2, but passing the arguments in a explicit way we will do something like this:

In [None]:
tgan = TGANModel(
    continuous_columns,
    output='output',
    gpu=None,
    max_epoch=5,
    steps_per_epoch=10000,
    save_checkpoints=True,
    restore_session=True,
    batch_size=200,
    z_dim=200,
    noise=0.2,
    l2norm=0.00001,
    learning_rate=0.001,
    num_gen_rnn=100,
    num_gen_feature=100,
    num_dis_layers=1,
    num_dis_hidden=100,
    optimizer='AdamOptimizer'
)

## Command-line interface

We include a command-line interface that allows users to access TGAN functionality. Currently only one action is supported.

### Random hyperparameter search

#### Input

To run random searchs for the best model hyperparameters for a given dataset, we will need:

* A dataset, in a csv file, without any missing value, only columns of type `bool`, `str`, `int` or
  `float` and only one type for column, as specified in [Data Format Input](#data-format-input).

* A JSON file containing the configuration for the search. This configuration shall contain:

  * `name`: Name of the experiment. A folder with this name will be created.
  * `num_random_search`: Number of iterations in hyper parameter search.
  * `train_csv`: Path to the csv file containing the dataset.
  * `continuous_cols`: List of column indices, starting at 0, to be considered continuous.
  * `epoch`: Number of epoches to train the model.
  * `steps_per_epoch`: Number of optimization steps in each epoch.
  * `sample_rows`: Number of rows to sample when evaluating the model.

You can see an example of such a json file in [examples/config.json](examples/config.json), which you
can download and use as a template.

#### Execution

Once we have prepared everything we can launch the random hyperparameter search with this command:

``` bash
tgan experiments config.json results.json
```

Where the first argument, `config.json`,  is the path to your configuration JSON, and the second,
`results.json`, is the path to store the summary of the execution.

This will run the random search, wich basically consist of the folling steps:

1. We fetch and split our data between test and train.
2. We randomly select the hyperparameters to test.
3. Then, for each hyperparameter combination, we train a TGAN model using the real training data T
   and generate a synthetic training dataset Tsynth.
4. We then train machine learning models on both the real and synthetic datasets.
5. We use these trained models on real test data and see how well they perform.

#### Output

One the experiment has finished, the following can be found:

* A JSON file, in the example above called `results.json`, containing a summary of the experiments.
  This JSON will contain a key for each experiment `name`, and on it, an array of length
  `num_random_search`, with the selected parameters and its evaluation score. For a configuration
  like the example, the summary will look like this:

``` python
{
    'census': [
        {
            "steps_per_epoch" : 10000,
            "num_gen_feature" : 300,
            "num_dis_hidden" : 300,
            "batch_size" : 100,
            "num_gen_rnn" : 400,
            "score" : 0.937802280415988,
            "max_epoch" : 5,
            "num_dis_layers" : 4,
            "learning_rate" : 0.0002,
            "z_dim" : 100,
            "noise" : 0.2
        },
        ... # 9 more nodes
    ]
}
```

* A set of folders, each one names after the `name` specified in the JSON configuration, contained
in the `experiments` folder. In each folder, sampled data and the models can be found. For a configuration
like the example, this will look like this:

```
experiments/
  census/
    data/       # Sampled data with each of the models in the random search.
    model_0/
      logs/     # Training logs
      model/    # Tensorflow model checkpoints
    model_1/    # 9 more folders, one for each model in the random search
    ...
```

## Citation

If you use TGAN, please cite the following work:

> Lei Xu, Kalyan Veeramachaneni. 2018. Synthesizing Tabular Data using Generative Adversarial Networks.

```LaTeX
@article{xu2018synthesizing,
  title={Synthesizing Tabular Data using Generative Adversarial Networks},
  author={Xu, Lei and Veeramachaneni, Kalyan},
  journal={arXiv preprint arXiv:1811.11264},
  year={2018}
}
```
You can find the original paper [here](https://arxiv.org/pdf/1811.11264.pdf)