# Elliptic TGAN

Generation of samples with **TGAN** from Elliptic dataset.

### 1. Load the data

The first step is to load the data wich we will use to fit TGAN. In order to do so, we will first
import the function `tgan.data.load_data` and call it with the name the dataset that we want to load.



1. `data` will contain a `pandas.DataFrame` with the table of data from the `Elliptic` dataset ready to be used to fit the model.

2. `continous_columns` will contain a `list` with the indices of continuous columns.

In [1]:
import pandas as pd

illicit = pd.read_csv("illicit.csv")

In [2]:
continuous_columns = illicit.columns.tolist()

### 2. Create a TGAN instance

The next step is to import TGAN and create an instance of the model.

To do so, we need to import the `tgan.model.TGANModel` class and call it.

This will create a TGAN instance with the default parameters.


In [3]:
from tgan.model import TGANModel

tgan = TGANModel(
    continuous_columns,
    output='output',
    gpu=None,
    max_epoch=3,
    steps_per_epoch=1000,
    save_checkpoints=True,
    restore_session=True,
    batch_size=200,
    z_dim=200,
    noise=0.2,
    l2norm=0.00001,
    learning_rate=0.001,
    num_gen_rnn=100,
    num_gen_feature=100,
    num_dis_layers=1,
    num_dis_hidden=100,
    optimizer='AdamOptimizer'
)






### 3. Fit the model

The third step is to pass the data that we have loaded previously to the `TGANModel.fit` method to
start the fitting.

This process will not return anything, however, the progress of the fitting will be printed into screen.

**NOTE**. Depending on the performance of the system you are running, and the parameters selected
for the model, this step can take up to a few hours.


In [4]:
tgan.fit(illicit)




[32m[0612 01:34:20 @input_source.py:222][0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...








Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
[32m[0612 01:34:20 @registry.py:126][0m gen/LSTM/00/FC input: [200, 100]

Instructions for updating:
Please use `layer.__call__` method instead.
[32m[0612 01:34:20 @registry.py:134][0m gen/LSTM/00/FC output: [200, 100]
[32m[0612 01:34:20 @registry.py:126][0m gen/LSTM/00/FC2 input: [200, 100]
[32m[0612 01:34:20 @registry.py:134][0m gen/LSTM/00/FC2 output: [200, 4470]
[32m[0612 01:34:20 @registry.py:126][0m gen/LSTM/00/FC3 input: [200, 4470]
[32m[0612 01:34:20 @registry.py:134][0m gen/LSTM/00/FC3 output: [200, 100]
[32m[0612 01:34:20 @r

100%|#########9|999/1000[1:58:32<00:07, 0.14it/s]




100%|##########|1000/1000[1:59:19<00:00, 0.14it/s]

[32m[0612 03:35:35 @base.py:285][0m Epoch 1 (global_step 1000) finished, time:1 hour 59 minutes 19 seconds.






[32m[0612 03:35:41 @saver.py:79][0m Model saved to output\model\model-1000.
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/discrim/accuracy_fake: 0.85
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/discrim/accuracy_real: 0.305
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/discrim/loss: 0.64566
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/gen/final-g-loss: 1.4156
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/gen/klloss: 0.22883
[32m[0612 03:35:41 @monitor.py:467][0m GAN_loss/gen/loss: 1.1867
[32m[0612 03:35:41 @monitor.py:467][0m QueueInput/queue_size: 50
[32m[0612 03:35:41 @group.py:48][0m Callbacks took 5.635 sec in total. ModelSaver: 5.62 seconds
[32m[0612 03:35:41 @base.py:275][0m Start Epoch 2 ...


100%|##########|1000/1000[2:32:16<00:00, 0.11it/s] 

[32m[0612 06:07:58 @base.py:285][0m Epoch 2 (global_step 2000) finished, time:2 hours 32 minutes 16 seconds.





[32m[0612 06:07:59 @saver.py:79][0m Model saved to output\model\model-2000.
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/discrim/accuracy_fake: 0.87
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/discrim/accuracy_real: 0.34
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/discrim/loss: 0.62551
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/gen/final-g-loss: 1.4286
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/gen/klloss: 0.27676
[32m[0612 06:07:59 @monitor.py:467][0m GAN_loss/gen/loss: 1.1519
[32m[0612 06:07:59 @monitor.py:467][0m QueueInput/queue_size: 50
[32m[0612 06:07:59 @base.py:275][0m Start Epoch 3 ...


100%|##########|1000/1000[2:32:49<00:00, 0.11it/s] 

[32m[0612 08:40:48 @base.py:285][0m Epoch 3 (global_step 3000) finished, time:2 hours 32 minutes 49 seconds.





[32m[0612 08:40:49 @saver.py:79][0m Model saved to output\model\model-3000.
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/discrim/accuracy_fake: 0.845
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/discrim/accuracy_real: 0.37
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/discrim/loss: 0.63806
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/gen/final-g-loss: 1.3634
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/gen/klloss: 0.25768
[32m[0612 08:40:49 @monitor.py:467][0m GAN_loss/gen/loss: 1.1058
[32m[0612 08:40:49 @monitor.py:467][0m QueueInput/queue_size: 50
[32m[0612 08:40:49 @base.py:289][0m Training has finished!

[32m[0612 08:40:57 @input_source.py:178][0m EnqueueThread QueueInput/input_queue Exited.

[32m[0612 08:41:06 @collection.py:146][0m New collections created in tower : tf.GraphKeys.REGULARIZATION_LOSSES
[32m[0612 08:41:06 @collection.py:165][0m These collections were modified but restored in : (tf.GraphKeys.SUMMARIES: 0->2)


[32m[0612 08:41:06

### 5. Save and Load a model

In the steps above we saw that the fitting process is slow, so we probably would like to avoid having to fit every we want to generate samples. Instead we can fit a model once, save it, and load it every time we want to sample new data.

If we have a fitted model, we can save it by calling the `TGANModel.save` method, that only takes
as argument the path to store the model into. Similarly, the `TGANModel.load` allows to load a model stored on disk by passing as argument a path where the model is stored.

At this point we could use this model instance to generate more samples.

In [5]:
model_path = 'TGAN_model/my_model_5k_steps'

tgan.save(model_path)

[32m[0612 09:08:54 @model.py:813][0m Model saved successfully.


In [None]:
#new_tgan = TGANModel.load(model_path)

### 4. Sample new data

After the model has been fit, we are ready to generate new samples by calling the `TGANModel.sample`
method passing it the desired amount of samples.

The returned object, `samples`, is a `pandas.DataFrame` containing a table of synthetic data with
the same format as the input data.

In [6]:
num_samples = 24947 #Same amount as natural illicit transactions

samples = tgan.sample(num_samples)

 62%|######1   |123/200[04:15<02:39, 0.48it/s]


In [35]:
samples.to_csv("synthetic_illicit_tx.csv", index=False)