# Elliptic TGAN

Generation of samples with **TGAN** from Elliptic dataset.

### 1. Load the data

The first step is to load the data wich we will use to fit TGAN. In order to do so, we will first
import the function `tgan.data.load_data` and call it with the name the dataset that we want to load.



1. `data` will contain a `pandas.DataFrame` with the table of data from the `Elliptic` dataset ready to be used to fit the model.

2. `continous_columns` will contain a `list` with the indices of continuous columns.

In [1]:
import pandas as pd

illicit = pd.read_csv("illicit.csv") #CSV that contains illicit transactions from Elliptic data set (4545 samples)

In [2]:
continuous_columns = illicit.columns.tolist()

### 2. Create a TGAN instance

The next step is to import TGAN and create an instance of the model.

To do so, we need to import the `tgan.model.TGANModel` class and call it.

This will create a TGAN instance with the default parameters.


In [None]:
from tgan.model import TGANModel

tgan = TGANModel(
    continuous_columns,
    output='output',
    gpu=None,
    max_epoch=3,
    steps_per_epoch=1000,
    save_checkpoints=True,
    restore_session=True,
    batch_size=200,
    z_dim=200,
    noise=0.2,
    l2norm=0.00001,
    learning_rate=0.001,
    num_gen_rnn=100,
    num_gen_feature=100,
    num_dis_layers=1,
    num_dis_hidden=100,
    optimizer='AdamOptimizer'
)

### 3. Fit the model

The third step is to pass the data that we have loaded previously to the `TGANModel.fit` method to
start the fitting.

This process will not return anything, however, the progress of the fitting will be printed into screen.

**NOTE**. Depending on the performance of the system you are running, and the parameters selected
for the model, this step can take up to a few hours.


In [None]:
tgan.fit(illicit)

### 5. Save and Load a model

In the steps above we saw that the fitting process is slow, so we probably would like to avoid having to fit every we want to generate samples. Instead we can fit a model once, save it, and load it every time we want to sample new data.

If we have a fitted model, we can save it by calling the `TGANModel.save` method, that only takes
as argument the path to store the model into. Similarly, the `TGANModel.load` allows to load a model stored on disk by passing as argument a path where the model is stored.

At this point we could use this model instance to generate more samples.

In [None]:
model_path = 'TGAN_model/my_model_5k_steps'

tgan.save(model_path)

In [None]:
#new_tgan = TGANModel.load(model_path)

### 4. Sample new data

After the model has been fit, we are ready to generate new samples by calling the `TGANModel.sample`
method passing it the desired amount of samples.

The returned object, `samples`, is a `pandas.DataFrame` containing a table of synthetic data with
the same format as the input data.

In [None]:
num_samples = 24947 #Same amount as natural illicit transactions

samples = tgan.sample(num_samples)

In [35]:
samples.to_csv("synthetic_illicit_tx.csv", index=False)