**Note:**
<br>Set **SkipTraining = 1** to skip the GAN training code in this notebook and instead import images of training results.

<br>You can click on a code cell and hit **ctrl+enter** to run just that cell. Alternatively, you can rull all code cells by hitting **ctrl+F9**

In [0]:
SkipTraining = 0 # 1 = skip code, 0 = run code

# Generative Adversarial Networks: Example Implementation (Vanilla Structure)

##Import Required Python Libraries

The below code cell imports modules from several Python libraries which will be used for our analysis.

**Numpy** - an efficient library for array operations
<br>**Keras** - a high-level machine learning library that uses the Tensorflow backend (compuational advantages)
<br>**Matplotlib** - a plotting library to help with visualisations
<br>**SciPy** - a library for scientific computing. In our case we are just using it to quickly normalise some arrays.
<br>

In [0]:
from numpy import loadtxt
from numpy import hstack
from numpy import zeros
from numpy import ones
from numpy import append
import numpy as np
from numpy.random import rand
from numpy.random import randn
from numpy.random import seed
from numpy.random import shuffle
from keras.models import Sequential 
from keras.layers import Dense
from keras.optimizers import Adam
from keras.layers import LeakyReLU
from matplotlib import pyplot
from scipy import stats


##Create Simulated "Real" Data for our Example

For our analysis of insider trading data we want to prove that a GAN is an appropriate ML technique for anomaly detection in the contextual setting.

<br>To prove that GANs can be trained to learn representations of normal data and subsequently flag anomalies we will create some clean data for model training. Once the model is trained we can create another set of data which is from the same distribution and then randomly inject some anomalies. If the GAN performs as expected, it will be able to identify the anomalous data.

<br>For our simulated data, lets assume that there are three different variables of interest. We could imagine that they represent features such as Price Impulse, Trade Size and Relative Trade Volume (10 prior vs. 100 prior orders).

<br>We will also add a final column vector of ones which indicates that the observations represent "real" data

In [19]:
# Lets first set the same random seed to facilitate the reproducibility of our work
seed(0)

# Simulating some clean data

A = 0.95 + 6*rand(100000,1)/100 # Let's set the first variable as random uniformly distributed data between 0.95 and 1.01. This reflects that bids/asks are often placed close to the best price.
B = 100 + 20*randn(100000,1) # Let's set trade sizes as normally distributed with a mean trade size of 100 parcels and a high variability
C = 1 + randn(100000,1)/100 # Let's set relative trade volume as normally distributed around one. This reflects that there is no clear pattern in relative trade volume during normal trading.
D = ones((100000,1)) # 1 values to indicate real samples

SimData=np.concatenate((A,B,C,D), axis=1)

# Let's also inspect some of the simulated data
np.set_printoptions(precision=4)
np.set_printoptions(suppress=True)
print(SimData[0:10,:])

# Let's use the first 70% of our simulated data for GAN training and save the remainder for mixing with anomalies

MixedData = SimData[70000:,:] # Separate the last 3k observations for later use
MixedData[:,3]=0 # The Mixed Data is only used in the deployment phase. In the deployment phase 0 will indicate normal, 1 will indicate anomaly
SimData = SimData[0:70000,:] # Separate the first 7k observations for model training

SimData.shape

[[  0.9829 118.2301   0.9933   1.    ]
 [  0.9929 101.5908   0.9863   1.    ]
 [  0.9862  81.0139   0.985    1.    ]
 [  0.9827 100.114    0.9986   1.    ]
 [  0.9754 136.6368   0.9931   1.    ]
 [  0.9888 104.784    0.9998   1.    ]
 [  0.9763  94.0114   0.9993   1.    ]
 [  1.0035  97.509    1.006    1.    ]
 [  1.0078 106.0129   1.0013   1.    ]
 [  0.973   91.9012   1.0137   1.    ]]


(70000, 4)

##Inserting Anomalies into the Data Set

In [20]:

# Let's generate 20 anomalies to represent insider trading


E = 0.99 + 3*rand(20,1)/100 # Let's set the first variable as random uniformly distributed data between 0.99 and 1.02. This reflects that the insider trader has a sense of urgency in placing their orders.
F = 30 + 30*rand(20,1) # Let's set the second variable as random uniformly distributed data between 30 and 60. This reflects that the insider trader is attempting to place multiple smaller orders to avoid detection.
G = 0.95 + randn(20,1)/100 # Let's set relative trade volume as normally distributed around 0.95. This reflects that the insider's orders are impacting relative trade volume as they attempt to disguise their orders across multiple smaller trades.
H = ones((20,1)) # We will label these data points as the anomaly data points so we can check whether the model correctly identified

ITData=np.concatenate((E,F,G,H), axis=1)

# Let's also inspect some of the simulated anomaly data
np.set_printoptions(precision=4)
np.set_printoptions(suppress=True)
print("Sample IT Data")
print(ITData[0:10,:])

# Let's insert the anomalies into our mixed data and then shuffle the data
MixedData=np.concatenate((MixedData,ITData), axis=0)
shuffle(MixedData)

print("Sample Mixed Data")
print(MixedData[0:10,:])


Sample IT Data
[[ 1.0126 37.3985  0.9592  1.    ]
 [ 1.0101 59.5441  0.9414  1.    ]
 [ 1.0129 59.9846  0.9358  1.    ]
 [ 1.0088 53.5655  0.9614  1.    ]
 [ 0.9933 34.3073  0.9403  1.    ]
 [ 0.9978 32.3941  0.9579  1.    ]
 [ 0.9939 53.5999  0.9669  1.    ]
 [ 0.9943 38.9223  0.9503  1.    ]
 [ 1.0106 35.5868  0.9628  1.    ]
 [ 1.0076 57.9353  0.9439  1.    ]]
Sample Mixed Data
[[  0.9614  99.1899   1.011    0.    ]
 [  0.968  118.891    0.9908   0.    ]
 [  0.9961  71.2588   1.0127   0.    ]
 [  1.0062 130.8594   1.0148   0.    ]
 [  1.0087  97.3095   1.0147   0.    ]
 [  0.9708  61.2612   0.9951   0.    ]
 [  0.9655 123.899    0.9948   0.    ]
 [  1.0085 101.5095   0.9879   0.    ]
 [  0.9784  74.8265   0.9981   0.    ]
 [  0.9959  77.0031   1.0093   0.    ]]


## Normalise Data

Normalising each of the variables will facilitate model training.

In [21]:
# Temporarily append data to calculate z-score across
AllData = np.concatenate((SimData,MixedData),axis=0)
NormAllData = stats.zscore(AllData[:,0:3], axis=0)
NormAllData = np.concatenate((NormAllData,AllData[:,3].reshape(-1,1)), axis=1) # leave final column as 1/0 values


#NormSimData = stats.zscore(SimData[:,0:3], axis=0) # calculate z-scores of each observation relative to the rest of the column
#NormSimData=np.concatenate((NormSimData,SimData[:,3].reshape(-1,1)), axis=1) # leave final column as 1/0 values
NormSimData = NormAllData[0:70000,:]

#NormMixedData = stats.zscore(MixedData[:,0:3], axis=0) # calculate z-scores of each observation relative to the rest of the column
#NormMixedData=np.concatenate((NormMixedData,MixedData[:,3].reshape(-1,1)), axis=1) # leave final column as 1/0 values
NormMixedData = NormAllData[70000:,:]


# Let's check a sample of the normalised data
np.set_printoptions(precision=4)
np.set_printoptions(suppress=True)
print("Normalised Simulated 'Real' Data")
print(NormSimData[0:10,:])
print("")
print("Normalised Mixed Data")
print(NormMixedData[0:10,:])
print("")
print("ITs in the Mixed Data")
print(NormMixedData[NormMixedData[:,3]==1])


Normalised Simulated 'Real' Data
[[ 0.1703  0.9028 -0.6624  1.    ]
 [ 0.7458  0.0697 -1.3662  1.    ]
 [ 0.3569 -0.9604 -1.4951  1.    ]
 [ 0.1567 -0.0042 -0.1335  1.    ]
 [-0.2627  1.8243 -0.6815  1.    ]
 [ 0.5061  0.2296 -0.0135  1.    ]
 [-0.2145 -0.3097 -0.0636  1.    ]
 [ 1.3567 -0.1346  0.6014  1.    ]
 [ 1.6054  0.2911  0.1351  1.    ]
 [-0.4018 -0.4154  1.373   1.    ]]

Normalised Mixed Data
[[-1.0696 -0.0505  1.1075  0.    ]
 [-0.69    0.9359 -0.9113  0.    ]
 [ 0.931  -1.4488  1.268   0.    ]
 [ 1.5132  1.5351  1.4819  0.    ]
 [ 1.6541 -0.1446  1.47    0.    ]
 [-0.5298 -1.9494 -0.4865  0.    ]
 [-0.8342  1.1866 -0.5099  0.    ]
 [ 1.6437  0.0657 -1.2038  0.    ]
 [-0.091  -1.2702 -0.184   0.    ]
 [ 0.9185 -1.1612  0.9327  0.    ]]

ITs in the Mixed Data
[[ 1.8988 -2.0133 -6.4075  1.    ]
 [ 0.8004 -2.3329 -3.3019  1.    ]
 [ 1.8345 -2.7904 -3.3928  1.    ]
 [ 0.8285 -3.0678 -4.9564  1.    ]
 [ 1.6873 -3.1733 -4.9445  1.    ]
 [ 1.7389 -2.0353 -5.8463  1.    ]
 [ 1.586 

##Define our Discriminator Model

The below code defines a function for our discriminative model within the GAN framework. For this example we are using a standard neural network **(NN)** with 3 input nodes, 2 hidden layers of 50 nodes, and a single node output layer (real/fake predictions).

<br>The code has been implemented within Keras, which enables a high level of customisation for the NN layers. As a result, we could choose to have non-uniform hidden layers and develop far more sophisticated versions of the NN used for the discriminative model.

<br>***Typical Keras ML Model Structure***
<br>1. Define the model as sequential or ___
<br>2. 

In [0]:
def define_discriminator(n_inputs,Learn_Rate):
    model = Sequential()
    model.add(Dense(50, activation='relu', kernel_initializer='he_uniform', input_dim=n_inputs))
    model.add(LeakyReLU(0.2))
    model.add(Dense(50, activation='relu'))
    model.add(LeakyReLU(0.2))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    opt = Adam(lr=Learn_Rate) # I reduced the learning rate to improve training
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

##Define our Generative Model

In [0]:
def define_generator(latent_dim, n_outputs):
    model = Sequential()
    model.add(Dense(50, activation='relu', kernel_initializer='he_uniform', input_dim=latent_dim))
    model.add(LeakyReLU(0.2))
    model.add(Dense(50, activation='relu'))
    model.add(LeakyReLU(0.2))
    model.add(Dense(n_outputs, activation='linear'))
    return model

##Define our Consolidated GAN Structure

In [0]:
def define_gan(generator, discriminator,Learn_Rate):
  # make weights in the discriminator not trainable
  discriminator.trainable = False
  # connect them
  model = Sequential()
  # add generator
  model.add(generator)
  # add the discriminator
  model.add(discriminator)
  # compile model
  opt = Adam(lr=Learn_Rate) # I reduced the learning rate to improve training
  model.compile(loss='binary_crossentropy', optimizer=opt)
  return model

##Define Function to Generate Latent Data Points

In [0]:
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n=10000):
	# generate points in the latent space
	x_input = rand(latent_dim * n)# testing uniform instead. random normal is typically advised
	# reshape into a batch of inputs for the network
	x_input = x_input.reshape(n, latent_dim)
	return x_input

##Instruct Generator to Generate New Samples Based on Latent Data Feed

In [0]:
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(generator, latent_dim, n=10000):
	# generate points in latent space
	x_input = generate_latent_points(latent_dim, n)
	# predict outputs
	X = generator.predict(x_input)
	# create class labels
	y = zeros((n, 1))
	return X, y

##Define Performance Evaluation Function

In [0]:
def summarize_performance(epoch, generator, discriminator, real_data, latent_dim, n=100):
    # prepare real samples
    NumVars = real_data.shape[1]
    x_real, y_real = real_data[:,0:NumVars], real_data[:,-1] 
    # evaluate discriminator on real examples
    _, acc_real = discriminator.evaluate(x_real, y_real, verbose=0)
    # prepare fake examples
    x_fake, y_fake = generate_fake_samples(generator, latent_dim, n)
    # evaluate discriminator on fake examples
    _, acc_fake = discriminator.evaluate(x_fake, y_fake, verbose=0)
    # summarize discriminator performance
    print("Epoch: ", epoch," D_real_acc: ", acc_real, " D_fake_acc: ", acc_fake)
    # scatter plot real and fake data points
    pyplot.figure()
    pyplot.title("Var1 vs. Var2")
    pyplot.scatter(x_real[:, 0], x_real[:, 1], color='red')
    pyplot.scatter(x_fake[:, 0], x_fake[:, 1], color='blue')
    pyplot.show()
    
    pyplot.figure()
    pyplot.title("Var1 vs. Var3")
    pyplot.scatter(x_real[:, 0], x_real[:, 2], color='red')
    pyplot.scatter(x_fake[:, 0], x_fake[:, 2], color='blue')
    pyplot.show()
    
    pyplot.figure()
    pyplot.title("Var2 vs. Var3")
    pyplot.scatter(x_real[:, 1], x_real[:, 2], color='red')
    pyplot.scatter(x_fake[:, 1], x_fake[:, 2], color='blue')
    pyplot.show()

    print("")
    print("")
    print("")
    print("")
    


##Define GAN Training Function

In [0]:
def train(g_model, d_model, gan_model, real_data, latent_dim, n_epochs=10000, n_batch=32, n_eval=2000):
  # determine half the size of one batch, for updating the discriminator
  half_batch = int(n_batch / 2) 
  Cum_real = []
  Cum_fake = []
  Cum_epoch = []
  # manually enumerate epochs
  for i in range(n_epochs):
    # prepare real samples
    NumVars = real_data.shape[1]
    x_real, y_real = real_data[:,0:NumVars], real_data[:,-1]
    # prepare fake examples
    x_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
    # update discriminator
    d_model.train_on_batch(x_real, y_real)
    d_model.train_on_batch(x_fake, y_fake)
    # prepare points in latent space as input for the generator
    x_gan = generate_latent_points(latent_dim, n_batch)
    # create inverted labels for the fake samples
    y_gan = ones((n_batch, 1))
    # update the generator via the discriminator's error
    gan_model.train_on_batch(x_gan, y_gan)
    # evaluate the model every n_eval epochs
    if (i+1) % n_eval == 0 or i==0:
      summarize_performance(i+1, g_model, d_model, real_data, latent_dim)
      _, acc_real = discriminator.evaluate(x_real, y_real, verbose=0)
      _, acc_fake = discriminator.evaluate(x_fake, y_fake, verbose=0)
      Cum_real = append(Cum_real,acc_real)
      Cum_fake = append(Cum_fake,acc_fake)
      Cum_epoch = append(Cum_epoch,(i+1))
      # Plot historical training performance
      pyplot.figure()
      pyplot.title("Historical Training Performance")
      pyplot.plot(Cum_epoch, Cum_real,'r')
      pyplot.plot(Cum_epoch, Cum_fake,'b')
      pyplot.show()



##Run our Functions and Evaluate Performance

A perfectly trained GAN would show a discriminator accuracy of approximately 50% for both real predictions and fake predictions during training. This shows that the generative model is producing perfectly indistinguishable synthetic observations. Conversely, the discriminator can also not make any further improvements - the generative model and discriminative model have arrived at a Nash Equilibrium.

<br>There are a few parameters that we can play around with for the training process such as *epochs* and *batch size*.

<br>**learning rate** - The learning rate controls how quickly the model is adapted to the problem. 
<br>**epochs** - defines the number times that the learning algorithm will work through the entire training dataset.
<br>**batch size** - refers to the number of samples processed before the model is updated.

<br>***Choosing the learning rate***
<br>Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.

<br>A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck. Setting an appropriate learning rate for a GAN is typically a process of trial and error. 

<br>***Background on setting batch sizes***
<br>The three common approaches are outlined below:
<br>**Batch Gradient Descent**: Batch Size = Size of Training Set
<br>**Stochastic Gradient Descent**: Batch Size = 1
<br>**Mini-Batch Gradient Descent**: 1 < Batch Size < Size of Training Set

<br>Mini-Batch Gradient Descent is the most common approach in deep learning and is therefore what we have used for our example. The key benefits and downsides of this approach are summarised below.

<br>**Benefits**
* The model update frequency is higher than batch gradient descent which allows for a more robust convergence, avoiding local minima.
* The batched updates provide a computationally more efficient process than stochastic gradient descent.
* The batching allows both the efficiency of not having all training data in memory and algorithm implementations.

<br>**Downside**
* Error information must be accumulated across mini-batches of training examples like batch gradient descent.

<br>***So what is the optimal batch size?***
<br>Batch sizes are often tuned to an aspect of the computational architecture on which the implementation is being executed. Such as a power of two that fits the memory requirements of the GPU or CPU hardware like 32, 64, 128, 256, etc.
<br>A batch size of 32 appears to be most strongly supported by machine learning literature:
* "Practical recommendations for gradient-based training of deep architectures, 2012"
* "Revisiting Small Batch Training for Deep Neural Networks, 2018."
<br>

In [29]:
epochs = 30000
batches = 64
AdamlearnRate = 0.001 # default is 0.001, 0.0005 works ok
eval_increments = 100
real_data = NormSimData


# size of the latent space
latent_dim = 3 # doesn't need to be 3. could be any size. chose 3 to match output dimensions

Inp_Outs = NormSimData.shape[1]

# create the discriminator
discriminator = define_discriminator(Inp_Outs,AdamlearnRate)

# create the generator
generator = define_generator(latent_dim,Inp_Outs)

# create the gan
gan_model = define_gan(generator, discriminator,AdamlearnRate)

# train model
train(generator, discriminator, gan_model, real_data, latent_dim,epochs,batches,eval_increments)



Output hidden; open in https://colab.research.google.com to view.

In [30]:

NumVars = NormMixedData.shape[1]
NormMixedData_Norm = NormMixedData[NormMixedData[:,3]==0]
NormMixedData_Anom = NormMixedData[NormMixedData[:,3]==1]

x_mixed_Norm, y_mixed_Norm = NormMixedData_Norm[:,0:NumVars], NormMixedData_Norm[:,-1] 
x_mixed_Anom, y_mixed_Anom = NormMixedData_Anom[:,0:NumVars], NormMixedData_Anom[:,-1] 

# evaluate discriminator on mixed (normal/anomaly) data sample

_, acc_normal = discriminator.evaluate(x_mixed_Norm, y_mixed_Norm)
_, acc_anomaly = discriminator.evaluate(x_mixed_Anom, y_mixed_Anom) 
print("D_Normal_acc: ", acc_normal)
print("D_IT_acc: ", acc_anomaly)


D_Normal_acc:  1.0
D_IT_acc:  0.949999988079071


##Concluding Comments

This notebook has provided an example of how to implement a vanilla GAN structure. The GAN structure has been able to identify anomalous patterns in previously unseen data. 

<br>Insider Trading is notoriously challenging to identify. Whilst a vanilla GAN of the form implemented in this notebook may have some success at identifying insider trading, it is likely that a more sophisticated GAN capable at analysing sequences will be more effective. Therefore, we can extend the framework above to a **LSTM-GAN** framework. By altering the NN in the GAN to LSTM-NN, the model will be able to look at sequences of trading, rather than stand-alone observations. This will be valuable in identifying situations where inside traders attempt to obscure their trades by placing sequences of smaller trades.