Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 2.31 KB

signet_detector.md

File metadata and controls

16 lines (11 loc) · 2.31 KB

Signet Detector

image

The SigNet Detector artificial neural network (ANN) is in charge of distinguishing realistic-looking samples from unknown ones. Essentially, it is an Out-Of-Distribution detector to avoid running the reffitter on data too different than the one used to train it. Therefore, we perform a binary classification, flagging as 1 samples that look familiar, and 0 for those that do not look familiar using binary cross-entropy loss.

The network's architecture is a feed-forward ANN composed of standard linear layers with leaky ReLU activation, and the optimizer used is Adam. The input of the neural network is the 96-element mutation vector that contains the total number of mutations for each mutation category and which we decompose into two objects: the total number of mutations (the sum of all the elements in the mutation vector), and the normalized mutation vector (the same vector but dividing it by the total number of mutations). The input vector is normalized so that all the inputs from different samples have the same order of magnitude and the neural network can be trained with more ease. At the same time, it is important to keep the information on the number of mutations, since low numbers of mutations the samples will be very noisy and the model needs to learn how to overcome this extra layer of complexity. SigNet Detector's output is a value between 0 and 1 represents the probability of the input sample belonging to the familiar class.

The training set is composed of the same number of realistic-looking samples and random samples. The realistic-looking samples were generated by sampling from the real linear combination of signatures provided by SigProfiler in the PCAWG dataset. The random set was generated by (1) selecting a random number of signatures between 1 and 10, and (2) assigning uniform random weights to each, such that each signature has the same probability to be chosen. We generated samples ranging from 25 mutations to the order of $10^5$ mutations to ensure that the algorithm is robust to any real sample that the user can provide.