Skip to content

4. Interpreting results

Breon Schmidt edited this page Jan 21, 2021 · 1 revision

ALLSorts outputs, currently, four key results:

Visual Outputs

distributions.png

Probabilities distribution of samples per subtype. Black dots are samples that are the negative label for that subtype, red is the positive. The green lines are probability thresholds which are calculated through cross validation based on F1 score or maximal distance between highest negative label and lowest positive label.

waterfalls.png

Waterfall plot (made that up!) of samples. The X-axis depicts the predicted class, the Y axis is the probability of belonging to a subtype. The colours within the plot represent the true label, white represents previously unknown samples. Note: This does not show subtypes that have multiple labels associated.

Text Outputs

predictions.csv

Simply a list of predictions made by ALLSorts.

probabilities.csv

A matrix of subtypes (columns) vs. samples (rows), with each value being the probability of the subtype. The final column is the prediction made (predictions + truth, is labels are supplied). There are two things to note here.

  1. The probabilities need not sum to 1. This will be explained in a methods section (to-do).
  2. Child subtypes of parents (Ph/Ph-like are children of Ph Group) have had their probabilities multiplied, i.e. Ph Group probability * Original Ph-like/Ph.