# Graboid documentation
## Classification
This module contains the functions used in KNN classification of sequence data

### Functions
* **calc_distance(seq1, seq2, dist_mat)** Returns the distance between *seq1* and *seq2* using *dist_mat*. <ins>NOTE: moved from *distance* delete that one</ins>
* **get_dists(query, data, dist_mat)** Returns the distance between *query* and the sequences in the *data* matrix using *dist_mat*
* **get_neighs(q, data, dist_mat)** Get *q*'s neighbours in *data* ordered by their proximity to *q*. Also returns the sorted distances, calculated using *dist_mat*
* **wknn(dists)** Returns the weighted support for the given distances (*dists*) using the WKNN equation
* **dwknn(dists)** Returns the weighted support for the given distances (*dists*) using the DWKNN equation
* **softmax(supports)** Calculates the softamx activation given an array of supports. <ins>NOTE: could implement this in the weighted classification</ins>
* **classify**
* **calibration_classify(q, k_rankge, data, tax_tab, dist_mat, q_name = 0)** Calibration of a single instance using a range of neighbours and all classification methods. Argument *q* is the query instance, *k_range* is the range of neighbours to utilize, *data* is the reference sequence matrix, *tax_tab* is the taxonomy table for the reference data, *dist_mat* is the distance matrix to be used in the classification, *q_name* is the numeric value of the query, used to organize results. Returns three arrays *maj_resutls*, *wknn_results*, and *dwknn_results* containing the classification results generated for each method
* **classify_majority(neighs, tax_tab, q_name = 0, total_k = 1)** Classify a query instance selecting for each rank the most represented taxon amongst the given *neighs*. Returns an array with columns *q_name*, *rank*, *taxon*, *max value*, *total_k*
* **classify_weighted(neighs, supports, tax_tab, q_name = 0, total_k = 1)** Classify a query instance using the weighted *supports* of the given *neighs*. Taxonomy for the provided neighbours is given in *tax_tab*. Argument *q_name* is used to name the query in the result table. Argument *total_k* indicates the number of neighbours considered. Returns an array with columns *query name*, *rank*, *taxon*, *representative count*, *total_k*, *total tax support*, *mean taxon support*, *std taxon supports*, detailing the support for each rank in each taxon
* **get_classif(results, mode = 'majority')** Gets a classification from the given result table. Argument *mode* specifies the classification method used to generate the result, values are *majority* and *weighted*
* **get_classif_majority(results, n_ranks = 6)** Get the classification from a majority vote result table. Assign the most represented taxon for each rank, if there is a draw, leave the classification ambiguous. Returns an array with the assigned taxon for each rank
* **get_classif_weighted(results, n_ranks = 6)** Get the classification from a weighted vote result table. Assign the most supported taxon for each rank, if there is a draw, leave the classification ambiguous. Returns an array with the assigned taxon for each rank <ins>NOTE: could add the softmax support for the assigned classification</ins>

## Cost matrix
This module contains the functions used to generate the cost matrixes used in distance calculations
### Functions
* **pair_idxs(bases0, bases1)** Called by *cost_matrix*, used to calculate distances between ambiguous bases
* **cost_matrix(transition = 1, transversion = 2)** Generates a distance matrix based on the K2P substitution model. Arguments *transition* and *transversion* determine how these substitutions are penalized
* **id_matrix()** Generates an ID matrix with diagonal values 0 (except cell 0,0) and all else are 1