# Graboid documentation
## Mapping

### BLAST
This module handles the alignment of the database or query sequences to a single reference sequence of the selected molecular marker.

#### Functions
* **blast(query, ref, out_dir, threads = 1)** Perform an ungapped blastn search of the given *query* against the given *ref* result is stored to *out_file*
* **makeblastdb(ref_file, db_prefix)** Make a blast database for the given *ref_file*. Generated files use the given *db_prefix*
* **check_db_dir(db_dir)** Counts possible blastdb files in the directory *db_dir*. Returns True if exactly 6 files are found, False otherwise. Additionally returns a list of the found files

#### Blaster
**class Blaster()**

This class handles all blast related tasks

##### Attributes
* **db_dir** Path to the blast database directory, also contains the database prefix

##### Methods
* **make_ref_db(ref_file, db_dir, clear = False)** Generates a blast database for the given *ref_file*. Generated files are stored in the directory *db_dir*. If the specified directory already contains database files and the argument *clear* is set to true, delete existing files and replace them with new ones.
* **blast(fasta_file, out_file, threads = 1)** Blasts the given query against the database files generated by *make_ref_db*. Returns *out_file* if succesful, None otherwise.

### Matrix2 (rename to matrix_builder)
This module builds an alignment matrix from the blast report generated by a *blaster* instance.

#### Functions
* **read_blast(blast_file, evalue = 0.005)** Loads the blast report stored in *blast file*. Assumes the columns generated are *qseqid*, *pident*, *length*, *qstart*, *qend*, *sstart*, *send* and *evalue*. Max e-value allowed is determined by the argument *evalue*. Returns loaded table as a pandas dataframe.
* **read_seqfile(seq_file)** Loads a fasta file sepecified by *seq_file*. Returns a dictionary with accession:sequence as key:value pairs
* **get_seqs(seq_file, blast_tab)** Returns a dictionary containing the sequences from *seq_file* present in *blast_tab*. Key:value pairs are accession:sequence
* **get_mat_dims(blast_tab)** Infers the alignment matrix dimensions from the values in *blast_tab*. Returns *nrows*, taken as the number of rows in *blast_tab*, *ncols*, as the maximum extent of matches over the reference sequence, and *offset* as the minimum value of match starts in *blast_tab*
* **build_coords0(coord_mat)** Sorts coordinates for the given match. Flips reverse matches. Returns *seq_coords*, the portions of each sequence to take, and *mat_coords* the location of each match in the alignment matrix.
* **build_row0(seq, seq_coords, mat_coords, rowlen)** Uses *seq_coords* and *mat_coords* to incorporate *seq* into a row of the alignment matrix. Returns *row*, a numpy array of length *rowlen* containing the aligned *seq* turned to numeric values.
* **plot_coverage_data(blast_file, evalue = 0.005, figzise = (12, 7))** WIP