# Graboid documentation
## Mapping

### BLAST
This module handles the alignment of the database or query sequences to a single reference sequence of the selected molecular marker.

#### Functions
* **blast(query, ref, out_dir, threads = 1)** Perform an ungapped blastn search of the given *query* against the given *ref* result is stored to *out_file*
* **makeblastdb(ref_file, db_prefix)** Make a blast database for the given *ref_file*. Generated files use the given *db_prefix*
* **check_db_dir(db_dir)** Counts possible blastdb files in the directory *db_dir*. Returns True if exactly 6 files are found, False otherwise. Additionally returns a list of the found files

#### Blaster
**class Blaster()**

This class handles all blast related tasks

##### Attributes
* **report** Path to the generated blast report

##### Methods
* **make_ref_db(ref_file, db_dir, clear = False)** Generates a blast database for the given *ref_file*. Generated files are stored in the directory *db_dir*. If the specified directory already contains database files and the argument *clear* is set to true, delete existing files and replace them with new ones.
* **blast(fasta_file, out_file, db_dir, threads = 1)** Blasts the sequences of *fasta_file* against the database given by *db_dir*. Attribute *report* updates with the path to the generated file (if BLAST is successful)

### Matrix2 (rename to matrix_builder)
This module builds an alignment matrix from the blast report generated by a *blaster* instance.

#### Variables
* **bases**
* **tr_dict**

#### Functions
* **read_blast(blast_file, evalue = 0.005)** Loads the blast report stored in *blast file*. Assumes the columns generated are *qseqid*, *pident*, *length*, *qstart*, *qend*, *sstart*, *send* and *evalue*. Max e-value allowed is determined by the argument *evalue*. Returns loaded table as a pandas dataframe.
* **read_seqfile(seq_file)** Loads a fasta file sepecified by *seq_file*. Returns a dictionary with accession:sequence as key:value pairs
* **get_seqs(seq_file, blast_tab)** Returns a dictionary containing the sequences from *seq_file* present in *blast_tab*. Key:value pairs are accession:sequence
* **get_mat_dims(blast_tab)** Infers the alignment matrix dimensions from the values in *blast_tab*. Returns *nrows*, taken as the number of rows in *blast_tab*, *ncols*, as the maximum extent of matches over the reference sequence, and *offset* as the minimum value of match starts in *blast_tab*
* **build_coords0(coord_mat)** Sorts coordinates for the given match. Flips reverse matches. Returns *seq_coords*, the portions of each sequence to take, and *mat_coords* the location of each match in the alignment matrix.
* **build_row0(seq, seq_coords, mat_coords, rowlen)** Uses *seq_coords* and *mat_coords* to incorporate *seq* into a row of the alignment matrix. Returns *row*, a numpy array of length *rowlen* containing the aligned *seq* turned to numeric values.
* **plot_coverage_data(blast_file, evalue = 0.005, figzise = (12, 7))** Plots sequence coverage taken from the blast report. X axis shows alignment coordinates.

#### MatBuilder
**class MatBuilder(out_dir)**

##### Parameters
* **out_dir** Output directory for the generated alignment file

##### Attributes
* **out_dir**
* **dims** Dimensions of the generated alignment matrix (*cols*, *rows*)
* **acclist** List of accessions included in the alignment (used to locate a given sequence in the matrix)
* **mat_file** File name of the generated alignment matrix. Suffix *.npy*
* **acc_file** File name of the generated accession list. Suffix *.acclist*

##### Methods
* **generate_outnames(seq_file, out_name = None)** Builds the file names for the generated files. If *out_name* is provided, generated file names are *\<out_name>.npy* and *\<out_name>.acclist*
* **build(blast_file, seq_file, out_name = None, evalue = 0.005)** Use the blast report given in *blast_file* to generate an alignment matrix from the data contained in *seq_file*. Only the matches below the *evalue* threshold are included in the alignment matrix. The method generates output file names on runtime (names can be specified with the argument *out_name*)

### Director
This module handles the blast search and alignment matrix construction

#### Functions
* **make_dirs(base_dir)** Generates the necesary subdirectories in base_dir to contain the generated files. Subdirectory names are data and warnings.
* **check_fasta(fasta_file)** Returns the count of recognized fasta sequences in *fasta_file*. Used to verify format of input sequence files.

#### Director
**class Director(out_dir, warn_dir)**

This class handles the process of the blast search and matrix construction. Stores paths to generated files and matrix metadata.
##### Parameters
* **out_dir** Directory for the output sequence, accession, taxonomy and taxonomy guide files
* **warn_dir** Directory for the warning files generated along the process

##### Attributes
* **out_dir**
* **warn_dir**
* **warn_handler**
* **log_handler**
* **db_dir** Directory containing the BLAST database files
* **blast_report** Path to the generated BLAST report
* **mat_file** Path to the generated alignment matrix
* **acc_file** Path to the generated accession list
* **dims** Dimensions of the generated alignment matrix
* **workers** The workers are instances of the classes used to construct the sequence map
  * **blaster**
  * **mapper**

##### Methods
* **set_blastdb(db_dir)** Establish the given *db_dir* as the BLAST database directory. Note, specified directory must contain the databse files.
* **build_blastdb(ref_seq, ref_name = None)** Creates a blast database using the given *ref_seq* (NOTE: ref seq must contain a single sequence). If no *ref_name* is given, the output directory is named based on the *ref_seq* file.
* **direct(fasta_file, out_name = None, evalue = 0.005, threads = 1)** Perform the BLAST alignment and build the corresponding matrix. NOTE: a BLAST database must be set beforehand. Perform a blast alginment of the given *fasta_file*. If *out_name* is provided, the generated report is saved in *\<out_name>.BLAST*, otherwise, the report is named after *fasta_file*. After the BLAST search, the report is used to generate the alignment matrix and accession list, these are stored to files *\<out_name>.npy* and *\<out_name>.acclist* respectively.