# Toolkit
In this example, we show some examples that you might find useful. We will cover:
- How to compute the global pairwise alignment matrix
- How to get the graph
- How to compute the maximum identity
- How to find and clear the cache

## Compute the global pairwise alignment matrix
The computation is accelerated using our pwiden rust engine. It returns a numpy array

In [None]:
from qmap.toolkit.aligner import compute_global_identity

# The dataset of sequences
sequences: list[str] = [...]
pairwise_identity_matrix = compute_global_identity(sequences)

## Build the graph and extract clusters
There is a utility function that builds the graph based on a threshold. Again, it is accelerated using the pwiden engine. Once the edges are computed, they are cached by default, so they won't be computed again.

In [None]:
from qmap.toolkit.clustering import build_graph


graph = build_graph(sequences, threshold=0.6)

Then, we can find the communities with the leiden algorithm. The clusters are returned as a dataframe, with two columns: cluster_id and sequence_id. The sequence ids correspond to the indices in the original sequences list.

In [None]:
from qmap.toolkit.clustering import leiden_community_detection

clusters = leiden_community_detection(graph)

## Compute the maximum identity between two sets of sequences
We can also compute the maximum identity between two sets of sequences. This is also optimized using the pwiden engine, and support caching. This mean that for a same set of sequences, the pairwise identity is only computed once. The returned value is a numpy array of shape (len(sequences_test),), where each value corresponds to the maximum identity of the sequence in sequences_a with all sequences in sequences_train.

In [None]:
from qmap.toolkit import compute_maximum_identity

sequences_train: list[str] = [...]
sequences_test: list[str] = [...]
max_identities = compute_maximum_identity(sequences_train, sequences_test)

## Find and clear the cache
The toolkit uses caching to maximize the user experience, and minimize latency. The cache is stored in the default cache directory of the system. You can easily find this directory using the `get_cache_dir` function. Then, you can manually clear the cache by deleting the files in this directory: `rm <cache_dir>/*`.

In [None]:
from qmap.toolkit import get_cache_dir

print(get_cache_dir())