# Toolkit
In this example, we show some examples that you might find useful. We will cover:
- How to get the embeddings
- How to align multiple sequences
- How to extract the clusters

## Get the embeddings

In [None]:
from qmap.toolkit.aligner import Encoder

encoder = Encoder(force_cpu=True)

import json
with open('../../../data/build/dataset.json', 'r') as f:
    dataset = json.load(f)
    # Filter out sequences that are too long because the aligner support sequences up to 100 amino acids long
    dataset = [sample for sample in dataset if len(sample["Sequence"]) < 100]

sequences = [sample['Sequence'] for sample in dataset]
embeddings = encoder.encode(sequences)
embeddings

## Align multiple sequences
For this example, we will compute the pairwise alignments between all sequences in the dataset. However, feel free to use any other sequence db as the second input and it will align all sequences from the first input to all sequences of the second input.

In [None]:
from qmap.toolkit.aligner import align_db

alignment = align_db(embeddings, embeddings)
alignment

## Extract the clusters
To extract the clusters, you will need to build a graph from a `VectorizedDB`. Then, you will be able to find the communities within this graph.

In [None]:
from qmap.toolkit.clustering import leiden_community_detection, build_graph
import os
import igraph as ig

# Step 1: Build the graph.
# The graph's edgelist file is stored in a temporary directory (managed by the os)
graph_path, node2id = build_graph(embeddings, threshold=0.6)

# Step 2: Load the graph
graph = ig.Graph.Read_Edgelist(graph_path, directed=False)

# Find the communities using the Leiden algorithm.
clusters = leiden_community_detection(graph, n_iterations=-1)

# It is good practice to remove the graph file after use.
os.remove(graph_path)

clusters # A Dataframe with the node_id (sequence index) and the community (cluster idx)