# LIT-AF_Cluster

> **_NOTE:_** This method has not been fully tested, we advise the users to proceed with caution. Please report any issue with the code to the developers.

This notebook is an implementation of AF_Cluster for the LIT-AlphaFold pipeline.

MSA clustering is performed, generating a new MonomericObject for each cluster. The newly generated objects can be used by LIT-ALphaFold for calculation.

This notebook has been developed for local use, and it requires first to generate a *.pkl* MonomericObject file with the script *create_individual_features.py*.

## Input

In [1]:
pkl_folder = '' #Path to contain the folder with the pregenerated pkl file for the target protein
pkl_file = '' #Pickle file of the target protein
output_dir = '.' #Output direcotry where to save the genereated MonomericObjects

projection_method = 'PCA' #Choose projection method between PCA and TSNE
show_msa = True #Show the MSA of the input monomer and the clustered MSAs
show_proj = True #Show the projection of the MSA for the input monomer

In [2]:
from litaf.objects import MonomericObjectMmseqs2, MonomericObject, load_monomer_objects
from colabfold.plot import plot_msa_v2
import matplotlib.pyplot as plt
import os
import copy
import numpy as np
import pickle
%matplotlib inline

monomer = load_monomer_objects({pkl_file.split('.')[0]: pkl_folder}, pkl_file.split('.')[0])
print(f"Monomer unit {monomer.description} created")
if show_msa:
    plot_msa_v2(monomer.feature_dict)
    plt.show()
    plt.close()
if show_proj:
    monomer.plot_msa_proj(method = projection_method)
    plt.show()
    plt.close()

2024-02-08 12:58:08.938425: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-08 12:58:08.938462: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-08 12:58:08.939319: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


FileNotFoundError: [Errno 2] No such file or directory: 'CXCL12.pkl'

## Cluster generation
AF_Cluster uses DBSCAN to cluster the MSA. To perform clustering set the values for eps (the maximum distance between two samples for one to be considered as in the neighborhood of the other) and min_samples (the number of samples in a neighborhood for a point to be considered as a core point).
For more information about DBSCAN please refer to the scikit-learn documentation (https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)

In [None]:
eps = 5
min_samples = 50

In [None]:
new_monomers = {}
new_feature_dicts = monomer.cluster_msa(eps, 50)
for i, feat_dict in new_feature_dicts.items():
    new_monomers[f'{monomer.description}_{i}'] = copy.deepcopy(monomer)
    new_monomers[f'{monomer.description}_{i}'].feature_dict = feat_dict
    new_monomers[f'{monomer.description}_{i}'].description = f'{monomer.description}_cluster_{i}'
    print(f"Monomer unit {monomer.description}_{i} created")
    if show_msa:
        %matplotlib inline 
        plot_features = new_monomers[f'{monomer.description}_{i}'].feature_dict.copy()
        plot_features['msa'] = np.concatenate([[monomer.feature_dict['msa'][0]],
                                    plot_features['msa']])
        plot_msa_v2(plot_features)
        plt.show()
        plt.close()
monomer.plot_msa_proj(method = projection_method)

## Save results

In [None]:
for nmonomer in new_monomers.values():
    output_file = os.path.join(output_dir, nmonomer.description)
    pickle.dump(nmonomer, open(f"{output_file}.pkl", "wb"))