## **Localized Sequence Logo Generator**

**What is this tool?**
This online tool, hosted on Google Colab, is designed to generate sequence logos based on the distance from a target residue or ligand in a given dataset of pdbs. The distance is calculated using the alpha carbon positions between protein interfaces or all non-hydrogen atoms in the case of ligands. The tool also includes a 3D visualizer that plots the entire target structure and highlights the residues within the specified distance constraint. There are two options for colering the interacting residues based on [Amino colour](http://acces.ens-lyon.fr/biotic/rastop/help/colour.htm#aminocolours) and [Shapely colour](http://acces.ens-lyon.fr/biotic/rastop/help/colour.htm#shapelycolours).

**Required Inputs:**
- PDB Files: Two examples are provided on GitHub, one for protein-protein interactions and one for ligand-protein interactions.
- Target Chain from PDB: The program uses the first file in the directory to upload the target structure.
- Interacting Chain from PDB: This chain identifier must be the same across all PDB files.
- Specify Target Type: Indicate whether the target is a ligand or a protein.
- Target Residue Index (for proteins) or Unique Atom Names (for ligands): Provide the target residue index for proteins or unique atom names for ligands. Alternatively, you can input 'All' to consider all residues/atoms.

**Usage:**
Please execute the cells in numerical order. If you wish to load a different dataset, you can start from the upload step to reload the new data.


In [2]:
#@title ##1. Install and import required packages
%%capture

from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/DanielP520/sequence_logo_project.git
%cd sequence_logo_project
!pip install logomaker
!pip install rarfile
import pandas as pd
import helper_functions
import sequence_logo_main
import os
import glob
import zipfile
import tarfile
import rarfile
%matplotlib notebook
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
            },
          });
        </script>
        '''))
os.mkdir('temp')




In [14]:
#@title ##2. Upload data, and process
#@markdown Navigate to the File menu located top left of the screen. Upload data into the **Temp** directory in the form of a zip file containing the pdb dataset. It is also possible to directly upload pdb files into temp folder but this might take a long time depending on the number of files.
#@markdown There are two example datasats to test the tool with, a ligand-protein exampple and a protein-protein example.
#@markdown Select which data set you would like to use:
import zipfile

def extract_file(archive_path, output_folder):
    if zipfile.is_zipfile(archive_path):
        with zipfile.ZipFile(archive_path, 'r') as zip_ref:
            zip_ref.extractall(output_folder)
    else:
        print(f"{archive_path} is not a zip file")

uploaded_data_set = False #@param {type:"boolean"}
protein_protein_example_data = False  #@param {type:"boolean"}
protein_ligand_example_data = True  #@param {type:"boolean"}
if uploaded_data_set:
  zip_files = glob.glob("temp/" +"*.zip")
  files = zip_files + tar_files + rar_files
  pdb_files =[]
  for zip in files:
      extract_file(zip, "temp")
      zip = zip.split(".zip")[0]
      pdb_files +=  glob.glob(zip + "/*.pdb")

elif protein_ligand_example_data:
  pdb_files = glob.glob("Ligand_Example/" + "*.pdb")
elif protein_protein_example_data:
    pdb_files = glob.glob("Protein_Example/" + "*.pdb")


In [15]:
#@title ##3. Select target and binding chains.

#@markdown Select target chain, binding chain, select whether the target is a ligand or protein, atom names (if ligand) or residue index (if protein) to visualize, separate by commas.

#@markdown You can also type 'all' to plot all residues or atoms.

#@markdown **Example usage:**
#@markdown - For protein example use B for target chain and A for binding chain.
#@markdown - For ligand example use Z for target chain and B for binding chain. Use is_ligand function
target_chain = "Z"#@param {type:"string"}
binding_chain = "B"#@param {type:"string"}
to_plot = "all"#@param {type:"string"}
is_ligand = True #@param {type:"boolean"}

if to_plot == "all":
  plot_list = "all"

else:
    if is_ligand:
      plot_list = [str(x) for x in to_plot.split(",")]
    else:
      plot_list =[int(x) for x in to_plot.split(",")]


In [16]:
from matplotlib.pyplot import plot
# configure_plotly_browser_state()
df_target,df_binder = sequence_logo_main.plot(pdb_files, target_chain,binding_chain,is_ligand,plot_list)

TypeError: ignored

In [None]:
# sequence_logo_input = input("Enter residues index to generate sequence logos: separated by comas: ")
# sequence_logo_residues =[int(x) for x in sequence_logo_input.split(",")]
# print(sequence_logo_residues)

In [None]:
%matplotlib inline
sequence_logo_main.sequence_logos(df_target,df_binder, ['C7',"C8"], True)

TypeError: find_nearest_points() missing 1 required positional argument: 'is_ligand'