<a href="https://colab.research.google.com/github/ThomasCMcLean/Lazy_AF/blob/main/Lazy_AF_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Lazy_AF Workflow Part 2**

This part of the workflow is designed to copy the highest ranked .json files  for each AlphaFold prediction from the designated Google Drive folder into the destination directory.

It will then output a .csv table with a list of each interaction, the pTM, ipTM and ranking_confidence (0.2pTM + 0.8ipTM) for downstream analysis of your choosing. This is reliant on the file name and currently only works on two-protein interactions.

**For details, refer to our manuscript:** *in prep*

**Lazy_AF Part 1** can be found here : https://colab.research.google.com/drive/1a5d7xraEK4Iv3Ecmmjb1opnU5jwXRW_a#scrollTo=M_CoJNzed49A

For more details checkout the [ColabFold GitHub](https://github.com/ThomasCMcLean/Lazy_AF).

In [14]:
#@title Mount google drive
from google.colab import drive
drive.mount('/content/drive')
from sys import version_info
python_version = f"{version_info.major}.{version_info.minor}"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [15]:
#@title Input directories and copy top ranked JSON files to this location
from google.colab import drive
drive.mount('/content/drive')

import shutil
import glob
import os

# Source directory containing the JSON files
source_directory = '/content/drive/MyDrive/RK2_results' #@param {type:"string"}

# Destination directory for .json files and analysis
destination_directory = '/content/drive/MyDrive/analysis' #@param {type:"string"}

# Create the destination directory if it doesn't exist
os.makedirs(destination_directory, exist_ok=True)

# Define the pattern to match files
pattern = '*_rank_001*.json'

# Find files matching the pattern in the source directory
matching_files = glob.glob(os.path.join(source_directory, pattern))

# Copy matching files to the destination directory
for file_path in matching_files:
    shutil.copy(file_path, destination_directory)



Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [28]:
#@title Name and produce your output .csv file

# final file name
csv_name = 'Output' #@param {type:"string"}

import json
import os
import pandas as pd

# Function to extract pTM and ipTM values from a JSON file
def extract_ptm_iptm(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)

    ptm = data['ptm']
    iptm = data['iptm']

    return {'ptm': ptm, 'iptm': iptm}

# Directory where your JSON files are located
json_folder = destination_directory  # Replace with the actual folder path
csv_file_path = os.path.join(json_folder, csv_name + '.csv')

# List all JSON files in the folder
json_files = [os.path.join(json_folder, file) for file in os.listdir(json_folder) if file.endswith('.json')]

# Create an empty data frame to store the extracted data
result_data = pd.DataFrame(columns=['Protein_1', 'Protein_2', 'pTM', 'ipTM'])

# Iterate through the list of JSON files
for json_file_path in json_files:
    # Extract the base name without extension
    file_name = os.path.splitext(os.path.basename(json_file_path))[0]

    # Extract everything before the asterix
    file_name_parts = file_name.split('*')
    file_name_short = file_name_parts[0]

    # Call the function to extract PTM and IPTM values
    ptm_iptm_values = extract_ptm_iptm(json_file_path)

    # Access PTM and IPTM values
    ptm_value = ptm_iptm_values['ptm']
    iptm_value = ptm_iptm_values['iptm']

    # Split the first column using "-"
    protein_names = file_name_short.split('-')

    # Check if there are enough elements in protein_names
    if len(protein_names) >= 2:
       # Create a data frame for the current JSON file
        file_data = pd.DataFrame({
            'Protein_1': [protein_names[0]],
           'Protein_2': [protein_names[1]],
           'pTM': [ptm_value],
            'ipTM': [iptm_value],
            'Ranking_confidence': [0.2 * ptm_value + 0.8 * iptm_value]
        })


    # Append the current file's data to the result_data data frame
    result_data = pd.concat([result_data, file_data], ignore_index=True)

# Save the data to a CSV file
result_data.to_csv(csv_file_path, index=False)