<a href="https://colab.research.google.com/github/Mr-Dino-DNA/Collabs/blob/main/SwissTarget_and_STRING_db_Integration_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Upload the SwissTarget CSV File

Please upload the CSV file obtained from SwissTargetPrediction here. Make sure the file is in CSV format. This file contains information about potential drug targets and their probabilities which will be used for further analysis.



In [None]:
# @title
import ipywidgets as widgets
from IPython.display import display

uploader = widgets.FileUpload(
    accept='.csv',  # Only accept CSV files
    multiple=False,  # Allow only one file to be uploaded
    description='Upload SwissTarget CSV'
)

# Display the uploader widget
display(uploader)

FileUpload(value={}, accept='.csv', description='Upload SwissTarget CSV')

# Process the Uploaded File

Once you have uploaded the CSV file, click the 'Process File' button below to load and verify the data. This step will confirm that the file has been uploaded correctly and will prepare it for the next analysis steps.


In [None]:
# @title
process_button = widgets.Button(description="Process File")
output = widgets.Output()

def process_file(b):
    global df  # Declare df as a global variable
    with output:
        output.clear_output()  # Clear previous outputs
        if uploader.value:
            # Extract the content of the uploaded file
            uploaded_file = next(iter(uploader.value.values()))
            import pandas as pd
            from io import StringIO
            content = uploaded_file['content']
            df = pd.read_csv(StringIO(content.decode('utf-8')))
            print("File processed successfully! Here are the first few rows:")
            display(df.head())  # Display the first few rows to confirm
            print("Column names in the CSV file:", df.columns)  # Print column names
        else:
            print("No file uploaded. Please upload a CSV file.")

process_button.on_click(process_file)
display(process_button, output)


Button(description='Process File', style=ButtonStyle())

Output()

Unnamed: 0,Target,Common name,Uniprot ID,ChEMBL ID,Target Class,Probability*,Known actives (3D/2D)
0,Integrin alpha-IIb/beta-3,ITGA2B ITGB3,P08514 P05106,CHEMBL2093869,Membrane receptor,0.198886,88 / 163
1,Integrin alpha-4/beta-7,ITGB7 ITGA4,P26010 P13612,CHEMBL2095184,Membrane receptor,0.169889,17 / 21
2,Integrin alpha-4/beta-1,ITGB1 ITGA4,P05556 P13612,CHEMBL1907599,Membrane receptor,0.150578,34 / 78
3,Integrin alpha-V/beta-3,ITGAV ITGB3,P06756 P05106,CHEMBL1907598,Membrane receptor,0.122017,78 / 86
4,Integrin alpha-5/beta-1,ITGB1 ITGA5,P05556 P08648,CHEMBL2095226,Membrane receptor,0.122017,15 / 17


Column names in the CSV file: Index(['Target', 'Common name', 'Uniprot ID', 'ChEMBL ID', 'Target Class',
       'Probability*', 'Known actives (3D/2D)'],
      dtype='object')


# Select and Prepare Data for STRING-db

Now that we have the SwissTarget CSV file loaded, the next step is to prepare the data for analysis in STRING-db. We will extract the 'Common Name' column, which contains potential drug targets, and prepare a list to be used for querying STRING-db.



In [None]:
# @title
# Assume df is defined and loaded correctly
try:
    df  # Check if df is defined
except NameError:
    print("DataFrame not loaded. Please process the uploaded file first.")

if 'df' in globals():
    # Only create widgets if df is defined
    column_selector = widgets.Dropdown(
        options=list(df.columns),
        value='Common name',  # Make sure this matches the exact column name
        description='Select Target Column:',
    )

    probability_selector = widgets.Dropdown(
        options=list(df.columns),
        value='Probability*',  # Adjust according to exact column name including any special characters
        description='Select Probability Column:',
    )

    display(column_selector, probability_selector)
else:
    print("Please upload and process a CSV file to activate this step.")

Dropdown(description='Select Target Column:', index=1, options=('Target', 'Common name', 'Uniprot ID', 'ChEMBL…

Dropdown(description='Select Probability Column:', index=5, options=('Target', 'Common name', 'Uniprot ID', 'C…

# Query STRING-db for Interaction Data

Now, we will use the selected potential drug targets to query STRING-db and retrieve interaction data. This data will include interaction partners, confidence scores, and other relevant information which can be used for network analysis.



In [None]:
# @title
import ipywidgets as widgets
from IPython.display import display
import pandas as pd

prepare_query_button = widgets.Button(description="Prepare Query Data")
query_output = widgets.Output()

def prepare_query_data(b):
    global query_string  # We'll use this in the next step
    with query_output:
        query_output.clear_output()
        try:
            # Extract unique protein names
            unique_names = df[column_selector.value].dropna().unique()
            query_string = '%0d'.join(unique_names)  # Prepare string for query
            print("Query data prepared successfully. Number of unique targets:", len(unique_names))
            print("Targets to be queried:")
            print('\n'.join(unique_names))  # Display all unique names
        except Exception as e:
            print("Error preparing query data:", str(e))

prepare_query_button.on_click(prepare_query_data)
display(prepare_query_button, query_output)

Button(description='Prepare Query Data', style=ButtonStyle())

Output()

In [None]:
# @title
import pandas as pd
from io import StringIO
import requests

global interaction_data  # Declare this at the top of your script to clarify global usage

fetch_data_button = widgets.Button(description="Fetch Data from STRING-db")
fetch_output = widgets.Output()

def fetch_data(b):
    global interaction_data  # Make sure to declare the use of the global variable inside the function
    with fetch_output:
        fetch_output.clear_output()
        string_api_url = "https://string-db.org/api"
        output_format = "tsv"
        method = "network"
        if 'query_string' in globals() and query_string:
            request_url = f"{string_api_url}/{output_format}/{method}?identifiers={query_string}&species=9606&required_score=400"
            response = requests.get(request_url)
            if response.status_code == 200:
                # Directly load the data into interaction_data
                interaction_data = pd.read_csv(StringIO(response.text), sep="\t")
                print("Data fetched successfully from STRING-db:")
                display(interaction_data)
            else:
                print("Failed to fetch data:", response.status_code, response.text)

fetch_data_button.on_click(fetch_data)
display(fetch_data_button, fetch_output)

Button(description='Fetch Data from STRING-db', style=ButtonStyle())

Output()

Unnamed: 0,stringId_A,stringId_B,preferredName_A,preferredName_B,ncbiTaxonId,score,nscore,fscore,pscore,ascore,escore,dscore,tscore
0,9606.ENSP00000085219,9606.ENSP00000286301,CD22,CSF1R,9606,0.407,0,0,0.000,0.088,0.045,0.00,0.373
1,9606.ENSP00000085219,9606.ENSP00000216341,CD22,GZMB,9606,0.416,0,0,0.000,0.058,0.125,0.00,0.349
2,9606.ENSP00000085219,9606.ENSP00000380227,CD22,ITGA4,9606,0.567,0,0,0.000,0.048,0.071,0.00,0.549
3,9606.ENSP00000085219,9606.ENSP00000411355,CD22,PTPRC,9606,0.999,0,0,0.000,0.153,0.510,0.90,0.995
4,9606.ENSP00000178638,9606.ENSP00000430656,CA12,CA1,9606,0.488,0,0,0.118,0.000,0.000,0.00,0.444
...,...,...,...,...,...,...,...,...,...,...,...,...,...
267,9606.ENSP00000416561,9606.ENSP00000419260,CFB,PIK3CG,9606,0.418,0,0,0.000,0.000,0.000,0.00,0.419
268,9606.ENSP00000419260,9606.ENSP00000466090,PIK3CG,ELANE,9606,0.416,0,0,0.000,0.072,0.000,0.00,0.397
269,9606.ENSP00000419260,9606.ENSP00000496129,PIK3CG,PRKCB,9606,0.613,0,0,0.000,0.181,0.292,0.00,0.387
270,9606.ENSP00000419260,9606.ENSP00000501150,PIK3CG,PIK3CB,9606,0.997,0,0,0.000,0.000,0.045,0.90,0.976


In [None]:
# @title
def merge_data(interaction_df, target_df):
    # Clean up and rename DataFrame columns as needed
    target_df.columns = [col.strip() for col in target_df.columns]
    target_df.rename(columns={'Common name': 'CommonName', 'Probability*': 'Probability'}, inplace=True)

    # Perform the merge
    merged_df = interaction_df.merge(target_df, left_on='preferredName_A', right_on='CommonName', how='left')
    # Select and rename columns for clarity
    merged_df = merged_df[['preferredName_A', 'preferredName_B', 'score', 'Probability']]
    merged_df.rename(columns={
        'preferredName_A': 'Node1',
        'preferredName_B': 'Node2',
        'score': 'Score',
        'Probability': 'NodeAttribute'
    }, inplace=True)

    return merged_df

# Example of using the merge function, ensuring both data are loaded
if 'interaction_data' in globals() and 'df' in globals():
    final_data = merge_data(interaction_data, df)
    print("Merged Data:")
    print(final_data.head())
else:
    print("Please ensure both interaction data and SwissTarget data are loaded before merging.")

Merged Data:
  Node1  Node2  Score  NodeAttribute
0  CD22  CSF1R  0.407            0.0
1  CD22   GZMB  0.416            0.0
2  CD22  ITGA4  0.567            0.0
3  CD22  PTPRC  0.999            0.0
4  CA12    CA1  0.488            0.0


In [None]:
# @title
import base64
from IPython.display import HTML

def create_download_link(df, filename="final_data.csv"):
    # Generate a download link that can download the data from a DataFrame as a CSV
    csv = df.to_csv(index=False)  # Convert DataFrame to CSV string
    b64 = base64.b64encode(csv.encode()).decode()  # Encode the bytes to base64 (necessary for HTML link)

    # Return an HTML link that downloads the CSV file when clicked
    return HTML(f'<a href="data:text/csv;base64,{b64}" download="{filename}">Download CSV file</a>')

download_button = widgets.Button(description="Download Merged Data")
output = widgets.Output()

def download_file(b):
    with output:
        output.clear_output()
        if 'final_data' in globals():
            display(create_download_link(final_data))  # Call the function and display the result
        else:
            print("Data not available. Please ensure the data is processed first.")

download_button.on_click(download_file)
display(download_button, output)

Button(description='Download Merged Data', style=ButtonStyle())

Output()