# **DnldTool4RCSB: Download Tool for RCSB**

This Jupyter Notebook downloads files (PDB and SDF) with atomic coordinates from the Protein Data Bank. It reads the HTML RCSB page to scrape data related to the identification of the active ligand. It focuses on structures for which binding affinity data is available. The active ligand is a small molecule bound to a protein target for which binding is available ([de Azevedo et al., 2024](https://doi.org/10.1002/jcc.27449)). It employs a requests library for downloading the atomic coordinates from the RCSB ([Veit-Acosta & de Azevedo, 2021](https://doi.org/10.2174/0929867328666210210121320)).
<br> </br>
<img src="https://drive.usercontent.google.com/download?id=1SpdLWV1K6Qtv0kvnRqdc3jc25mL0Vol_&export=view&authuser=0" width=400 alt="DnldTool4RCSB Flowchart">
<br><i>Schematic flowchart for DnldTool4RCSB. It reads input files (lig.in, par.in, and pdb_codes.in) and downloads PDB and SDF from the Protein Data Bank for which binding affinity data (e.g., K<sub>i</sub>) is available. DnldTool4RCSB reads pdb_codes.in to define the PDB file to be downloaded from the Protein Data Bank. Input files lig.in and par.in define the folders used for downloading and the binding affinity. DnldTool4RCSB also downloads the SDF for the active ligand in the structure.</i></br>
<br> </br>
This code has the following functions

**ISO80000**: This function determines the file size in kibibytes (KiB), mebibytes (MiB), gibibytes (GiB) etc. It employs ISO/IEC 80000 standard
(https://www.iso.org/standard/87648.html).
<br> </br>
**read_dictionary**: This function reads a file with parameters stored as Python dictionaries.
<br> </br>
**read_pdb_codes**: This function reads a CSV file with PDB access codes and returns a list with them. It shows a summmary.
<br> </br>
**read_pdb_codes_no_summary**: This function reads a CSV file with PDB access codes and returns a list with them.
<br> </br>
**show_line**: This function shows a formatted line.
<br> </br>
**show_references**: This function shows references given as a list.
<br> </br>
**show_title**: This function shows a formatted line as a title.
<br> </br>
**rcsb_download_sdf**: This function downloads a specific ligand from the PDB using
the RCSB Model Server API and saves it as an SD File.
<br> </br>
**scrape_data_rcsb**: This function scrapes data from the RCSB page. It saves it to a file with the identification of the active ligand found in a given structure.
<br> </br>
**extract_pdb_coordinates**: This function extracts coordinates from a downloaded
PDB file. It intends to select target coordinates for docking simulations.
<br> </br>
**rcsb_download_pdb**: This function downloads a PDB file from the RCSB and saves it to a target directory.
<br> </br>
**zip_content_folders**: This function zips datasets folder in the content directory.
<br> </br>
**download_file_from_google_drive**: This function downloads a file from the google drive.
<br> </br>
**get_confirm_token**: This function gets the confirmation token.
<br> </br>
**save_response_content**: This function saves the response content.
<br> </br>
**unzip_a_folder**: This function unzips a previously zipped folder.
<br> </br>
**make_a_dir**: This function makes a directory in the content folder.
<br> </br>
<br> </br>
Requests library docs available at https://requests.readthedocs.io/en/latest/.

To install the requests library, type the following command.

python -m pip install requests
<br> </br>

**References**
<br> </br>
de Azevedo WF Jr, Quiroga R, Villarreal MA, da Silveira NJF, Bitencourt-Ferreira
G, da Silva AD, Veit-Acosta M, Oliveira PR, Tutone M, Biziukova N, Poroikov V, Tarasova O, Baud S. SAnDReS 2.0: Development of machine-learning models to explore the scoring function space. J Comput Chem. 2024;45(27):2333-2346.
[doi:](https://doi.org/10.1002/jcc.27449)
<br> </br>
Veit-Acosta M, de Azevedo Júnior WF. The Impact of Crystallographic Data for the Development of Machine Learning Models to Predict Protein-Ligand Binding Affinity. Curr Med Chem. 2021;28(34):7006-7022.
[doi:](https://doi.org/10.2174/0929867328666210210121320)

In [2]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 20 13:53:27 2026

@author: walter

DnldTool4RCSB: Download Tool for RCSB

This Jupyter Notebook downloads files (PDB and SDF) with atomic coordinates from
the Protein Data Bank. It reads the HTML RCSB page to scrape data related to the
identification of the active ligand. It focuses on structures for which binding
affinity data is available. The active ligand is a small molecule bound to a
protein target for which binding is available (de Azevedo et al., 2024). It
employs a requests library for downloading the atomic coordinates from the RCSB
(Veit-Acosta & de Azevedo, 2021).

This code has the following functions

ISO80000: This function determines the file size in kibibytes (KiB), mebibytes
(MiB), gibibytes (GiB) etc. It employs ISO/IEC 80000 standard
(https://www.iso.org/standard/87648.html).

read_dictionary: This function reads a file with parameters stored as Python
dictionaries.

read_pdb_codes: This function reads a CSV file with PDB access codes and returns
a list with them. It shows a summmary.

read_pdb_codes_no_summary: This function reads a CSV file with PDB access codes
and returns a list with them.

show_line: This function shows a formatted line.

show_references: This function shows references given as a list.

show_title: This function shows a formatted line as a title.

rcsb_download_sdf: This function downloads a specific ligand from the PDB using
the RCSB Model Server API and saves it as an SD File.

scrape_data_rcsb: This function scrapes data from the RCSB page. It saves it to
a file with the identification of the active ligand found in a given structure.

extract_pdb_coordinates: This function extracts coordinates from a downloaded
PDB file. It intends to select target coordinates for docking simulations.

rcsb_download_pdb: This function downloads a PDB file from the RCSB and saves it
to a target directory.

zip_content_folders: This function zips datasets folder in the content
directory.

download_file_from_google_drive: This function downloads a file from the google
drive.

get_confirm_token: This function gets the confirmation token.

save_response_content: This function saves the response content.

unzip_a_folder: This function unzips a previously zipped folder.

make_a_dir: This function makes a directory in the content folder.


Requests library docs available at https://requests.readthedocs.io/en/latest/.

To install the requests library, type the following command.

python -m pip install requests



References

de Azevedo WF Jr, Quiroga R, Villarreal MA, da Silveira NJF, Bitencourt-Ferreira
G, da Silva AD, Veit-Acosta M, Oliveira PR, Tutone M, Biziukova N, Poroikov V,
Tarasova O, Baud S. SAnDReS 2.0: Development of machine-learning models to
explore the scoring function space. J Comput Chem. 2024;45(27):2333-2346.
[doi:](https://doi.org/10.1002/jcc.27449)

Veit-Acosta M, de Azevedo Júnior WF. The Impact of Crystallographic Data for the
Development of Machine Learning Models to Predict Protein-Ligand Binding
Affinity. Curr Med Chem. 2021;28(34):7006-7022.
[doi:](https://doi.org/10.2174/0929867328666210210121320)

"""
################################################################################
# Define variables for references related to this program                      #
################################################################################
doistr = "DOI: https://doi.org/"

sandres2doi = "10.1002/jcc.27449"

pdb1doi = "10.2174/0929867328666210210121320"

################################################################################
# Define ISO80000() function                                                   #
################################################################################
def ISO80000(file2measure):
    """
    This function determines the file size in kibibytes (KiB),
    mebibytes (MiB), gibibytes (GiB) etc. It employs ISO/IEC 80000 standard
    (https://www.iso.org/standard/87648.html).
    It defines quantities and units used in information science and information
    technology and specifies names and symbols for these quantities and units.
    It has a scope; normative references; names, definitions, and symbols;
    and prefixes for binary multiples.

    Value       IEC                   SI
    1024	      2**10	    Ki	kibi    10**3   k   kilo
    1024**2	    2**20	    Mi	mebi    10**6   M   mega
    1024**3	    2**30	    Gi	gibi    10**9   G   giga
    1024**4	    2**40	    Ti	tebi    10**12  T   tera
    1024**5	    2**50	    Pi	pebi    10**15  P   peta
    1024**6	    2**60	    Ei	exbi    10**18  E   exa
    1024**7	    2**70	    Zi	zebi    10**21  Z   zetta
    1024**8	    2**80	    Yi	yobi    10**24  Y   yotta
    1024**9	    2**90	    Ri	robi    10**27  R   ronna
    1024**10	  2**100	  Qi	quebi   10**30  Q   quetta

    IEC: International Electrotechnical Commission
    SI: Système International d'Unités

    Source:
    https://en.wikipedia.org/wiki/Binary_prefix

    """
    # Import section
    import os, warnings

    # Ignore warnings
    warnings.filterwarnings("ignore")

    # Get file size
    size = os.path.getsize(file2measure)

    # Check size
    if size >= 1024 and size < 1024**2:
        size_SI = size/1000
        size_IEC = size/1024
        unit_SI = "kilobytes (kB)"
        unit_IEC = "kibibytes (KiB)"
    elif size >= 1024**2 and size < 1024**3:
        size_SI = size/(1000**2)
        size_IEC = size/(1024**2)
        unit_SI = "megabytes (MB)"
        unit_IEC = "mebibytes (MiB)"
    elif size >= 1024**3 and size < 1024**4:
        size_SI = size/(1000**3)
        size_IEC = size/(1024**3)
        unit_SI = "gigabytes (GB)"
        unit_IEC = "gibibytes (GiB)"
    elif size >= 1024**4 and size < 1024**5:
        size_SI = size/(1000**4)
        size_IEC = size/(1024**4)
        unit_SI = "terabytes (TB)"
        unit_IEC = "tebibytes (TiB)"
    elif size >= 1024**5 and size < 1024**6:
        size_SI = size/(1000**5)
        size_IEC = size/(1024**5)
        unit_SI = "petabytes (PB)"
        unit_IEC = "pebibytes (PiB)"
    elif size >= 1024**6 and size < 1024**7:
        size_SI = size/(1000**6)
        size_IEC = size/(1024**6)
        unit_SI = "exabytes (EB)"
        unit_IEC = "exbibytes (EiB)"
    elif size >= 1024**7 and size < 1024**8:
        size_SI = size/(1000**7)
        size_IEC = size/(1024**7)
        unit_SI = "zettabytes (ZB)"
        unit_IEC = "zebibytes (ZiB)"
    elif size >= 1024**8 and size < 1024**9:
        size_SI = size/(1000**8)
        size_IEC = size/(1024**8)
        unit_SI = "yottabytes (YB)"
        unit_IEC = "yobibytes (YiB)"
    elif size >= 1024**9 and size < 1024**10:
        size_SI = size/(1000**9)
        size_IEC = size/(1024**9)
        unit_SI = "ronnabytes (RB)"
        unit_IEC = "robibytes (RiB)"
    elif size >= 1024**10:
        size_SI = size/(1000**10)
        size_IEC = size/(1024**10)
        unit_IEC = "quettabytes (QB)"
        unit_IEC = "quebibytes (QiB)"
    else:
        size_IEC = size
        size_SI = size
        unit_SI = "bytes (B)"
        unit_IEC = "bytes (B)"

    # Return size_SI, size_IEC, unit_SI, unit_IEC
    return size_SI, size_IEC, unit_SI, unit_IEC

################################################################################
# Define read_dictionary() function                                            #
################################################################################
# Define read_dictionary() function
def read_dictionary(dict_file):
    """
    This function reads a file with parameters stored as Python dictionaries.
    """
    # Import section
    import ast

    # Open dictionary
    with open(dict_file) as f:
            data_in = f.read()

    # Reconstructing the data_in as a dictionary
    dict = ast.literal_eval(data_in)

    # Return dict
    return dict

################################################################################
# Define read_pdb_codes() function                                             #
################################################################################
def read_pdb_codes(pdbs_in):
    """
    This function reads a CSV file with PDB access codes and returns a list
    with them. It shows a summmary.
    """
    # Import section
    import csv, sys

    # Show message
    evo_msg = f"\nReading PDBs from CSV file: {pdbs_in}..."
    print(f"{evo_msg}",end="")

    # Read PDB access codes
    # Try to read a csv file
    try:
        fo_csv_in = open(pdbs_in,"r")
        csv_in = csv.reader(fo_csv_in)
    except IOError:
        err_msg = f"\nIOError! I can't find file {pdbs_in}"
        sys.exit(f"{err_msg}")

    # Set up a string and a list
    aux_pdb = ""
    pdb_list = []

    # Looping through csv_in
    for list_in in csv_in:
        str_in = str(list_in).replace(" ","").replace("'","").replace("[","").\
                                                            replace("]","")
        # Looping through str_in
        for char in str_in:
            if char != ",":
                aux_pdb += char
            else:
                pdb_list.append(aux_pdb)
                aux_pdb = ""

    # Get last PDB access code
    pdb_list.append(aux_pdb)

    # Show message
    print("done!")

    # Call show_title() function
    show_title(31," PDB Data Summary ",31)

    # Call show_line() function twice and show message
    n_pdb = len(pdb_list)
    show_line(f"CSV file: {pdbs_in}")
    show_line(f"Total number of PDB access codes: {n_pdb}")
    print(80*"#")

    # References for PDB
    # Define c_line_list
    c_line_list = [doistr+pdb1doi,doistr+sandres2doi]

    # Call show_references() function
    show_references(c_line_list)

    # Return pdb_list
    return pdb_list

################################################################################
# Define read_pdb_codes_no_summary() function                                  #
################################################################################
def read_pdb_codes_no_summary(pdbs_in):
    """
    This function reads a CSV file with PDB access codes and returns a list
    with them.
    """
    # Import section
    import csv, sys

    # Show message
    evo_msg = f"\nReading PDBs from CSV file: {pdbs_in}..."
    print(f"{evo_msg}",end="")

    # Read PDB access codes
    # Try to read a csv file
    try:
        fo_csv_in = open(pdbs_in,"r")
        csv_in = csv.reader(fo_csv_in)
    except IOError:
        err_msg = f"\nIOError! I can't find file {pdbs_in}"
        sys.exit(f"{err_msg}")

    # Set up a string and a list
    aux_pdb = ""
    pdb_list = []

    # Looping through csv_in
    for list_in in csv_in:
        str_in = str(list_in).replace(" ","").replace("'","").replace("[","").\
                                                            replace("]","")
        # Looping through str_in
        for char in str_in:
            if char != ",":
                aux_pdb += char
            else:
                pdb_list.append(aux_pdb)
                aux_pdb = ""

    # Get last PDB access code
    pdb_list.append(aux_pdb)

    # Show message
    print("done!")

    # Return pdb_list
    return pdb_list

################################################################################
# Define show_line() function                                                  #
################################################################################
def show_line(c_line):
    """
    This function shows a formatted line.
    """
    # Define auxiliary variables
    s1 = " "
    h1 = "#"

    # Prepare and show line
    n_line = len(c_line)
    comp = f"{(77 - n_line)*s1}{h1}"
    print(f"{h1} {c_line}{comp}")

################################################################################
# Define show_references()                                                     #
################################################################################
def show_references(c_line_list):
    """
    This function shows references given as a list.
    """
    # Call show_title() function
    show_title(34," References ",34)

    # Looping through c_line_list
    for c_line in c_line_list:
        # Add doistr to c_line
        c_line = doistr+c_line

        # Call show_line() function
        show_line(f"{c_line}")
    print(80*"#")

################################################################################
# Define show_title() function                                                 #
################################################################################
def show_title(n1,c_title,n2):
    """
    This function shows a formatted line as a title.
    """
    # Define auxiliary variable
    h1 = "#"

    # Prepare and show line
    print(f"\n{n1*h1}{c_title}{n2*h1}")

################################################################################
# Define rcsb_download_sdf() function                                          #
################################################################################
def rcsb_download_sdf():
    """
    This function downloads a specific ligand from the PDB using the RCSB Model
    Server API and saves it as an SD File.
    """
    # Import section
    import requests
    import pandas as pd

    # Call read_dictionary("/content/misc/par/par.in") function
    dict = read_dictionary("/content/misc/par/par.in")
    miscellaneous = dict.get("miscellaneous")
    project_dir = miscellaneous["project_dir"]

    # Call read_dictionary("/content/misc/par/lig.in") function
    dict = read_dictionary("/content/misc/par/lig.in")
    binfo = dict.get("binfo")
    ligand_datafile = binfo["ligand_datafile"]

    # Call show_title() function
    show_title(33," SDF Summary ",34)

    # Read the CSV file and get only specified columns
    df = pd.read_csv(project_dir+ligand_datafile)
    pdb_col = df["PDB"]
    lig_col = df["Ligand"]
    cha_col = df["Chain"]
    num_col = df["Number"]

    # Looping through pdb_col
    n_sdf = len(lig_col)
    for i,pdb_id in enumerate(pdb_col):
        # Define parameters for downloading
        asym_id = cha_col[i]
        auth_seq_id = num_col[i]
        curr_sdf = pdb_id+f"_{lig_col[i]}.sdf"
        #output_sdf = project_dir+"pdb/"+pdb_id+f"_{lig_col[i]}.sdf"
        output_sdf = project_dir+"pdb/"+curr_sdf
        url = f"https://models.rcsb.org/v1/{pdb_id}/ligand?auth_seq_id={auth_seq_id}&label_asym_id={asym_id}&encoding=sdf"

        # Download
        response = requests.get(url, allow_redirects=True)

        # Check response.status_code
        if response.status_code == 200:
            with open(output_sdf, 'wb') as f:
                f.write(response.content)

            # Call show_line() function
            c_line = f"Successfully downloaded ligand to {curr_sdf} "
            c_line += f"({i+1}/{n_sdf})"
            show_line(f"{c_line}")
        else:
            c_line = "Failed to download ligand. "
            c_line += f"Status code: {response.status_code}"
            show_line(f"{c_line}")
            c_line="Check if the PDB ID, asym_id, and auth_seq_id are correct."
            show_line(f"{c_line}")

    # Show message
    h1 = "#"
    print(f"{80*h1}")

################################################################################
# Define scrape_data_rcsb() function                                           #
################################################################################
def scrape_data_rcsb():
    """
    This function scrapes data from the RCSB page. It saves it to a file with
    the identification of the active ligand found in a given structure.
    """
    # Import section
    import requests
    from bs4 import BeautifulSoup

    # Call read_dictionary("/content/misc/par/par.in") function
    dict = read_dictionary("/content/misc/par/par.in")
    miscellaneous = dict.get("miscellaneous")
    project_dir = miscellaneous["project_dir"] # Project directory
    file4pdb_codes = project_dir+miscellaneous["file4pdb_codes"]

    # Call read_dictionary("/content/misc/par/lig.in") function
    dict = read_dictionary("/content/misc/par/lig.in")
    binfo = dict.get("binfo")
    ligand_datafile = binfo["ligand_datafile"]

    # Call read_pdb_codes_no_summary() function
    pdb_list =read_pdb_codes_no_summary(file4pdb_codes)

    # Set up header for binding_data
    binding_data = str(binfo["ligand_labels"]).replace("[","").\
     replace("]","").replace(" ","").replace("\"","").replace("\'","")+"\n"

    # Define auxiliary variables
    s1 = " "
    h1 = "#"

    # Call show_title() function
    show_title(31," PDB Data Summary ",31)

    # Get the number of PDBs
    n_pdb = len(pdb_list)

    # Looping through pdb_list
    for i,pdb in enumerate(pdb_list):
        # Show message
        c_line = f"Scraping data for structure {pdb} ({i+1}/{n_pdb})..."
        n_line = len(c_line)
        comp = f"{(72 - n_line)*s1}{h1}"
        print(f"{h1} {c_line}",end="")

        # Connect to the target URL
        url = 'https://www.rcsb.org/structure/'+str(pdb).strip().upper()
        page = requests.get(url)
        html_data = page.text

        # Parse the HTML content
        alternative_soup = BeautifulSoup(html_data, 'html.parser')

        # Assign None to label_asym_id
        label_asym_id = None

        # Try to scrap rcsb
        try:
            # Get a string from page
            results = alternative_soup.find_all(name="a")
            for line in results:
                str_line = str(line)
                if "encoding=sdf&amp;filename=" in str_line:
                    i_sdf = str_line.index(".sdf")
                    i_ref = str_line.index("encoding=sdf&amp;filename=")
                    l_str = len("encoding=sdf&amp;filename=")
                    sd_f = str_line[i_ref+l_str:i_sdf+4]
                    i_chain1 = sd_f.index("_")+1
                    i_chain2 = sd_f[i_chain1:].index("_")+i_chain1
                    i_lig_end = sd_f.index(".sdf")
                    i_lig_start = i_chain2 + 1
                    lig_id = sd_f[i_lig_start:i_lig_end]
                    i_number1 = str_line.index("ligand?auth_seq_id=")
                    i_number1 += len("ligand?auth_seq_id=")
                    i_number2 = str_line[i_number1:].index("&amp;")+i_number1
                    ligand_number = str_line[i_number1:i_number2]
                    label_asym_id = sd_f[i_chain1:i_chain2]
                    auth_seq_id = ligand_number
                    break

        except:
            # Call show_line() function
            c_line = f"I can't find ligand data for structure {pdb}"
            show_line(f"{c_line}")

            return

        # Update line
        binding_data += pdb+","+lig_id+","+label_asym_id
        binding_data += ","+auth_seq_id+"\n"
        print(f"done!{comp}")
    print(f"{80*h1}")

    # Open a new file and write content
    lig_file = project_dir+ligand_datafile
    fo_lig = open(lig_file,"w")
    fo_lig.write(binding_data)
    fo_lig.close()

    # Call ISO80000() function
    size_SI,size_IEC,unit_SI,unit_IEC = ISO80000(lig_file)

    # Call show_title() function
    show_title(29," Ligand Data Summary ",30)

    # Call show_line() function
    c_line = "Ligand data written to a "
    c_line += f"csv file: {ligand_datafile}."
    show_line(f"{c_line}")

    # Call show_line() function
    c_line = f"File size: {size_IEC:.2f} "
    c_line += f"{unit_IEC} {size_SI:.2f} {unit_SI}"
    show_line(f"{c_line}")

    print(f"{80*h1}")

################################################################################
# Define extract_pdb_coordinates() function                                    #
################################################################################
def extract_pdb_coordinates(receptor_xyz_scheme):
    """
    This function extracts coordinates from a downloaded PDB file. It intends to
    select target coordinates for docking simulations.
    """
    # Import section
    import pandas as pd

    # Call read_dictionary("/content/misc/par/par.in") function
    dict = read_dictionary("/content/misc/par/par.in")
    miscellaneous = dict.get("miscellaneous")
    project_dir = miscellaneous["project_dir"]
    pdb_in = project_dir+miscellaneous["file4pdb_codes"]
    curr_dir = project_dir+"pdb/"

    # Read a CSV file
    df = pd.read_csv(project_dir+"ligdata.csv")

    # Call read_pdb_codes() function
    pdbs = read_pdb_codes(pdb_in)

    # Call show_title() function
    show_title(31," PDB Data Summary ",31)

    # Loopings through pdbs
    n_pdb = len(pdbs)
    for i,pdb in enumerate(pdbs):
        # Define list
        receptor_out = []

        # Try to open a pdb file
        try:
            curr_file = curr_dir+"pdb"+pdb.lower()+".ent"
            fo_pdb = open(curr_file,"r")
            pdb_lines = fo_pdb.readlines()
            fo_pdb.close()

        except IOError:
            # Call show_line() function
            c_line = f"IOError! I can`t find {curr_file}"
            show_line(f"{c_line}")
            return

        # Select the scheme and write related content
        if receptor_xyz_scheme.lower() == "receptor":
            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+water":
            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] == "HOH":
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+cofactor":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] != "HOH"\
                    and str_line[17:20] != curr_lig:
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+cofactor+water":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] != curr_lig:
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+active":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] != "HOH"\
                    and str_line[17:20] == curr_lig:
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+active+water":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] == curr_lig:
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] == "HOH":
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+active+cofactor":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  ":
                    receptor_out.append(line)
                elif str_line[:6] == "HETATM" and str_line[17:20] != "HOH":
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        elif receptor_xyz_scheme.lower() == "receptor+active+cofactor+water":
            # Get index
            curr_i = df.index[df["PDB"] == pdb].tolist()
            lig_list = df["Ligand"].tolist()
            curr_lig = lig_list[curr_i[0]]

            # Looping through pdb_lines
            for line in pdb_lines:
                str_line = str(line)
                if str_line[:6] == "ATOM  " or str_line[:6] == "HETATM":
                    receptor_out.append(line)
                elif str_line[:3] == "TER" or str_line[:3] == "END":
                    receptor_out.append(line)

        # Write selected coordinates
        curr_pdb = pdb.upper()+".pdb"
        output_file = curr_dir+curr_pdb
        fo_out = open(output_file,"w")
        fo_out.writelines(receptor_out)
        fo_out.close()

        # Call show_line() function
        c_line = f"Selected coordinates written to {curr_pdb} "
        c_line += f"({i+1}/{n_pdb})"
        show_line(f"{c_line}")

    # Show message
    h1 = "#"
    print(f"{80*h1}")

################################################################################
# Define rcsb_pdb_download() function                                          #
################################################################################
def rcsb_download_pdb():
    """
    This function downloads a PDB file from the RCSB and saves it to a target
    directory.
    """
    # Import section
    import requests
    import pandas as pd
    import os

    # Call read_dictionary("/content/misc/par/par.in") function
    dict = read_dictionary("/content/misc/par/par.in")
    miscellaneous = dict.get("miscellaneous")
    target_dir = miscellaneous["project_dir"]+"pdb/"

    # Call read_dictionary("/content/misc/par/lig.in") function
    dict = read_dictionary("/content/misc/par/lig.in")
    binfo = dict.get("binfo")
    ligand_datafile = binfo["ligand_datafile"]

    # Call scrape_data_rcsb() function
    scrape_data_rcsb()

    # Call show_title() function
    show_title(29," PDB Download Summary ",29)

    # Check whether a directory exists
    if os.path.isdir(target_dir):
        # Call show_line() function
        c_line = f"The directory '{target_dir}' exists."
        show_line(f"{c_line}")

    else:
        # Call show_line() function
        c_line = f"The directory '{target_dir}' does not exist or is a file."
        show_line(f"{c_line}")

        # Try to make PDB directory
        try:
            os.mkdir(target_dir)
        except Exception as e:
            # Call show_line() function
            c_line = f"An error occurred: {e}"
            show_line(f"{c_line}")

    # Read the CSV file, loading only the specified columns
    df = pd.read_csv(miscellaneous["project_dir"]+ligand_datafile)
    pdb_col = df["PDB"]

    # Looping through pdb_col
    n_pdb = len(pdb_col)
    h1 = "#"
    for i,pdb_id in enumerate(pdb_col):
        # Define url and output_path
        url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
        #output_path = f"{target_dir}{pdb_id}.pdb"
        curr_pdb = "pdb"+pdb_id.lower()+".ent"
        output_path = target_dir+curr_pdb

        # Download
        response = requests.get(url)

        # Check response.status_code
        if response.status_code == 200:
            with open(output_path, 'wb') as f:
                f.write(response.content)

                # Call show_line() function
                c_line = f"Successfully downloaded to {curr_pdb} "
                c_line += f"({i+1}/{n_pdb})"
                show_line(f"{c_line}")

        else:
            # Call show_line() function
            c_line = "Failed to download structure: "
            c_line += f" Status code {response.status_code}"
            show_line(f"{c_line}")

    # Show message
    h1 = "#"
    print(f"{80*h1}")

################################################################################
# Define zip_content_folders() function                                        #
################################################################################
def zip_content_folders():
  """
  This function zips datasets folder in the content directory.
  """
  # Import section
  import os

  # Try to zip a folder
  try:
    !zip -r /content/datasets.zip /content/datasets
    print('Folder zipped successfully to /content/datasets.zip')

  except Exception as e:
    # Call show_line() function
    c_line = f"An error occurred: {e}"

  # Try to zip a folder
  try:
    !zip -r /content/misc.zip /content/misc
    print('Folder zipped successfully to /content/misc.zip')

  except Exception as e:
    # Call show_line() function
    c_line = f"An error occurred: {e}"

  # Optional: List the content of /content to show the zipped file
  print('\nContents of /content:')
  !ls -lh /content

################################################################################
# Define download_file_from_google_drive() function                            #
################################################################################
def download_file_from_google_drive(file_id, destination):
    """
    This function downloads a file from the google drive.

    Usage example:
    # https://drive.google.com/file/d/1nt_SoOYmm9uz8J8Bm7A4VKxe4woGXvpo/view?usp=drive_link
    file_id = "1nt_SoOYmm9uz8J8Bm7A4VKxe4woGXvpo"
    destination = "/misc/par/lig.in"
    download_file_from_google_drive(file_id, destination)

    """
    # Import section
    import requests

    # Define variables
    s1 = " "
    h1 = "#"

    # Show message
    print(f"{80*h1}")
    c_line = f"Downloading {destination}..."
    n_line = len(c_line)
    comp = f"{(72 - n_line)*s1}{h1}"
    print(f"{h1} {c_line}",end = "")

    # Define url
    URL = "https://docs.google.com/uc?export=download"

    # Download with requests
    session = requests.Session()
    response = session.get(URL, params={'id': file_id}, stream=True)
    token = get_confirm_token(response)

    # Check token
    if token:
        params = {'id': file_id, 'confirm': token}
        response = session.get(URL, params=params, stream=True)

    # Call save_response_content() function
    destination = "/content/"+destination
    save_response_content(response, destination)

    # Show message
    print(f"done!{comp}")
    print(f"{80*h1}")

################################################################################
# Define get_confirm_token() function                                          #
################################################################################
def get_confirm_token(response):
  """
  This function gets the confirmation token.
  """
  for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value
  return None

################################################################################
# Define save_response_content() function                                      #
################################################################################
def save_response_content(response, destination):
  """
  This function saves the response content.
  """
  CHUNK_SIZE = 32768
  with open(destination, "wb") as f:
    for chunk in response.iter_content(CHUNK_SIZE):
      if chunk: # filter out keep-alive new chunks
        f.write(chunk)

################################################################################
# Define unzip_a_folder() function                                             #
################################################################################
def unzip_a_folder(destination_path):
  """
  This function unzips a previously zipped folder.
  """
  # Import section
  import os

  # Try to unzip a folder
  try:
    curr_dir = "/content/"+destination_path.replace(".zip","")
    !unzip "/content/{destination_path}" -d "{curr_dir}"
    print(f'Folder unzipped successfully to {curr_dir}')

    # Attempt to remove the file
    file_path = "/content/"+destination_path
    os.remove(file_path)
    print(f"File '{file_path}' deleted successfully.")

  except Exception as e:
    # Call show_line() function
    c_line = f"An error occurred: {e}"
    print(f"{c_line}")

################################################################################
# Define make_a_dir() function                                                 #
################################################################################
def make_a_dir(dir_path):
  """
  This function makes a directory in the content folder.
  """
  # Import section
  import os

  # Check if the directory already exists to prevent errors
  if not os.path.exists(dir_path):
    # Create the directory
    os.makedirs(dir_path)
    print(f"Directory '{dir_path}' created.")
  else:
    print(f"Directory '{dir_path}' already exists.")

################################################################################
# Define main() function                                                       #
################################################################################
def main():
    # Import section
    import os
    # Define receptor_xyz_scheme
    #receptor_xyz_scheme = "receptor"
    #receptor_xyz_scheme = "receptor+water"
    #receptor_xyz_scheme = "receptor+cofactor"
    receptor_xyz_scheme = "receptor+cofactor+water"
    #receptor_xyz_scheme = "receptor+active"
    #receptor_xyz_scheme = "receptor+active+water"
    #receptor_xyz_scheme = "receptor+active+cofactor"
    #receptor_xyz_scheme = "receptor+active+cofactor+water"

    # Call make_a_dir() function
    make_a_dir("/content/datasets")
    make_a_dir("/content/datasets/Test")
    make_a_dir("/content/misc")
    make_a_dir("/content/misc/par")

    # Call download_file_from_google_drive() function
    file_id = "1nt_SoOYmm9uz8J8Bm7A4VKxe4woGXvpo"
    destination = "/misc/par/lig.in"
    download_file_from_google_drive(file_id, destination)

    # Call download_file_from_google_drive() function
    file_id = "1w14sifC2nrMZDInRKf6pqx3a2ihJL6dK"
    destination = "/misc/par/par.in"
    download_file_from_google_drive(file_id, destination)

    # Call download_file_from_google_drive() function
    file_id = "1haLRCndmq58KmKyWQLw8vsMnHimYih78"
    destination = "/datasets/Test/pdb_codes.in"
    download_file_from_google_drive(file_id, destination)

    # Call rcsb_download_pdb() function
    rcsb_download_pdb()

    # Call extract_pdb_coordinates()
    extract_pdb_coordinates(receptor_xyz_scheme)

    # Call rcsb_download_sdf() function
    rcsb_download_sdf()

    # Call zip_content_folders() function
    zip_content_folders()

# Call main() function
main()

Directory '/content/datasets' already exists.
Directory '/content/datasets/Test' already exists.
Directory '/content/misc' already exists.
Directory '/content/misc/par' already exists.
################################################################################
# Downloading /misc/par/lig.in...done!                                         #
################################################################################
################################################################################
# Downloading /misc/par/par.in...done!                                         #
################################################################################
################################################################################
# Downloading /datasets/Test/pdb_codes.in...done!                              #
################################################################################

Reading PDBs from CSV file: ./datasets/Test/pdb_codes.in...done!

###################