# PaDEL - PaDEL-Descriptor for molecular fingerprints


## Introduction


Description of the files:
- Header: The first three lines are the header, containing the molecule name or ID , program information, and comments.
- Counts Line: Mentions the number of atoms, bonds, and other structural features.
- Atom Block: This is a list of all the atoms in the molecule. Each line represents one atom and specifies its properties in columns:
    - Atom serial number: The atom's x, y, and z coordinates.
    - The element symbol (e.g., C for carbon, O for oxygen).
- Bond Block: This block follows the atoms and defines how they are connected. Each line represents one bond:
    - Column 1: The index number of the first atom in the bond (from the Atom Block).
    - Column 2: The index number of the second atom.
    - Column 3: The bond type (e.g., 1 = single, 2 = double).

After the bond block, there are additional sections that gives more information about the molecule.

## Extraction of Zips files


In [None]:
import os
import gzip
import shutil

# Define paths
source_dir = 'datasets/SDFssmallset/'
dest_dir = 'datasets/SDFssmallset_extracted/'

# Create folder
os.makedirs(dest_dir, exist_ok=True)

# Loop through each file in the source directory
for filename in os.listdir(source_dir):
    # Check if the file is a .gz file
    if filename.endswith('.sdf.gz'):
        # Construct the full file paths
        source_path = os.path.join(source_dir, filename)
        # Create the new filename by removing .gz
        output_filename = filename[:-3] 
        dest_path = os.path.join(dest_dir, output_filename)

        print(f"Extracting {filename} to {dest_dir}...")

        # Open the compressed file and write the extracted content to the destination
        with gzip.open(source_path, 'rb') as f_in:
            with open(dest_path, 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)
    

print("\nExtraction complete!")

## Pre-Processing