<a href="https://colab.research.google.com/github/Nekostudy88/bio-chem-cv-projects/blob/main/notebooks/3d_voxel_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Wed Feb 18 22:05:50 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   53C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

In [1]:
!pip install --pre deepchem
import deepchem as dc
import rdkit
print(f"DeepChem version: {dc.__version__}")



  | |_| | '_ \/ _` / _` |  _/ -_)
Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead


DeepChem version: 2.8.1.dev


In [2]:
from deepchem.feat import RdkitGridFeaturizer

# 1. Initialize the Featurizer
# This creates a 3D box of 16x16x16 voxels around the binding site
featurizer = RdkitGridFeaturizer(
    voxel_width=1.0,
    box_width=16.0,
    feature_types=['element', 'hbond', 'aromatic']
)

# 2. Get sample data (PDBbind fragment)
# In a real lab, these would be your .pdb and .sdf files
tasks, datasets, transformers = dc.molnet.load_pdbbind(
    featurizer=featurizer,
    splitter='random',
    subset='mini'
)

train_dataset, valid_dataset, test_dataset = datasets

# 3. Inspect the "3D Image"
# The shape will be (Batch, X, Y, Z, Features)
X_sample = train_dataset.X[0]
print(f"3D Voxel Grid Shape: {X_sample.shape}")



3D Voxel Grid Shape: (1,)


In [5]:
# 1. First, let's see what is actually inside X_sample
print(f"Type: {type(train_dataset.X)}")
print(f"Shape of the whole X array: {train_dataset.X.shape}")

# 2. Check if the first entry is actually an object/array
if len(train_dataset.X.shape) > 0:
    X_sample = train_dataset.X[0]

    # DeepChem sometimes stores these as 'object' arrays
    # If it's an object, we need to make sure it's the right shape
    print(f"Actual shape of X_sample: {X_sample.shape}")

    if len(X_sample.shape) == 4:
        import matplotlib.pyplot as plt
        # Try plotting again now that we've verified the dimensions
        plt.imshow(X_sample[8, :, :, 0], cmap='viridis')
        plt.title("Middle Slice of Protein-Ligand Voxel Grid")
        plt.colorbar(label="Atomic Density")
        plt.show()
    else:
        print("Error: X_sample is not 4D. It might have failed to featurize.")

Type: <class 'numpy.ndarray'>
Shape of the whole X array: (154, 1)
Actual shape of X_sample: (1,)
Error: X_sample is not 4D. It might have failed to featurize.


In [6]:
import numpy as np

# 1. Look through the dataset for the first entry that isn't a flat 0 or 1
valid_index = -1
for i in range(len(train_dataset.X)):
    # Check if the entry is an array with 4 dimensions (X, Y, Z, Channels)
    if isinstance(train_dataset.X[i], np.ndarray) and len(train_dataset.X[i].shape) == 4:
        valid_index = i
        break

if valid_index != -1:
    X_sample = train_dataset.X[valid_index]
    print(f"Success! Found a valid 4D voxel at index {valid_index}")
    print(f"Voxel Shape: {X_sample.shape}")

    # Now plot the successful one
    import matplotlib.pyplot as plt
    plt.imshow(X_sample[8, :, :, 0], cmap='magma') # Using magma for better contrast
    plt.title(f"Protein-Ligand Voxel (Index {valid_index})")
    plt.show()
else:
    print("Still no 4D data. Let's try loading more data.")

Still no 4D data. Let's try loading more data.
