# 🧪 DeepChem Dataloaders Tutorial

This tutorial demonstrates how to use DeepChem with PyTorch-style `DataLoader`s.  
We'll load a molecular dataset, featurize it using circular fingerprints, and process it in batches for training using `torch.utils.data.DataLoader`.


In [None]:
import deepchem as dc
import pandas as pd
import numpy as np
import torch
from torch.utils.data import DataLoader


## 📥 Step 1: Load the Delaney Dataset

The Delaney dataset contains molecular solubility values. We'll use DeepChem's `load_delaney` utility.


In [None]:
# Load Delaney dataset with CircularFingerprint featurizer
featurizer = dc.feat.CircularFingerprint(radius=2, size=1024)
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer=featurizer, splitter='random')

train_dataset, valid_dataset, test_dataset = datasets
print("Number of training examples:", len(train_dataset))
print("Feature shape:", train_dataset.X.shape)


## 🧰 Step 2: Create a Custom Collate Function

We'll define a simple collate function that takes a batch of DeepChem datapoints and stacks them into PyTorch tensors.


In [None]:
def collate_fn(batch):
    X = torch.tensor([example[0] for example in batch], dtype=torch.float32)
    y = torch.tensor([example[1] for example in batch], dtype=torch.float32)
    w = torch.tensor([example[2] for example in batch], dtype=torch.float32)
    ids = [example[3] for example in batch]
    return X, y, w, ids


## 📦 Step 3: Load the Dataset in Batches

We'll now use PyTorch's `DataLoader` with our custom collate function to iterate over the `train_dataset` in batches.


In [None]:
# Loop directly through the generator returned by iterbatches
for X_batch, y_batch, w_batch, ids_batch in train_dataset.iterbatches(batch_size=4, deterministic=True):
    print("Batch X shape:", X_batch.shape)
    print("Batch y shape:", y_batch.shape)
    break  # just print one batch

## ✅ Tutorial Complete

You now know how to:
- Load and featurize a dataset using DeepChem
- Wrap the dataset in a `DataLoader`
- Use a custom collate function to work with batches

This setup is perfect for training PyTorch models on molecular datasets.
