# Utility Functions for Deep Learning Workflows

This notebook contains utility functions and examples for common deep learning tasks in PyTorch. It demonstrates:

# Table of Contents
1. [Kernel Connection Information](#kernel-connection-information)
2. [Module Auto-reloading Setup](#module-auto-reloading-setup)  
3. [Example Usage: Data Loading and Preprocessing](#example-usage-data-loading-and-preprocessing)


The following cells provide practical examples of these utilities in action.


In [1]:
import os
import sys
import pathlib

"""Only needed for this util notebook cuz it's in the utils folder"""
"""Add the project root directory to Python path."""
# Get the absolute path of this file's directory
# In Jupyter notebooks, __file__ is not defined
# Use notebook's directory instead
try:
    # Try to get the notebook's directory
    current_dir = os.getcwd()
except:
    # Fallback if that doesn't work
    current_dir = os.path.abspath('.')
# Get the parent directory (project root)
project_root = os.path.dirname(current_dir)
# Add to Python path if not already there
if project_root not in sys.path:
    sys.path.append(project_root)
    print(f"Added {project_root} to Python path")

Added d:\Development\Python\PyTorch to Python path


## Kernel Connection Information

The following code cell retrieves and displays information about the current Jupyter kernel connection. It:

1. Accesses the IPython kernel configuration
2. Reads the connection file to extract port information
3. Prints the port number where the kernel is running

This is useful for debugging connection issues or when you need to connect to the kernel programmatically.


In [2]:
import json
from IPython.core.getipython import get_ipython

ipython = get_ipython()
if ipython is not None and hasattr(ipython.config, 'IPKernelApp'):
    connection_file = ipython.config['IPKernelApp']['connection_file']
    with open(connection_file) as f:
        config = json.load(f)
    print(f"Kernel is running on port: {config['shell_port']}")
else:
    print("Not running in an IPython kernel or kernel information not available")

Kernel is running on port: 57954


## Module Auto-reloading Setup

The following code cell demonstrates how to set up module auto-reloading in Jupyter. It:

1. Imports necessary modules from the utils package
2. Enables the autoreload extension with `%load_ext autoreload`
3. Configures it to automatically reload all modules before executing code with `%autoreload 2`

This is particularly useful during development when you're frequently making changes to imported modules and want those changes to be reflected without restarting the kernel.



In [2]:
from ml_utils import get_dataloaders, torch, TensorDataset

# Enable autoreload extension
%load_ext autoreload
%autoreload 2

# This will automatically reload modules before executing code
# Useful during development when you're frequently changing code in imported modules
print("Autoreload enabled: Changes to imported modules will be reloaded automatically")


Autoreload enabled: Changes to imported modules will be reloaded automatically


## Example Usage: Data Loading and Preprocessing

This example demonstrates how to use the `get_preprocessed_dataloaders` function to create data loaders with automatic preprocessing. It shows:

- Creating synthetic training and validation datasets
- Defining a preprocessing function for data normalization 
- Setting up DataLoaders with automatic preprocessing
- Inspecting the resulting batch sizes and data shapes

The code illustrates proper usage of the data loading utilities with preprocessing capabilities.




In [7]:
"""
Example script demonstrating how to use get_dataloaders properly
"""


# Create some sample data
# For example, 100 samples with 5 features
from ml_utils import get_preprocessed_dataloaders, torch, TensorDataset

x_train = torch.randn(100, 5)
y_train = torch.randint(0, 2, (100,))

x_valid = torch.randn(20, 5)
y_valid = torch.randint(0, 2, (20,))

# Create TensorDataset objects
train_ds = TensorDataset(x_train, y_train)
valid_ds = TensorDataset(x_valid, y_valid)
# Define a preprocessing function to normalize the data
def preprocess(x, y):
    # Normalize the input data to have zero mean and unit variance
    mean = x.mean(0, keepdim=True)
    std = x.std(0, keepdim=True) + 1e-7  # Add small epsilon to avoid division by zero
    x_normalized = (x - mean) / std
    return x_normalized, y

# Create preprocessed dataloaders with batch size of 32
train_loader, valid_loader = get_preprocessed_dataloaders(
    train_ds=train_ds,
    valid_ds=valid_ds,
    bs=32,
    preprocess=preprocess
)


print(f"Number of batches in training loader: {len(train_loader)}")
print(f"Number of batches in validation loader: {len(valid_loader)}")

# Get one batch from the training loader
for X_batch, y_batch in train_loader:
    print(f"Batch shape: X={X_batch.shape}, y={y_batch.shape}")
    break 

Number of batches in training loader: 4
Number of batches in validation loader: 1
Batch shape: X=torch.Size([32, 5]), y=torch.Size([32])


In [8]:
from ml_utils import (
    torch, 
    torchvision, 
    transforms,     
    get_tensorboard_writer,
    add_images_to_tensorboard,
    matplotlib_imshow,
    log_model_graph,
    log_scalars,
    close_writer
)

# Sample model for demonstration
class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.pool = torch.nn.MaxPool2d(2, 2)
        self.conv2 = torch.nn.Conv2d(6, 16, 5)
        self.fc1 = torch.nn.Linear(16 * 4 * 4, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.nn.functional.relu(self.conv1(x)))
        x = self.pool(torch.nn.functional.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = torch.nn.functional.relu(self.fc1(x))
        x = torch.nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x


# Create a TensorBoard writer
writer = get_tensorboard_writer('runs/visualization_example')

# Load some example data (Fashion MNIST)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])
    
    # Download training data
training_set = torchvision.datasets.FashionMNIST(
    '../data', download=True, train=True, transform=transform
)
    
# Create data loader
training_loader = torch.utils.data.DataLoader(
    training_set, batch_size=4, shuffle=True, num_workers=2
)
    
# Get a batch of training data
dataiter = iter(training_loader)
images, labels = next(dataiter)

# Add images to TensorBoard
add_images_to_tensorboard(writer, images, 'Fashion-MNIST Samples')
    
# Initialize model
model = SimpleModel()
    
    # Log model graph
log_model_graph(writer, model, images)
    
# Log some dummy training metrics
for epoch in range(5):
    # Simulate training and validation losses
    train_loss = 1.0 / (epoch + 1)
    val_loss = 1.2 / (epoch + 1)
    
    # Log metrics
    log_scalars(
        writer, 
        'Training vs. Validation Loss',
        {'Training': train_loss, 'Validation': val_loss},
        epoch
    )

# Close the writer
close_writer(writer)

print("TensorBoard data has been logged to 'runs/visualization_example'")
print("To view it, run: tensorboard --logdir=runs")
print("Then open http://localhost:6006/ in your browser")



 42%|████▏     | 11.2M/26.4M [00:46<01:02, 243kB/s]


KeyboardInterrupt: 