# PCAM Extraction Notebook - Quick Start Guide

This Colab notebook is designed to extract the preprocessed PCAM dataset after it has been zipped (from Preprocessing_PCAM.ipynb) and uploaded to Google Drive. The notebook first mounts your Google Drive to access files, then unzips the dataset to either local Colab storage (fast but temporary) or back to Google Drive (slower but persistent).

To use this notebook with any ZIP file, simply modify the `zip_path` variable to point to your ZIP file location in Google Drive, and adjust the `extract_path` to your desired extraction location. The `EXTRACT_TO_LOCAL` flag controls whether data is extracted to temporary Colab storage (deleted after session) or to Google Drive (permanent)—set it to `False` for Drive extraction or `True` for faster local extraction.

This workflow is particularly useful when you've preprocessed a large dataset locally (using the Preprocessing_PCAM script), zipped the results to save space, uploaded to Google Drive for cloud access, and now need to extract it in Colab for model training or analysis.

In [None]:
# Import Necessary Libraries
# Import the drive module from google.colab to access Drive mounting functions
from google.colab import drive
# Import os module for operating system operations 
import os
# Improt Path class from pathlib for object-originted file path management
from pathlib import Path
# Import zipfile module for working with ZIP archives
import zipfile
# Import os module for file system operations
import os

# Mount Google Drive to the Colab environment at /content/drive/
drive.mount('/content/drive')

print("✓ Google Drive mounted successfully!") # Print success message
print(f"Available at: /content/drive/MyDrive/") # Print the path where Google Drive is accessible in Colab (usually this by default)

Mounted at /content/drive
✓ Google Drive mounted successfully!
Available at: /content/drive/MyDrive/


In [None]:
# Define the full path to the ZIP file containing the PCAM dataset in Google Drive
### YOU WILL NEED TO CHANGE THIS ###
zip_path = "/content/drive/MyDrive/Sekeh_Lab/Sara_Project/PCam_Extracted_100k.zip"

# Choose where to extract: local Colab storage (fast; but deleted after session) or Drive (persistent)
EXTRACT_TO_LOCAL = False  # Set to False to extract to Google Drive

# Check if user wants to extract to local Colab storage
if EXTRACT_TO_LOCAL:
    # Set extraction path to local Colab temporary storage
    extract_path = "/content/pcam_data"
    # Inform user about extracting to local storage
    print("Extracting to LOCAL Colab storage (faster, temporary)")
    # Warn user that local storage is deleted when Colab session ends
    print("⚠️  WARNING: Data will be deleted when session ends")
else: # User wants to extract to Google Drive (persistent storage)
    # Set extraction path to Google Drive location (slower but persistent)
    extract_path = "/content/drive/MyDrive/Sekeh_Lab/Sara_Project/Datasets/PCAM"
    # Create the extraction directory and all parent directories if they don't exist
    os.makedirs(extract_path, exist_ok=True)

# Open the ZIP files in read mode as a context manager
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    # Extract all contents of the ZIP file to the specified extraction path
    zip_ref.extractall(extract_path)

# Pritn confirmation message showing where files were extracted
print(f"Extraction completed to: {extract_path}")

Extraction completed to: /content/drive/MyDrive/Sekeh_Lab/Sara_Project/Datasets/PCAM
