📓 Notebooked

Transform Jupyter Notebooks into Production-Ready ML Pipelines/Scripts

Notebooked is a Python CLI tool that converts tagged Jupyter notebooks into production-ready training, inference, and deployment scripts for AWS SageMaker, Azure ML, and Google Cloud Vertex AI.

🌟 Features

Multi-Cloud Support: Deploy to AWS SageMaker, Azure ML, or Google Cloud Vertex AI
Automated Code Generation: Convert notebooks to train.py, inference.py, and preprocess.py
Intelligent Parsing: Extracts code using cell tags or auto-detection
MLFlow Integration: Built-in experiment tracking and model versioning
CI/CD Ready: Generate GitHub Actions workflows for automated deployment
Provider Agnostic: Abstract interface for easy cloud provider switching
Auto-Dependency Detection: Uses pipreqs to generate requirements.txt

🏗️ Architecture

High-Level Overview

graph TB
    subgraph Input
        NB[Jupyter Notebook<br/>with Tagged Cells]
    end
    
    subgraph "Notebooked CLI"
        PARSE[Parser<br/>Extract Tagged Code]
        GEN[Generator<br/>Create Scripts]
        PROV[Provider<br/>Train/Deploy]
    end
    
    subgraph Output
        TRAIN[train.py]
        INFER[inference.py]
        PREP[preprocess.py]
        REQ[requirements.txt]
    end
    
    subgraph "Cloud Providers"
        SM[AWS SageMaker]
        AZ[Azure ML]
        GCP[Vertex AI]
        LOCAL[Local Execution]
    end
    
    NB --> PARSE
    PARSE --> GEN
    GEN --> TRAIN
    GEN --> INFER
    GEN --> PREP
    GEN --> REQ
    
    TRAIN --> PROV
    PROV --> SM
    PROV --> AZ
    PROV --> GCP
    PROV --> LOCAL

Code Flow

sequenceDiagram
    participant User
    participant CLI
    participant Parser
    participant Generator
    participant Provider
    participant Cloud

    User->>CLI: notebooked convert my-experiment
    CLI->>Parser: parse_notebook()
    Parser->>Parser: extract_tagged_code()
    Parser-->>CLI: ExtractedCode
    CLI->>Generator: generate_all()
    Generator->>Generator: create train.py
    Generator->>Generator: create inference.py
    Generator->>Generator: create preprocess.py
    Generator-->>CLI: generated_files
    CLI-->>User: ✓ Scripts generated
    
    User->>CLI: notebooked train my-experiment --provider sagemaker
    CLI->>Provider: train()
    Provider->>Cloud: submit_training_job()
    Cloud-->>Provider: job_status
    Provider-->>User: ✓ Training complete

Directory Structure

notebooked/
├── src/notebooked/
│   ├── core/
│   │   ├── parser.py          # Notebook parsing & tag extraction
│   │   ├── generator.py       # Script generation logic
│   │   ├── models.py          # Pydantic config models
│   │   ├── config.py          # Configuration management
│   │   └── workflow.py        # CI/CD workflow generation
│   ├── providers/
│   │   ├── base.py            # Abstract base provider
│   │   ├── local.py           # Local execution provider
│   │   ├── sagemaker.py       # AWS SageMaker provider
│   │   ├── azure_ml.py        # Azure ML provider
│   │   └── vertex_ai.py       # Google Cloud Vertex AI provider
│   └── cli.py                 # Click-based CLI
├── notebooks/                 # Example notebooks
├── tests/                     # Unit and integration tests
├── config.yaml                # Project configuration
└── pyproject.toml             # Project metadata & dependencies

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/patrickprakash/notebooked.git
cd notebooked

# Install in editable mode
pip install -e .

# For Azure support
pip install -e .[azure]

# For GCP support
pip install -e .[gcp]

# For development
pip install -e .[dev]

Basic Usage

1. Tag Your Notebook Cells

Add tags to your Jupyter notebook cells:

imports - Import statements
preprocess - Data preprocessing functions
model - Model architecture classes
train - Training functions and loops
inference - Inference/prediction functions
utils - Utility functions
requirements - Manual requirements (optional)

How to add tags in Jupyter:

Click on a code cell
View → Cell Toolbar → Tags
Add appropriate tag(s)

2. Initialize Configuration

notebooked init

This creates a config.yaml:

mlflow:
  tracking_uri: "http://localhost:5000"
  experiment_name: "my-experiment"

aws:
  region: "us-east-1"
  role: "arn:aws:iam::123456789012:role/SageMakerRole"

azure:
  subscription_id: "your-subscription-id"
  resource_group: "your-resource-group"
  workspace_name: "your-workspace"

gcp:
  project_id: "your-project-id"
  location: "us-central1"
  staging_bucket: "gs://your-bucket"

experiments:
  - name: "anomaly-detection"
    notebook: "notebooks/anomaly_detection.ipynb"
    data_path: "s3://my-bucket/data"
    hyperparameters:
      epochs: 10
      learning_rate: 0.001

3. Convert Notebook to Scripts

notebooked convert anomaly-detection

Generates:

generated/anomaly-detection/train.py
generated/anomaly-detection/inference.py
generated/anomaly-detection/preprocess.py
generated/anomaly-detection/requirements.txt

4. Train Your Model

Local:

notebooked train anomaly-detection --provider local

AWS SageMaker:

notebooked train anomaly-detection --provider sagemaker

Azure ML:

notebooked train anomaly-detection --provider azure

Google Cloud Vertex AI:

notebooked train anomaly-detection --provider gcp

5. Deploy Model

notebooked deploy anomaly-detection \
  --model-uri s3://my-bucket/model.tar.gz \
  --endpoint-name my-endpoint \
  --provider sagemaker

6. Generate CI/CD Workflow

notebooked generate-workflow --provider sagemaker

Creates .github/workflows/notebooked-pipeline.yml for automated training and deployment.

📚 Detailed Documentation

Cell Tags Reference

Tag	Purpose	Required	Example
`imports`	Import statements	No*	`import torch`
`preprocess`	Data loading & preprocessing	No	`def load_data():`
`model`	Model architecture classes	No	`class MyModel(nn.Module):`
`train`	Training loop & logic	Yes	`def train_model():`
`inference`	Inference functions	Recommended	`def predict():`
`utils`	Helper functions	No	`def calculate_metric():`
`requirements`	Manual dependencies	No	`torch>=2.0`

* If not tagged, imports are auto-detected from the first few cells.

Provider Configuration

Each provider requires specific configuration in config.yaml:

AWS SageMaker

region - AWS region
role - IAM role ARN with SageMaker permissions
instance_type - Training instance type (default: ml.m5.xlarge)

Azure ML

subscription_id - Azure subscription ID
resource_group - Azure resource group name
workspace_name - Azure ML workspace name

Google Cloud Vertex AI

project_id - GCP project ID
location - GCP region (e.g., us-central1)
staging_bucket - GCS bucket for artifacts

CLI Commands

# Initialize project
notebooked init

# Convert notebook to scripts
notebooked convert <experiment-name>

# Train model
notebooked train <experiment-name> [OPTIONS]
  --provider [local|sagemaker|azure|gcp]
  --wait/--no-wait

# Deploy model
notebooked deploy <experiment-name> [OPTIONS]
  --model-uri <uri>
  --endpoint-name <name>
  --serverless  # For serverless endpoints
  --provider [local|sagemaker|azure|gcp]

# Generate CI/CD workflow
notebooked generate-workflow [OPTIONS]
  --provider [local|sagemaker|azure|gcp]
  --branch <branch-name>

🔄 Workflow Examples

End-to-End ML Pipeline

graph LR
    A[Tag Notebook] --> B[notebooked convert]
    B --> C[Review Generated Scripts]
    C --> D[notebooked train]
    D --> E{Training Success?}
    E -->|Yes| F[notebooked deploy]
    E -->|No| G[Debug & Iterate]
    G --> A
    F --> H[Endpoint Ready]

CI/CD Integration

graph TB
    A[Push to main] --> B[GitHub Actions Triggered]
    B --> C[Install Dependencies]
    C --> D[notebooked convert]
    D --> E[notebooked train --provider sagemaker]
    E --> F{Tests Pass?}
    F -->|Yes| G[notebooked deploy]
    F -->|No| H[Notify Team]
    G --> I[Production Endpoint]

🤝 Contributing

See CONTRIBUTING.md for detailed guidelines.

Quick Start:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Run tests: pytest tests/
Run linting: ruff check src/
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by MLOps best practices and the need for simpler notebook-to-production workflows
Built with Click for CLI
Uses Pydantic for configuration validation
Powered by MLflow for experiment tracking
Uses a cool project called PipReqs to generate the requirements.txt file

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: patrickprakash8@gmail.com

Made with ❤️ for the ML community

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
src/notebooked		src/notebooked
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml

PatrickPrakash/notebooked

Folders and files

Latest commit

History

Repository files navigation

📓 Notebooked

🌟 Features

🏗️ Architecture

High-Level Overview

Code Flow

Directory Structure

🚀 Quick Start

Installation

Basic Usage

1. Tag Your Notebook Cells

2. Initialize Configuration

3. Convert Notebook to Scripts

4. Train Your Model

5. Deploy Model

6. Generate CI/CD Workflow

📚 Detailed Documentation

Cell Tags Reference

Provider Configuration

AWS SageMaker

Azure ML

Google Cloud Vertex AI

CLI Commands

🔄 Workflow Examples

End-to-End ML Pipeline

CI/CD Integration

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages