Transform Jupyter Notebooks into Production-Ready ML Pipelines/Scripts
Notebooked is a Python CLI tool that converts tagged Jupyter notebooks into production-ready training, inference, and deployment scripts for AWS SageMaker, Azure ML, and Google Cloud Vertex AI.
- Multi-Cloud Support: Deploy to AWS SageMaker, Azure ML, or Google Cloud Vertex AI
- Automated Code Generation: Convert notebooks to
train.py,inference.py, andpreprocess.py - Intelligent Parsing: Extracts code using cell tags or auto-detection
- MLFlow Integration: Built-in experiment tracking and model versioning
- CI/CD Ready: Generate GitHub Actions workflows for automated deployment
- Provider Agnostic: Abstract interface for easy cloud provider switching
- Auto-Dependency Detection: Uses
pipreqsto generaterequirements.txt
graph TB
subgraph Input
NB[Jupyter Notebook<br/>with Tagged Cells]
end
subgraph "Notebooked CLI"
PARSE[Parser<br/>Extract Tagged Code]
GEN[Generator<br/>Create Scripts]
PROV[Provider<br/>Train/Deploy]
end
subgraph Output
TRAIN[train.py]
INFER[inference.py]
PREP[preprocess.py]
REQ[requirements.txt]
end
subgraph "Cloud Providers"
SM[AWS SageMaker]
AZ[Azure ML]
GCP[Vertex AI]
LOCAL[Local Execution]
end
NB --> PARSE
PARSE --> GEN
GEN --> TRAIN
GEN --> INFER
GEN --> PREP
GEN --> REQ
TRAIN --> PROV
PROV --> SM
PROV --> AZ
PROV --> GCP
PROV --> LOCAL
sequenceDiagram
participant User
participant CLI
participant Parser
participant Generator
participant Provider
participant Cloud
User->>CLI: notebooked convert my-experiment
CLI->>Parser: parse_notebook()
Parser->>Parser: extract_tagged_code()
Parser-->>CLI: ExtractedCode
CLI->>Generator: generate_all()
Generator->>Generator: create train.py
Generator->>Generator: create inference.py
Generator->>Generator: create preprocess.py
Generator-->>CLI: generated_files
CLI-->>User: ✓ Scripts generated
User->>CLI: notebooked train my-experiment --provider sagemaker
CLI->>Provider: train()
Provider->>Cloud: submit_training_job()
Cloud-->>Provider: job_status
Provider-->>User: ✓ Training complete
notebooked/
├── src/notebooked/
│ ├── core/
│ │ ├── parser.py # Notebook parsing & tag extraction
│ │ ├── generator.py # Script generation logic
│ │ ├── models.py # Pydantic config models
│ │ ├── config.py # Configuration management
│ │ └── workflow.py # CI/CD workflow generation
│ ├── providers/
│ │ ├── base.py # Abstract base provider
│ │ ├── local.py # Local execution provider
│ │ ├── sagemaker.py # AWS SageMaker provider
│ │ ├── azure_ml.py # Azure ML provider
│ │ └── vertex_ai.py # Google Cloud Vertex AI provider
│ └── cli.py # Click-based CLI
├── notebooks/ # Example notebooks
├── tests/ # Unit and integration tests
├── config.yaml # Project configuration
└── pyproject.toml # Project metadata & dependencies
# Clone the repository
git clone https://github.com/patrickprakash/notebooked.git
cd notebooked
# Install in editable mode
pip install -e .
# For Azure support
pip install -e .[azure]
# For GCP support
pip install -e .[gcp]
# For development
pip install -e .[dev]Add tags to your Jupyter notebook cells:
imports- Import statementspreprocess- Data preprocessing functionsmodel- Model architecture classestrain- Training functions and loopsinference- Inference/prediction functionsutils- Utility functionsrequirements- Manual requirements (optional)
How to add tags in Jupyter:
- Click on a code cell
- View → Cell Toolbar → Tags
- Add appropriate tag(s)
notebooked initThis creates a config.yaml:
mlflow:
tracking_uri: "http://localhost:5000"
experiment_name: "my-experiment"
aws:
region: "us-east-1"
role: "arn:aws:iam::123456789012:role/SageMakerRole"
azure:
subscription_id: "your-subscription-id"
resource_group: "your-resource-group"
workspace_name: "your-workspace"
gcp:
project_id: "your-project-id"
location: "us-central1"
staging_bucket: "gs://your-bucket"
experiments:
- name: "anomaly-detection"
notebook: "notebooks/anomaly_detection.ipynb"
data_path: "s3://my-bucket/data"
hyperparameters:
epochs: 10
learning_rate: 0.001notebooked convert anomaly-detectionGenerates:
generated/anomaly-detection/train.pygenerated/anomaly-detection/inference.pygenerated/anomaly-detection/preprocess.pygenerated/anomaly-detection/requirements.txt
Local:
notebooked train anomaly-detection --provider localAWS SageMaker:
notebooked train anomaly-detection --provider sagemakerAzure ML:
notebooked train anomaly-detection --provider azureGoogle Cloud Vertex AI:
notebooked train anomaly-detection --provider gcpnotebooked deploy anomaly-detection \
--model-uri s3://my-bucket/model.tar.gz \
--endpoint-name my-endpoint \
--provider sagemakernotebooked generate-workflow --provider sagemakerCreates .github/workflows/notebooked-pipeline.yml for automated training and deployment.
| Tag | Purpose | Required | Example |
|---|---|---|---|
imports |
Import statements | No* | import torch |
preprocess |
Data loading & preprocessing | No | def load_data(): |
model |
Model architecture classes | No | class MyModel(nn.Module): |
train |
Training loop & logic | Yes | def train_model(): |
inference |
Inference functions | Recommended | def predict(): |
utils |
Helper functions | No | def calculate_metric(): |
requirements |
Manual dependencies | No | torch>=2.0 |
* If not tagged, imports are auto-detected from the first few cells.
Each provider requires specific configuration in config.yaml:
region- AWS regionrole- IAM role ARN with SageMaker permissionsinstance_type- Training instance type (default: ml.m5.xlarge)
subscription_id- Azure subscription IDresource_group- Azure resource group nameworkspace_name- Azure ML workspace name
project_id- GCP project IDlocation- GCP region (e.g., us-central1)staging_bucket- GCS bucket for artifacts
# Initialize project
notebooked init
# Convert notebook to scripts
notebooked convert <experiment-name>
# Train model
notebooked train <experiment-name> [OPTIONS]
--provider [local|sagemaker|azure|gcp]
--wait/--no-wait
# Deploy model
notebooked deploy <experiment-name> [OPTIONS]
--model-uri <uri>
--endpoint-name <name>
--serverless # For serverless endpoints
--provider [local|sagemaker|azure|gcp]
# Generate CI/CD workflow
notebooked generate-workflow [OPTIONS]
--provider [local|sagemaker|azure|gcp]
--branch <branch-name>graph LR
A[Tag Notebook] --> B[notebooked convert]
B --> C[Review Generated Scripts]
C --> D[notebooked train]
D --> E{Training Success?}
E -->|Yes| F[notebooked deploy]
E -->|No| G[Debug & Iterate]
G --> A
F --> H[Endpoint Ready]
graph TB
A[Push to main] --> B[GitHub Actions Triggered]
B --> C[Install Dependencies]
C --> D[notebooked convert]
D --> E[notebooked train --provider sagemaker]
E --> F{Tests Pass?}
F -->|Yes| G[notebooked deploy]
F -->|No| H[Notify Team]
G --> I[Production Endpoint]
See CONTRIBUTING.md for detailed guidelines.
Quick Start:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
pytest tests/ - Run linting:
ruff check src/ - Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by MLOps best practices and the need for simpler notebook-to-production workflows
- Built with Click for CLI
- Uses Pydantic for configuration validation
- Powered by MLflow for experiment tracking
- Uses a cool project called PipReqs to generate the requirements.txt file
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: patrickprakash8@gmail.com
Made with ❤️ for the ML community
