Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.DS_Store

136 changes: 136 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**cube_dbt** is a Python package that converts dbt models and columns into Cube semantic layer definitions. It parses dbt manifest files and provides Jinja-compatible YAML output for integrating data models with Cube's semantic layer.

## Common Development Commands

```bash
# Testing
pdm run test # Run all tests (34 unit tests)
pytest tests/ -v # Run tests with verbose output
pytest tests/test_dbt.py # Run specific test file
pytest -k "test_model" # Run tests matching pattern

# Development Setup
pdm install # Install project with dev dependencies
pdm install --prod # Install production dependencies only
pdm lock # Update pdm.lock file
pdm update # Update all dependencies

# Building & Publishing
pdm build # Build distribution packages
pdm publish # Publish to PyPI (requires credentials)

# Development Workflow
pdm run python -m cube_dbt # Run the module directly
python -c "from cube_dbt import Dbt; print(Dbt.version())" # Check version
```

## High-Level Architecture

The package consists of 4 core classes that work together:

### Core Classes

**Dbt (src/cube_dbt/dbt.py)**
- Entry point for loading dbt manifest files
- Supports file paths and URLs via `from_file()` and `from_url()` class methods
- Implements chainable filtering API: `filter(paths=[], tags=[], names=[])`
- Lazy initialization - models are only loaded when accessed
- Handles manifest v1-v12 formats

**Model (src/cube_dbt/model.py)**
- Represents a single dbt model from the manifest
- Key method: `as_cube()` - exports model as Cube-compatible YAML
- Supports multiple primary keys via column tags
- Provides access to columns, description, database, schema, and alias
- Handles special characters in model names (spaces, dots, dashes)

**Column (src/cube_dbt/column.py)**
- Represents dbt columns with comprehensive type mapping
- Maps 130+ database-specific types to 5 Cube dimension types:
- string, number, time, boolean, geo
- Database support: BigQuery, Snowflake, Redshift, generic SQL
- Primary key detection via `primary_key` tag in column metadata
- Raises RuntimeError for unknown column types (fail-fast approach)

**Dump (src/cube_dbt/dump.py)**
- Custom YAML serialization utilities
- Returns Jinja SafeString for template compatibility
- Handles proper indentation for nested structures
- Used internally by Model.as_cube() for output formatting

### Key Design Patterns

1. **Lazy Loading**: Models are loaded only when first accessed via `dbt.models` property
2. **Builder Pattern**: Filter methods return self for chaining: `dbt.filter(tags=['tag1']).filter(paths=['path1'])`
3. **Factory Methods**: `Dbt.from_file()` and `Dbt.from_url()` for different data sources
4. **Type Mapping Strategy**: Centralized database type to Cube type conversion in Column class

### Data Flow

```
manifest.json → Dbt.from_file() → filter() → models → Model.as_cube() → YAML output
columns → Column.dimension_type()
```

## Testing Structure

Tests use a real dbt manifest fixture (tests/manifest.json, ~397KB) with example models:

- **test_dbt.py**: Tests manifest loading, filtering by paths/tags/names, version checking
- **test_model.py**: Tests YAML export, primary key handling, special character escaping
- **test_column.py**: Tests type mapping for different databases, primary key detection
- **test_dump.py**: Tests YAML formatting and Jinja compatibility

Run specific test scenarios:
```bash
pytest tests/test_column.py::TestColumn::test_bigquery_types -v
pytest tests/test_model.py::TestModel::test_multiple_primary_keys -v
```

## Important Implementation Details

### Primary Key Configuration
Primary keys are defined using tags in dbt column metadata:
```yaml
# In dbt schema.yml
columns:
- name: id
meta:
tags: ['primary_key']
```

### Type Mapping Behavior
- Unknown types raise RuntimeError immediately (fail-fast)
- Database-specific types are checked first, then generic SQL types
- Default mappings can be found in `src/cube_dbt/column.py` TYPE_MAP dictionaries

### Jinja Template Integration
All output from `as_cube()` is wrapped in Jinja SafeString to prevent double-escaping in templates. Use the `safe` filter if needed in templates.

### URL Loading Authentication
When using `Dbt.from_url()`, basic authentication is supported:
```python
dbt = Dbt.from_url("https://user:pass@example.com/manifest.json")
```

## Recent Changes (from git history)

- Multiple primary key support (#15)
- Documentation of package properties (#14)
- Extended dbt contract data type support (#10)
- Jinja escaping protection for as_cube() (#2)

## Package Metadata

- **Version**: Defined in `src/cube_dbt/__init__.py`
- **Python Requirement**: >= 3.8
- **Production Dependency**: PyYAML >= 6.0.1
- **License**: MIT
- **Build System**: PDM with PEP 517/518 compliance
90 changes: 90 additions & 0 deletions QUICK_REFERENCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# cube_dbt Quick Reference

## What is cube_dbt?
A Python package that converts dbt models and columns into Cube semantic layer definitions. It parses dbt manifests and provides Jinja-compatible YAML output.

## Install & Run Tests
```bash
pdm install # Set up environment
pdm run test # Run all tests
```

## Basic Usage
```python
from cube_dbt import Dbt

# Load and filter
dbt = Dbt.from_file('manifest.json').filter(
paths=['marts/'],
tags=['cube'],
names=['model_name']
)

# Access models
model = dbt.model('my_model')
print(model.name)
print(model.sql_table)
print(model.columns)

# Export to Cube (YAML)
print(model.as_cube())
print(model.as_dimensions())
```

## Project Structure
```
src/cube_dbt/
├── dbt.py - Dbt class (manifest loading & filtering)
├── model.py - Model class (cube export)
├── column.py - Column class (type mapping)
├── dump.py - YAML utilities (Jinja-safe)
└── __init__.py - Public exports

tests/ - 34 unit tests, all passing
```

## Key Classes

### Dbt
- `from_file(path)` - Load from JSON
- `from_url(url)` - Load from remote URL
- `filter(paths=[], tags=[], names=[])` - Chainable filtering
- `.models` - Get all filtered models
- `.model(name)` - Get single model

### Model
- `.name`, `.description`, `.sql_table` - Properties
- `.columns` - List of Column objects
- `.primary_key` - List of primary key columns
- `.as_cube()` - Export as Cube definition (YAML)
- `.as_dimensions()` - Export dimensions (YAML)

### Column
- `.name`, `.description`, `.type`, `.meta` - Properties
- `.primary_key` - Boolean
- `.as_dimension()` - Export dimension (YAML)

Type mapping: BigQuery, Snowflake, Redshift → Cube types (number, string, time, boolean, geo)

## Dependencies
- Production: PyYAML >= 6.0.1, orjson >= 3.10.15
- Note: orjson is used for fast JSON parsing. If unavailable, the package may fall back to standard libraries.
- Development: pytest >= 7.4.2
- Python: >= 3.8

## Common Tasks
| Task | Command |
|------|---------|
| Run tests | `pdm run test` |
| Run specific test | `pytest tests/test_dbt.py -v` |
| Install deps | `pdm install` |
| Lock deps | `pdm lock` |
| Build package | `pdm build` |

## Recent Changes
- v0.6.2: Multiple primary keys support
- Type support for dbt contracts
- Jinja template safe rendering

## Publishing
GitHub Actions auto-publishes to PyPI on release.
Loading