# pyhdb-rs Quick Start Guide

High-performance Python driver for SAP HANA with native Arrow support.

## Key Features

- **Zero-copy Arrow transfer** via PyCapsule interface
- **Native Polars/pandas integration** without serialization overhead
- **2x+ faster** than hdbcli for bulk reads
- **DB-API 2.0 compliant** (PEP 249)

## Installation

```bash
uv pip install pyhdb_rs polars pyarrow
```

In [None]:
import os

from pyhdb_rs import connect

# Connection URL (set via environment variable for security)
HANA_URL = os.environ.get("HANA_TEST_URI", "hdbsql://user:password@host:39017")

## Basic Connection

pyhdb-rs follows the DB-API 2.0 standard, so it works like any Python database driver.

In [None]:
# Context manager ensures proper cleanup
with connect(HANA_URL) as conn, conn.cursor() as cursor:
    cursor.execute("SELECT 'Hello from HANA!' AS greeting")
    row = cursor.fetchone()
    print(row[0])

## Direct Polars Integration

The `execute_polars()` method returns a Polars DataFrame with **zero-copy** data transfer.
Data flows directly from HANA → Rust → Arrow → Polars without Python object creation.

In [None]:
with connect(HANA_URL) as conn, conn.cursor() as cursor:
    # Returns pl.DataFrame directly - no intermediate conversion!
    df = cursor.execute_polars("""
            SELECT 
                PRODUCT_ID,
                PRODUCT_NAME,
                CATEGORY,
                PRICE,
                STOCK_QUANTITY
            FROM PRODUCTS
            WHERE PRICE > 100
        """)

    print(f"Loaded {len(df):,} rows")
    print(df.head())

## Arrow Integration

For maximum flexibility, use `execute_arrow()` to get a PyArrow Table.
This can be converted to pandas, Polars, DuckDB, or any Arrow-compatible tool.

In [None]:
with connect(HANA_URL) as conn, conn.cursor() as cursor:
    # Get PyArrow Table
    arrow_table = cursor.execute_arrow("""
            SELECT * FROM SALES_DATA
            WHERE SALE_DATE >= '2024-01-01'
        """)

    print(f"Schema: {arrow_table.schema}")
    print(f"Rows: {arrow_table.num_rows:,}")
    print(f"Memory: {arrow_table.nbytes / 1024 / 1024:.2f} MB")

        # Convert to pandas if needed
        # pandas_df = arrow_table.to_pandas()

        # Or to Polars
        # polars_df = pl.from_arrow(arrow_table)

## Arrow PyCapsule Interface

The `RecordBatchReader` implements the Arrow PyCapsule Interface (`__arrow_c_stream__` protocol),
enabling zero-copy data transfer to any Arrow-compatible library.

This protocol is automatically used by Polars, PyArrow, pandas, and other libraries when you pass
the reader directly.

In [None]:
import pyarrow as pa

with connect(HANA_URL) as conn, conn.cursor() as cursor:
    # execute_arrow() returns a RecordBatchReader
    reader = cursor.execute_arrow("SELECT * FROM SALES")

    # PyArrow uses __arrow_c_stream__ protocol automatically
    pa_reader = pa.RecordBatchReader.from_stream(reader)
    table = pa_reader.read_all()

    print(f"Loaded {table.num_rows:,} rows via PyCapsule interface")

        # Note: The reader is consumed after use (single-use pattern)
        # You cannot call read_all() again on the same reader

**Important:** The `__arrow_c_stream__` method consumes the reader. After calling,
the reader cannot be used again. This is a deliberate design choice to ensure
memory safety in the Arrow C Data Interface.

Most libraries (Polars, PyArrow) handle this automatically when you pass the reader to them.

## Parameterized Queries

Always use parameters for user input to prevent SQL injection.

In [None]:
with connect(HANA_URL) as conn, conn.cursor() as cursor:
    # Positional parameters
    df = cursor.execute_polars(
        "SELECT * FROM ORDERS WHERE STATUS = ? AND TOTAL > ?", ["PENDING", 1000.0]
    )
    print(f"Pending orders > $1000: {len(df)}")

## Batch Insert

Use `executemany()` for efficient bulk inserts.

In [None]:
# Sample data to insert
products = [
    ("PROD001", "Widget A", 29.99),
    ("PROD002", "Widget B", 39.99),
    ("PROD003", "Widget C", 49.99),
]

with connect(HANA_URL) as conn, conn.cursor() as cursor:
    cursor.executemany("INSERT INTO PRODUCTS (ID, NAME, PRICE) VALUES (?, ?, ?)", products)
    conn.commit()
    print(f"Inserted {len(products)} products")

## Transaction Control

Transactions are controlled via `commit()` and `rollback()`.

In [None]:
with connect(HANA_URL) as conn:
    try:
        with conn.cursor() as cursor:
            cursor.execute("UPDATE ACCOUNTS SET BALANCE = BALANCE - 100 WHERE ID = 1")
            cursor.execute("UPDATE ACCOUNTS SET BALANCE = BALANCE + 100 WHERE ID = 2")
            conn.commit()  # Both updates succeed
            print("Transfer completed")
    except Exception as e:
        conn.rollback()  # Undo all changes
        print(f"Transfer failed: {e}")

## Next Steps

- **[02_polars_analytics.ipynb](./02_polars_analytics.ipynb)** - Advanced Polars analytics
- **[03_streaming_large_data.ipynb](./03_streaming_large_data.ipynb)** - Processing large datasets
- **[04_performance_comparison.ipynb](./04_performance_comparison.ipynb)** - Benchmarks vs hdbcli