> üì¶ **Note:** Before running this notebook, make sure **BodoSQL and related dependencies** are installed.

You can install it using the following command

- If you're working inside the repo:
  ```bash
  pip install -e ".[bodosql]"
  ```

- Or install BodoSQL directly:
  ```bash
  pip install bodosql
  ```

### üì¶ Importing Required Libraries

In [5]:
import pydough
import bodosql
import datetime
import pandas as pd

### üîê BodoSQLContext

BodoSQL is a versatile SQL compute engine that can work with data from various sources.
The [BodoSQL API](https://docs.bodo.ai/latest/api_docs/sql/database_catalogs/)
defines several ways to connect a BodoSQL context object to a data source, such
as a Snowflake account. This demo shows how to run PyDough with a BodoSQL context
that has connected to such a Snowflake account.

1. **Load credentials from a local JSON file**:
   - The `creds.json` file contains your Snowflake login details like username, password, account name, database, schema, and warehouse.
   - These are read using Python‚Äôs built-in `json` module and stored in variables.


2. **Build the BodoSQLContext connected to the catalog**:
   - `SnowflakeCatalog` establishes the connection to the Snowflake account.
   - `BodoSQLContext` is the crucial wrapper API that is used to access BodoSQL
   and which conencts to the catalog.

3. **Connect to BodoSQL using PyDough**:
   - `pydough.active_session.load_metadata_graph(...)` loads a metadata graph that maps your Snowflake schema (used for query planning or optimizations).
   - `connect_database(...)` uses the loaded credentials to establish a live connection to your Snowflake database.

üìå Make sure:
- The `creds.json` file exists and contains all the required keys.
- The metadata graph path points to a valid JSON file that represents your schema.


In [None]:
import json

# Step 1: Load credentials from a JSON file
path_to_creds = "./creds.json"
with open(path_to_creds) as f:
    creds = json.load(f)

sf_username = creds["SF_USERNAME"]
sf_password = creds["SF_PASSWORD"]
sf_account = creds["SF_ACCOUNT"]
sf_tpch_db = creds["SF_DATABASE"]
sf_warehouse = creds["SF_WH"]

# Step 2: Build the BodoSQLContext connected to the Snowflake catalog
catalog = bodosql.SnowflakeCatalog(
    sf_username,
    sf_password,
    sf_account,
    sf_warehouse,
    sf_tpch_db
)
bc = bodosql.BodoSQLContext(catalog=catalog)

# Step 3: Load a sample metadata graph and connect PyDough to the BodoSQL context
pydough.active_session.load_metadata_graph("../../tests/test_metadata/snowflake_sample_graphs.json", "TPCH")
pydough.active_session.connect_database("bodosql", context=bc)

DatabaseContext(connection=<bodosql.context.BodoSQLContext object at 0x11f9a0c50>, dialect=<DatabaseDialect.SNOWFLAKE: 'snowflake'>)

### üîå Enabling PyDough's Jupyter Magic Commands

This line loads the `pydough.jupyter_extensions` module, which adds **custom magic commands** (like `%%pydough`) to the notebook.

These magic commands allow you to:
- Write PyDough directly in notebook cells using `%%pydough`
- Automatically render results

This is a Jupyter-specific feature ‚Äî the `%load_ext` command dynamically loads these extensions into your current notebook session.


In [3]:
%load_ext pydough.jupyter_extensions

### üìä TPC-H Query 1 Using PyDough DSL

This cell runs **TPC-H Query 1** using PyDough's Python-style DSL instead of raw SQL.

The query computes summary statistics (like sums, averages, and counts) for orders, grouped by return flag and line status, and filtered by a shipping date cutoff.

Finally, `pydough.to_df(output)` converts and prints the result as a Pandas DataFrame for easy inspection and analysis in Python.


In [6]:
%%pydough
# TPCH Q1
output = (
        lines.WHERE((ship_date <= datetime.date(1998, 12, 1)))
        .PARTITION(name="groups", by=(return_flag, status))
        .CALCULATE(
            L_RETURNFLAG=return_flag,
            L_LINESTATUS=status,
            SUM_QTY=SUM(lines.quantity),
            SUM_BASE_PRICE=SUM(lines.extended_price),
            SUM_DISC_PRICE=SUM(lines.extended_price * (1 - lines.discount)),
            SUM_CHARGE=SUM(
                lines.extended_price * (1 - lines.discount) * (1 + lines.tax)
            ),
            AVG_QTY=AVG(lines.quantity),
            AVG_PRICE=AVG(lines.extended_price),
            AVG_DISC=AVG(lines.discount),
            COUNT_ORDER=COUNT(lines),
        )
        .ORDER_BY(L_RETURNFLAG.ASC(), L_LINESTATUS.ASC())
)
# Step 3: Execute code
pydough.to_df(output)

Py4JJavaError: An error occurred while calling z:com.bodosql.calcite.application.PythonEntryPoint.buildRelationalAlgebraGenerator.
: java.lang.RuntimeException: Unable to load default schema from snowflake. Error message: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Fail to retrieve row count for first arrow chunk: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available.
	at com.bodosql.calcite.catalog.SnowflakeCatalog.getDefaultSchemaImpl(SnowflakeCatalog.java:849)
	at com.bodosql.calcite.catalog.SnowflakeCatalog.getDefaultSchemaImpl(SnowflakeCatalog.java:847)
	at com.bodosql.calcite.catalog.SnowflakeCatalog.getDefaultSchema(SnowflakeCatalog.java:800)
	at com.bodosql.calcite.application.RelationalAlgebraGenerator.lambda$new$0(RelationalAlgebraGenerator.java:233)
	at com.bodosql.calcite.application.RelationalAlgebraGenerator.setupSchema(RelationalAlgebraGenerator.java:154)
	at com.bodosql.calcite.application.RelationalAlgebraGenerator.<init>(RelationalAlgebraGenerator.java:214)
	at com.bodosql.calcite.application.PythonEntryPoint$Companion.buildRelationalAlgebraGenerator(PythonEntryPoint.kt:559)
	at com.bodosql.calcite.application.PythonEntryPoint.buildRelationalAlgebraGenerator(PythonEntryPoint.kt)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:1583)


Next, use the same context to execute SQL generated from PyDough, and also to
see the execution plan generated by BodoSQL.

In [None]:
%%pydough

# Count how many african customers are in the building market segment.
result = (
   TPCH
   .CALCULATE(n=COUNT(customers.WHERE((nation.region.name == "AFRICA") & (market_segment == "BUILDING"))))
)

as_sql = pydough.to_sql(result)
print("Generated SQL:\n", as_sql)

plan = bc.generate_plan(as_sql)
print("BodoSQL Query plan:\n", plan)

bc.sql(as_sql)