# MYSQL PyDough Database connector

## Initial Setup

### 1. MySQL Database

You can connect to your own MySQL database using your credentials (for example, if you have MySQL Workbench or another local server running).

### 2. Docker Image (TPC-H Database)

You can also test with our pre-built MySQL TPC-H database available on Docker Hub:

- Make sure you have Docker installed.
- Pull and run the image with the following command:
    ```shell
    docker run -d \
      --name [CONTAINER_NAME] \
      -p 3306:3306 \
      johnbodoai/pydough-mysql-tpch:latest
    ```
    Replace `[CONTAINER_NAME]` with your preferred container name.

The required environment variables for connecting are:
* `MYSQL_USERNAME=root`
* `MYSQL_PASSWORD=[PASSWORD]`

**Tip:**  
Store these credentials in a `.env` file in your project directory for easy access and security.  

## Installing MySQL Connector

Make sure to have the `mysql-connector-python` installed:

- If you're working inside the repo:
    ```shell
    pip install -e ".[mysql]"
    ```

- Or install the connector directly with:
    ```shell
    pip install "mysql-connector-python"
    ```

## Importing Required Libraries

In [None]:
import pydough
import datetime
import os

## Loading credencials and connecting to MySQL

1. Load credentials from a local .env file
    * The `.env` file contains your MySQL login details like `MYSQL_USERNAME`, `MYSQL_PASSWORD`.
    * These are read using `os.getenv()` function.

2. MySQL-PyDough `connect_database()` parameters:
    * `user`(required): Username for MySQL connection.
    * `password` (required): Password used for MySQL connection.
    * `database` (required): Name of the MySQL database.
    * `host` (optional): IP to access MySQL server. Default is "localhost" or "127.0.0.1".
    * `port` (optional): Port number used to access MySQL server. Default is 3306.
    * `connection_timeout` (optional): Timeout used for MySQL connection. Default is 3 seconds.
    * `attempts` (optional): Number of times the connections is attempted. Default is 1.
    * `delay` (optional): Number of seconds to wait before another connection attempt. Default is 2.

3. Connect to MySQL using PyDough
    * `pydough.active_session.load_metadata_graph(...)` loads a metadata graph that maps your MySQL schema (used for query planning or optimizations).
    * `connect_database(...)` uses the loaded credentials to establish a live connection to your MySQL database.

Note: 
- Make sure the `.env` exists and contains all the required keys.
- Make sure the metadata graph path points to a valid JSON file that represents your schema


In [None]:
mysql_username = os.getenv("MYSQL_USERNAME")
mysql_password = os.getenv("MYSQL_PASSWORD")
mysql_tpch_db = "tpch"
mysql_host = "127.0.0.1"
connection_timeout = 2
attempts = 2      
delay = 5.0       

pydough.active_session.load_metadata_graph("../../tests/test_metadata/sample_graphs.json", "TPCH")
pydough.active_session.connect_database("mysql", 
        user=mysql_username,
        password=mysql_password,
        database=mysql_tpch_db,
        host=mysql_host,
        connection_timeout=connection_timeout,
        attempts=attempts,
        delay=delay
)

## Enabling PyDough's Jupyter Magic Commands

This line loads the `pydough.jupyter_extensions` module, which adds custom magic commands (like %%pydough) to the notebook.

These magic commands allow you to:

- Write PyDough directly in notebook cells using %%pydough
- Automatically render results

This is a Jupyter-specific feature — the %load_ext command dynamically loads these extensions into your current notebook session.

In [None]:
%load_ext pydough.jupyter_extensions

## TPC-H Query 1 with PyDough and MySQL

This cell runs TPC-H Query 1 using PyDough's Python-style DSL instead of raw SQL.

The query computes summary statistics (like sums, averages, and counts) for orders, grouped by return flag and line status, and filtered by a shipping date cutoff.

Finally, pydough.to_df(output) converts and prints the result as a Pandas DataFrame for easy inspection and analysis in Python.

In [None]:
%%pydough
# TPCH Q1
output = (lines.WHERE((ship_date <= datetime.date(1998, 12, 1)))
        .PARTITION(name="groups", by=(return_flag, status))
        .CALCULATE(
            L_RETURNFLAG=return_flag,
            L_LINESTATUS=status,
            SUM_QTY=SUM(lines.quantity),
            SUM_BASE_PRICE=SUM(lines.extended_price),
            SUM_DISC_PRICE=SUM(lines.extended_price * (1 - lines.discount)),
            SUM_CHARGE=SUM(
                lines.extended_price * (1 - lines.discount) * (1 + lines.tax)
            ),
            AVG_QTY=AVG(lines.quantity),
            AVG_PRICE=AVG(lines.extended_price),
            AVG_DISC=AVG(lines.discount),
            COUNT_ORDER=COUNT(lines),
        )
        .ORDER_BY(L_RETURNFLAG.ASC(), L_LINESTATUS.ASC())
)

pydough.to_df(output)