# OKDP Sandbox Jupyter notebooks

Welcome to the OKDP Sandbox Workspace!  
This environment contains a collection of notebooks designed to help you **explore, test, and validate OKDP services** using practical and real-case scenarios.

These notebooks cover a wide range of topics, including:

- Data exploration and sampling strategies  
- Trino SQL queries and performance checks  
- Python data workflows with Pandas and Altair  
- Interactive analytics and visualization  
- End-to-end demonstrations of OKDP capabilities  

Among these resources, you will also find examples of two complementary approaches to interacting with Trino:

- **Python + SQLAlchemy workflows**  
- **SQL Magic (`%sql`, `%%sql`) workflows**  

But the project is **not limited to these two approaches** — more notebooks, tests, and examples will appear as the workspace grows.

Use this hub as your starting point to navigate the notebook ecosystem and explore OKDP in a sandbox environment.

---

## Trino-Backed Analysis Notebooks

The following notebooks analyze the NYC Yellow Taxi dataset **using Trino as the compute engine**, focusing on:

- Random sampling strategies executed directly on Trino  
- Time-based feature engineering powered by Trino SQL functions  
- Interactive Altair visualizations built on top of Trino query results  
- Data quality checks performed over Trino-managed tables  
- Explorations of distance, fare, and passenger behavior using Trino-backed datasets  

### 1. Using Python and SQLAlchemy
- **Notebook:** `nyc_trip/raw_trino_python.ipynb`
- **Description:** A full workflow using Python, SQLAlchemy, and Pandas to connect to Trino, query the NYC Yellow Taxi dataset, and build visualizations with Altair.
- ➡️ **Open:** [NYC Taxi Analysis — Python Version](./notebooks/nyc_trip/raw_trino_python.ipynb)


### 2. Using SQL
- **Notebook:** `nyc_trip/raw_trino_sql.ipynb`
- **Description:**
- An identical notebook using [JupySQL](https://github.com/ploomber/jupysql) (`%sql`, `%%sql`) to run SQL queries directly against Trino, while still leveraging Pandas and Altair for analysis.
- ➡️ **Open:** [NYC Taxi Analysis — SQL Magic Version](./notebooks/nyc_trip/raw_trino_sql.ipynb)

---

## Contribute

- Explore related examples in the official [OKDP examples](https://github.com/okdp/okdp-examples) repository.
- Have ideas, improvements, or new examples to share?  
  **Contributions are welcome!** Feel free to open issues or submit pull requests.  
