# Introduction and notebook template

This notebook presents a common structure for notebooks that enable useful code idioms for running this dataset exploration.
First, we load the following Python kernel extensions.

1. `autoreload` facilitates ongoing development of module `acme4_explore`.
2. `dotenv` provides configuration facilities.
3. `quak` is a gorgeous table browsing extension for data frames.
4. `sql` makes it much easier to wrangle DuckDB queries.

In [1]:
%load_ext autoreload
%load_ext dotenv
%load_ext quak
%load_ext sql

We configure `autoreload` to avoid restarting the kernel while code-editing `acme4_explore`.
As new submodules get added, add them also as `%aimport` statements.

In [2]:
%autoreload 1
%aimport acme4_explore

Load configuration settings stored in `.env`.

In [3]:
%dotenv

We have what we need to run imports already.

In [4]:
import acme4_explore
import logging
import os

Configuring logging is most useful.
Function `acme4_explore.logging_config` assists with doing so uniformly across notebooks.

In [5]:
logging.basicConfig(**acme4_explore.logging_config())

We can access a work directory to store intermediate artifacts,
as well as write up labels or other data we find useful.
Where is it? You can set it up in `.env`.

In [6]:
with acme4_explore.dir_work() as dir:
    print(dir)
    print(dir.resolve())

.work
/work/home/hamelin/Wintap-Analytics/2025-acme4-explore/.work


Connect to the standard view of the dataset using DuckDB, appending files generated in the work directory.
The following also properly configures things so that we may query the database using the `%sql` magic (either as line or cell magic).
Note that this DuckDB instance is read-only.
We append new data by writing new Parquet files in the work directory.

> Hint: if you want to see the progress of the construction of the database, run in a cell: `logging.getLogger("acme4_explore").setLevel("DEBUG")`.
> My own `.env` is set up to always set the logging level of this module by having: `LOG_LEVEL = '{"acme4_explore": "DEBUG"}'`.

In [7]:
db = acme4_explore.connect_db()
%sql db --alias duckdb
%config SqlMagic.displaycon=False
%config SqlMagic.autopandas=True

15:49:20 | INFO     | acme4_explore            | Access dataset over httpfs
15:49:20 | DEBUG    | acme4_explore            | Add standard view to all_files over https://gdo168.llnl.gov/data/ACME4/stdview-20240819-20240923/all_files.parquet
15:49:21 | DEBUG    | acme4_explore            | Add standard view to files over https://gdo168.llnl.gov/data/ACME4/stdview-20240819-20240923/files.parquet
15:49:21 | DEBUG    | acme4_explore            | Add standard view to host over https://gdo168.llnl.gov/data/ACME4/stdview-20240819-20240923/host.parquet
15:49:22 | DEBUG    | acme4_explore            | Add standard view to host_ip over https://gdo168.llnl.gov/data/ACME4/stdview-20240819-20240923/host_ip.parquet
15:49:22 | DEBUG    | acme4_explore            | Add standard view to labels_graph_net_conn over https://gdo168.llnl.gov/data/ACME4/stdview-20240819-20240923/labels_graph_net_conn.parquet
15:49:23 | DEBUG    | acme4_explore            | Add standard view to labels_graph_nodes over https://

So what views can we access in this database?

In [8]:
%%sql
select table_name, table_type, is_insertable_into
from information_schema.tables

Widget(sql='SELECT * FROM "df"')