# Simple Process Tree Builder

This notebook opens a Wintap process parquet file (or files) using duckdb to build a networkx process tree graph.

Workflow is:

1. Map process parquet into a duckdb view
2. Create an iterator for all rows in the set using the wg.add_all() function.
    Note: the current example function is dead simple, but could be greatly enhanced with complex filtering, joining, etc.
3. Pass the iterator to the build graph function which adds nodes with properties and parent->child relationships
4. Display a few simple metics about the graph

## Setup
Download data from https://gdo-wintap.llnl.gov

A good starting set is: https://gdo168.llnl.gov/data/ACME-2023/stdview-20231109-20231111/process_summary.parquet

If you'd like more data, look into the longer date ranges.

Modify the path in the "create view" statement to point to where you have downloaded the process_summary.parquet file. 

In [None]:
# Import packages used in notebooks
import duckdb
import networkx as nx
import wintapgraph as wg
import pandas as pd
%load_ext magic_duckdb
pd.set_option('display.max_rows', 200)

In [None]:
# Initialize an in-memory db. Save reference in a variable and then set magic-duckdb environment. Result is ability to use the same DB instance from python code and %dql/%%dql magics.
con = duckdb.connect()
%dql -co con
# Only uses a process table
%dql create view process as from '~/data/wintapv6/ACME-Redo/stdview-20231109-20231111/process_summary.parquet'
# Display a simple summary of the process_uber_summary table
%dql summarize process


In [3]:
processes = wg.add_all(con)
netg = wg.build_process_tree_graph(con, processes)
netg

In [None]:
[len(c) for c in sorted(nx.connected_components(netg), key=len, reverse=True)]

In [None]:
nx.number_connected_components(netg)

In [None]:
%dql select count(*) from process