# BI 4


In [1]:
from dg_builder import *

- original

> forum -> person -> city -> country

- indexed

> forum -> country


## Original


In [2]:
""" build `original data graph` """

build_original_dg(original_dg_filepath=BI_4_DG, optimized_dg_filepath=BI_4_DG_OPTIMIZED)

Mapping `origin_id` to `uni_id`: 100%|██████████| 3181724/3181724 [00:02<00:00, 1107599.15it/s]
Build map of `vertex.uni_id -> label`: 100%|██████████| 3181724/3181724 [00:01<00:00, 1636471.53it/s]
Build edges in format: `(src_id, dst_id)`: 100%|██████████| 17256038/17256038 [00:13<00:00, 1298749.36it/s]

File `./out/original/data_graph.txt` already exists





In [3]:
""" labels & edges """

edges = [(0, 1), (1, 2), (2, 3)]
labels = ["forum", "person", "city", "country"]
task_names = [["bi_4_query_graph"]]

original_builder = QueryBuilder(
    edges=edges,
    labels=labels,
    raw_task_names=task_names,
    QG_PRE=BI_4_ORIGINAL_Q_PRE,
    LOG_PRE=BI_4_ORIGINAL_L_PRE,
    args_starting=bi_4_original_args_starting,
    kwargs={},
)

original_builder.build()

QueryBuilder {
    edges: [(0, 1), (1, 2), (2, 3)],
    labels: ['forum', 'person', 'city', 'country'],
    raw_task_names: [['bi_4_query_graph']],
    QG_PRE: ./out/original/BI_4,
    LOG_PRE: ./log/original/BI_4,
    args_starting: ['wsl', './VEQ_M_100k', '-dg', './out/original/data_graph.txt', '-qg'],
    replace_indices: [],
    replace_wrapper: <function QueryBuilder.<lambda> at 0x000001E31839D080>,
    kwargs: {'bi_4_query_graph': QGMetaRecord(labels=['forum', 'person', 'city', 'country'], edges=[(0, 1), (1, 2), (2, 3)])},
}

In [4]:
""" exec """

time_table = original_builder.run()
assert len(time_table) == 1

>>> Running: bi_4_query_graph...
    Data file: ./out/original/data_graph.txt
    Query file: ./out/original/BI_4/bi_4_query_graph.txt
    Output file: 
    Sum of |C(u)|: 89019
    Total Recursive Call Count: 160
    Number of Matches: 100120
    Filtering Time (ms): 121.951
    Verification Time (ms): 238.711
    Processing Time (ms): 360.662
<<< Done! (Outer Elapsed Time: 19839.9591 ms)


In [5]:
time_table

[360.662]

## Optimized


In [6]:
""" build `optimized data graph` """

index_csv_filenames = ["forum_person_city_country"]

build_optimized_dg(
    optimized_dg_filepath=BI_4_DG_OPTIMIZED,
    index_csv_filenames=index_csv_filenames,
)

Adding `index edge` into `edges`: 100%|██████████| 90492/90492 [00:00<00:00, 1129592.41it/s]
Writing `labels` into `./out/optimized/BI_4/data_graph.txt`: 100%|██████████| 3181724/3181724 [00:01<00:00, 1916326.24it/s]
Writing `edges` into `./out/optimized/BI_4/data_graph.txt`: 100%|██████████| 17346525/17346525 [00:16<00:00, 1043856.72it/s]


In [7]:
""" labels & edges """

edges_optimized: list[tuple[int, int]] = [(0, 1)]
labels_optimized = ["forum", "country"]

optimized_builder = QueryBuilder(
    edges=edges_optimized,
    labels=labels_optimized,
    raw_task_names=task_names,
    QG_PRE=BI_4_OPTIMIZED_Q_PRE,
    LOG_PRE=BI_4_OPTIMIZED_L_PRE,
    args_starting=bi_4_optimized_args_starting,
    kwargs={},
)

optimized_builder.build()

QueryBuilder {
    edges: [(0, 1)],
    labels: ['forum', 'country'],
    raw_task_names: [['bi_4_query_graph']],
    QG_PRE: ./out/optimized/BI_4,
    LOG_PRE: ./log/optimized/BI_4,
    args_starting: ['wsl', './VEQ_M_100k', '-dg', './out/optimized/BI_4/data_graph.txt', '-qg'],
    replace_indices: [],
    replace_wrapper: <function QueryBuilder.<lambda> at 0x000001E31839D080>,
    kwargs: {'bi_4_query_graph': QGMetaRecord(labels=['forum', 'country'], edges=[(0, 1)])},
}

In [8]:
""" exec """

time_table_optimized = optimized_builder.run()
assert len(time_table_optimized) == 1

>>> Running: bi_4_query_graph...
    Data file: ./out/optimized/BI_4/data_graph.txt
    Query file: ./out/optimized/BI_4/bi_4_query_graph.txt
    Output file: 
    Sum of |C(u)|: 53961
<<< Done! (Outer Elapsed Time: 16978.2335 ms)
--- ^^^^^^^^ `time_table` will be filled with `float("NaN")` only for marking. ---


In [9]:
time_table_optimized

[nan]

In [10]:
""" Show BI-4 `comparison data-frame` """

print("Comparison between: `original_match` & `optimized_match`")

df = pl.DataFrame(
    {
        "task": task_names,
        "original (ms)": time_table,
        "optimized (ms)": time_table_optimized,
    }
)
df

Comparison between: `original_match` & `optimized_match`


task,original (ms),optimized (ms)
list[str],f64,f64
"[""bi_4_query_graph""]",360.662,
