# Step 3: Virtualize bulk data through an external service

In this final version of our story, we will again be writing a Table with manually created bulk data Urls.
But instead of writing the bulk data to files, we will only write a lookup table, which will then be used by an
external HTTP server for serving chunks dynamically.

The trick: we need to know how a 3LC bulk data url resolves to a slice of our source data.

E.g. when receiving a request at "http://localhost:2233/chunk-0:100-200", we must check a lookup table

```
{
    "chunk-0": {
        "0-100": {"sample": 0, "attribute": "vertices_2d"},
        "100-50": {"sample": 0, "attribute": "intensities"},
        "150-100": {"sample": 1, "attribute": "vertices_2d"}
        "250-50": {"sample": 1, attribute": "intensities"}
    },
    "chunk-1": ...
    ...
}
```

to find something we can resolve.

In [1]:
import tlc
from pathlib import Path
from data_sources import Deterministic3DPointCloudDataset
import numpy as np

[90m3lc: [0mUsing API key resolved through tlc config system.
[90m3lc: [0mAPI key resolved
[90m3lc: [0muser_id: disabled_saas_services_test_user tenant_id: disabled_saas_services_test_tenant
[90m3lc: [0mUMAPTable imported; umap-learn is installed
[90m3lc: [0mPaCMAPTable imported; pacmap is installed
[90m3lc: [0mdetectron2 not installed, disabling detectron2 integration.


In [2]:
bulk_data_path = Path("bulk_data/3").absolute()
lookup_table_path = bulk_data_path / "lookup_table.json"

server_root = "http://localhost:2233"

In [3]:
dataset = Deterministic3DPointCloudDataset(10)

In [4]:
from collections import defaultdict

rows = []
chunk_offsets = defaultdict(int)
lookup_table = defaultdict(lambda: defaultdict(dict))

for i in range(len(dataset)):
    chunk = i // 3
    data = dataset[i]
    intensities: np.ndarray = data["intensities"]
    vertices_3d: np.ndarray = data["vertices_3d"]
    vertices_3d_length = np.prod(vertices_3d.shape) * vertices_3d.dtype.itemsize
    intensities_length = np.prod(intensities.shape) * intensities.dtype.itemsize

    vertices_range = f"{chunk_offsets[chunk]}-{vertices_3d_length}"
    intensities_range = f"{chunk_offsets[chunk]+vertices_3d_length}-{intensities_length}"

    row = {
        "x_min": 0,
        "y_min": 0,
        "z_min": 0,
        "x_max": 1,
        "y_max": 1,
        "z_max": 1,
        "instances": [
            {
                "vertices_3d_binary_property_url": f"{server_root}/chunk-{chunk}:{vertices_range}",
                "vertices_3d_additional_data": {
                    "intensity_binary_property_url": f"{server_root}/chunk-{chunk}:{intensities_range}"
                }
            }
        ],
    }

    chunk_offsets[chunk] += vertices_3d_length
    chunk_offsets[chunk] += intensities_length
    rows.append(row)
    lookup_table[f"chunk-{chunk}"][intensities_range] = {"sample": i, "attribute": "intensities"}
    lookup_table[f"chunk-{chunk}"][vertices_range] = {"sample": i, "attribute": "vertices_2d"}

In [5]:
import json

lookup_table_path.parent.mkdir(parents=True, exist_ok=True)
with open(lookup_table_path, "w") as f:
    json.dump(lookup_table, f)

## Write the Table!

In [6]:
schema = tlc.Geometry3DSchema(
    include_3d_vertices=True,
    per_vertex_schemas={
        "intensity": tlc.Float32ListSchema()
    },
    is_bulk_data=True,  # This is what sets up the "sibling" paths with the "_binary_property_url" suffix
)

table_writer = tlc.TableWriter(
    table_name="pre-externalized-table",
    dataset_name="pre-externalized-dataset",
    project_name="pre-externalized-project",
    description="Pre-externalized table",
    column_schemas={"vertices": schema},  # We use the same schema as before
    if_exists="rename",
)

In [None]:
for row in rows:
    table_writer.add_row({"vertices": row})

table = table_writer.finalize()

[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableWriter: Added 1 examples to buffer
[90m3lc: [0mTableFromParquetWriter: Writing table to URL: C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/pre-externalized-project/datasets/pre-externalized-dataset/tables/pre-externalized-table


[90m3lc: [0mOverwriting URL: C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/index.3lc.json
[90m3lc: [0mOverwriting URL: C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/pre-externalized-project/index.3lc.json
[90m3lc: [0mOverwriting URL: C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/index.3lc.json


In [8]:
table[0]

[90m3lc: [0mOverwriting URL: C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/pre-externalized-project/datasets/pre-externalized-dataset/tables/pre-externalized-table/object.3lc.json
[90m3lc: [0mReading parquet file from C:/Users/gudbrand/AppData/Local/3LC/3LC/projects/pre-externalized-project/datasets/pre-externalized-dataset/tables/pre-externalized-table/pre-externalized-table.parquet


{'vertices': {'x_min': 0.0,
  'y_min': 0.0,
  'z_min': 0.0,
  'x_max': 1.0,
  'y_max': 1.0,
  'z_max': 1.0,
  'instances': [{'vertices_3d': [],
    'vertices_3d_binary_property_url': 'http://localhost:2233/chunk-0:0-1200',
    'vertices_3d_additional_data': {'intensity': [],
     'intensity_binary_property_url': 'http://localhost:2233/chunk-0:1200-400'}}]}}