# Telecom RAN Data Generation: Scale Example

This notebook demonstrates generating large-scale synthetic timeseries data for a telecom Radio Access Network (RAN) using Rockfish's Entity Data Generator.

**What this notebook shows:**
- Scaling the schema in [01_basic_generation.ipynb](01_basic_generation.ipynb) to production-like cardinalities
- Generating millions of rows efficiently with Rockfish
- Working with larger time ranges and different intervals

**Generated data:**
- 100 transport links × 241 timestamps (5 days @ 30min) = ~24,100 rows
- 500 core nodes × 241 timestamps = ~120,500 rows
- 10,000 cell sites × 241 timestamps = ~2,410,000 rows
- **Total: ~2.55M rows**
- **Generation time:** ~2-5 minutes

## Setup and Imports

In [1]:
import rockfish as rf
import rockfish.actions as ra
from dotenv import load_dotenv

from utils import create_telecom_ran_schema

In [2]:
# Connect to the Rockfish platform using your API Key
load_dotenv()
conn = rf.Connection.from_env()

## Update Base Schema

We'll reuse the `DataSchema` from [01_basic_generation.ipynb](01_basic_generation.ipynb).

We'll increase cardinalities and adjust the time range to generate a larger amount of data:
- **transport_link**: 6 -> 100 entities
- **core_node**: 16 -> 500 entities
- **cell_site**: 100 -> 10,000 entities
- **Time Interval**: 15min -> 30min
- **Duration**: 2 days -> 5 days

For an in-depth explanation of the Rockfish Entity Data Generator and the example schema used in this tutorial, please refer to [rf_telecom_ran_tutorial.md](rf_telecom_ran_tutorial.md).

In [3]:
# Update the schema instance to generate more data
telecom_ran_schema = create_telecom_ran_schema(
    n_transport_links=100,
    n_core_nodes=500,
    n_cell_sites=10000,
    global_start_time="2025-01-10T00:00:00Z",
    global_end_time="2025-01-15T00:00:00Z",
    global_time_interval="30min",
)

## Run Data Generation

We'll use a **Rockfish Workflow** to run a data generation job on the Rockfish platform.

See [01_basic_generation.ipynb](01_basic_generation.ipynb) for more information about Rockfish workflows, actions, and action configs.

In [4]:
config = ra.GenerateFromDataSchema.Config(
    schema=telecom_ran_schema,
    upload_datasets=True,
)
generate = ra.GenerateFromDataSchema(config)

In [5]:
builder = rf.WorkflowBuilder()
builder.add(generate)
workflow = await builder.start(conn)
print(f"Workflow ID: {workflow.id()}")

Workflow ID: 4ZPgcTIT4JCuSXHGr2WSiu


In [6]:
async for log in workflow.logs(level=rf.events.LogLevel.DEBUG):
    print(log)

2025-11-21T15:30:22.108026Z generate-from-data-schema: INFO Generating 3 entities: transport_link, core_node, cell_site
2025-11-21T15:30:22.116299Z generate-from-data-schema: INFO Starting data generation...
2025-11-21T15:31:49.549526Z generate-from-data-schema: INFO Generated 3 entity tables
2025-11-21T15:31:49.569194Z generate-from-data-schema: INFO Creating dataset for entity 'transport_link': 24100 rows
2025-11-21T15:31:49.814829Z generate-from-data-schema: INFO Uploaded dataset 'transport_link' (1Vng8n2g4jVHTVZGBUajZN): 24100 rows
2025-11-21T15:31:49.834951Z generate-from-data-schema: INFO Creating dataset for entity 'core_node': 120500 rows
2025-11-21T15:31:50.068104Z generate-from-data-schema: INFO Uploaded dataset 'core_node' (3YodwQdQn4VocAo57zARWP): 120500 rows
2025-11-21T15:31:50.087023Z generate-from-data-schema: INFO Creating dataset for entity 'cell_site': 2410000 rows
2025-11-21T15:31:52.635155Z generate-from-data-schema: INFO Uploaded dataset 'cell_site' (eJ1Pvr0cb8Y0tk

## Retrieve Generated Datasets

We'll retrieve datasets for `transport_link`, `core_node`, and `cell_site` entities.

In [7]:
datasets = await workflow.datasets().collect()
print(f"Generated {len(datasets)} datasets")

transport_link_dataset = None
core_node_dataset = None
cell_site_dataset = None
for remote_ds in datasets:
    ds = await remote_ds.to_local(conn)
    if ds.name() == "transport_link":
        transport_link_dataset = ds
    elif ds.name() == "core_node":
        core_node_dataset = ds
    elif ds.name() == "cell_site":
        cell_site_dataset = ds

Generated 3 datasets


## Inspect Transport Link Data

In this tutorial, we verify that a larger amount of data was generated. Feel free to reuse the data exploration code from [01_basic_generation.ipynb](01_basic_generation.ipynb)!

In [8]:
transport_link_df = transport_link_dataset.to_pandas()
print(f"Transport Link dataset: {transport_link_dataset.table.num_rows} rows")
transport_link_df.head()

Transport Link dataset: 24100 rows


Unnamed: 0,Device_ID,Interface_ID,Bandwidth_Utilization_Out,Packet_Loss_Percent,Latency_ms,Jitter_ms,Timestamp
0,RTR_004,eth1,57.627208,0.387565,15.986711,1.958601,2025-01-10T00:00:00+00:00
1,RTR_004,eth1,56.786161,0.443919,15.780268,1.888845,2025-01-10T00:30:00+00:00
2,RTR_004,eth1,55.739454,0.516126,15.45361,1.84966,2025-01-10T01:00:00+00:00
3,RTR_004,eth1,51.06026,0.471744,15.776317,1.964455,2025-01-10T01:30:00+00:00
4,RTR_004,eth1,51.271135,0.398359,15.292285,2.151212,2025-01-10T02:00:00+00:00


## Inspect Core Node Data

In [9]:
core_node_df = core_node_dataset.to_pandas()
print(f"Core Node dataset: {core_node_dataset.table.num_rows} rows")
core_node_df.head()

Core Node dataset: 120500 rows


Unnamed: 0,Core_Node_ID,MM_AttachedUEs,SM_ActivePDUSessions,CPU_Load,Timestamp
0,MME_001,5310,3068,62.167676,2025-01-10T00:00:00+00:00
1,MME_001,5252,3145,59.791042,2025-01-10T00:30:00+00:00
2,MME_001,5248,3153,58.520525,2025-01-10T01:00:00+00:00
3,MME_001,4960,3037,58.282711,2025-01-10T01:30:00+00:00
4,MME_001,5080,2863,53.845959,2025-01-10T02:00:00+00:00


## Inspect Cell Site Data

In [10]:
cell_site_df = cell_site_dataset.to_pandas()
print(f"Cell Site dataset: {cell_site_dataset.table.num_rows} rows")
cell_site_df.head()

Cell Site dataset: 2410000 rows


Unnamed: 0,Cell_ID,Base_Station_ID,Location_Lat,Location_Lon,Transport_Device_ID,Transport_Interface_ID,RRC_ConnEstabFail,RRC_ConnEstabSucc,RRC_ConnEstabAtt,ERAB_EstabInitSuccNbr_QCI,DL_PRB_Utilization,Cell_Availability,Timestamp
0,CELL_8701,eNB_002,40.345021,-79.652112,RTR_002,eth1,1,65,66,53.979877,25.043652,99.525135,2025-01-14T09:30:00+00:00
1,CELL_8701,eNB_002,40.345021,-79.652112,RTR_002,eth1,2,67,70,59.962766,29.735254,99.477863,2025-01-14T10:00:00+00:00
2,CELL_8701,eNB_002,40.345021,-79.652112,RTR_002,eth1,3,74,78,70.503723,28.666601,99.491717,2025-01-14T10:30:00+00:00
3,CELL_8701,eNB_002,40.345021,-79.652112,RTR_002,eth1,4,78,83,70.609874,32.166309,99.528757,2025-01-14T11:00:00+00:00
4,CELL_8701,eNB_002,40.345021,-79.652112,RTR_002,eth1,5,85,91,80.117612,38.610725,99.479817,2025-01-14T11:30:00+00:00


## Save Data to CSV

In [11]:
# Save all datasets to file
transport_link_df.to_csv("transport_link_data_scale.csv", index=False)
core_node_df.to_csv("core_node_data_scale.csv", index=False)
cell_site_df.to_csv("cell_site_data_scale.csv", index=False)