This project uses uv for dependency management, which provides faster and more reliable package installation.
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync
-
Activate the virtual environment:
source .venv/bin/activate
If you prefer using traditional pip:
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
source .venv/bin/activate
# Install dependencies
pip install -e .Note: After activating the virtual environment (either with uv or pip), you can use python directly in the following commands.
First, download the dataset from Zenodo at https://doi.org/10.5281/zenodo.19637628 and place the archive in your workspace.
Then, extract the dataset archive:
tar -xzf gleaner-dataset.tar.gzThis will extract the dataset files into the data/ directory.
After extracting the dataset, you can run the sampling command:
python ./main.py sample batch -s gleaner -d gleaner --mode online --rate 0.05 --clearCommand Parameters:
-s gleaner: Source dataset name-d gleaner: Destination dataset name--mode online: Sampling mode (online)--rate 0.05: Sampling rate (5% of the data)--clear: Clear previous results
After sampling, you can view the performance metrics:
python main.py sample perf-report -d gleanerThis command will output summary statistics for key performance metrics. For detailed performance data, check the Parquet files in:
output/rcabench-platform-v2/sampler_reports/gleaner/
Traces file contains a time series of spans.
| Column | Type | Description |
|---|---|---|
| time | datetime | start time of a span in UTC |
| trace_id | string | unique identifier of a trace (a trace groups many spans) |
| span_id | string | unique identifier of a span |
| parent_span_id | string | identifier of the parent span (for trace hierarchy) |
| service_name | string | name of the service that generated the span |
| span_name | string | name of the operation represented by the span |
| duration | uint64 | duration of a span in nanoseconds |
| attr.* | * | other attributes of a span |
Metrics file contains a time series of metric values.
| Column | Type | Description |
|---|---|---|
| time | datetime | UTC timestamp of a metric value |
| metric | string | name of the metric value |
| value | float64 | value of the metric value |
| service_name | string | name of the service that generated the metric value |
| attr.* | * | other attributes of a metric value |
Logs file contains a time series of log events.
| Column | Type | Description |
|---|---|---|
| time | datetime | UTC timestamp of a log event |
| trace_id | string | unique identifier of a trace |
| span_id | string | unique identifier of a span |
| service_name | string | name of the service that generated the log event |
| level | string | log level (e.g., INFO, ERROR) |
| message | string | log message |
| attr.* | * | other attributes of a log event |