Skip to content

Commit

Permalink
[BigQuery] Print out longest path through view graph (Recidiviz/recid…
Browse files Browse the repository at this point in the history
…iviz-data#29579)

## Description of the change

Updates the stats logging at the end of each view update / sandbox load
to print out the longest path through the graph so we can begin to
identify bottlenecks!

Tested with by running this command:
```
python -m recidiviz.tools.load_views_to_sandbox \
        --sandbox_dataset_prefix ageiduschek manual \
        --view_ids_to_load shared_metric_views.event_based_commitments_from_supervision_for_matrix \
        --update_ancestors False \
        --update_descendants True
```

... which produced this output:
```
INFO:root:### BQ DAG PROCESSING STATS ###
Total processing time: 7.64 sec
Nodes processed: 33
Average queue wait time: 0.0 seconds
Max queue wait time: 0.0 seconds
Top [25] most expensive nodes in DAG:
  1) 1.46 sec: ageiduschek_dashboard_views.revocations_matrix_distribution_by_violation
  2) 1.44 sec: ageiduschek_shared_metric_views.supervision_termination_matrix_by_person
  3) 1.38 sec: ageiduschek_shared_metric_views.event_based_commitments_from_supervision_for_matrix
  4) 1.38 sec: ageiduschek_shared_metric_views.supervision_matrix_by_person
  5) 1.35 sec: ageiduschek_dashboard_views.revocations_matrix_distribution_by_officer
  6) 1.31 sec: ageiduschek_shared_metric_views.admission_types_per_state_for_matrix
  7) 1.31 sec: ageiduschek_dashboard_views.revocations_matrix_distribution_by_race
  8) 1.30 sec: ageiduschek_dashboard_views.revocations_matrix_filtered_caseload
  9) 1.29 sec: ageiduschek_dashboard_views.revocations_matrix_distribution_by_gender
  10) 1.29 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_gender_comparison_supervision_errors
  11) 1.25 sec: ageiduschek_dashboard_views.revocations_matrix_distribution_by_risk_level
  12) 1.23 sec: ageiduschek_validation_views.revocation_matrix_comparison_revocations_by_officer_errors
  13) 1.22 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_race_comparison_revocation_errors
  14) 1.22 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_race_comparison_recommendation_errors
  15) 1.22 sec: ageiduschek_validation_views.revocation_matrix_comparison_revocation_cell_vs_caseload
  16) 1.21 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_gender_comparison
  17) 1.21 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_race_comparison
  18) 1.21 sec: ageiduschek_validation_views.revocation_matrix_comparison_supervision_population
  19) 1.20 sec: ageiduschek_validation_views.revocation_matrix_comparison_by_month_errors
  20) 1.19 sec: ageiduschek_validation_views.revocation_matrix_comparison_by_month
  21) 1.19 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_race_comparison_supervision_errors
  22) 1.19 sec: ageiduschek_dashboard_views.revocations_matrix_supervision_location_ids_to_names
  23) 1.18 sec: ageiduschek_validation_views.revocation_matrix_distribution_by_gender_comparison_recommendation_errors
  24) 1.18 sec: ageiduschek_validation_views.revocation_matrix_caseload_admission_history_errors
  25) 1.18 sec: ageiduschek_validation_views.revocation_matrix_caseload_admission_history
Most expensive path [7.63 seconds total]:
  * ageiduschek_shared_metric_views.event_based_commitments_from_supervision_for_matrix (1.38 sec)
  * ageiduschek_shared_metric_views.revocations_matrix_by_person (1.15 sec)
  * ageiduschek_shared_metric_views.admission_types_per_state_for_matrix (1.31 sec)
  * ageiduschek_dashboard_views.revocations_matrix_distribution_by_gender (1.29 sec)
  * ageiduschek_validation_views.revocation_matrix_distribution_by_gender_comparison (1.21 sec)
  * ageiduschek_validation_views.revocation_matrix_distribution_by_gender_comparison_supervision_errors (1.29 sec)
```
## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues

Related to Recidiviz/recidiviz-data#29291

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [x] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [x] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [x] This pull request has a descriptive title and information useful
to a reviewer
- [x] Potential security implications or infrastructural changes have
been considered, if relevant

GitOrigin-RevId: 3ae10312ce236e16bc5ccc93321fdd9cd8b08850
  • Loading branch information
ageiduschek authored and Helper Bot committed May 25, 2024
1 parent 3a0ec33 commit 004cca5
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 9 deletions.
80 changes: 71 additions & 9 deletions recidiviz/big_query/big_query_view_dag_walker.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,22 @@ class TraversalDirection(Enum):

@attr.s(auto_attribs=True, kw_only=True)
class ViewProcessingMetadata:
node_processing_runtime_seconds: float
total_time_in_queue_seconds: float
# The time it took to actually run the view_process_fn for this view.
view_processing_runtime_sec: float
# The total time between when this view was added to the queue for processing and
# when it finishes processing. May include wait time, during which the job for this
# view wasn't actually started yet.
total_node_processing_time_sec: float
graph_depth: int
longest_path: List[BigQueryView]
longest_path_runtime_seconds: float

@property
def queue_wait_time_sec(self) -> float:
"""The number of seconds between when a node is queued and when it actually
starts view processing.
"""
return self.total_node_processing_time_sec - self.view_processing_runtime_sec


@attr.s(auto_attribs=True, kw_only=True)
Expand Down Expand Up @@ -98,6 +111,8 @@ class ProcessDagResult(Generic[ViewResultT]):
view_results: Dict[BigQueryView, ViewResultT]
view_processing_stats: Dict[BigQueryView, ViewProcessingMetadata]
total_runtime: float
# The set of nodes that were processed last
edge_nodes: Set["BigQueryViewDagNode"]

def log_processing_stats(self, n_slowest: int) -> None:
"""Logs various stats about a DAG processing run.
Expand All @@ -112,10 +127,10 @@ def log_processing_stats(self, n_slowest: int) -> None:
processing_runtimes = []
queued_wait_times = []
for view, metadata in self.view_processing_stats.items():
processing_time = metadata.node_processing_runtime_seconds
total_queue_time = metadata.total_time_in_queue_seconds
processing_runtimes.append((processing_time, view.address))
queued_wait_times.append(total_queue_time - processing_time)
processing_runtimes.append(
(metadata.view_processing_runtime_sec, view.address)
)
queued_wait_times.append(metadata.queue_wait_time_sec)

avg_wait_time = max(
0.0, round(sum(queued_wait_times) / len(queued_wait_times), 2)
Expand All @@ -130,19 +145,44 @@ def log_processing_stats(self, n_slowest: int) -> None:
]
)

longest_path_runtime_seconds, longest_path = self._get_longest_path()

path_strs = []
for v in longest_path:
runtime_seconds = self.view_processing_stats[v].view_processing_runtime_sec
path_strs.append(f" * {runtime_seconds:.2f} sec: {v.address.to_str()}")

longest_path_str = "\n".join(path_strs)

logging.info(
"### BQ DAG PROCESSING STATS ###\n"
"Total processing time: %s sec\n"
"Nodes processed: %s\n"
"Average queue wait time: %s seconds\n"
"Max queue wait time: %s seconds\n"
"Top [%s] most expensive nodes in DAG: \n%s",
"Top [%s] most expensive nodes in DAG: \n%s\n"
"Most expensive path [%s seconds total]: \n%s",
round(self.total_runtime, 2),
nodes_processed,
avg_wait_time,
max_wait_time,
n_slowest,
slowest_list,
round(longest_path_runtime_seconds, 2),
longest_path_str,
)

def _get_longest_path(self) -> Tuple[float, List[BigQueryView]]:
edge_node_stats = [
self.view_processing_stats[node.view] for node in self.edge_nodes
]

end_of_longest_path_stats = sorted(
edge_node_stats, key=lambda stats: stats.longest_path_runtime_seconds
)[-1]
return (
end_of_longest_path_stats.longest_path_runtime_seconds,
end_of_longest_path_stats.longest_path,
)


Expand Down Expand Up @@ -590,6 +630,7 @@ def _dag_view_process_fn_result(
@staticmethod
def _view_processing_statistics(
view_processing_stats: Dict[BigQueryView, ViewProcessingMetadata],
view: BigQueryView,
parent_results: Dict[BigQueryView, ViewResultT],
processing_time: float,
queue_time: float,
Expand All @@ -599,10 +640,28 @@ def _view_processing_statistics(
if not parent_results
else max({view_processing_stats[p].graph_depth for p in parent_results}) + 1
)

if parent_results:
parent_with_longest_path = max(
(view_processing_stats[parent_view] for parent_view in parent_results),
key=lambda parent_metadata: parent_metadata.longest_path_runtime_seconds,
)
parent_longest_path = parent_with_longest_path.longest_path
parent_longest_path_runtime_seconds = (
parent_with_longest_path.longest_path_runtime_seconds
)
else:
parent_longest_path = []
parent_longest_path_runtime_seconds = 0

return ViewProcessingMetadata(
node_processing_runtime_seconds=processing_time,
total_time_in_queue_seconds=queue_time,
view_processing_runtime_sec=processing_time,
total_node_processing_time_sec=queue_time,
graph_depth=graph_depth,
longest_path=[*parent_longest_path, view],
longest_path_runtime_seconds=(
parent_longest_path_runtime_seconds + processing_time
),
)

def process_dag(
Expand All @@ -629,6 +688,7 @@ def process_dag(
"""

top_level_set = set(self.leaves) if reverse else set(self.roots)
bottom_level_set = set(self.roots) if reverse else set(self.leaves)
processed: Set[BigQueryAddress] = set()
view_results: Dict[BigQueryView, ViewResultT] = {}
view_processing_stats: Dict[BigQueryView, ViewProcessingMetadata] = {}
Expand Down Expand Up @@ -667,6 +727,7 @@ def process_dag(
)
view_stats = self._view_processing_statistics(
view_processing_stats=view_processing_stats,
view=node.view,
parent_results=parent_results,
processing_time=execution_sec,
queue_time=end - entered_queue_time,
Expand Down Expand Up @@ -696,6 +757,7 @@ def process_dag(
view_results=view_results,
view_processing_stats=view_processing_stats,
total_runtime=(time.perf_counter() - dag_processing_start),
edge_nodes=bottom_level_set,
)

def _check_sub_dag_input_views(self, *, input_views: List[BigQueryView]) -> None:
Expand Down
14 changes: 14 additions & 0 deletions recidiviz/tests/big_query/big_query_view_dag_walker_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1072,6 +1072,20 @@ def process_simple(

self.assertEqual(set(walker.views), set(result.view_results))

def test_log_processing_stats(self) -> None:
walker = BigQueryViewDagWalker(self.diamond_shaped_dag_views_list)

def process_simple(
view: BigQueryView, _parent_results: Dict[BigQueryView, BigQueryAddress]
) -> BigQueryAddress:
time.sleep(MOCK_VIEW_PROCESS_TIME_SECONDS)
return view.address

result = walker.process_dag(process_simple, synchronous=self.synchronous)

# Logging stats shouldn't crash
result.log_processing_stats(n_slowest=3)

def assertIsValidEmptyParentsView(self, node: BigQueryViewDagNode) -> None:
"""Fails the test if a view that has no parents is an expected view with no
parents. Failures could be indicative of poorly formed view queries.
Expand Down

0 comments on commit 004cca5

Please sign in to comment.