bug-1885978: remove raw crash from indexing code #6560

willkg · 2024-03-20T13:44:25Z

In bug 1763264, we finished all the work to stop using the raw crash when indexing crash report data. The only data we're indexing now comes from the processed crash after being normalized and validated by processor rules. This radically reduces problems with indexing from weird values.

This cleans up the remnants of those old days by removing raw_crash from building documents, indexing, and related documentation. This also removes a deepcopy call which will speed up indexing a little.

willkg · 2024-03-20T13:46:05Z

socorro/external/es/crashstorage.py

@@ -498,14 +498,10 @@ def save_processed_crash(self, raw_crash, processed_crash):
        es_doctype = self.get_doctype()
        all_valid_keys = self.get_keys(index_name, es_doctype)

-        src = {
-            "raw_crash": copy.deepcopy(raw_crash),


One less deepcopy should make it a little faster, though I'm not sure we'd notice.

willkg · 2024-03-20T13:47:18Z

socorro/tests/external/es/test_crashstorage.py

-        # Verify keys that aren't in super_search_fields aren't in the raw or processed
-        # crash parts
-        raw_crash = doc["_source"]["raw_crash"]
-        assert list(sorted(raw_crash.keys())) == []


One less "sort an empty list" in the codebase. 🎉

willkg · 2024-03-20T13:48:03Z

socorro/tests/external/es/test_crashstorage.py

            mm.assert_histogram("processor.es.processed_crash_size", value=96)
-            mm.assert_histogram("processor.es.crash_document_size", value=186)
+            mm.assert_histogram("processor.es.crash_document_size", value=169)


This value changes because the document no longer contains a "raw_crash":{},.

Now that I'm thinking about this, we could stop emitting the processed crash size. The "Socorro prod app metrics" dashboard only has a panel for the crash document size.

I'll fix this now--one less metric.

willkg · 2024-03-20T13:54:39Z

socorro/tests/external/es/test_super_search_fields.py

@@ -218,8 +217,7 @@ def test_validate_super_search_fields(name, properties):
        if properties.get("destination_keys"):
            for key in properties["destination_keys"]:
                possible_keys = [
-                    # Old keys we're probably migrating from
-                    f"raw_crash.{properties['in_database_name']}",


None of the keys start with raw_crash and we want to make sure that's true going forward. We only want to index values that have been normalized and validated and those are in the processed crash.

In bug 1763264, we finished all the work to stop using the raw crash when indexing crash report data. The only data we're indexing now comes from the processed crash after being normalized and validated by processor rules. This radically reduces problems with indexing from weird values. This cleans up the remnants of those old days by removing raw_crash from building documents, indexing, and related documentation. This also removes a deepcopy call which will speed up indexing a little.

willkg · 2024-03-20T23:49:01Z

Thank you!

willkg requested a review from a team as a code owner March 20, 2024 13:44

willkg commented Mar 20, 2024

View reviewed changes

willkg force-pushed the willkg-bug-1885978-raw branch from 4998e65 to 72595f9 Compare March 20, 2024 14:04

willkg requested a review from relud March 20, 2024 14:04

relud approved these changes Mar 20, 2024

View reviewed changes

willkg merged commit 8c84ea2 into main Mar 20, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug-1885978: remove raw crash from indexing code #6560

bug-1885978: remove raw crash from indexing code #6560

willkg commented Mar 20, 2024

willkg Mar 20, 2024

willkg Mar 20, 2024

willkg Mar 20, 2024

willkg Mar 20, 2024

willkg Mar 20, 2024

willkg commented Mar 20, 2024

bug-1885978: remove raw crash from indexing code #6560

bug-1885978: remove raw crash from indexing code #6560

Conversation

willkg commented Mar 20, 2024

willkg Mar 20, 2024

Choose a reason for hiding this comment

willkg Mar 20, 2024

Choose a reason for hiding this comment

willkg Mar 20, 2024

Choose a reason for hiding this comment

willkg Mar 20, 2024

Choose a reason for hiding this comment

willkg Mar 20, 2024

Choose a reason for hiding this comment

willkg commented Mar 20, 2024