New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug-1885978: remove raw crash from indexing code #6560
Conversation
@@ -498,14 +498,10 @@ def save_processed_crash(self, raw_crash, processed_crash): | |||
es_doctype = self.get_doctype() | |||
all_valid_keys = self.get_keys(index_name, es_doctype) | |||
|
|||
src = { | |||
"raw_crash": copy.deepcopy(raw_crash), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One less deepcopy should make it a little faster, though I'm not sure we'd notice.
# Verify keys that aren't in super_search_fields aren't in the raw or processed | ||
# crash parts | ||
raw_crash = doc["_source"]["raw_crash"] | ||
assert list(sorted(raw_crash.keys())) == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One less "sort an empty list" in the codebase. 🎉
mm.assert_histogram("processor.es.processed_crash_size", value=96) | ||
mm.assert_histogram("processor.es.crash_document_size", value=186) | ||
mm.assert_histogram("processor.es.crash_document_size", value=169) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value changes because the document no longer contains a "raw_crash":{},
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm thinking about this, we could stop emitting the processed crash size. The "Socorro prod app metrics" dashboard only has a panel for the crash document size.
I'll fix this now--one less metric.
@@ -218,8 +217,7 @@ def test_validate_super_search_fields(name, properties): | |||
if properties.get("destination_keys"): | |||
for key in properties["destination_keys"]: | |||
possible_keys = [ | |||
# Old keys we're probably migrating from | |||
f"raw_crash.{properties['in_database_name']}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the keys start with raw_crash
and we want to make sure that's true going forward. We only want to index values that have been normalized and validated and those are in the processed crash.
In bug 1763264, we finished all the work to stop using the raw crash when indexing crash report data. The only data we're indexing now comes from the processed crash after being normalized and validated by processor rules. This radically reduces problems with indexing from weird values. This cleans up the remnants of those old days by removing raw_crash from building documents, indexing, and related documentation. This also removes a deepcopy call which will speed up indexing a little.
4998e65
to
72595f9
Compare
Thank you! |
In bug 1763264, we finished all the work to stop using the raw crash when indexing crash report data. The only data we're indexing now comes from the processed crash after being normalized and validated by processor rules. This radically reduces problems with indexing from weird values.
This cleans up the remnants of those old days by removing
raw_crash
from building documents, indexing, and related documentation. This also removes a deepcopy call which will speed up indexing a little.