Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug-1885978: remove raw crash from indexing code #6560

Merged
merged 1 commit into from Mar 20, 2024
Merged

Conversation

willkg
Copy link
Collaborator

@willkg willkg commented Mar 20, 2024

In bug 1763264, we finished all the work to stop using the raw crash when indexing crash report data. The only data we're indexing now comes from the processed crash after being normalized and validated by processor rules. This radically reduces problems with indexing from weird values.

This cleans up the remnants of those old days by removing raw_crash from building documents, indexing, and related documentation. This also removes a deepcopy call which will speed up indexing a little.

@willkg willkg requested a review from a team as a code owner March 20, 2024 13:44
@@ -498,14 +498,10 @@ def save_processed_crash(self, raw_crash, processed_crash):
es_doctype = self.get_doctype()
all_valid_keys = self.get_keys(index_name, es_doctype)

src = {
"raw_crash": copy.deepcopy(raw_crash),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less deepcopy should make it a little faster, though I'm not sure we'd notice.

# Verify keys that aren't in super_search_fields aren't in the raw or processed
# crash parts
raw_crash = doc["_source"]["raw_crash"]
assert list(sorted(raw_crash.keys())) == []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less "sort an empty list" in the codebase. 🎉

mm.assert_histogram("processor.es.processed_crash_size", value=96)
mm.assert_histogram("processor.es.crash_document_size", value=186)
mm.assert_histogram("processor.es.crash_document_size", value=169)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value changes because the document no longer contains a "raw_crash":{},.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm thinking about this, we could stop emitting the processed crash size. The "Socorro prod app metrics" dashboard only has a panel for the crash document size.

I'll fix this now--one less metric.

@@ -218,8 +217,7 @@ def test_validate_super_search_fields(name, properties):
if properties.get("destination_keys"):
for key in properties["destination_keys"]:
possible_keys = [
# Old keys we're probably migrating from
f"raw_crash.{properties['in_database_name']}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the keys start with raw_crash and we want to make sure that's true going forward. We only want to index values that have been normalized and validated and those are in the processed crash.

In bug 1763264, we finished all the work to stop using the raw crash
when indexing crash report data. The only data we're indexing now comes
from the processed crash after being normalized and validated by
processor rules. This radically reduces problems with indexing from
weird values.

This cleans up the remnants of those old days by removing raw_crash from
building documents, indexing, and related documentation. This also
removes a deepcopy call which will speed up indexing a little.
@willkg willkg merged commit 8c84ea2 into main Mar 20, 2024
1 check passed
@willkg
Copy link
Collaborator Author

willkg commented Mar 20, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants