bug-1886018: support pubsub crashqueue #6554

relud · 2024-03-11T15:45:25Z

No description provided.

relud · 2024-04-01T17:10:23Z

the make setup step pauses for like 6 minutes and then ends up with a "504 Deadline exceeded". What's going on is that make runservices doesn't start pubsub.

pubsub definitely needs to be added to runservices, but also i feel like either make setup should use a different container for the shell that defines depends_on in docker-compose.yml, or app should define depends_on. given that we don't have any extends in docker-compose.yml.

i'm in favor of changing app, but i don't have the context to be sure that's the right choice.'

edit: i'm going to commit the change to app, but i'm happy to remove/change that solution

relud · 2024-04-01T17:23:47Z

sometimes the crash report will get processed twice. I saw this happen twice in the minimal testing I did. Also, it looks like two threads each get the crash id, so it's getting processed twice at the same time.

I don't remember having this problem years ago when I did pubsub the first time around. Are we missing a setting somewhere in either the topic or subscription creation? Maybe this is a problem with the emulator we're using?

Also, I took a break to get a cup of coffee and after I came back, I can't reproduce it anymore.

I'm pretty sure the issue isn't the emulator, it's the pubsub client sdk. the data platform's ingestion-edge service has a flaky test where sometimes messages are published twice, and that's using a custom emulator.

last time I looked the data platform had a duplicate rate of ~6% between between the edge and bigquery, which goes edge->pubsub->dataflow->pubsub->java app->bigquery, and the dataflow job uses beam's built-in deduplication based on document_id over a 10-minute window.

is this something that will break socorro, or is it just wasted compute? if it doesn't break anything i'd be inclined to let it slide.

willkg · 2024-04-03T13:20:02Z

Getting processed multiple times and even having two things processing at the same time where one stomps on the other's results--Socorro should be fine with both of those situations. It's just wasted compute and if we're looking at numbers of crash reports collected vs. processed, there will be some discrepancy.

willkg

I pulled down the changes in this PR. I have a mostly default local dev env--very little is set in my my.env file. So it should be running in "AWS mode". When I run ./bin/process_crash.sh, it fails with an "unbound variable" error. That should get fixed. I provided a diff.

There were a couple of other comments of minor things.

I went through and tested the CLOUD_PROVIDER=GCP mode and processing, reprocessing with webapp, and reprocessing with api all work fine.

Almost there!

Makefile

docker-compose.yml

bin/process_crashes.sh

relud · 2024-04-03T15:42:53Z

ready for another pass

willkg

Looks good--thank you!

I went through this test plan:

with CLOUD_PROVIDER unset
1. does it build? does it rebuild?
2. can i verify it's using sqs and not pubsub?
3. does processing work?
4. does reprocessing api work?
5. does reprocessing via webapp work?
with CLOUD_PROVIDER=GCP
1. does it build? does it rebuild?
2. can i verify it's using pubsub and not sqs?
3. does processing work?
4. does reprocessing api work?
5. does reprocessing via webapp work?

willkg · 2024-04-03T17:45:57Z

bin/process_crashes.sh

@@ -51,7 +52,12 @@ mkdir "${DATADIR}" || echo "${DATADIR} already exists."
 ./bin/socorro_aws_s3.sh ls --recursive "s3://${CRASHSTORAGE_S3_BUCKET}/"

 # Add crash ids to queue
-./socorro-cmd sqs publish "${SQS_STANDARD_QUEUE}" $@
+# ^^ returns CLOUD_PROVIDER value as uppercase
+if [[ "${CLOUD_PROVIDER^^}" == "GCP" ]]; then


Wow! That's bonkers! I had no idea that existed.

relud force-pushed the pubsub-queue branch 9 times, most recently from 7665840 to e7c4b80 Compare March 18, 2024 16:09

relud requested a review from willkg March 18, 2024 16:10

relud marked this pull request as ready for review March 18, 2024 20:14

relud requested a review from a team as a code owner March 18, 2024 20:14

bug-1886018: support pubsub crashqueue

213462c

relud force-pushed the pubsub-queue branch from e7c4b80 to 213462c Compare March 18, 2024 20:27

relud changed the title ~~bug-1878423: support pubsub crashqueue~~ bug-1886018: support pubsub crashqueue Mar 18, 2024

This comment was marked as resolved.

Sign in to view

address review

14a2590

relud force-pushed the pubsub-queue branch from 66d1d11 to 14a2590 Compare March 21, 2024 16:42

relud requested a review from willkg March 21, 2024 16:53

This comment was marked as resolved.

Sign in to view

Merge branch 'main' into pubsub-queue

b5494b4

This comment was marked as resolved.

Sign in to view

Merge branch 'main' into pubsub-queue

a17ab34

address more review

21b3e7b

relud requested a review from willkg April 1, 2024 17:42

This comment was marked as resolved.

Sign in to view

willkg requested changes Apr 3, 2024

View reviewed changes

Makefile Show resolved Hide resolved

docker-compose.yml Show resolved Hide resolved

bin/process_crashes.sh Outdated Show resolved Hide resolved

relud added 2 commits April 3, 2024 08:41

Merge branch 'main' into pubsub-queue

da4c283

address more review

0f0e59f

relud requested a review from willkg April 3, 2024 15:42

willkg approved these changes Apr 3, 2024

View reviewed changes

relud merged commit e03e1fe into main Apr 3, 2024
1 check passed

relud deleted the pubsub-queue branch April 3, 2024 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug-1886018: support pubsub crashqueue #6554

bug-1886018: support pubsub crashqueue #6554

relud commented Mar 11, 2024 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

relud commented Apr 1, 2024 •

edited

relud commented Apr 1, 2024

This comment was marked as resolved.

willkg commented Apr 3, 2024

willkg left a comment

relud commented Apr 3, 2024

willkg left a comment •

edited

willkg Apr 3, 2024

bug-1886018: support pubsub crashqueue #6554

bug-1886018: support pubsub crashqueue #6554

Conversation

relud commented Mar 11, 2024 • edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

relud commented Apr 1, 2024 • edited

relud commented Apr 1, 2024

This comment was marked as resolved.

willkg commented Apr 3, 2024

willkg left a comment

Choose a reason for hiding this comment

relud commented Apr 3, 2024

willkg left a comment • edited

Choose a reason for hiding this comment

willkg Apr 3, 2024

Choose a reason for hiding this comment

relud commented Mar 11, 2024 •

edited

relud commented Apr 1, 2024 •

edited

willkg left a comment •

edited