New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug-1886018: support pubsub crashqueue #6554
Conversation
7665840
to
e7c4b80
Compare
This comment was marked as resolved.
This comment was marked as resolved.
pubsub definitely needs to be added to runservices, but also i feel like either i'm in favor of changing app, but i don't have the context to be sure that's the right choice.' edit: i'm going to commit the change to app, but i'm happy to remove/change that solution |
I'm pretty sure the issue isn't the emulator, it's the pubsub client sdk. the data platform's ingestion-edge service has a flaky test where sometimes messages are published twice, and that's using a custom emulator. last time I looked the data platform had a duplicate rate of ~6% between between the edge and bigquery, which goes edge->pubsub->dataflow->pubsub->java app->bigquery, and the dataflow job uses beam's built-in deduplication based on document_id over a 10-minute window. is this something that will break socorro, or is it just wasted compute? if it doesn't break anything i'd be inclined to let it slide. |
This comment was marked as resolved.
This comment was marked as resolved.
Getting processed multiple times and even having two things processing at the same time where one stomps on the other's results--Socorro should be fine with both of those situations. It's just wasted compute and if we're looking at numbers of crash reports collected vs. processed, there will be some discrepancy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled down the changes in this PR. I have a mostly default local dev env--very little is set in my my.env
file. So it should be running in "AWS mode". When I run ./bin/process_crash.sh
, it fails with an "unbound variable" error. That should get fixed. I provided a diff.
There were a couple of other comments of minor things.
I went through and tested the CLOUD_PROVIDER=GCP
mode and processing, reprocessing with webapp, and reprocessing with api all work fine.
Almost there!
ready for another pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good--thank you!
I went through this test plan:
- with
CLOUD_PROVIDER
unset- does it build? does it rebuild?
- can i verify it's using sqs and not pubsub?
- does processing work?
- does reprocessing api work?
- does reprocessing via webapp work?
- with
CLOUD_PROVIDER=GCP
- does it build? does it rebuild?
- can i verify it's using pubsub and not sqs?
- does processing work?
- does reprocessing api work?
- does reprocessing via webapp work?
@@ -51,7 +52,12 @@ mkdir "${DATADIR}" || echo "${DATADIR} already exists." | |||
./bin/socorro_aws_s3.sh ls --recursive "s3://${CRASHSTORAGE_S3_BUCKET}/" | |||
|
|||
# Add crash ids to queue | |||
./socorro-cmd sqs publish "${SQS_STANDARD_QUEUE}" $@ | |||
# ^^ returns CLOUD_PROVIDER value as uppercase | |||
if [[ "${CLOUD_PROVIDER^^}" == "GCP" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! That's bonkers! I had no idea that existed.
No description provided.