Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

Support multiple pinpoint configs #25

Merged
merged 13 commits into from May 27, 2020

Conversation

zachmargolis
Copy link
Contributor

Goal: fail over to a backup pinpoint region if the first region errors

Approach: Set up the gem to accept an array of SMS and Voice configs. When making an SMS or Voice call, iterate over each of the configs, returning after the first successful one (otherwise fall back on the next one)

The gem was relying on a singleton config variable and TBH I'm not a fan (especially not a fan of all the stubbing in specs), but I tried to keep my changes to a minimum.

@@ -45,6 +67,11 @@ def handle_pinpoint_error(err)
)
end

def notify_pinpoint_failover(error)
# TODO: log some sort of message?
Telephony.config.logger.warn "error region: region"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pauldoomgov can you help me figure out what kind of warning, maybe even what format, we should log a warning here when we fail over? This will end up in telephony.log

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus far it seems we log one JSON line per-SMS. We don't have a region defined. So first I would say extend the normal message:

{
    "success": true,
    "errors": {},
    "request_id": "61526818-bd93-476b-bb6b-6c34efaf9655",
    "delivery_status": "SUCCESSFUL",
    "message_id": "s0f82mp1v3c9ultqqvjpv0gc6t7f4khsf4vqat80",
    "status_code": 200,
    "status_message": "MessageId: s0f82mp1v3c9ultqqvjpv0gc6t7f4khsf4vqat80",
    "adapter": "pinpoint",
    "region": "us-west-2",
    "channel": "sms",
    "context": "authentication"
}

And then two variations. The first would be if a failover occurred, but the message was delivered:

 {
    "success": true,
    "errors": {
        "region_failover": {
            "status_code": "500",
            "region": "us-west-1",
            "status_message": "Uhhhh.... SMS don't feel so goood...."
        }
    },
    "request_id": "61526818-bd93-476b-bb6b-6c34efaf9655",
    "delivery_status": "SUCCESSFUL",
    "message_id": "s0f82mp1v3c9ultqqvjpv0gc6t7f4khsf4vqat80",
    "status_code": 200,
    "status_message": "MessageId: s0f82mp1v3c9ultqqvjpv0gc6t7f4khsf4vqat80",
    "adapter": "pinpoint",
    "region": "us-east-1",
    "channel": "sms",
    "context": "authentication"
}

And another if both failed:

 {
    "success": false,
    "errors": {
        "region_failover": {
            "status_code": "500",
            "region": "us-west-1",
            "status_message": "Uhhhh.... SMS don't feel so goood...."
        },
       "other_stuff_that_normally_goes_here": "yep"
    },
    "request_id": "61526818-bd93-476b-bb6b-6c34efaf9655",
    "delivery_status": "FAILURE",
    "message_id": "s0f82mp1v3c9ultqqvjpv0gc6t7f4khsf4vqat80",
    "status_code": 500,
    "status_message": "Dunno...  Maybe try sending a telegram?",
    "adapter": "pinpoint",
    "region": "us-east-1",
    "channel": "sms",
    "context": "authentication"
}

This should prevent double-counting message failures or (maybe?) breaking any existing parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already log the message delivery status, so this is a separate message, if we want one.

c.twilio.voice_callback_encryption_key = Base64.strict_encode64('0' * 32)
c.twilio.voice_callback_base_url = 'https://example.com/api/voice'

c.pinpoint.add_sms_config do |sms|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a prototype of the IDP code that will use this new config stuff. It will check for a newer variable that's an array, otherwise it will fall back and add a single config based on the old variables

@jmhooper
Copy link
Member

Heads up, there are some instructions about using this thing in the README. I think those are out of date after this change.

@jmhooper
Copy link
Member

Incredible timing

README.md Outdated Show resolved Hide resolved
def client_configs
@client_configs ||= Telephony.config.pinpoint.sms_configs.map do |sms_config|
credentials = AwsCredentialBuilder.new(sms_config).call
args = { region: sms_config.region, retry_limit: 1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the retry_limit param here reminds me of a caveat.

IIRC the request here has a 5 second timeout. With 1 retry that means each config has 10 seconds total is can spend sending an SMS. If Pinpoint is timing out that means 2 configs will take 20 seconds. That will push the user's request to the IdP outside the 15 second timeout window.

Since we have a failover, does it makes sense to go to no retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah happy to remove that.

Also that reminded me I was talking to @pauldoomgov about setting a timeout that covers our 95th percentile, it I think it's much less than 5 seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny thing - NewRelic seems to think all we need is an average :( So I am not sure what our p95 is on PinPoint calls. I would guess 1.5s would be a sane starting point. That would give us 12sec if we fail, retry, failover+fail, retry. I think @jmhooper has a point about not retrying on failover.

One other item: Might be good to add timing output to the telephony output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference - 3hr window of PinPoint API calls in Prod

Screen Shot 2020-05-27 at 14 49 24

Copy link
Member

@jmhooper jmhooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you still have 2 CC issues to cleanup but LGTM otherwise

@zachmargolis
Copy link
Contributor Author

Yup! Waiting on some feedback from Paul, so the TODOs are intentionally failing the build for now :) Thanks for the review!

@zachmargolis zachmargolis merged commit c983718 into master May 27, 2020
@zachmargolis zachmargolis deleted the margolis-multiple-pinpoint-configs branch May 27, 2020 19:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants