Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan the "production" deployment configuration for Soy #41

Closed
2 tasks
brian-lc opened this issue Jan 9, 2019 · 17 comments
Closed
2 tasks

Plan the "production" deployment configuration for Soy #41

brian-lc opened this issue Jan 9, 2019 · 17 comments
Assignees

Comments

@brian-lc
Copy link

brian-lc commented Jan 9, 2019

Overview

Propose a plan to configure aws accounts, users, roles, such that we can feel comfortable using this setup to deploy "production" versions of our projects.

Reference

"The Plan" lives as a living post in the first #41 (comment)

Questions

  • Do we need two accounts or is IAM good enough?

Assumptions

  • Developer accounts should not be allowed to change production assets
  • Developers should be able to change production assets in an emergency
  • Travis should be able to change production
  • Some kind of integration test suite should exist or be possible
  • Developer accounts should be able to see production logs
  • uptime monitoring should be in place.
    • Downtime alerts sent via slack

Acceptance

  • Documented plan for soy in production

Tasks

  • A list of tasks that need
  • to be done to call this issue done
@brian-lc brian-lc self-assigned this Jan 9, 2019
@barlock barlock assigned barlock and unassigned brian-lc Jan 16, 2019
@barlock barlock added this to the Sprint - 1/25 milestone Jan 16, 2019
This was referenced Jan 16, 2019
@barlock
Copy link
Contributor

barlock commented Jan 22, 2019

The Plan

This post will serve as a living document. It will evolve as needed based on discussion on and off this issue. It will cease to evolve after the issue is closed and proceed to exist as execution hopefully through permissioning and strategy defined as code.

Permissioning

We will have two accounts, Web3Studio - Dev and Web3Studio - Prod. The Dev account can be used by developers as a sandbox for testing and developing resources. The prod account will be used for production resources.

Devs will be granted full access to Dev and Prod. CI (Travis) will only deploy into prod. In the future it could be possible that CI will deploy into Dev first, run tests, then deploy into prod.

Dev's need to be careful to, if they set up their local machines to access prod, to make sure their credentials file has the prod profile set as non-default to avoid accidental pushes.

https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html

# ~/.aws/credentials
[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

[user1]
aws_access_key_id=AKIAI44QH8DHBEXAMPLE
aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY

Monitoring

Route 53 can monitor endpoints and post to SNS topics and cloudwatch.

At first pass, we will monitor web3studio.eth.soy/web3studio/ and consensys.net/web3studio/

Deployments

Travis CI will have a functional user made that can access prod. This is the defacto primary way to create and update deployments.

While devs have full ability to control it as well, the standard is to only do so in emergencies and only have travis do the pushing in the normal case.

@brian-lc
Copy link
Author

We should also add 3rd party monitoring through Pingdom. I think they might still have a free version for monitoring a single site. Any downtime alerts from pingdom should be sent to the web3studio gmail group.

@brian-lc
Copy link
Author

For the question about how to isolate the production deployment... I've done this with multiple AWS accounts. This provides the most isolation. However, I'm not sure how much effort there is to get a new production-only AWS account tied to the master Consensys billing org. It might be less of a hassle to deploy prod and dev/test in the same AWS account but use different user accounts for each one. Security and configuration wise both options are the same (two accounts vs one account). Having two separate accounts makes billing and cost accounting easier such that might be worth enough to justify the effort to set up the prod-only AWS account.

Going with the prod IAM user accounts inside the single prod/dev/test AWS account we'd probably want to setup different users for 'prod'. This would not offer much in the way of security because everyone on our team would have an account anyway but it could help prevent accidental changes to production. I'd imagine we'd have users like 'username' for dev/test and 'username-prod' for production only. There would need to be a separate production role and this role would have full access to all production resources and nothing else. Likewise the dev/test IAM users would not be allowed to add the production role/access

@barlock
Copy link
Contributor

barlock commented Jan 22, 2019

We should also add 3rd party monitoring through Pingdom. I think they might still have a free version for monitoring a single site.

It doesn't look like they have a free version 😢.

Are you thinking that a third party is better so the infrastructure isn't tied to a single provider? What features are you looking for in a third party over just aws? I've used Pingdom, it's great, lots of awesome features that aws doesn't have. New Relic I've also had really good experiences with.

My gut here is that the added hassle of figuring out how to pay for a third party that isn't AWS will be more pain than it's worth (yet).

@barlock
Copy link
Contributor

barlock commented Jan 22, 2019

After some digging. It's absolutely possible to protect everything in a single account. For stack resources, which is what Soy is using now there are some pretty good docs. It seems like it would be clumsy at best to make sure we were protecting the correct things.

It's not hard to get a second account. I've put in a request and we should have it tomorrow. After asking around the mesh, it seems like this is what people tend to do as it's fairly frictionless (just login) for a good amount of isolation.

It will be a bit of a pain to transfer the domain and cert, but, worth it.

@brian-lc
Copy link
Author

Are you thinking that a third party is better so the infrastructure isn't tied to a single provider? What features are you looking for in a third party over just aws?

Yes, pingdom simulates the traffic from users so I trust it to be a more accurate reflection of the status of the site.

I'll just add it to my account. I'm paying for it already for my personal site.

@brian-lc
Copy link
Author

Pingdom also has helpful performance metrics based on lighthouse. Here's a manual run of the site https://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fconsensys.net%2Fweb3studio%2F

@barlock
Copy link
Contributor

barlock commented Jan 23, 2019 via email

@barlock barlock mentioned this issue Jan 23, 2019
@barlock
Copy link
Contributor

barlock commented Jan 23, 2019

@BreakPointer Looking at the pingdom pricing you need the Standard plan for alerting. Do you have a plan that enables that?

I'm thinking for now we wire up both route53 health checks (for alerting) and pingdom (for better information). If we can get just a larger paid plan, we can switch to just pingdom.

Ideally, i'd like to get our configuration as code. Pingdom does have an API but it looks like it would require some scripting. I'm a bit surprised I'm not able to find any projects that do it for you in a CI environment! There is a pretty cool k8s ingress monitor that automates it... but that isn't helpful for us right now.

@humbitious
Copy link

Advise going with Amazon for now and upping to pingdom after experimental phase.

@brian-lc
Copy link
Author

brian-lc commented Jan 24, 2019

Sounds good. I have added our site to my personal pingdom (I have the "Starter" plan at $14/mo). This is enough CYA for me to sleep better at night (or not sleep... if the site goes down).

Also, I'm not sure about your comment about needing the 'standard' plan for alerting @barlock That is not my understanding and I don't see anything on the subscription page to that effect.

@barlock
Copy link
Contributor

barlock commented Jan 24, 2019

I could be misreading, it says Threshold alerting isn't included in standard.

Is it possible to get those notifications sent into the alarms channel? I'd be curious to see what it catches that amazon doesn't.

@brian-lc
Copy link
Author

Oh, perhaps that's for setting some custom threshold on something like response time. For what I have set up it really is just "uptime" alerting. I get emails/SMS about the site being down. Oh, and performance alerting for slow load times... But with my plan I only get performance monitoring on one site.

@brian-lc
Copy link
Author

brian-lc commented Jan 24, 2019

The whole reason I wanted something like pingdom is just uptime alerting (although their performance stuff is pretty neat and I've found it very useful). From my experience I have learned that it's important to have something other than your own tooling/system/host telling you that everything is okay.

@brian-lc
Copy link
Author

Is it possible to get those notifications sent into the alarms channel? I'd be curious to see what it catches that amazon doesn't.

My options are SMS or email and I think with my account type they can only go to me.

@barlock
Copy link
Contributor

barlock commented Jan 25, 2019

The whole reason I wanted something like pingdom is just uptime alerting (although their performance stuff is pretty neat and I've found it very useful). From my experience I have learned that it's important to have something other than your own tooling/system/host telling you that everything is okay.

I couldn't agree more. Have you seen what Route53+CloudWatch gives us? It pings our endpoints every 30 seconds (from a bunch of regions) and if it fails if it doesn't get a success response 3 times. Upon failure, it notifies our slack alarms channel.
It also measures TTFB and SSL Handshake time.

Here are the charts https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#cw:dashboard=Route53

My options are SMS or email and I think with my account type they can only go to me.

Email is fine, slack has an integration that lets you forward notifications to a channel. That's what I'm using for the aws checks as well. I can send you that address offline.

@barlock barlock closed this as completed Jan 25, 2019
@brian-lc
Copy link
Author

brian-lc commented Jan 25, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants