Skip to content

Commit

Permalink
Merge pull request #7 from NoRedInk/multi-vpc-support
Browse files Browse the repository at this point in the history
Add support for cross-vpc cloning
  • Loading branch information
dgtized committed Apr 15, 2021
2 parents c204bff + 2e48dd6 commit 3c91007
Show file tree
Hide file tree
Showing 17 changed files with 537 additions and 40 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ test-results/
/clj/
/.cpcache/
/resources/role.edn
.terraform/
*.tfstate
*.tfstate.backup
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,14 @@

### Added

## [0.6.0]

- Added the ability to clone with source and target instances in different VPCs, by automatically falling back to `RestoreDBInstanceFromDBSnapshot`, instead of `CreateDBInstanceReadReplica`
- The restore will be done based on the latest snapshot available, which is usually at most 24h stale.
- Added `--restore-snapshot` to force even same-VPC clones to be done with `RestoreDBInstanceFromDBSnapshot` (it's currently faster)
- Added a terrafrom environment for testing

## [0.5.0]

- Added `--iam-policy` option for generating a IAM policy for a user or role to clone a replica with a minimal set of permissions.
- Updated dependencies

11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ data parity with production in a throw-away environment.

## Process

Suppose for testing or sales purposes it is necessary to maintain an independent application stack with it's own database, which needs to periodically refresh data from the production database. To simplify the illustration, this will focus on the changes to the [AWS RDS](https://aws.amazon.com/rds/) database replication graphs, and omit the application and other services it may depend on.
Suppose for testing or sales purposes it is necessary to maintain an independent application stack with it's own database, which needs to periodically refresh data from the production database. To simplify the illustration, this will focus on the changes to the [AWS RDS](https://aws.amazon.com/rds/) database replication graphs, and omit the application and other services it may depend on.

Consider two independent application stacks, production and demo, with primary databases `mitosis-prod` and `mitosis-demo` respectively. Each stack has a replication graph where a primary database is followed by one replica, ie `mitosis-prod` replicates to `mitosis-prod-replica` and `mitosis-demo` replicates to `mitosis-demo-replica`.

Expand All @@ -34,12 +34,14 @@ Once that is complete, it's safe to rename the `temp-` prefixed clones back to `

![img](doc/img/rename-2.png)

However, as this is a DNS swap, the application is likely still connected to the original `old-mitosis-demo`. By specifying a restart script, stack-mitosis can force the demo application to restart, and connect to the newly created `mitosis-demo` with fresh data from production. Once it has restarted the application successfully, it deletes the `old-` prefixed database instances from the original demo replication graph.
However, as this is a DNS swap, the application is likely still connected to the original `old-mitosis-demo`. By specifying a restart script, stack-mitosis can force the demo application to restart, and connect to the newly created `mitosis-demo` with fresh data from production. Once it has restarted the application successfully, it deletes the `old-` prefixed database instances from the original demo replication graph.

![img](doc/img/final.png)

Note that this replication graph is a simple case, but it supports replacing arbitrarily complex replication graphs on RDS and has been verified with mysql and postgres database engines. The postgres engine on RDS only allows multiple replicas of a single primary, but the Mysql engine on RDS allows cascading replicas of replicas. See the AWS documentation for [working with RDS read replicas](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html) for more information on these limitations.

In the case where source and target instances live in different VPCs, or in case `--restore-snapshot` is used, the first instance (`temp-mitosis-demo` here) is created by restoring the latest available snapshot, instead of using replication. This also skips the promote replica step. All other steps remain the same.

# Install

After installing a JDK, follow the [clojure install
Expand Down Expand Up @@ -79,10 +81,11 @@ Hopefully in the future this can be parsed directly from the `AWS_CONFIG` file.
[--credentials resources/role.edn]
[--plan]
[--iam-policy]
[--restore-snapshot]

## Flight Plan

The `--plan` flag will give a flight plan showing the expected list of API calls it's planning on executing against the Amazon API.
The `--plan` flag will give a flight plan showing the expected list of API calls it's planning on executing against the Amazon API.

```
$ clj -m stack-mitosis.cli --source mitosis-prod --target mitosis-demo --plan
Expand Down Expand Up @@ -179,7 +182,7 @@ This ensures that a continuous integration or cronjob server like Jenkins can cl

Cloudformation and Terraform are wonderful tools focused on declarative architecture transformation from one steady state to another. Stack-mitosis is focused on safely cloning the contents of a database in one environment to another without changing from one steady state to another. As example, for an environment with production and demo environments, they both exist in the correct configuration before running stack-mitosis, and then after running stack-mitosis the configuration remains the same but the demo environment has a fresh copy of the data from production.

I suspect this could also be accomplished using one of these declarative infrastructure tools by transitioning through multiple intervening states, but have not found any examples of anyone doing that.
I suspect this could also be accomplished using one of these declarative infrastructure tools by transitioning through multiple intervening states, but have not found any examples of anyone doing that.

# License

Expand Down
1 change: 1 addition & 0 deletions resources/log4j.properties
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ log4j.rootLogger=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} | %-5p | %t | %m%n
log4j.appender.console.Target=System.err
40 changes: 26 additions & 14 deletions src/stack_mitosis/cli.clj
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
[clojure.tools.cli :as cli]
[clojure.tools.logging :as log]
[stack-mitosis.interpreter :as interpreter]
[stack-mitosis.lookup :as lookup]
[stack-mitosis.planner :as plan]
[stack-mitosis.policy :as policy]
[stack-mitosis.request :as r]
Expand All @@ -20,6 +21,7 @@
["-c" "--credentials FILENAME" "Credentials file in edn for iam assume-role"]
["-p" "--plan" "Display expected flightplan for operation."]
["-i" "--iam-policy" "Generate IAM policy for planned actions."]
[nil "--restore-snapshot" "Always clone using snapshot restore."]
["-h" "--help"]])

(defn parse-args [args]
Expand Down Expand Up @@ -52,26 +54,36 @@
(let [rds (interpreter/client)
instances (interpreter/databases rds)]
(when (interpreter/verify-databases-exist instances [source target])
(let [tags (interpreter/list-tags rds instances target)
plan (plan/replace-tree instances source target
:restart restart :tags tags)]
(cond (:plan options)
(do (println (flight-plan (interpreter/check-plan instances plan)))
true)
(:iam-policy options)
(do (json/pprint (policy/from-plan instances plan))
true)
:else
(let [last-action (interpreter/evaluate-plan rds plan)]
(not (contains? last-action :ErrorResponse))))))))
(let [same-vpc (lookup/same-vpc?
(lookup/by-id instances source)
(lookup/by-id instances target))
use-restore-snapshot (or (:restore-snapshot options) (not same-vpc))
source-snapshot (if use-restore-snapshot
(interpreter/latest-snapshot rds source)
nil)]

(when (or (not use-restore-snapshot)
(interpreter/verify-snapshot-exists instances [source target]
source-snapshot))
(let [tags (interpreter/list-tags rds instances target)
plan (plan/replace-tree instances source source-snapshot target
:restart restart :tags tags)]
(cond (:plan options)
(do (println (flight-plan (interpreter/check-plan instances plan)))
true)
(:iam-policy options)
(do (json/pprint (policy/from-plan instances plan))
true)
:else
(let [last-action (interpreter/evaluate-plan rds plan)]
(not (contains? last-action :ErrorResponse))))))))))

(defn -main [& args]
(let [{:keys [ok exit-msg] :as options} (parse-args args)]
(when exit-msg
(println exit-msg)
(System/exit (if ok 0 1)))
(System/exit (if (process options) 0 1))
))
(System/exit (if (process options) 0 1))))

(comment
(process (parse-args ["--source" "mitosis-prod" "--target" "mitosis-demo"
Expand Down
31 changes: 28 additions & 3 deletions src/stack_mitosis/interpreter.clj
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,22 @@
false)
true)))

(defn verify-snapshot-exists
[instances identifiers snapshot]
(let [instances (map (partial lookup/by-id instances) identifiers)
vpcs (map #(get-in % [:DBSubnetGroup :VpcId]) instances)
cross-vpc-mitosis (-> vpcs distinct count (> 1))]
(if (and cross-vpc-mitosis (not snapshot))
(do
(log/error
(str/join "\n" ["Source database has no snapshots." ""
(str "Source and target databases are in different VPCs."
" When that happens, stack-mitosis uses "
"RestoreDBInstanceFromDBSnapshot to be able to"
" clone the source database to the target VPC.")]))
false)
true)))

(defn list-tags
"Mapping of db-id to tags list for each instance in a tree."
[rds instances target]
Expand All @@ -62,6 +78,15 @@
[db-id (:TagList (invoke-logged! rds (op/tags arn)))])))
(into {})))

(defn latest-snapshot
"Returns the latest snapshot for an instance"
[rds target]
(->> (invoke-logged! rds (op/list-snapshots target))
(:DBSnapshots)
(sort-by :SnapshotCreateTime)
(last)
(:DBSnapshotIdentifier)))

(defn describe
[rds id]
(invoke-logged! rds (op/describe id)))
Expand All @@ -75,7 +100,7 @@
(op/completed? (describe rds new-id)))]
[id #(op/completed? (describe rds id))])
started (. System (nanoTime))
ret (wait/poll-until completed-fn {:delay 60000 :max-attempts 120})
ret (wait/poll-until completed-fn {:delay 60000 :max-attempts 180})
msecs (/ (double (- (. System (nanoTime)) started)) 1000000.0)
status (-> (describe rds result-id) :DBInstances first :DBInstanceStatus)
msg (format "Completed after %.2fs with status %s" (/ msecs 1000) status)]
Expand Down Expand Up @@ -118,15 +143,15 @@
(sudo/sudo-provider (sudo/load-role "resources/role.edn"))
(def rds (client))
(-> (predict/state [] (example/create example/template))
(plan/replace-tree "mitosis-prod" "mitosis-demo"))
(plan/replace-tree "mitosis-prod" "mitosis-demo" nil))

(interpret rds (op/shell-command "echo restart"))
(evaluate-plan rds [(op/shell-command "true") (op/shell-command "false")
(op/shell-command "true")])

;; check plan
(let [state (databases rds)]
(check-plan state (plan/replace-tree state "mitosis-prod" "mitosis-demo")))
(check-plan state (plan/replace-tree state "mitosis-prod" "mitosis-demo" nil)))

;; create a copy of mitosis-prod tree
(let [state (databases rds)]
Expand Down
99 changes: 99 additions & 0 deletions src/stack_mitosis/lookup.clj
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,64 @@
(and (or (seq? v) (vector? v))
(empty? v))))

(defn same-vpc? [db-a db-b]
(= (get-in db-a [:DBSubnetGroup :VpcId]) (get-in db-b [:DBSubnetGroup :VpcId])))

(defn restore-snapshot-attributes
"Creates a list of additional attributes to clone from original instance into
the newly created replica instance.
https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_RestoreDBInstanceFromDBSnapshot.html
has more information on these attributes."
[original tags]
(let [attributes-to-clone
[:CopyTagsToSnapshot
:PubliclyAccessible
:AutoMinorVersionUpgrade
:DBInstanceClass
;; :DeletionProtection ; must be false for repeated invocation
;; :KmsKeyId ;; not supported by restore or modify
;; but is guaranteed to remain the same
;; it can only change when we copy a snapshot
;; :SourceRegion ; not applicable?
:ProcessorFeatures
;; :UseDefaultProcessorFeatures ; just copy features directly?
:Iops
:StorageType
:MultiAZ]

translated-attributes
{:Tags tags
:EnableIAMDatabaseAuthentication (:IAMDatabaseAuthenticationEnabled original)
:EnableCloudwatchLogsExports (:EnabledCloudwatchLogsExports original)
:Port (:Port (:Endpoint original))
:DBSubnetGroupName (:DBSubnetGroupName (:DBSubnetGroup original))

;; all active security groups ids
:VpcSecurityGroupIds
(->> original
:VpcSecurityGroups
(filter (fn [group] (= (:Status group) "active")))
(map :VpcSecurityGroupId))

;; first synchronized option group name
:OptionGroupName
(->> original
:OptionGroupMemberships
(some (fn [group]
(and (= (:Status group) "in-sync")
(:OptionGroupName group)))))
;; TODO map for names on original
;; :DomainMemberships -> :Domain, :DomainIAMRoleName
}]
(-> original
;; copy as-is with no translation
(select-keys attributes-to-clone)
;; Attributes requiring custom rules to extract from original and
;; translate to key for clone-replica request
(merge (into {} (remove (fn [[_ v]] (nil-or-empty? v))
translated-attributes))))))

(defn clone-replica-attributes
"Creates a list of additional attributes to clone from original instance into
the newly created replica instance.
Expand Down Expand Up @@ -122,3 +180,44 @@
;; translate to key for modify-db request
(merge (into {} (remove (fn [[_ v]] (nil-or-empty? v))
translated-attributes))))))

(defn post-restore-snapshot-attributes
"List of additional attributes to apply after creation.
Some parameters are not available or applicable at time of creation, so they
need to be applied after.
https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_ModifyDBInstance.html
has more information on these attributes."
[original]
(let [attributes-to-clone ;; attributes not supported by restore-snapshot
[:MonitoringRoleArn
:MonitoringInterval
:PerformanceInsightsKMSKeyId
:PerformanceInsightsRetentionPeriod
:PreferredMaintenanceWindow
:PreferredBackupWindow
;; TODO ?
;; :AllocatedStorage ;; Tricky, only supports increase
;; :MaxAllocatedStorage
]

translated-attributes
{:EnablePerformanceInsights (:PerformanceInsightsEnabled original) ;; restore_not_supported

;; Triggers "The specified DB instance is already in the target DB subnet group"
;; probably need to detect if changing? disabling for now
;; :DBSubnetGroupName (:DBSubnetGroupName (:DBSubnetGroup original))
;; first synchronized db parameter group name
:DBParameterGroupName
(->> original
:DBParameterGroups
(some (fn [group]
(and (= (:ParameterApplyStatus group) "in-sync")
(:DBParameterGroupName group)))))}]
(-> original
(select-keys attributes-to-clone)
;; Attributes requiring custom rules to extract from original and
;; translate to key for modify-db request
(merge (into {} (remove (fn [[_ v]] (nil-or-empty? v))
translated-attributes))))))
16 changes: 15 additions & 1 deletion src/stack_mitosis/operations.clj
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,20 @@
{:SourceDBInstanceIdentifier source
:DBInstanceIdentifier replica})}))

(defn restore-snapshot
([snapshot-id source target] (restore-snapshot snapshot-id source target {}))
([snapshot-id source target attributes]
{:op :RestoreDBInstanceFromDBSnapshot
:request (merge attributes
{:DBSnapshotIdentifier snapshot-id
:DBInstanceIdentifier target})
:meta {:SourceDBInstance source}}))

(defn list-snapshots
([target]
{:op :DescribeDBSnapshots
:request {:DBInstanceIdentifier target}}))

(defn promote
[id]
{:op :PromoteReadReplica
Expand Down Expand Up @@ -68,7 +82,7 @@
(defn blocking-operation?
[action]
(contains? #{:CreateDBInstance :CreateDBInstanceReadReplica
:PromoteReadReplica :ModifyDBInstance} (:op action)))
:PromoteReadReplica :ModifyDBInstance :RestoreDBInstanceFromDBSnapshot} (:op action)))

(defn transition-to
"Maps current rds status to in-progress, failed or done
Expand Down
Loading

0 comments on commit 3c91007

Please sign in to comment.