This project simulates a database migration from an Oracle Database (Version 12) to an Aurora Cluster (Postgres Engine). For simulation purpose, the source database (Oracle) is in one AWS Region (Default: Oregon) and the Aurora Cluster is in another (Default: N. Virginia).
The following diagram illustrates the migration architecture.
The previous image illustrates the main components of the architecture but be aware that terraform will help us to create 36 resources. For example, AWS DMS component will be composed of:
- 1 AWS DMS replication instance.
- 1 AWS DMS replication subnet group (With 2 subnets attached).
- 3 AWS DMS endpoints - One for the Oracle (
source
), one for Aurora-Postgres (target
) and the last one for S3 (target
). - 2 AWS DMS replication tasks - One for Aurora-Postgres and one for S3.
Important Note: At this point is important to be aware that the AWS DMS replication task
won't start automatically after the creation of this resource. Currently, you must start the task manually after the terrafrom apply
command execution. Another important aspect of this terraform code, is that you must stop the AWS DMS replication task
manually before executing terrafrom destroy
, if you don't do this, terraform will hit a timeout exception while destroying some resources because AWS can't release the resources attached to the replication tasks.
Create an EC2 instance with AWS Linux 2 AMI manually and connect to the instance using SSH.
To configure the EC2 instance, please download oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
from this site. Once you have downloaded the file, proceed with the installation:
sudo yum update -y
sudo yum install oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
sudo sh -c "echo /usr/lib/oracle/12.2/client64/lib > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo yum install git -y
sudo ldconfig
export LD_LIBRARY_PATH=/usr/lib/oracle/12.2/client64/lib:$LD_LIBRARY_PATH
python3 -m pip install cx_Oracle pandas --upgrade --user
Note: If you plan to use another oracle version, you have change the value of engine_version
in the rds.tf
file located inside the module source
to match the version you need to use.
Download the dgen project from its repository. The steps shown in the dgen repository to install and configure this tool are the same we are executin here in 'Create and Configure an EC2 Instance'
cd $HOME
git clone <this-repo>
echo 'export $PATH:"<path-to-the-bin-folder-of-dgen>"' >> .bashrc
source .bashrc
To create the AMI please execute the following command, using the right Instance :
aws iam ....
Because the EC2 instance created through the EC2 resource of terraform is configured with the usage of a key, you have to create the one asymmetric key. To do this please execute:
ssh-keygen XXX
Note: Be aware of the destination of your key. By default, the private key will be located inside the ~/.ssh/
folder, but if you have changed the location, please update the value of public_key
in the ec2.tf
file located inside the module source
, or allocate the key private key in the right place to avoid making adjustments to the terraform code.
Create a file named var.tfvars
at the root of the project.
nano var.tfvars
Paste the following text writing the right values:
ec2_ami = "<value>"
key_name = "<value>"
key_path = "<value>"
source_db_username = "<value>"
source_db_password = "<value>"
target_db_username = "<value>"
target_db_password = "<value>"
For ec2_ami
variable, use the id from AMI created in the previous steps. For the other ones, just write the appropriate values for your implementation.
For this step you don`t have to do anything. This is only to check the stablished configuration defined this project. If you need to change something, go ahead to the dms.tf
file located in the ```target``` module.
- Table mappings:
{
rules = [
{
object-locator = {
schema-name = "SCHEMANAME"
table-name = "%"
}
rule-action = "include"
rule-id = "1"
rule-name = "1"
rule-type = "selection"
},
]
}
Check the appendix section to identify the Replication task settings
for oracle-to-aurora
and oracle-to-s3
.
terraform apply -var-file=vars.tfvars -auto-approve
In this step we have define the credencials used to connect to the Oracle database using:
dgen config -l = <oracledb-source.XXXXXXXXX.region.rds.amazonaws.com> \
-s = <SID> \
-u = <username> \
-x = <userpassword>
Then we can execute the inicialization of the schema implemented by dgen using:
dgen init
This action will create 3 tables with the following schema:
We can print the information related to each table using:
dgen data
And also we can populate tables using the following command (we will insert data only in table 1):
dgen insert -t table1 -n 10 -b 1,2,3
In this example the argument -t
is used to define the table, -n
is used to define the number of new rows to create and -b
is used to define the blocks we need. Finally, we can print again the information of each table using dgen data
.
Important:
The project dgen use a variable called block on each table, this variable is used to create tags over registers while using dgen insert
using the argument -b
.
Those tags can also be used to identify and execute Updates or Delete actions over the registers which contain them. We can use dgen delete -t table1 -b 1,2
to delete register that contain block 1
or 2
, and also use dgen update -t table1 -b 1 -m 'update-message'
to update registers with block equals to 1
and change the value of the cdc
variable.
To learn more about the dgen command please go to the documentation.
To test the available endpoints (connections) we have to start the testing process manually with the console, in the AWS DMS > endpoints section, or with the following commands replacing the values with the right outputs of the terraform apply
execution:
- Testing the connection (Oracle | Aurora-Postgres | S3):
aws dms test-connection \
--replication-instance-arn <value> \
--endpoint-arn <value>
- Checking the connections:
Check that after executing the following command each endpoint must show "Status": "successful"
.
aws dms describe-connections
Like the previous step, to execute the migration tasks we have to start them manually with the console, in the AWS DMS > replication task section, or with the following commands replacing the values with the right outputs of the terraform apply
execution:
- Run task (Aurora-Postgres | S3):
aws dms start-replication-task \
--replication-task-arn <value> \
--start-replication-task-type start-replication
Note that we are using start-replication
for replication task type, but you can also use resume-processing
. Check this link to learn more about the task starting process.
- Checking the tasks:
aws dms describe-replication-tasks
aws dms stop-replication-task \
--replication-task-arn <value>
This process cloud take a couple of minutes and at the end we have to see "Status": "stopped"
with the following command.
- Checking the tasks:
aws dms describe-replication-tasks
After checking the status (has to be stopped
), we can destroy all the resources we created with:
terraform destroy -var-file=vars.tfvars -auto-approve
At this point you have learned how to migrate an Oracle Database with the Ongoing Changes (CDC) schema to an Aurora cluster and a S3 Bucket.
I hope you have enjoyed this material. Thanks 😄
Terraform:
- Multi-environment and multi-region infrastructure
- Multi-region deployment
- AWS resources across multiple regions
- Migrating oracle to AWS aurora postgress
Oracle SQL and Data Structure Example:
Python lib for Oracle connectivity:
- Oracle-to-Aurora Replication task settings:
{
BeforeImageSettings = null
ChangeProcessingDdlHandlingPolicy = {
HandleSourceTableAltered = true
HandleSourceTableDropped = true
HandleSourceTableTruncated = true
}
ChangeProcessingTuning = {
BatchApplyMemoryLimit = 500
BatchApplyPreserveTransaction = true
BatchApplyTimeoutMax = 30
BatchApplyTimeoutMin = 1
BatchSplitSize = 0
CommitTimeout = 1
MemoryKeepTime = 60
MemoryLimitTotal = 1024
MinTransactionSize = 1000
StatementCacheSize = 50
}
CharacterSetSettings = null
ControlTablesSettings = {
ControlSchema = ""
FullLoadExceptionTableEnabled = false
HistoryTableEnabled = false
HistoryTimeslotInMinutes = 5
StatusTableEnabled = false
SuspendedTablesTableEnabled = false
}
ErrorBehavior = {
ApplyErrorDeletePolicy = "IGNORE_RECORD"
ApplyErrorEscalationCount = 0
ApplyErrorEscalationPolicy = "LOG_ERROR"
ApplyErrorFailOnTruncationDdl = false
ApplyErrorInsertPolicy = "LOG_ERROR"
ApplyErrorUpdatePolicy = "LOG_ERROR"
DataErrorEscalationCount = 0
DataErrorEscalationPolicy = "SUSPEND_TABLE"
DataErrorPolicy = "LOG_ERROR"
DataTruncationErrorPolicy = "LOG_ERROR"
FailOnNoTablesCaptured = true
FailOnTransactionConsistencyBreached = false
FullLoadIgnoreConflicts = true
RecoverableErrorCount = -1
RecoverableErrorInterval = 5
RecoverableErrorStopRetryAfterThrottlingMax = true
RecoverableErrorThrottling = true
RecoverableErrorThrottlingMax = 1800
TableErrorEscalationCount = 0
TableErrorEscalationPolicy = "STOP_TASK"
TableErrorPolicy = "SUSPEND_TABLE"
}
FailTaskWhenCleanTaskResourceFailed = false
FullLoadSettings = {
CommitRate = 10000
CreatePkAfterFullLoad = false
MaxFullLoadSubTasks = 8
StopTaskCachedChangesApplied = false
StopTaskCachedChangesNotApplied = false
TargetTablePrepMode = "DROP_AND_CREATE"
TransactionConsistencyTimeout = 600
}
Logging = {
EnableLogging = true
LogComponents = [
{
Id = "TRANSFORMATION"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "SOURCE_UNLOAD"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "IO"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "TARGET_LOAD"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "PERFORMANCE"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "SOURCE_CAPTURE"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "SORTER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "REST_SERVER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "VALIDATOR_EXT"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "TARGET_APPLY"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "TASK_MANAGER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "TABLES_MANAGER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "METADATA_MANAGER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "FILE_FACTORY"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "COMMON"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "ADDONS"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "DATA_STRUCTURE"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "COMMUNICATION"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
{
Id = "FILE_TRANSFER"
Severity = "LOGGER_SEVERITY_DEFAULT"
},
]
}
LoopbackPreventionSettings = null
PostProcessingRules = null
StreamBufferSettings = {
CtrlStreamBufferSizeInMB = 5
StreamBufferCount = 3
StreamBufferSizeInMB = 8
}
TargetMetadata = {
BatchApplyEnabled = false
FullLobMode = false
InlineLobMaxSize = 0
LimitedSizeLobMode = true
LoadMaxFileSize = 0
LobChunkSize = 0
LobMaxSize = 32
ParallelApplyBufferSize = 0
ParallelApplyQueuesPerThread = 0
ParallelApplyThreads = 0
ParallelLoadBufferSize = 0
ParallelLoadQueuesPerThread = 0
ParallelLoadThreads = 0
SupportLobs = true
TargetSchema = ""
TaskRecoveryTableEnabled = false
}
ValidationSettings = {
EnableValidation = true
FailureMaxCount = 10000
HandleCollationDiff = false
MaxKeyColumnSize = 8096
PartitionSize = 10000
RecordFailureDelayInMinutes = 5
RecordFailureDelayLimitInMinutes = 0
RecordSuspendDelayInMinutes = 30
SkipLobColumns = false
TableFailureMaxCount = 1000
ThreadCount = 5
ValidationMode = "ROW_LEVEL"
ValidationOnly = false
ValidationPartialLobSize = 0
ValidationQueryCdcDelaySeconds = 0
}
}
Be aware of TargetTablePrepMode
if you plan to use AWS SCT (Schema Conversion Tool).