# AWS Databases

## Overview

Standard Features

* VPC deployment
* Security groups to limit access
* Encryption at rest, in transit
* Replication for HA/Durability
* Automated backup
* Some support multi master across regions


Services

* RDS - Relational database service
    * Aurora, PostgreSQL, MySQL, Oracle, SQL Server
    * Used for trad apps, ERP, CRM, e-commerce
* Redshift - data warehouse
    * Massively parallel columnar db
    * Integrates with S3
* NoSQL
    * DynamoDB, Cassandra, DocumentDB
    * Unlimited scale, single digit millisecond latency
* ElasticCache
    * In memory database
    * memcachedd and Redis version
* Neptune
    * Graph database, optimized for highly connected datasets, querying relationships
* TimeStream
    * time series database for storing and querying volume timeseries data at 1/10 the cost of relational databases
    * iot, industrial telemery, devops
* Quantum Ledger Database
    * Blockchain based system for transparent, immutable, and cryptographically verifiable transaction log
* ElasticSearch
    * search database, store , analyzed and correlate logs from disparate applications and services
    
AWS Database Migration Service

* One-time data replication
* Continuous data replication from on-premises to AWS cloud (and reverse)
* Homogenous and heterogenous replication

## Relational Database Service

Characteristics:

* General purpose - can model anything
* Rigid schema - difficult to change
* SQL - flexible querying method
* Complex, requiring specialized skills to administer
* Scaling challenges

RDS

* Automates time consuming admin tasks (hardware, installation, patching, backup)
* Production ready db in minutes
* Push button scaling (cpu, memory, storage)
* Six engines 

HA

* Primary in one AZ, Secondary in another AZ
* Writes are synchronously replication to both primary and secondary (multi az config)
* Primary does down - secondary is promoted to primary
* Can configure RDS to backup automatically to S3
    * Configurable for retention up to 35 days
    * Last restorable time - typically within last 5 minutes
    * Point in time restore up to specified second (in a new instance)
* Can take snapshot backups to
    * User initiated
    * Kept until explicitly deleted
    * Suitable for long term retention
    * Copy to another region
    
* Read replica
    * Supports read only traffic
    * Data replicated asynchronously, data can be stale
    * One of more replicas based on your engine type
    
Scaling CPU and Memory

* Specify desired CPU and memory - RDS takes care of it
* Completes in a few minutes (will be an interrupetion in service - RDS performs failover during compute scaling)
* Scaling can be scheduled during next maintenance window or applied immediately

Scaling Storage

* Can be scaled without interruption
* SQL server up to 16 TB, Aurora up to 64 TB, MySql MariaDB PostgresSQL Oracle up to 32 TB
* Can be immediate or scheduled for maint window

Permissions and Encryption

* IAM for control plane access
* DB specific user for data plane access
* Optional encryption at rest using AWS key management service (KMS)
* Optional encrypted connection support using TLS

Customization, Optimization

* Parameter groups
* Guidance via analyzing usage and config
* Reserved instances for long term use
* Use AWS Config to monitor config drift
* CloudWatch for monitoring

## Aurora

Improvements over traditional DBs

* Storage subsystem does replication, two copies in each AZ, 6 copies
    * Writes are quorem based, acked when 4/6 written
    * Read replicas can read directly from storage subsystem, low lag for read replicas
* Failover is fast


Features

* MySQL and PostgresSQL compat modes
* Up to 5x faster than MySQL
* Up to 3x faster than Postgres
* Security, availability, reliability of commercial databases at 1/10 the cost
* Up to 15 read replicas
* Global database option - multiregion replication

Endpoints

* Cluster endpoint - for writes and reads
* Reader Endpoint - points to read replicas, load balanced at connection level
* Instance - points to individual instances

Serverless

* Processing and storage are decoupled
* Can remove idling processing capacity
* Attach processing to storage on demand
* Good for intermittent or unpredicatable
    * Spec min and max aurora units, can scale up and down
    
    
## NoSQL Databases

### DynamoDB

* Key/value store
* Flexible - only primary key needs to be defined, all columns/attributes flexible
* COnsistent performance - single digit performance when reading/writing with primary keys

Keys

* Single
* Composite - partition key, sort key
* Data is stored in partitions
    * Partition key -> hash -> server with data

Want partition key with a large number of unique values

DDB features

* Automated replication of data across AZs
* Global tables - multiregion, multimaster
* Transactions - coordinate actions across multiple items and tables
* Point in time recovery - automated continuous backup (35 day retention)
* On demand backup/snapshot for long term retention
* Automated deletion of expired items - TTL
* Limits - item cannot exceed 400K

### Cassandra, DocumentDB

Amazon managed cassandra

* Move cassandra workloads to the cloud
* Performance benefits similar to DDB
* Use for industrial equipment data collection, other use cases that require high performance and large number of columns


Cassandra vs DDB

* Can have multi-column partition and sort keys
* Cassandra can support up to 2 GBs per column, general practice is not to exceed a few MBs
* Unlimited number of columns without size limit for row

DocumentDB

* Offers API compatibility with MongoDB
* Drift in API from managed service and mongo

## Elasticache

* In memory datastore with sub-millisecond latency
* Ideal for frequently read data, reduce read traffic going to db,buffer high frequency writes and periodically reconcile with backend database
* Usage
    * Product review and rating
    * Caching
    * Session management
    * Gaming leaderboards
    * Geospatial apps
* Deploy in your VPC
* Two engines - memcached and redis

Memecached

* Key-value store
* Scales up to 20 nodes and 12.7 TB
* Sub millisecond latency

Redis

* Can store strings, lists, sorted set, hash, bit arrays
    * Sorted sets work great for leaderboards
* Built in commands for geospatial data
* Sub milli-second latency
* Scales up to 250 nodes and 170 TB

Other features (Redis)

* pub/sub and messaging
* read replica across multiple AZs
* detects node failure, promotes read replica to primary
* backup, restore
* export to another region
* lua scripting

## RedShift

* Peta Byte Scale Massively Parallel Relational Database
* Cluster consists of Leader Node and Multiple Compute Nodes
    * Available Storage = Storage per Compute Node X Number of Compute nodes
* Columnar Storage
* Targeted Data Compression
* Powerful SQL based Analytics
* With Redshift Spectrum - query can span tables in Redshift and files stored in S3 Data Lake
