# Databases and AWS
> Introduction to AWS Databases

- toc: true 
- comments: true
- author: Ankush Agarwal
- categories: [aws, databases]

### Introduction
    Amazon RDS provides support for six popular relational database engines: 
        MySQL, Oracle, PostgreSQL, Microsoft SQL Server, MariaDB, and Amazon Aurora
    Amazon Redshift is a high-performance data warehouse designed specifically for OLAP use cases.
    Traditional relational databases are difficult to scale beyond a single server without significant 
        engineering and cost, but a NoSQL architecture allows for horizontal scalability on commodity hardware.
    Amazon RDS makes it easy to replicate your data to increase availability, improve durability, or scale 
        up or beyond a single database instance for read-heavy database workloads.

### RDS
    Amazon RDS MySQL,PostgreSQL,MariaDB,Oracle,Microsoft SQL Server supports Multi-AZ deployments for 
        high availability and read replicas for horizontal scaling
    Amazon Aurora can deliver up to five times the performance of MySQL without requiring changes to most 
        of your existing web applications. 
    An Amazon Aurora DB cluster consists of two different types of instances:
        Primary Instance 
            This is the main instance, which supports both read and write workloads. 
            When you modify your data, you are modifying the primary instance. 
            Each Amazon Aurora DB cluster has one primary instance.
        Amazon Aurora Replica 
            This is a secondary instance that supports only read operations. 
            Each DB cluster can have up to 15 Amazon Aurora Replicas in addition to the primary instance.
            By using multiple Amazon Aurora Replicas, you can distribute the read workload among various 
                instances, increasing performance. 
            You can also locate your Amazon Aurora Replicas in multiple Availability Zones to increase 
                your database availability.

#### Storage Options
    Amazon RDS is built using Amazon Elastic Block Store (Amazon EBS) and allows you to select the right 
        storage option based on your performance and cost requirements. 
    Depending on the database engine and workload, you can scale up to 4 to 6TB in provisioned storage 
        and up to 30,000 IOPS. 
    Amazon RDS supports three storage types: Magnetic, General Purpose (Solid State Drive [SSD]), 
        and Provisioned IOPS (SSD)
    
    Magnetic Magnetic storage, also called standard storage, offers cost-effective storage that is ideal 
        for applications with light I/O requirements.
    General Purpose (SSD) General purpose (SSD)-backed storage, also called gp2, can provide faster access 
        than magnetic storage. This storage type can provide burst performance to meet spikes and is excellent 
        for small- to medium-sized databases.
    Provisioned IOPS (SSD) Provisioned IOPS (SSD) storage is designed to meet the needs of I/O-intensive 
        workloads, particularly database workloads, that are sensitive to storage performance and consistency 
        in random access I/O throughput.

#### Backup and Recovery
    RPO is defined as the maximum period of data loss that is acceptable in the event of a failure or incident
    RTO is defined as the maximum amount of downtime that is permitted to recover from backup and to 
        resume processing
        
    Automated Backups
        An automated backup is an Amazon RDS feature that continuously tracks changes and backs up your database
        One day of backups will be retained by default, but you can modify the retention period up to a 
            maximum of 35 days
    Manual DB Snapshots
        In addition to automated backups, you can perform manual DB snapshots at any time. 
        Unlike automated snapshots that are deleted after the retention period, manual DB snapshots are 
            kept until you explicitly delete them with the Amazon RDS console or the DeleteDBSnapshot action.
        You cannot restore from a DB snapshot to an existing DB Instance; a new DB Instance is created when 
            you restore. 
        When you restore a DB Instance, only the default DB parameter and security groups are associated 
            with the restored instance. 

#### High Availability with Multi-AZ
    Multi-AZ allows you to place a secondary copy of your database in another Availability Zone for 
        disaster recovery purposes
    Amazon RDS automatically replicates the data from the master database or primary instance to the 
        slave database or secondary instance using synchronous replication.
    To improve database performance using multiple DB Instances, use read replicas or other DB caching 
        technologies such as Amazon ElastiCache.

#### Scaling Up and Out
    Vertical Scalability
        Storage expansion is supported for all of the database engines except for SQL Server.
    Horizontal Scalability with Partitioning
        A relational database can be scaled vertically only so much before you reach the maximum instance size.
        Partitioning a large relational database into multiple instances or shards is a common technique 
            for handling more requests beyond the capabilities of a single instance.
        The application needs to decide how to route database requests to the correct shard and becomes 
            limited in the types of queries that can be performed across server boundaries.
        NoSQL databases like Amazon DynamoDB or Cassandra are designed to scale horizontally.
    Horizontal Scalability with Read Replicas
        Another important scaling technique is to use read replicas to offload read transactions from the 
            primary database and increase the overall number of transactions
        Read replicas are currently supported in Amazon RDS for MySQL, PostgreSQL, MariaDB and Amazon Aurora

#### Security
    Protect access to your infrastructure resources using AWS Identity and Access Management (IAM) policies 
        that limit which actions AWS administrators can perform
    Another security best practice is to deploy your Amazon RDS DB Instances into a private subnet within 
        an Amazon Virtual Private Cloud (Amazon VPC) that limits network access to the DB Instance. 
    Restrict network access using network Access Control Lists (ACLs) and security groups to limit inbound 
        traffic to a short list of source IP addresses.
    At the database level, you will also need to create users and grant them permissions to read and write 
        to your databases. 
    Access to the database is controlled using the database engine-specific access control and user 
        management mechanisms
    Finally, protect the confidentiality of your data in transit and at rest with multiple encryption 
        capabilities provided with Amazon RDS.
    You can securely connect a client to a running DB Instance using Secure Sockets Layer (SSL) to protect 
        data in transit. 
    Encryption at rest is possible for all engines using the Amazon Key Management Service (KMS) or 
        Transparent Data Encryption (TDE). 
    All logs, backups, and snapshots are encrypted for an encrypted Amazon RDS instance.

### Amazon Redshift
    Amazon Redshift is a fast, powerful, fully managed, petabyte-scale data warehouse service in the cloud. 
    Amazon Redshift is a relational database designed for OLAP scenarios and optimized for high-performance
        analysis and reporting of very large datasets. 
    Amazon Redshift is based on industry-standard PostgreSQL, so most existing SQL client applications 
        will work with only minimal changes.
    The key component of an Amazon Redshift data warehouse is a cluster. 
    A cluster is composed of a leader node and one or more compute nodes. 
    The client application interacts directly only with the leader node, and the compute nodes are transparent 
        to external applications.
    The six node types are grouped into two categories: Dense Compute and Dense Storage. 
    The Dense Compute node types support clusters up to 326TB using fast SSDs, while the Dense Storage 
        nodes support clusters up to 2PB using large magnetic disks
    When you submit a query, Amazon Redshift distributes and executes the query in parallel across all of 
        a cluster’s compute nodes. 
    Amazon Redshift also spreads your table data across all compute nodes in a cluster based on a 
        distribution strategy that you specify
        
    Table Design
        Data Types
        Compression Encoding
        Distribution Strategy
            EVEN distribution  
                This is the default option and results in the data being distributed across the slices 
                    in a uniform fashion regardless of the data.
            KEY distribution  
                With KEY distribution, the rows are distributed according to the values in one column. 
                The leader node will store matching values close together and increase query performance 
                    for joins.
            ALL distribution  
                With ALL, a full copy of the entire table is distributed to every node. 
                This is useful for lookup tables and other large tables that are not updated frequently.
        Sort Keys

    Loading Data
         COPY
    Querying Data
    Snapshots
    Security

### Amazon DynamoDB
    To help maintain consistent, fast performance levels, all table data is stored on high-performance 
        SSD disk drives. 
    Performance metrics, including transactions rates, can be monitored using Amazon CloudWatch.
    In addition to providing high-performance levels, Amazon DynamoDB also provides automatic high-availability 
        and durability protections by replicating data across multiple Availability Zones within an AWS Region.
    DynamoDB provides a web service API that accepts requests in JSON format.

#### Data Types
    Scalar Data Types 
        A scalar type represents exactly one value. 
        Amazon DynamoDB supports the following five scalar types:
            String Text and variable length characters up to 400KB. Supports Unicode with UTF8 encoding
            Number Positive or negative number with up to 38 digits of precision
            Binary Binary data, images, compressed objects up to 400KB in size
            Boolean Binary flag representing a true or false value
            Null Represents a blank, empty, or unknown state. String, Number, Binary, Boolean cannot be empty.

    Set Data Types 
        Sets are useful to represent a unique list of one or more scalar values. 
        Each value in a set needs to be unique and must be the same data type. 
        Sets do not guarantee order. 
        Amazon DynamoDB supports three set types: String Set, Number Set, and Binary Set.
            String Set Unique list of String attributes
            Number Set Unique list of Number attributes
            Binary Set Unique list of Binary attributes

    Document Data Types 
        Document type is useful to represent multiple nested attributes, similar to the structure of a JSON file
        Amazon DynamoDB supports two document types: List and Map. 
        Multiple Lists and Maps can be combined and nested to create complex structures.
            List Each List can be used to store an ordered list of attributes of different data types.
            Map Each Map can be used to store an unordered list of key/value pairs. 

#### Primary Key
    When you create a table, you must specify the primary key of the table in addition to the table name. 
    Amazon DynamoDB supports two types of primary keys, and this configuration cannot be changed after a 
        table has been created:

        Partition Key 
            The primary key is made of one attribute, a partition (or hash) key. 
            Amazon DynamoDB builds an unordered hash index on this primary key attribute.
        Partition and Sort Key 
            The primary key is made of two attributes. 
            The first attribute is the partition key and the second one is the sort (or range) key. 
            Each item in the table is uniquely identified by the combination of its partition and sort key value
            It is possible for two items to have the same partition key value, but those two items must 
                have different sort key values.

#### Provisioned Capacity
    When you create an Amazon DynamoDB table, you are required to provision a certain amount of read and 
        write capacity to handle your expected workloads. 
    Based on your configuration settings, DynamoDB will then provision the right amount of infrastructure
        capacity to meet your requirements with sustained, low-latency response times

#### Secondary Indexes
    When you create a table with a partition and sort key (formerly known as a hash and range key), you can
        optionally define one or more secondary indexes on that table. 
    A secondary index lets you query the data in the table using an alternate key, in addition to queries
        against the primary key. 
            
    Amazon DynamoDB supports two different kinds of indexes:
        Global Secondary Index 
            The global secondary index is an index with a partition and sort key that can be different 
                from those on the table. 
            You can create or delete a global secondary index on a table at any time.
        Local Secondary Index 
            The local secondary index is an index that has the same partition key attribute as the primary key 
                of the table, but a different sort key. 
            You can only create a local secondary index when you create a table.

#### Writing and Reading Data
    Writing Items
        Amazon DynamoDB provides three primary API actions to create, update, and delete items: 
            PutItem, UpdateItem, and DeleteItem
    Reading Items
        After an item has been created, it can be retrieved through a direct lookup by calling the 
            GetItem action or through a search using the Query or Scan action