# Database Administration

The DBA role:

* primarily concerned with "maintenance"/"ops" phase
* but should be consulted during all phases of development
* "Database Administrator" or "DBA" often framed as a "job" or a "person"
* large companies have many DBAs
* small company developer is the DBA
* DBA role can be made redundant by Cloud-based DBMS or "database as a service" DAAS

Data Administrator(management role):

* data policies, procedures and standards
* planning
* data conflict resolution
* managing info repository(data dictionary)
* internal marketing(use data to persuade leader to take certain action)
* similar to "Chief Data Officer"

Database Administrator(technical role):

* analyze and design DB
* select DBMS/tools/vendor
* install and upgrade DBMS
* tune DBMS performance
* manage security, privacy, integrity
* backup and recovery

## Architecture of a Database Management System

Exists as one entity in two places
* In memory(when the database is running)
* Physically on disk

Both places
* Manage Data
* Manage Performance(how it performs as it is used and grows)
* Manage Concurrency(manages high volumes of users)
* Manage Recoverability(assist in recovery and availability)

One place is persistent the other transient
* Disk representation is always present
* Memory-transient-only exists when DBMS is running

<img src="img/img66.png" width="400">

### Query Processing

Parsing(compiling):
* Syntax is correct & can "compile"
* DBMS User Permissions
* Available to resources(Data, Code, be able to Record Changes/Retrieve results)

Optimizing:
* Execution Plan & Execution Cost
* Evaluate indexes, table scans, hashing
* Eliminate worst, consider best options
* Lowest cost theoretically "best"

Execution:
* Meet the ACID test
* Atomic: All rows succeed or all fail
* Ensure resources are available(data, log changes, memory, cursor to do the work for the USER)

### Concurrency Control

* Manages the work of the DBMS
* Transaction Manager handles all aspects of the SQL transaction - which DBMS user wants WHAT resource
* Lock Manager is a list of what resources are locked and by which user at what level
* not only tables, indexes(buffers, cursor, memory addresses of resources)
* Essential to manage large scalable DBMS
* Enables 100,000s of concurrent users

### Storage

File & Access Methods:
* Disk to memory to disk
* read a buffer or a block of buffer

Buffer Pool
* Data in memory(row data, index data)
* Organized

Disk Space Management:
* how to organize growth of data on disk efficiently by writing efficiently

<img src="img/img67.png" width="500">

Disk writing level is at buffer level.

Buffer pool:

* Many object types(tables, indexes, undo)
* Each buffer contains rows, btree leaf etc.

Each buffer can have one of four status types:

* Current(in use current committed version of data)
* Active(most recent change, may not be committed)
* Stale(an old version of the data)
* Aged(old and about to be removed from buffer pool)

### Log Manage

Recovery

Log Manager records all changes
* Statement
* Transaction
    * Statement
    * Rollback values
    * Before and After values
    * Timestamp begin
* Database
    * Data Dictionary Changes
    
<img src="img/img68.png" width="500">

## Database Performance

What affects database performance?
* caching data in memory, e.g. data buffers
* placement of data files across disc drives
* use of fast storage such as SSD
* database replication and server clustering
* use of indexes to speed up searches and joins
* good choice of data types(especially PKs)
* good program logic
* good query execution plans
* good code(no deadlocks)

Caching Data in Memory:
* data and code found in memory
* avoids a read
* reads are expensive
* goal into minimize reads(and writes), writes are necessary(recovery logs, changed data)
* "in memory databases", all code all data loaded into memory on db start and stays until shutdown

Data file location & Fast Disks(SSD)
* spread the files across the physical server
* we can't avoid writes
    * spread files across many disks, avoid contention(many users competing for same resource)
    * recovery logs, always writing, use faster disk
* SSD(Solid state drives)
    * no moving parts - nothing to break down
    * fster I/O
    
Distribution & Replication

Distributed data
* spreads the load
* data kept only where it is needed
* less work per physical server - faster response times

Replicated data:
* spreads load
* less work per physical server - faster response times

When to create indexes:
* column is queried frequently(used in WHERE clauses)
* columns that are used for joins(PK to FK)
* primary keys(automatic in most DBMS)
* foreign keys(automatic in MySQL)
* unique columns(automatic in most DBMS)
* large tables only - small tables do not require indexes
* wide range of values(good for regular indexes)
* small range of values(good for bitmap/hash indexes)

Good Choices
* good data types(integers for PK FK & PFK)
* good program logic & code
    * Transaction design(BEGIN TRANSACTION, SELECT, UPDATE, COMMIT)
    * Avoid long complex transactions that never commit or savepoint
    * Avoid coding deadlocks
    * Appropriate Locking strategy, consider lock timeouts
    
## Security

Threats to databases
* Loss of integrity
    * keep data consistent
    * free of errors or anomalies
* Loss of availability
    * must be available to authorized users for authorized purposes
* Loss of confidentiality
    * must be protected against unauthorized access
* To protect databases against these types of threats, different kinds of countermeasures can be implemented:
    * access control
    * encryption
    
### Access Control

The security mechanism of a DBMS must include provisions for restricting access to data

Access control is handled by the DBA creating user accounts for those with a legitimate need to access the DB

The database keeps track of all operations on the database for all users(usage log)(audit logs)

When tampering is suspected, perform an audit. A database audit consists of reviewing the log to examine all accesses and operations applied to the database during a certain time period

Need to control online and physical access to the database

Based on granting and revoking privileges
```
GRANT SELECT on employee to HR;
REVOKE UPDATE on bonus from SCOTT;
```

Types of discretionary privileges
* account level, DBA specifies the particular privileges that each user holds regarding the database as a whole, i.e. the operations they can carry out on the database
* table level, DBA controls a user's privilege to access particular tables or views
* schema level, DBA controls a user's privilege to access a particular schema in the database

Using views
```
CREATE VIEW vEmployee As
SELECT employeeid, firstname, lastname, departmentid, bossid
FROM employee;
GRANT select on vEmployee to SCOTT;
```

### Encryption

Particular tables or columns may be encrypted to:
* protect sensitive data(e.g. password) when they are transmitted over a network. prevents interception by third party
* encrypt data in the database(e.g. credit card numbers). provides some protection in case of unauthorized access

Data is encoded using an algorithm, authorized users are given keys to decipher data

### Web security

Injection: Injection flaws, such as SQL, NoSQL, OS and LDAP injection, occur when untrusted data is sent to an interpreter as part of a command or query. The attacker's hostile data can trick the interpreter into executing unintended commands or accessing data without proper authorization.

Broken Authentication: Application functions related to authentication and session management are often implemented incorrectly, allowing attackers to compromise passwords, keys, or session tokens, or to exploit other implementation flaws to assume other users' identities temporarily or permanently.

How to prevent injection:
* sanitize user inputs
* pass inputs as parameters to a stored procedure, rather than directly building the SQL string in the code

<img src="img/img69.png" width="500">

<img src="img/img70.png" width="500">

## Backup and Recovery

A back up is a copy of your data.

If data becomes corrupted or deleted or held to ransom it can be restored from the backup copy.

A backup and recovery strategy is needed to plan how data is backed up and to plan how it will be recovered.

Protect data from human error, hardware or software malfunction, malicious activity, natural or man made disasters and government regulation.

Failures can be divided into the following categories:
* Statement failure(Syntactically incorrect)
* User Process failure(The process doing the work fails)
* Network failure(Network failure between the user and the database)
* User error(User accidentally drops the rows, tables, database)
* Memory failure(Memory fails, becomes corrupt)
* Media failure(Disk failure, corruption, deletion)

Types of backups:
* Physical vs Logical
* Online vs Offline
* Full vs Incremental
* Onsite vs Offsite

### Physical vs Logical backup

Physical backup:
* raw copies of files and directories
* suitable for large databases that need fast recovery
* database is preferably offline("cold" backup) when backup occurs
    * MySQL Enterprise automatically handles file locking, so database is not wholly off line
* backup should include logs
* backup is only portable to machines with a similar configuration
* to restore
    * shut down DBMS
    * copy backup over current structure on disk
    * restart DBMS
    
Logical backup:
* backup completed through SQL queries
* slower than physical
    * SQL selects rather than OS copy
* output is larger than physical
* doesn't include log or config files
* machine independent
* server is available during the backup
* in MySQL can use the backup using
    * Mysqldump
    * SELECT ... INTO OUTFILE
* to store
    * Use mysqlimport, or LOAD DATA INFILE within the mysql client
    
### Online vs Offline backup

Online(or HOT) backup:
* backups occur when the database is "live"
* clients don't realize a backup is in progress
* need to have appropriate locking to ensure integrity of data

Offline(or COLD) backup:
* backups occur when the database is stopped
* to maximize availability to users, to backup from replication server not live server
* simpler to perform
* cold backup is preferable, but not available in all situations: e.g. applications without downtime

### Full vs Incremental backup

Full:
* a full backup is where the complete database is backed up
* it includes everything you need to get the database operational in the event of a failure

Incremental:
* only the changes since last backup are backed up
* for most databases this means only backup log files
* to restore:
    * stop the database, copy backed up log files to disk
    * start the database and tell it to redo the log files
    
### Create a backup policy

Backup strategy is usually a combination of full and incremental backups. For example, weekly full backup and weekday incremental backup.

Conduct backups when database load is low

If you replicate the database, use the mirror database for backups to negate any performance concerns with the main database.

TEST your backup before you NEED your backup!

### Offsite backup

Enables disaster recovery, because backup is not physically near the disaster site.

Example solutions:
* backup tapes transported to underground vault
* remote mirror database maintained via replication
* backup to Cloud

### Other ways to reduce risk of data loss

#### Replication

MySQL Master-Slave replication
* one writer and many readers
* some protection against server failure
* multiple copies of data
* replicates accidental data deletion

#### Clusters

Many Writers and Many Readers:
* usually Linux/Unix Only
* automatic synchronous partition
* protection against server failure
* multiple copies of data
* replicates accidental data deletion

#### RAID

RAID attributes:
* software or hardware RAID
* available on Windows, Linux, Unix
* some RAID levels protect against drive failure