-
Notifications
You must be signed in to change notification settings - Fork 0
SQL Server : High Availability & Disaster Recovery
Sandesh Kota edited this page Mar 14, 2018
·
18 revisions
[HA] High Availability -> minimum downtime [DR] Disaster Recovery -> Quick time to recover
Purpose:
- Critical to the integrity of DB
- Critical for the mission of HA/DR
- Critical for the operation of HA/DR
Transaction: An individual unit of work (ACID - Atomic, Consistent, Isolated, Durable)
- Recovery is a process by which SQL Server ensures a DB is brought online in a consistent state after:
- Crash Recovery
- Restore from Backup
- Simple Recovery
- Log is mainly used for crash recovery
- Transaction logs are marked inactive after checkpoint
- Supports Full Backup & Differential Backup
- Full Backup :: All data pages copied + Tail of log backup (transaction log data - at the time of backup)
- Differential Backup :: Only Changed data pages are copied (after full data backup)
- Easy to Manage
- Loss of all transactions that have occurred after last Full/Diff backup
- Full Recovery
- Log supports crash recovery and log backups
- After Full Backup, transaction logs are backed up regularly (at a certain time). The transaction logs are marked active until it is backed up
- Allows us to restore DB to a specific point in time OR to a Specific LSN (Log Sequence Number)
- As backups are regular, smaller data loss windows
- Best: Full backups weekly, Differential backups nightly, Log backups every 15 minutes (backup files are moved off-site)
- Point in Time Restores
- BACKUP DATABASE [db] TO DISK = ['full recovery path']; GO
- BACKUP LOG [db] TO DISK = ['log backup path']; GO
- RESTORE DATABASE [db] FROM DISK = ['full recovery path'] WITH NORECOVERY; GO
- RESTORE LOG [db] FROM DISK = ['full recovery path'] WITH NORECOVERY [, STOPAT='08/02/2017 23:25:01']; GO
- Bulk-Logged Recovery
- Offers better performance and a smaller log footprint during bulk load processes
- Recovery to the point of each log backup is supported offering small recovery windows
- Point in time recovery is not supported
- Can be switched between Full & Bulk Recovery model
- File Based Backups
- Supports partial database restores
- Helpful in scenarios where only a single file or filegroup was damaged or lost as it avoids the time to do a complete db restore
- Managing file based backups and restore is a complex topic
- Helps to configure the Backups (scheduling)
- DB Server > Management > Maintenance Plans
- SQL Agent / Custom TSQL Scripts
- Ola Hallengren's Scripts *
- 3rd Party Management Tools Ex: RedGate, Idera, Dell and others
- Store backups off site
- Test your backups regularly. Assume that any backup that hasn't been tested is faulty
- Option :: Backup with Checksum > restore to DB > And run CheckDB?
- Simple Recovery : can take a very long time to restore large databases
- Full Recovery : can take even longer..unless you have a DR Server
- Full Backup > Apply to DR Server
- Log backup > Apply to DR Server
- Log backup > Apply to DR Server... (goes on at regular intervals)
- In case of Disaster, let the DR Server serve the application
- Can be automated easily: Log Shipping
- Windows Failover Clustering
- SQL Server Always ON = SQL Server on Windows Fail Over Clustering
- Failover Clustering = OS Redundancy
- Protection against:
- OS System Crashes
- Motherboard failures
- Single Server Power Failure
- Memory Errors
- Blue Screens
- Outages due to : Server Maintenance, Service Packs, Patching, Server Reboots
- Protection against:
- Failover will handle only the Server switch, Data storage should be handled by other means
- Same Architecture (32bit / 64bit)
- Same OS Version
- Same configuration, patch level, etc (RAM, Storage, etc..) (close to same . But Same is preferred)
- 2+ Servers
- SQL Server
- 2 nodes / shared network (standard)
- 2+ nodes / multi-subnet (enterprise)
- Storage - both nodes can talk to
- iSCSI SAN, Fiber SAN, Serial Attached SCSI
- Redundant protection - No single drive failure should bring down DB instance
- Redundant Network - no single netcard / switch failure should bring down SQL Server Instance
- 2 NICs per node on separate networks (separate switches)
- 2 NICs on logical network w/ NIC teaming
- Virtual vs Physical
- Virtual is supported but requires additional expertise : So far -> SysAdmin, ServerAdmin, SQLServerDBA
- Database Log shipping - Database level - DB corrupt - Disaster Recovery
- Failover Clustering - Server Level - Server failure - High Availability
*suggested