Skip to content

SQL Server : High Availability & Disaster Recovery

Sandesh Kota edited this page Mar 14, 2018 · 18 revisions

[HA] High Availability -> minimum downtime [DR] Disaster Recovery -> Quick time to recover

SQL Server Transaction Log:

Purpose:

  • Critical to the integrity of DB
  • Critical for the mission of HA/DR
  • Critical for the operation of HA/DR

Transaction: An individual unit of work (ACID - Atomic, Consistent, Isolated, Durable)

DB Recovery:

  • Recovery is a process by which SQL Server ensures a DB is brought online in a consistent state after:
    • Crash Recovery
    • Restore from Backup

Recovery Types: [Recovery Model] in SQL Server

  • Simple Recovery
    • Log is mainly used for crash recovery
    • Transaction logs are marked inactive after checkpoint
    • Supports Full Backup & Differential Backup
    • Full Backup :: All data pages copied + Tail of log backup (transaction log data - at the time of backup)
    • Differential Backup :: Only Changed data pages are copied (after full data backup)
    • Easy to Manage
    • Loss of all transactions that have occurred after last Full/Diff backup
  • Full Recovery
    • Log supports crash recovery and log backups
    • After Full Backup, transaction logs are backed up regularly (at a certain time). The transaction logs are marked active until it is backed up
    • Allows us to restore DB to a specific point in time OR to a Specific LSN (Log Sequence Number)
    • As backups are regular, smaller data loss windows
    • Best: Full backups weekly, Differential backups nightly, Log backups every 15 minutes (backup files are moved off-site)
    • Point in Time Restores
      • BACKUP DATABASE [db] TO DISK = ['full recovery path']; GO
      • BACKUP LOG [db] TO DISK = ['log backup path']; GO
      • RESTORE DATABASE [db] FROM DISK = ['full recovery path'] WITH NORECOVERY; GO
      • RESTORE LOG [db] FROM DISK = ['full recovery path'] WITH NORECOVERY [, STOPAT='08/02/2017 23:25:01']; GO
  • Bulk-Logged Recovery
    • Offers better performance and a smaller log footprint during bulk load processes
    • Recovery to the point of each log backup is supported offering small recovery windows
    • Point in time recovery is not supported
    • Can be switched between Full & Bulk Recovery model
  • File Based Backups
    • Supports partial database restores
    • Helpful in scenarios where only a single file or filegroup was damaged or lost as it avoids the time to do a complete db restore
    • Managing file based backups and restore is a complex topic

Maintenance Plans

  • Helps to configure the Backups (scheduling)
  • DB Server > Management > Maintenance Plans

Managing Backups

  • SQL Agent / Custom TSQL Scripts
  • Ola Hallengren's Scripts *
  • 3rd Party Management Tools Ex: RedGate, Idera, Dell and others
  • Store backups off site
  • Test your backups regularly. Assume that any backup that hasn't been tested is faulty
    • Option :: Backup with Checksum > restore to DB > And run CheckDB?

Disaster Recovery from Backups:

  • Simple Recovery : can take a very long time to restore large databases
  • Full Recovery : can take even longer..unless you have a DR Server
    • Full Backup > Apply to DR Server
    • Log backup > Apply to DR Server
    • Log backup > Apply to DR Server... (goes on at regular intervals)
    • In case of Disaster, let the DR Server serve the application
    • Can be automated easily: Log Shipping

Failover Clustering - High Availability

  • Windows Failover Clustering
  • SQL Server Always ON = SQL Server on Windows Fail Over Clustering
  • Failover Clustering = OS Redundancy
    • Protection against:
      • OS System Crashes
      • Motherboard failures
      • Single Server Power Failure
      • Memory Errors
      • Blue Screens
      • Outages due to : Server Maintenance, Service Packs, Patching, Server Reboots
  • Failover will handle only the Server switch, Data storage should be handled by other means

Pre-Requisites

  • Same Architecture (32bit / 64bit)
  • Same OS Version
  • Same configuration, patch level, etc (RAM, Storage, etc..) (close to same . But Same is preferred)
  • 2+ Servers
  • SQL Server
    • 2 nodes / shared network (standard)
    • 2+ nodes / multi-subnet (enterprise)
  • Storage - both nodes can talk to
    • iSCSI SAN, Fiber SAN, Serial Attached SCSI
    • Redundant protection - No single drive failure should bring down DB instance
  • Redundant Network - no single netcard / switch failure should bring down SQL Server Instance
    • 2 NICs per node on separate networks (separate switches)
    • 2 NICs on logical network w/ NIC teaming
  • Virtual vs Physical
    • Virtual is supported but requires additional expertise : So far -> SysAdmin, ServerAdmin, SQLServerDBA

Notes

  • Database Log shipping - Database level - DB corrupt - Disaster Recovery
  • Failover Clustering - Server Level - Server failure - High Availability

*suggested

Clone this wiki locally