![CH5-ADS.png](.\Media\CH5-ADS.png)

# <span style="color:#cc5500;">Torn Page Protection</span>

SQL Server has mechanisms to automatically detect and alert you when it finds corruption during I/O operations.  In this notebook, we’ll cover:

1. Torn-page detection
2. Page checksums
3. I/O errors
4. Monitoring for I/O errors

This notebook contains a description of what a torn page is, how it can occcur, followed by a step-by-step demo where we will create a test database, corrupt a page in the database, then show you how it is detected. 

## <span style="color:#cc5500;">Page Protection Options</span>

SQL Server allows pages to be “protected” on disk.  This allows fast detection of corruption when a page is read into memory.  This feature is set by using the ALTER DATABASE SET PAGE\_VERIFY\<option\> statement.  There are three configurable options:

- NONE (don’t do this…)
- TORN\_PAGE\_DETECTION
- CHECKSUM

All page protection operations are performed by the buffer pool(also known as the buffer manager or buffer cache)

## <span style="color:#cc5500;">Torn Page Detection</span>

An 8 KB data file page is really a 16 x 512-byte disk sector.  It is possible for a page to be partially written in the event of a power failure.  For example, in the event of a power failure, and the disk isn't backed by a better, during a write operation, all 16 blocks may not all be written to disk due to power loss.  Torn-page detection allows SQL Server to detect incomplete writes.  It takes two-bits from each disk sector, stores them in the page header (which is a 32-byte metadata structure in each data file page) and then writes an alternating bit pattern in each sector.  It writes 01 in the first, 10 in the second, 01 in the third, etc.  The bit pattern flips each time the page is written to disk.  On a subsequent read, if the pattern is disrupted, the page is torn.  This however, does not detect corruptions within a disk sector though.

<span style="display:block;text-align:center;"><img src="C:\Users\kewarren\Documents\Kevin\MyDocuments\ARMA 4200\ADS\Operations and Support\images\tornpage.jpg" alt="title"></span>

## <span style="color:#cc5500;">Page Checksums</span>

SQL Server 2005 first introduced per-page checksums.  This setting is turned on by default for new databases from SQL Server 2005 onward..  It was then added to tempdb from SQL Server 2008 onward.  It is performed by the SQL Server Buffer Pool.  It is:

- Calculated as a four-byte value and stored in the page header as the very last thing SQL Server does on a physical write
- Recalculated and checked against the stored value as the very first thing SQL Server does on a physical read

If you have an older database from SQL Server 2000 and upgraded it, upgraded databases must enabled manually.  Switching it on doesn’t do anything until pages are written.  There is no easy method to force all pages to get a page checksum.  Switching it on does not erase existing torn page detection.  There is negligible CPU overhead as it uses a very simple checksum algorithm.

Page Checksums are Error detecting, not error correcting

SQL Server evaluates the checksum when:

- Page is read normally
- Page is read during consistency checks
- Page is read during BACKUP … WITH CHECKSUM
- Page is read from within a checksum'd database backup

## <span style="color:#cc5500;">I/O Errors</span>

Threre are generally three types of errors that SQL Server will throw when corruption is detected.  Again, Page Checksums are Error detecting, not error correcting.

- 823: a hard I/O error -- SQL Server asks Windows to read something from disk and Windows comes back and says "no"
- 824: a soft I/O error -- SQL Server asks Windows to read something from disk and Windows gives the data back, but SQL Server detects that there is problem
- 825: a read-retry error -- Any I/O error detected by the buffer pool, is going to cause the read to retry, up to 4 times. If on the 5th attempt, it still can't do the read, it will throw an 823 or 824
    - If there is a success on the 5th attempt it will write an 825 to the error log
    - But an 825 is a Severity 10 Informational message only.  Unless you are looking through the error log, you may likely not see this.  We recommend creating a SQL Agent Alert for 825 errors.
    - Treat 825's as an early warning against impending doom.  The I/O subsystem is returning incorrect data to SQL Server and this should be investigated as soon as possible

823 and 824 are Severity 24 Errors and if encountered:

- Connections to the databasewill be broken
- The error will be logged in the msdb.dbo.suspect\_pages table
- Written to the SQL Server error log and the Windows Application event log

[MSSQLSERVER error 823 - SQL Server | Microsoft Docs](https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-823-database-engine-error?view=sql-server-ver15)

[MSSQLSERVER\_824 - SQL Server | Microsoft Docs](https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-824-database-engine-error?view=sql-server-ver15)

[MSSQLSERVER\_825 - SQL Server | Microsoft Docs](https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-825-database-engine-error?view=sql-server-ver15)

## <span style="color:#cc5500;">Automatic Page Repair</span>

Database mirroring and Always On Availability Groups allow some corruptions to be automatically repaired, where possible

- It works for 824 “soft-I/O” errors, some 823 “hard-I/O” errors, 829 “in restore” errors
- Errors are hit by queries reading the page or by consistency checks
- Corrupt pages on the principal/mirror and primary/secondary replicas can be repaired
- Which ever copy of the database, the primary or the secondary, has the corrupt page, it will ask the partner for their copy of the data page
- Repairs are asynchronous, corrupt pages are unusable until fixed
- Subsequent reads of the page will return an 829 “in restore” error until the page is repaired, but will not trigger another repair attempt

NOTE: Only meant to be a band-aid to prevent downtime. It is NOT a substitute for having alerts for high-severity errors and taking action to rectify/prevent them.

[Automatic page repair for availability groups & database mirroring - SQL Server Always On | Microsoft Docs](https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/automatic-page-repair-availability-groups-database-mirroring?view=sql-server-ver15)

## <span style="color:#cc5500;">Monitoring for I/O Errors</span>

Manual monitoring is time-consuming, prone to being forgotten, and very difficult to do in large SQL Server enterprises.  We recommend that you us an automated monitoring process.

- Create SQL Agent alerts
- Use Microsoft SCOM
- Us a 3rd-party monitoring tool

Create alerts for:

- Severity 19 errors and above - (Errors 19 through 24)
- Error 825
- Anything else you’re interested in

Glenn Berry has a comprehensive blog post and detailed Transact-SQL script for creating SQL Agent alerts at [The Accidental DBA (Day 17 of 30): Configuring Alerts for High Severity Problems - Glenn Berry (sqlskills.com)](https://www.sqlskills.com/blogs/glenn/the-accidental-dba-day-17-of-30-configuring-alerts-for-high-severity-problems/)

# <span style="color:#cc5500;">Corruption Demo</span>

## <span style="color:#da2433;">DISCLAIMER: This Demo purposefully corrupts a test database!!&nbsp; The information in this section should not be used on a production SQL Server system. Any problem, corruption, damage, or loss you cause by using the information presented here is entirely your own responsibility. Use at your own risk.&nbsp; If DBCC WRITEPAGE is run against the master database, it can cause your SQL Server to shut down and not start again until master is repair.&nbsp; It is highly recommended that you perform this on a test system.</span>

Use on a Test User Database on a Test Server, and not on a System database.

This setup script uses the undocumented DBCC WRITEPAGE command against a test database to cause corruption and the undocumented DBCC IND command to find pages to corrupt.  These two DBCC's were created by the Microsoft Product Group in order to create DBCC CHECKDB that is used in the repair of a database.

### <span style="color:rgb(0, 204, 153);">Run the Code block below</span>

1. Click the run icon below
2. If ADS prompts you for a connection, enter the correct SQL Server and authentication account
3. View the results of the query by scrolling down to the results set

Create the test database

In [None]:
USE [master];
GO
--If exists, drop the Company database
IF DATABASEPROPERTYEX (N'Company', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END
GO
--Modify the file path for your environment and create the database
CREATE DATABASE [Company] ON PRIMARY (
    NAME = N'Company',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company.mdf')
LOG ON (
    NAME = N'Company_log',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company_log.ldf');
GO

Insert 10 rows of data into the new test database named Company

In [None]:
USE [Company];
GO

CREATE TABLE [RandomData] (
	[c1]  INT IDENTITY,
	[c2]  CHAR (8000) DEFAULT 'a');
GO

INSERT INTO [RandomData] DEFAULT VALUES;
GO 10
SELECT c1, c2
FROM RandomData

View what the Page Protection is set to.  (default setting)

In [None]:
SELECT
	[page_verify_option],
	[page_verify_option_desc]
FROM
	sys.databases
WHERE
	[name] = N'Company';
GO

Will now list the data pages for the Company database using the Microsoft Undocumented DBCC IND

In [None]:
DBCC IND (N'Company', N'RandomData', -1);
GO

You will now use the Microsoft Undocumented DBCC WRITEPAGE to corrupt one of the data pages you saw listed in the preceding query.

WARNING!  Make sure you only use this command on your test database.

dbcc WRITEPAGE ({'dbname' | dbid}, fileid, pageid, offset, length, data \[, directORbufferpool\])

The parameters mean:

- ‘dbname’ | dbid : self-explanatory
- fileid : file ID containing the page to change
- pageid : zero-based page number within that file
- offset : zero-based offset in bytes from the start of the page
- length : number of bytes to change, from 1 to 8
- data : the new data to insert (in hex, in the form ‘0xAABBCC’ – example three-byte string)
- directORbufferpool : whether to bypass the buffer pool or not (0/1)

In [None]:
--We will now currupt a page.  You need to modify the  2nd and 3rd input parameters from B, B, to the fileid, and the pageid values returned to you in the above DBCC IND command in the previous cell.  
--The remaining values you see of  0, 2, 0x0000, 1); leave as is.
ALTER DATABASE [Company] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'Company', B, B, 0, 2, 0x0000, 1);
GO
ALTER DATABASE [Company] SET MULTI_USER;
GO

Before we demo the results of the corruption, we will first clean the SQL Error log by giving it a cycle

In [None]:
-- Clean the error log and suspect_pages
DELETE FROM [msdb].[dbo].[suspect_pages];
EXEC sp_cycle_errorlog;
GO

We will now trip the I/O Error

In [None]:
-- Trip the I/O error
USE Company
GO
SELECT
	*
FROM
	[Company].[dbo].[RandomData];
GO


After running the select query, if it killed your connection to SQL Server, Reconnect this Notebook to your Test Server

We will now read the SQL Server Error Log

In [None]:
EXEC xp_ReadErrorLog

Open up your computers Event Viewer and read review the 824 Error

Let's trigger the error one more time

In [None]:
-- Trigger the error again
SELECT
	*
FROM
	[Company].[dbo].[RandomData];
GO


Now, let's read the suspect\_pages table in the msdb database

In [None]:
SELECT
	*
FROM
	[msdb].[dbo].[suspect_pages];
GO

Now that the demo has concluded, let's drop the Company test database

In [None]:
USE [master];
GO

IF DATABASEPROPERTYEX (N'Company', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END
GO