![CH5-ADS.png](.\Media\CH5-ADS.png)
# <span style="color:#cc5500;">Simple Repair Techniques</span>

The purpose of Repair, is to make the database structurally consistent and to do to this as fast and safely as possible.  Usually you’ll use repair because you do not have the backups necessary to restore your database with no data loss.  Be careful using repair or manually fixing corruptions as it may involve data loss and I recommend that you practice using repair or manually fixing corruptions before doing it for real in production.  

In this Notebook we’ll cover:

1. How repair works
2. Repair options
3. Manually fixing some corruptions

- REPAIR\_ALLOW\_DATA\_LOSS | REPAIR\_FAST | REPAIR\_REBUILD
    - Specifies that DBCC CHECKDB repair the found errors. Use the REPAIR options only as a last resort. The specified database must be in single-user mode to use one of the following repair options.
- REPAIR\_ALLOW\_DATA\_LOSS
    - Tries to repair all reported errors. These repairs can cause some data loss.
    - The REPAIR\_ALLOW\_DATA\_LOSS option is a supported feature but it may not always be the best option for bringing a database to a physically consistent state. If successful, the REPAIR\_ALLOW\_DATA\_LOSS option may result in some data loss. In fact, it may result in more data lost than if a user were to restore the database from the last known good backup.
- REPAIR\_FAST
    - Maintains syntax for backward compatibility only. No repair actions are performed.
- REPAIR\_REBUILD
    - Performs repairs that have no possibility of data loss. This can include quick repairs, such as repairing missing rows in nonclustered indexes, and more time-consuming repairs, such as rebuilding an index.
    - This argument does not repair errors involving FILESTREAM data.

Microsoft always recommends a user restore from the last known good backup as the primary method to recover from errors reported by DBCC CHECKDB. The REPAIR\_ALLOW\_DATA\_LOSS option is not an alternative for restoring from a known good backup. It is an emergency "last resort" option recommended for use only if restoring from a backup is not possible.

Certain errors, that can only be repaired using the REPAIR\_ALLOW\_DATA\_LOSS option, may involve deallocating a row, page, or series of pages to clear the errors. Any deallocated data is no longer accessible or recoverable for the user, and the exact contents of the deallocated data cannot be determined. Therefore, referential integrity may not be accurate after any rows or pages are deallocated because foreign key constraints are not checked or maintained as part of this repair operation. The user must inspect the referential integrity of their database (using DBCC CHECKCONSTRAINTS) after using the REPAIR\_ALLOW\_DATA\_LOSS option.

Before performing the repair, create physical copies of the files that belong to this database. This includes the primary data file (.mdf), any secondary data files (.ndf), all transaction log files (.ldf), and other containers that form the database including full text catalogs, file stream folders, memory optimized data, etc.

Before performing the repair, consider changing the state of the database to EMERGENCY mode and trying to extract as much information possible from the critical tables and save that data.

## <span style="color:#cc5500;">How does Repair work?</span>

- What is the purpose of repair?
    - The purpose of repair is NOT to try and save all of your data.  Make the database structurally consistent so that the Storage Engine can process the database without running into other corruptions
    - Tries to be as fast as possible which is why most of the repairs in the REPAIR\_ALLOW\_DATA\_LOSS option are 'delete what is broken and fix up all of the links'
- How does it know what to repair?
    - Each stage of running CHECKDB it makes a list of corruptions it has found, and it processes those corruptions.
- How does it process those corruptions and how does it choose what to repair first?
    - It decides that based upon what corruptions there are
    - Every corruption has a ranking of how intrusive of a repair it will be.  It processes the most intrusive errors first, because many times, when the most intrusive error is processed, it results in being able to cross some of the more minor issues off the list
- Did it repair everything?
    - Check the output.  At the bottom of the output it will display the count of errors: found x number of errors and fixed y number of errors.  If the two numbers match, then it thinks it fixed everything
    - Be careful –some corruptions could be masked by others
- Why do we have to put the database in Single User Mode.  Why aren’t repairs done while online?
    - It is hard enough to get them right while offline, that it is simply too difficult if the database were online, to keep up with and track all of the changes that occurs on an online database

## <span style="color:#cc5500;">Be aware of REPAIR_ALLOW_DATA_LOSS</span>

- It was very deliberately named
- It usually fixes structural inconsistencies by de-allocating
    - This is the fastest and most provably correct way
- It doesn’t take into account:
    - Foreign-key constraints
    - Inherent business logic and data relationships
    - Replication
- Before running repair, protect yourself
    - Take a backup and quiescereplication topologies involved
- After running repair, check the data
    - Run DBCC CHECKDB again to make sure all corruptions were repaired
    - Run DBCC CHECKCONSTRAINTS if necessary
    - Reinitialize any replication topologies involved

## <span style="color:#cc5500;">Examples of Repairs</span>

- What does repair do to fix:
    - A missing nonclustered index row
        - There is special code in DBCC that will just insert the missing record and not modify the entire table
    - A corrupt data record.  It depends on the level of corruption found
        - Maybe it deletes just the record, or maybe it delete the whole page
    - An extent allocated to multiple objects, for example, what if the corruption impacts pages in more than one table?
        - Performs deeper examination of the pages in the extent and if it can't do the repair, it will delete the extent and then deallocate the associated pages
- Remember there are some un-repairable errors
    - System table clustered index data pages
    - PFS pages
    - Data purity errors

## <span style="color:#cc5500;">Misconceptions around Repair</span>

- Repair will not cause data loss (it depends)
- Repair should be run as the default (no)
- You can run repair without running DBCC CHECKDB (no)
- As soon as you’ve run repair, continue as normal (no)
- Repair can always fix everything (no)
- Repair is safe on system databases (no)
- You can run repairs online (no)
- REPAIR\_REBUILD will fix everything (no)
- Repair fixes up constraints (no)
- Repairs are propagated to replication subscribers (no)

# <span style="color:#cc5500;">Repair Demo</span>

## <span style="color:#da2433;">DISCLAIMER: This Demo purposefully corrupts a test database!!&nbsp; The information in this section should not be used on a production SQL Server system. Any problem, corruption, damage, or loss you cause by using the information presented here is entirely your own responsibility. Use at your own risk.&nbsp; If DBCC WRITEPAGE is run against the master database, it can cause your SQL Server to shut down and not start again until master is repair.&nbsp; It is highly recommended that you perform this on a test system.</span>

Use on a Test User Database on a Test Server, and not on a System database.

This setup script uses the undocumented DBCC WRITEPAGE command against a test database to cause corruption and the undocumented DBCC IND command to find pages to corrupt.  These two DBCC's were created by the Microsoft Product Group in order to create DBCC CHECKDB that is used in the repair of a database.

### <span style="color:rgb(0, 204, 153);">Run the Code block below</span>

1. Click the run icon below
2. If ADS prompts you for a connection, enter the correct SQL Server and authentication account
3. View the results of the query by scrolling down to the results set

Create the test databases

In [None]:
-- If they exist, drop the 3 test databases for this demo
USE [master];
GO

IF DATABASEPROPERTYEX (N'Company', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company];
END
GO

IF DATABASEPROPERTYEX (N'Company2', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company2];
END
GO

IF DATABASEPROPERTYEX (N'Company3', N'Version') > 0
BEGIN
	ALTER DATABASE [Company] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [Company3];
END
GO

In [None]:
-- Create the first test database
USE master
GO
CREATE DATABASE [Company] ON PRIMARY (
    NAME = N'Company',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company.mdf')  -- Modify path for your environment
LOG ON (
    NAME = N'Company_log',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company_log.ldf');  -- Modify path for your environment
GO

-- Create a table and insert some records
USE [Company];
GO

CREATE TABLE [RandomData] (
	[c1]  INT IDENTITY,
	[c2]  CHAR (8000) DEFAULT 'a');
GO

INSERT INTO [RandomData] DEFAULT VALUES;
GO 10

-- List the pages in the table
DBCC IND (N'Company', N'RandomData', -1);
GO

We will now corrupt an IAM page (PageType 10) of the Company database.  From the DBCC IND query above, I see that my PageType 10 is PagePID is 332.  So we will pick 332.  (Your PagePID may have a different value).

In [None]:
-- Pick a page to corrupt with type 10.  Just replace the two values for B, B below
ALTER DATABASE [Company] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'Company', 1, 332, 0, 2, 0x0000, 1);
GO
ALTER DATABASE [Company] SET MULTI_USER;
GO


In [None]:
-- We will now create the 2nd test database 
USE master
GO
CREATE DATABASE [Company2] ON PRIMARY (
    NAME = N'Company2',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company2.mdf')  -- Modify path for your environment
LOG ON (
    NAME = N'Company2_log',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company2_log.ldf');  -- Modify path for your environment
GO

USE [Company2];
GO

CREATE TABLE [RandomData] (
	[c1]  INT IDENTITY,
	[c2]  CHAR (8000) DEFAULT 'a');
GO

INSERT INTO [RandomData] DEFAULT VALUES;
GO 10
--  This time we will create a Clustered index on the table
CREATE CLUSTERED INDEX [RandomData_Clustered]
ON [RandomData] ([c1]);
GO

-- List the pages in the table
DBCC IND (N'Company2', N'RandomData', -1);
GO

We will now corrupt an Clustered Index page (PageType 2) of the Company2 database.  From the DBCC IND query above, I see that my PageType 2 is PageFID 1, PagePID is 408.  So we will pick 407.  (Your PagePID may have a different value).

In [None]:
-- We will now corrupt a page in the Clustered Index of Company2 by using the PageType 2 value.  Modify this by inserting your valuse for B, B
ALTER DATABASE [Company2] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'Company2', B, B, 0, 2, 0x0000, 1);
GO
ALTER DATABASE [Company2] SET MULTI_USER;
GO

In [None]:
-- We will now create our Company3 database and we will corrupt an off-row LOB
USE master
GO
CREATE DATABASE [Company3] ON PRIMARY (
    NAME = N'Company3',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company3.mdf')  -- Modify path for your environment
LOG ON (
    NAME = N'Company3_log',
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\Company3_log.ldf');  -- Modify path for your environment
GO

USE [Company3];
GO
CREATE TABLE [RandomData] (
	[c1]  INT IDENTITY,
	[c2]  VARCHAR (5000) DEFAULT REPLICATE ('a', 5000),
	[c3]  VARCHAR (5000) DEFAULT REPLICATE ('a', 5000));
GO
-- Do an insert that pushes the data to Off-Row LOB storage.
INSERT INTO [RandomData] DEFAULT VALUES;
GO 10

-- List the pages in the table
DBCC IND (N'Company3', N'RandomData', -1);
GO


We will now corrupt an IAM page (PageType 3) of the Company3 database.  From the DBCC IND query above, I see that one of my PageType 3 pages is PagePID is 352.  So we will pick 352.  (Your PagePID may have a different value).

In [None]:
-- Pick a page to corrupt with type 3.  Modify your values for B, B
ALTER DATABASE [Company3] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'Company3', B, B, 0, 2, 0x0000, 1);
GO
ALTER DATABASE [Company3] SET MULTI_USER;
GO

-- Clean the error log and suspect_pages to complete the setup portion for our demo coming next.
DELETE FROM [msdb].[dbo].[suspect_pages];
EXEC sp_cycle_errorlog;
GO

# <span style="color:#cc5500;">Lets' start making some repairs</span>



In [None]:
-- Run DBCC CHECKDB against the Company
-- The results will show you that you have a broken IAM page
-- the results will show you that repair_allow_data_loss is the minimum repair level for the types of errors found.
DBCC CHECKDB (N'Company') WITH NO_INFOMSGS;
GO


In [None]:
--Run Repair
-- Take the database into SINGLE_USER mode
ALTER DATABASE [Company] SET SINGLE_USER;
GO

-- Run repair
-- The results should display that the corruption has been repaired.  It fixes the IAM pages, and reallocates them.  It recreates the bloken and missing IAM page.
DBCC CHECKDB (N'Company', REPAIR_ALLOW_DATA_LOSS)
WITH NO_INFOMSGS;
GO

In [None]:
-- A good practice after running repair, is to do another CHECKDB
-- Check all corruptions were fixed
DBCC CHECKDB (N'Company') WITH NO_INFOMSGS;
GO

In [None]:
-- The results should come back clean and we will now make the database available again
-- Make the database MULTI_USER again
ALTER DATABASE [Company] SET MULTI_USER;
GO

Let's now see if we can repair the broken Clustered Index in Company2

In [None]:
-- Run CHECKDB on Company2
-- It should show that all of your linkages in the B tree are broken now that a page in the Clustered Index was corrupted.
-- It should show that repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Company2).
DBCC CHECKDB (N'Company2') WITH NO_INFOMSGS;
GO


In [None]:
-- Let's run repair on Company2
-- Take the database into SINGLE_USER mode
ALTER DATABASE [Company2] SET SINGLE_USER;
GO

-- Run repair
-- It should display that it will be able to deallocate the broken Index page, and then successfully rebuilt the index.  It should show that all of the broken parent/child page links are repaired.
DBCC CHECKDB (N'Company2', REPAIR_ALLOW_DATA_LOSS)
WITH NO_INFOMSGS;
GO

In [None]:
-- Check all corruptions were fixed
-- A good practice after running repair, is to do another CHECKDB
DBCC CHECKDB (N'Company2') WITH NO_INFOMSGS;
GO

In [None]:
-- Make the database MULTI_USER again
ALTER DATABASE [Company2] SET MULTI_USER;
GO


Let's now see if we can repair the broken off-row data in LOB storage

In [None]:
-- Company3 CHECKDB
-- The results should display that a text page cannot be processed
-- It should also tell you that there is some off-row data that it can't get to.
DBCC CHECKDB (N'Company3') WITH NO_INFOMSGS;
GO

In [None]:
-- Let's try and repair this
-- Take the database into SINGLE_USER mode
ALTER DATABASE [Company3] SET SINGLE_USER;
GO

-- Run repair
-- It should display that an in-row data record is deleted and that the database corruption is repaired and consistent.
DBCC CHECKDB (N'Company3', REPAIR_ALLOW_DATA_LOSS)
WITH NO_INFOMSGS;
GO

In [None]:
-- Check all corruptions were fixed
DBCC CHECKDB (N'Company3') WITH NO_INFOMSGS;
GO


In [None]:
-- But, did we lose any data?

SELECT COUNT (*) FROM [Company3].[dbo].[RandomData];

In [None]:
-- Yes, you can now see that using repair_allow_data_loss can result in true data loss.  You made the repair and the database is now healthy, but you are missing 1 of the 10 rows that were originally inserted

-- Make the database MULTI_USER again
ALTER DATABASE [Company3] SET MULTI_USER;
GO

## <span style="color:#cc5500;">Manually Fixing Nonclustered Indexes</span>

- It doesn’t make sense to put the database offline and run DBCC CHECKDB or DBCC CHECKTABLE with REPAIR\_REBUILD to fix corrupt nonclustered indexes
- All it will do is essentially disable and rebuild the index, so why not do it yourself, while keeping the database Online?
- You cannot just rebuild the index
    - Online rebuild, reads the old index to build the new index
    - Offline rebuild, does that from SQL Server 2008 onward 
- Steps to use, inside a transaction:
    - ALTER INDEX name ON tablename DISABLE
    - ALTER INDEX name ON tablename REBUILD
- Using a transaction is necessary to prevent any index-enforced constraints from being violated while the index is disabled.  If you disble or drop the index, outside the 'bounds' of a Transaction, If someone inserts a record, the theoretically could insert a record that violates your defined constraint.

## <span style="color:#cc5500;">Demo: Fixing a broken NCL index with out taking the database offline using Repair</span>



Prerequisite:  Download the Sample Database named SalesDB from this link: https://www.sqlskills.com/resources/conferences/salesdb2014.zip 

### <span style="color:rgb(0, 204, 153);">Run the Code block below</span>

1. Click the run icon below
2. If ADS prompts you for a connection, enter the correct SQL Server and authentication account
3. View the results of the query by scrolling down to the results set

In [None]:
-- Setup script for DBCC CHECK Options demo.

-- Download the SalesDB Sample Database from the link above
-- and unzip into one of your drives

-- Restore as follows:
USE [master];
GO

IF DATABASEPROPERTYEX (N'SalesDB', N'Version') > 0
BEGIN
	ALTER DATABASE [SalesDB] SET SINGLE_USER
		WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [SalesDB];
END
GO

RESTORE DATABASE [SalesDB]
FROM DISK = N'C:\Temp\SalesDB2014.bak'  --modify path for your environment
WITH
    MOVE N'SalesDBData' TO N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\SalesDB.mdf',  --modify path for your environment
	MOVE N'SalesDBLog' TO N'C:\Program Files\Microsoft SQL Server\MSSQL14.VIPER\MSSQL\DATA\SalesDB_log.ldf';  --modify path for your environment
GO

In [None]:
-- Create the index we're going to break
USE [SalesDB];
GO
CREATE NONCLUSTERED INDEX [CustomerName]
ON [Customers] ([LastName]);

-- What's the index ID?
SELECT
	[index_id]
FROM
	sys.indexes
WHERE
	[name] = N'CustomerName'
	AND [object_id] = OBJECT_ID (N'Customers');
GO


In [None]:
-- Pass in the value of your NCL Index of 2 into DBCC IND
-- List the pages in the index
DBCC IND (N'SalesDB', N'Customers', 2);
GO

The query should show the ID of your new NCL Index is 2

In [None]:
-- Corrupt some records
--Again, DBCC WRITEPAGE is an undocumented procedure so I am not able to comment further about the query below, but it will corrupt the NonClustered index in the SalesDB Test database
--If you are not using SQL Server 2019 the 2nd value below 24696 may be an invalid number and you need to pick one that will be slightly different
ALTER DATABASE [SalesDB] SET SINGLE_USER;
GO
DBCC WRITEPAGE (N'SalesDB', 1, 24696, 134, 1, 0x64);
DBCC WRITEPAGE (N'SalesDB', 1, 24696, 196, 1, 0x70);
DBCC WRITEPAGE (N'SalesDB', 1, 24696, 2396, 1, 0x74);
DBCC WRITEPAGE (N'SalesDB', 1, 24696, 2698, 1, 0x74);
DBCC WRITEPAGE (N'SalesDB', 1, 24696, 2748, 1, 0x70);
GO
ALTER DATABASE [SalesDB] SET MULTI_USER;
GO

-- Clean the error log and suspect_pages
DELETE FROM [msdb].[dbo].[suspect_pages];
EXEC sp_cycle_errorlog;
GO

In [None]:
-- Run a CHECKDB
-- It should display
-- It should display repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (SalesDB).
DBCC CHECKDB (N'SalesDB') WITH NO_INFOMSGS;
GO


In [None]:
-- Is it just non-clustered indexes that are impacted?
-- We've only corrupted a few pages.  Imagine how difficult it would be if you had hundreds of errors
-- Scan through all the errors looking for index IDs
-- Maybe use WITH TABLERESULTS?
DBCC CHECKDB (N'SalesDB') WITH NO_INFOMSGS, TABLERESULTS;
GO

This gives you a very nice readable output using TABLERESULTS and you will see that the IndexID column is displaying that all of the errors were error related to a NonClustered Index.

In [None]:
-- If you wanted to fix them with DBCC CHECKDB, it
-- may do single row repairs or rebuild the index,
-- depending on the error
DBCC CHECKDB (N'SalesDB', REPAIR_REBUILD) WITH NO_INFOMSGS;
GO


As you see, it is not going to let us run this, because the database needs to be in SINGLE User Mode to run repair.  We don't want to put the database into SINGLE user mode just to rebuild an index, so this is not the option we want.

In [None]:
-- You need to be in SINGLE_USER mode! Just to fix non-clustered indexes with CHECKDB
-- That doesn't make sense. Just rebuild them manually and keep the database online. 
-- Try an online rebuild...
USE [SalesDB];
GO
EXEC sp_HelpIndex N'Customers';
GO

ALTER INDEX [CustomerName] ON [Customers] REBUILD
WITH (ONLINE = ON);
GO

In [None]:
-- And check again, to see if that worked.
DBCC CHECKDB (N'SalesDB') WITH NO_INFOMSGS;
GO

That didn't work.  Why?  A REBUILD with ONLINE reads the old index.  So that is not going to work.  The missing rows and the broken rows are still going to then be in the new index.

  

So, what if we try an OFFLINE REBUILD?

In [None]:
ALTER INDEX [CustomerName] ON [Customers] REBUILD;
GO

DBCC CHECKDB (N'SalesDB') WITH NO_INFOMSGS;
GO

That didn't work.  Why?  An OFFLINE REBUILD reads the old index as well.  So that is not going to work.  The missing rows and the broken rows are still going to then be in the new index.

In [None]:
-- But on 2008 onward you should disable and rebuild the index (or use repair to do it).
USE [SalesDB];
GO

-- Do it inside a transaction to prevent problems by forcing locks to be held which should prevent constraint violations
BEGIN TRAN;
GO
ALTER INDEX [CustomerName] ON [Customers] DISABLE;
ALTER INDEX [CustomerName] ON [Customers] REBUILD;
GO
COMMIT TRAN;
GO


In [None]:
-- Final check and we found that using a Transaction to disable, then rebuild worked, while keeping the database in multi-user mode
DBCC CHECKDB (N'SalesDB') WITH NO_INFOMSGS;
GO

## <span style="color:#cc5500;">If you are forced to use Repair</span>

- That implies that your backup strategy does not allow you to meet your downtime and data loss Service Level Agreements
- Update your backup strategy!
    - Figure out what restores you need to be able to perform
    - Change the backup strategy to perform the backups that will allow those restores to take place
    - Implement regular backup validation
- Also make sure that:
    - You check any constraints that may be affected based on which tables were repaired
    - You check to see what data was lost
    - You reinitialize any affected replication topologies
    - Perform root-cause analysis