![CH6-ADS.png](.\Media\CH6-ADS.png)

# <span style="color:#cc5500;">Interpreting Performance Counters</span>

When running SQL Server it is important to monitor both OS performance counters and SQL Server counters.  Performing a base line, and then frequently monitoring the following counters contained in this notebook will help you determine the performance of your platforms, both now and in the future.

When I started my first Jr. DBA postion I was hungry to learn everything I could about SQL Server, and that included performance tuning and monitoring performance counters.  At that time, performance counters were little more than numbers and graphs appearing on a screen.  I had no idea what I was looking at and what those numbers meant.

- If I see a page life expectancy of 600 is that a good number or a bad number?
- If I have an average disk sec / write of 58 miliseconds, is that a reasonable number?
- How do I know if I have a bottleneck and how to I find it?
- What do these performance counters mean?

These are very common questions asked by DBA's and my attempt in the remainder of this notebook is to try and demystify some of that for you.  The content below is not necessarily the perfect solution for your environment, and I don't cover all available counters, but rather sharing my eperience of more than 15 years as a Microsoft Consultant working in our largest customer enterprises.

## <span style="color:#cc5500;">OS Performance Counters</span>

### <span style="color:#1e90ff;">Memory</span>

Memory and Disk I/O complement each other.   Memory issues on the system could affect disk I/O and vice versa.   It is important to carefully observe the trend of your performance counter data over a long period of time to identify the real problem.  In other words, analyzing only a few minutes of performance data may be misleading.

Memory\\Available Mbytes 

- <span style="color: rgb(255, 0, 0);">Defined:&nbsp;</span> Available MBytes is the amount of physical memory, in Megabytes, immediately available for allocation to a process or for system use.
- <span style="color:#ff0000;">Threshold:&nbsp;</span> A consistent value of less than 20 to 25 percent of installed RAM is an indication of insufficient memory.
- <span style="color:#ff0000;">Significance:&nbsp;</span> This indicates the amount of physical memory available to processes running on the computer. Note that this counter displays the last observed value only.  It is not an average.

Memory\\Page Reads/sec 

- <span style="color:#ff0000;">Defined:</span> Page Reads/sec is the rate at which the disk was read to resolve hard page faults. It shows the number of read operations, without regard to the number of pages retrieved in each operation. Hard page faults occur when a process references a page in virtual memory that is not in working set or elsewhere in physical memory, and must be retrieved from disk. This counter is a primary indicator of the kinds of faults that cause system-wide delays.
- <span style="color:#ff0000;">Threshold:&nbsp;</span> Sustained values of more than five indicate a large number of page faults for read requests.
- <span style="color:#ff0000;">Significance:&nbsp;</span> This counter indicates that the working set of your process is too large for the physical memory and that it is paging to disk. It shows the number of read operations, without regard to the number of pages retrieved in each operation.  Higher values indicate a memory bottleneck.

If a low rate of page-read operations coincides with high values for Physical Disk\\% Disk Time and Physical Disk\\Avg Disk Queue Length, there could be a disk bottleneck. If an increase in queue length is not accompanied by a decrease in the pages-read rate, a memory shortage exists. 

Memory/Page Input/Sec:

- <span style="color:#ff0000;">Defined:&nbsp;</span> Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\\\Pages Input/sec to the value of Memory\\\\Page Reads/sec to determine the average number of pages read into memory during each read operation.
- <span style="color:#ff0000;">Threshold:&nbsp;</span> The value should not exceed 15.  The higher the value of this counter, the poorer the performance is likely to be.
- <span style="color:#ff0000;">Significance:&nbsp;</span> Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk.   This can create a larger amount of I/O than necessary.

Memory\\Pages/sec 

- <span style="color:#ff0000;">Defined:</span>  Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.  It is the sum of Memory\\Pages Input/sec and Memory\\Pages Output/sec.  It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
- <span style="color:#ff0000;">Threshold:</span>  Sustained values higher than five indicate a bottleneck.
- <span style="color:#ff0000;">Significance:</span>   If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, which indicates that you need more RAM.  If this occurs over a long period, you probably need more memory.  A high value of Pages/sec indicates that your application does not have sufficient memory. The average of Pages Input/sec divided by average of Page Reads/sec gives the number of pages per disk read. This value should not generally exceed five pages per second. A value greater than five pages per second indicates that the system is spending too much time paging and requires more memory (assuming that the application has been optimized).

PageFile\\% Usage

- <span style="color:#ff0000;">Defined:</span>  The amount of the Page File instance in-use in a percentage.
- <span style="color:#ff0000;">Threshold:</span>  This should be no more than 70 percent.
- <span style="color:#ff0000;">Significance:</span>  Indicates percentage usage of Page file.  A higher value than 70% is an indication of increasing the size of page file.

### <span style="color:#1e90ff;">Processor</span>

Processor\\% Processor Time

- <span style="color:#ff0000;">Defined:</span>  % Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the percentage of time that the processor spends executing the idle thread and then subtracting that value from 100%. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It should be noted that the accounting calculation of whether the processor is idle is performed at an internal sampling interval of the system clock (10ms).
- <span style="color:#ff0000;">Threshold:</span>  The general figure for the threshold limit for processors is 65 percent.
- <span style="color:#ff0000;">Significance:</span>  This counter is the primary indicator of processor activity. High values may not necessarily be bad.  However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating.

Processor\\% Privileged Time 

- <span style="color:#ff0000;">Defined:</span>  % Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode.  When a Windows system service is called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.
- <span style="color:#ff0000;">Threshold:</span>  A figure that is consistently over 75 percent indicates a bottleneck.
- <span style="color:#ff0000;">Significance:</span>  This counter indicates the percentage of time a thread runs in privileged mode. When your application calls internal Operating System functions (for example to perform file I/O or network I/O, or to allocate memory to something), these operating system functions are executed in “privileged” mode, not “User” mode.

System\\Context Switches/sec 

- <span style="color:#ff0000;">Defined:</span>  Context Switches/sec is the combined rate at which all processors on the computer are switched from one thread to another.  Context switches occur when a running thread voluntarily relinquishes the processor, is preempted by a higher priority ready thread, or switches between user-mode and privileged (kernel) mode to use an Executive or subsystem service.  It is the sum of Thread\\Context Switches/sec for all threads running on all processors in the computer and is measured in numbers of switches.  There are context switch counters on the System and Thread objects. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.
- <span style="color:#ff0000;">Threshold:</span>  As a general rule, context switching rates of less than 5,000 per second per processor are not worth worrying about.  If context switching rates exceed 15,000 per second per processor, then there may be a CPU constraint.
- <span style="color:#ff0000;">Significance:</span>  Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks.  High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system.

### <span style="color:#1e90ff;">Disk I/O</span>

To measure disk I/O activity, you can use the following counters: Logical counters for SAN’s, Physical counters for direct attached.

LogicalDisk\\Avg. Disk Queue Length 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  Should not be higher than the number of spindles plus two.
- <span style="color:#ff0000;">Significance:</span>  Tracks the number of requests that are queued and waiting for a disk during the sample interval, as well as requests in service.  As a result, this might overstate activity.

If more than two requests are continuously waiting on a single-disk system, the disk might be a bottleneck.  To analyze queue length data further, use Avg. Disk Read Queue Length and Avg. Disk Write Queue Length. 

LogicalDisk\\Avg. Disk Read Queue Length 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk Read Queue Length is the average number of read requests that were queued for the selected disk during the sample interval.
- <span style="color: rgb(255, 0, 0);">Threshold:</span> This should be less than two.

LogicalDisk\\Avg. Disk Write Queue Length 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk Write Queue Length is the average number of write requests that were queued for the selected disk during the sample interval.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  Should be less than two.

LogicalDisk\\Avg. Disk sec/Read 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  No specific value, but here is our rule of thumb:
    - Less than 10 ms – very good
    - Between 10-20 ms – okay
    - Between 20-50 ms – slow, needs attention
    - Greater than 50 ms – Serious I/O bottleneck

LogicalDisk\\Avg. Disk sec/Write 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  No specific value, based on the manufacturer, but here is our rule of thumb:
    - Less than 10 ms – very good
    - Between 10-20 ms – okay
    - Between 20-50 ms – slow, needs attention
    - Greater than 50 ms – Serious I/O bottleneck

LogicalDisk\\Avg. Disk sec/Transfer 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  This should not be more than 18 milliseconds.
- <span style="color: rgb(255, 0, 0);">Significance:</span>  This may indicate a large amount of disk fragmentation, slow disks, or disk failures. You can also multiply the values of the Physical Disk\\Avg. Disk sec/Transfer and Memory\\Pages/sec counters.  If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, indicating that you need more RAM.

LogicalDisk\\Disk Writes/sec 

- <span style="color: rgb(255, 0, 0);">Defined:</span>  Disk Writes/sec is the rate of write operations on the disk.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  Basically depends on manufacturer's specification.

Logical Disk: %Disk Time

- <span style="color: rgb(255, 0, 0);">Defined:</span>  % Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  A value greater than 50 percent represents an I/O bottleneck.
- <span style="color: rgb(255, 0, 0);">Significance:</span>  Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.

Logical Disk\\Avg. Disk Reads/Sec  and Logical Disk\\Avg. Disk Writes/Sec

- <span style="color: rgb(255, 0, 0);">Defined:</span>  is the average time, in seconds, of a read of data from the disk or a write of data to the disk.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  It should be less than 85% of the disk capacity
- <span style="color: rgb(255, 0, 0);">Significance:</span>

When using above I/O counters, you will likely need to adjust the values for RAID configurations and avg disk queue length using the following formulas.

- Raid 0 -- I/Os per disk = (reads + writes) / number of disks 
- Raid 1 -- I/Os per disk = \[reads + (2 \* writes)\] / 2 
- Raid 5 -- I/Os per disk = \[reads + (4 \* writes)\] / number of disks 
- Raid 10 -- I/Os per disk = \[reads + (2 \* writes)\] / number of disks

For example:

You have a RAID-1 system with two physical disks with the following values of the counters.

Disk Reads/sec 90 

Disk Writes/sec 80 

Avg. Disk Queue Length 5

In this case, you are generating (90 + (2 \* 80)) / 2 = 125 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck.

## <span style="color:#cc5500;">SQL Performance Counters</span>

### <span style="color:#1e90ff;">Memory</span>

SQL Server performs much faster with, and consumes fewer resources if it can retrieve data from the buffer cache instead of having to incur an I/O and reading it from disk.  In some cases, memory intensive operations can force data pages out of the buffer cache before they ideally should be flushed out.  This can occur if the buffer cache is not large enough and the memory intensive operation needs more buffer space to work with. When this happens, the data pages that were flushed out to make extra room must again be read from disk, hurting performance.

There are several different SQL Server counters that you can watch to help determine if your SQL Server is experiencing such a problem.

SQL Server Buffer Mgr: Page Life Expectancy

- <span style="color:#ff0000;">Defined:</span> This performance monitor counter tells you, on average, how long data pages are staying in the buffer.
- <span style="color:#ff0000;">Threshold:</span>  If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.

SQL Server Buffer Mgr: Lazy Writes/Sec

- <span style="color:#ff0000;">Defined:</span> This counter tracks how many times a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space.
- <span style="color:#ff0000;">Threshold:</span>  Generally speaking, this should not be a high value, no more than 20 per second.
- <span style="color:#ff0000;">Significance:</span>  Ideally, it should be close to zero.  If it is zero, this indicates that your SQL Server's buffer cache is sized well and SQL Server doesn't have to free up dirty pages, instead waiting for this to occur during regular checkpoints.  If this value is high, then a need for more memory is indicated.

SQL Server Buffer Mgr: Checkpoint Pages/Sec

- <span style="color:#ff0000;">Defined:</span> When a checkpoint occurs, all dirty pages are written to disk. This is a normal procedure and will cause this counter to rise during the checkpoint process.
- <span style="color:#ff0000;">Threshold:</span>  What you don't want to see is a high value for this counter over time. This can indicate that the checkpoint process is running more often than it should, which can use up valuable server resources.
- <span style="color:#ff0000;">Significance:</span> If this has a high figure (and this will vary from server to server), consider adding more RAM to reduce how often the checkpoint occurs, or consider increasing the "recovery interval" SQL Server configuration setting.

SQLServer: SQL Statistics: SQL Compilations/Sec

SQLServer: SQL Statistics: SQL Re-Compilations/Sec 

#### 

- <span style="color:#ff0000;">Defined:</span> Monitoring the number of query compilations and recompilations and the number of batches received by an instance of SQL Server gives you an indication of how quickly SQL Server is processing user queries and how effectively the query optimizer is processing the queries.  Compilation is a significant part of a query's turnaround time. In order to save the compilation cost, the Database Engine saves the compiled query plan in a query cache. The objective of the cache is to reduce compilation by storing compiled queries for later reuse, therefore ending the requirement to recompile queries when later executed. However, each unique query must be compiled at least one time.  This counter measures how many compilations are performed by SQL Server per second.
- <span style="color:#ff0000;">Threshold:</span>  Generally speaking, if this figure is over 100 compilations per second, or greater than 5 re-compilations per second, then you may be experiencing unnecessary compilation overhead.
- <span style="color:#ff0000;">Significance:</span>  A high number such as this might indicate that your server is just very busy, or it could mean that unnecessary compilations are being performed.   For example, compilations can be forced by SQL Server if object schema changes, if previously parallelized execution plans have to run serially, if statistics are recomputed.    Query recompilations can be caused by the following factors: Schema changes, including base schema changes such as adding columns or indexes to a table, or statistics schema changes such as inserting or deleting a significant number of rows from a table.  Environment (SET statement) changes. Changes in session settings such as ANSI\_PADDING or ANSI\_NULLS can cause a query to be recompiled

SQL Server General Statistics Object: User Connections

SQLServer:GeneralStatistics Logins/sec

#### 

- <span style="color:#ff0000;">Defined:</span> The number of user connections, to SQL Server can affect performance.  The SQL Server General Statistics Object: User Connections displays the number of user connections to the SQL server, not the number of users.
- <span style="color:#ff0000;">Threshold:</span> When evaluating this number, it is important to note that a single user can have multiple connections open, and also that multiple people can share a single user connection.
- <span style="color:#ff0000;">Significance:</span> Don't make the assumption that this number represents actual users.  Each user connection will consume about 8K of RAM.

SQL Server Buffer Manager Object: Buffer Cache Hit Ratio

#### 

- <span style="color:#ff0000;">Defined:</span> A key counter to watch is the SQL Server Buffer Manager Object: Buffer Cache Hit Ratio. This indicates how often SQL Server is able to retrieve data that is in the memory buffer and not the hard disk, to get data.  The higher this ratio, the less often SQL Server has to go to the hard disk to fetch data, and performance overall is boosted.
- <span style="color:#ff0000;">Threshold:</span>  In OLTP applications, this ratio should exceed 95%.  If it doesn't, then you need to may need to add more RAM to your server to increase performance.
- <span style="color:#ff0000;">Significance:</span>  Unlike many of the other counters available for monitoring SQL Server, this counter averages the Buffer Cache Hit Ratio from the time the last instance of SQL Server was restarted.  In other words, this counter is not a real-time measurement, but an average of all the days since SQL Server was last restarted. Because of this, if you really want to get an accurate record of what is happening in your Buffer Cache right now, you must stop and restart the SQL Server service, then letting SQL Server run several hours of normal activity before you check this figure (in order to get a good reading).

If you have not restarted SQL Server lately, then the Buffer Cache Hit Ratio figure you see may not be accurate for what is occurring now in your SQL Server, and it is possible that although your Buffer Cache Hit Ratio looks good, it may really, in fact, not be good, because of the way this counter averages this ratio over time.

SQLServer:Memory Manager: Total Server Memory (KB) 

SQLServer:Memory Manager: Target Server Memory (KB). 

#### 

- <span style="color:#ff0000;">Defined:</span> The first counter, SQLServer:Memory Manager: Total Server Memory (KB), tells you how much the SQL Server ‘service’ is currently using. This includes the total of the buffers committed to the SQL Server Buffer Pool and the OS buffers of the type "OS in Use".
- <span style="color:#ff0000;">Defined:</span> The second counter, SQLServer:Memory Manager: Target Server Memory (KB), tells you how much memory SQL Server is willing to consume or ‘wants’. This is based on the number of buffers reserved by SQL Server when it is first started up.
- <span style="color:#ff0000;">Threshold:</span>  If, over time, the SQLServer:Memory Manager: Total Server Memory (KB) counter is less than the SQLServer:Memory Manager: Target Server Memory (KB) counter, then this means that SQL Server has enough memory to run efficiently. On the other hand, if the SQLServer:Memory Manager: Total Server Memory (KB) counter is more or equal than the SQLServer:Memory Manager: Target Server Memory (KB) counter, this indicates that SQL Server may be under memory pressure and could use access to more physical memory.

Memory Manager: Memory Grants Pending

#### 

- <span style="color:#ff0000;">Defined:</span> This is the total number of requests that are waiting for a workspace memory grant.
- <span style="color: rgb(255, 0, 0);">Threshold:</span>  This should be zero or very close to zero
- <span style="color: rgb(255, 0, 0);">Significance:</span> If this is greater than zero, this is an indicator of memory pressure.

### <span style="color:#1e90ff;">I/O Counters</span>

Access Methods object: Page Splits/sec

#### 

- <span style="color:#ff0000;">Defined:</span> There are many reasons for I/O bottle necks and one cause of excess I/O on a SQL Server is page splitting.  Page splitting occurs when an index or data page becomes full, and then is split between the current page and a newly allocated page.  That is the default behavior.  While page splitting is a normal process, excess page splitting can cause excessive disk I/O and contribute to slow performance.  To find out if SQL Server is experiencing a large number of page splits, monitor the SQL Server Access Methods object: Page Splits/sec.
- <span style="color:#ff0000;">Threshold:</span>  What is a high Page Splits/sec? There is no simple answer, as it somewhat depends on your system's I/O subsystem. But if you are having disk I/O performance problems on a regular basis, and this counter is over 100 on a regular basis, then you might want to experiment with the index fill factor to see if it helps or not.
- <span style="color:#ff0000;">Significance:</span> If it is observed that the number of page splits is high, consider changing the fill factor of your indexes.  The amount of free space in a data page can be increased by selecting a lower fill factor percentage.  This will help reduce the frequency of page splits because there is more room in data pages before it fills up and a page split has to occur.   (There is more ‘room’ for inserts).  Exercise caution when changing fill factor to a lower number as it will cause your data files to consume more disk space.

SQLServer: Databases: Log Flushes/sec

#### 

- <span style="color:#ff0000;">Defined:</span> This counter measures the number of log flushes per second.  This can be measured on a per database level, or for all databases on a SQL Server.  SQL Server uses a combination of in-memory buffer cache, data files and transaction log files to guarantee the ACID properties of a transaction.   Depending on the size and type of transactions occurring on a server, there can be frequent small log flushes, or less frequent but larger log flushes.
- <span style="color:#ff0000;">Threshold:</span> Example:  Stored Procedure (A) inserts a single record into a table and the DBA executes Stored Procedure (A) 10,000 times to populate a table to a total of 10,000 records.  Stored Procedure (B) inserts 10,000 records into a table and it is executed one time.  Log flushes contribute to I/O.  Although a simple example, this demonstrated the behavior of two completely different stored procedures, the first would tend to generate many small log flushes, and the second would have fewer but larger log flushes.

SQLServer: SQL Statistics: Batch Requests/Sec

#### 

- <span style="color:#ff0000;">Defined:</span> To get a sense of how busy SQL Server is, monitor the SQLServer: SQL Statistics: Batch Requests/Sec counter.  This counter measures the number of batch requests that SQL Server receives per second, and generally follows in step to how busy your server's CPUs are.
- <span style="color:#ff0000;">Threshold:</span> Generally, over 1000 batch requests per second indicates a very busy SQL Server, and could mean that if the server is not already experiencing a CPU bottleneck, that you may very well soon. This is somewhat of a relative target, and the more robust the hardware, the more batch requests per second SQL Server can handle.
- <span style="color:#ff0000;">Significance:</span> Some DBAs like to use the SQLServer: Databases: Transaction/Sec: \_Total to measure total SQL Server activity, but this may not be a good choice. Transaction/Sec only measures the numbers of transactions started in a database, not all activity, thus producing skewed results.  Instead, use the SQLServer: SQL Statistics: Batch Requests/Sec counter, which measures all SQL Server activity.

SQL Server Backup Device Object: Device Throughput Bytes/sec

#### 

- <span style="color:#ff0000;">Defined:</span> If you suspect that your backup or restore operations are running at less than optimal speeds, you can help verify this by using the SQL Server Backup Device Object: Device Throughput Bytes/sec.  This counter will give you a good feel for how fast your backups are performing.  You will also want to use this in conjunction with the Physical Disk Object: Avg. Disk Queue Length counter.  Most likely, if you are having backup or restore performance issues, it is because of an I/O or network bottleneck.
- <span style="color:#ff0000;">Threshold:</span>  For example, the cause of slow backups or restores could be something as simple as noticing that a large index rebuild and an SSIS package are running at the same time, and could be fixed by simply rescheduling the job.  Look for correlations.

### <span style="color:#1e90ff;">Lock Counters</span>

SQL Server Locks Object: Number of Deadlocks/sec

#### 

- <span style="color:#ff0000;">Defined:</span> Deadlocks are usually caused by poorly written code, and poorly performing disks.  If the databases are experiencing deadlocks, they can be tracked by using the SQL Server Locks Object: Number of Deadlocks/sec.  You can also use the Profiler to track deadlocks and capture the statements that caused them.
- <span style="color:#ff0000;">Threshold:</span> It is not possible to generate specific guidance around this counter as all systems are different and have different locking behaviors.  The best advice is to use profiler to baseline this counter and then periodically re-profile and compare to the baseline.  It is important to know what is normal for your system.

SQL Server Locks Object: Average Wait Time (ms)

#### 

- <span style="color:#ff0000;">Defined:</span> If users are complaining that they have to wait for their transactions to complete, you may want to find out if object locking on the server is contributing to this problem.  Microsoft SQL Server provides information about SQL Server locks on individual resource types.  Locks are held on SQL Server resources, such as rows read or modified during a transaction, to prevent concurrent use of resources by different transactions.
- <span style="color:#ff0000;">Threshold:</span> You can use the SQL Server Locks Object: Average Wait Time (ms) counter to measure the average wait time of a variety of locks, including: database, extent, Key, Page, RID, and table.  As the DBA, you have to decide what an acceptable average wait time is.  One way to do this is to watch this counter over time for each of the lock types, finding average values for each type of lock.  Then use these average values as a point of reference.  For example, if the average wait time in milliseconds of RID (row) locks is 500, then you might consider any value over 500 as potentially a problem, especially if the value is a lot higher than 500, and extends over long periods of time.
- If you can identify one or more types of locks causing transaction delays, then you will want to investigate further to see if you can identify what specific transactions are causing the locking. The Profiler is the best tool for this detailed analysis of locking issues.
- <span style="color:#ff0000;">Significance:</span> For example, if an exclusive (X) lock is held on a row within a table by a transaction, no other transaction can modify that row until the lock is released.  Minimizing locks increases concurrency, which can improve performance. Multiple instances of the Locks object can be monitored at the same time, with each instance representing a lock on a resource type.  To do this, use the SQL Server Locks Object: Average Wait Time (ms).

SQL Server Access Methods Object: Full Scans/sec

#### 

- <span style="color:#ff0000;">Defined:</span> While table scans are a fact of life, and sometimes faster than index seeks, generally it is better to have fewer table scans than more.  To find out how many table scans your server is performing, use the SQL Server Access Methods Object: Full Scans/sec.  Note that this counter is for an entire server, not just a single database. One thing you will notice with this counter is that there often appears to be a pattern of scans occurring periodically.  In many cases, these are table scans SQL Server is performing on a regular basis for internal use.
- <span style="color:#ff0000;">Threshold:</span>  What you want to look for are the random table scans that represent your application. If you see what you consider to be an inordinate number of table scans, then use Profiler and Index Tuning Wizard to help you determine exactly what is causing them, and if adding any indexes can help reduce the table scans.  It could also be that SQL Server is performing table scans instead of using indexes because it is just plain more efficient.

### <span style="color:#1e90ff;">Latch Counters</span>

A latch is in essence a lightweight lock.  From a technical perspective, a latch is a lightweight, short-term object used for synchronization. A latch acts like a lock, in that its purpose is to prevent data from changing unexpectedly. For example, when a row of data is being moved from the buffer to the SQL Server storage engine, a latch is used by SQL Server during this move to prevent the data in the row from being changed during this very short time period. This not only applies to rows of data, but to index information as well, as it is retrieved by SQL Server. Like a lock, a latch can prevent access to rows in a database, which can hurt performance.  SQL Server provides ways to measure latch activity. They include:

SQL Server: Latches: Average Latch Wait Time (ms): 

- <span style="color: rgb(255, 0, 0);">Defined:</span> The wait time (in milliseconds) for latch requests that have to wait. Note here that this is a measurement for only those latches whose requests had to wait. In many cases, there is no wait. So keep in mind that this figure only applies for those latches that had to wait, not all latches.

SQL Server: Latches: Latch Waits/sec: 

- <span style="color:#ff0000;">Defined:</span> This is the number of latch requests that could not be granted immediately. In other words, these are the amount of latches, in a one second period of time that had to wait. So these are the latches measured by Average Latch Wait Time (ms).

SQL Server: Latches: Total Latch Wait Time (ms): 

- <span style="color:#ff0000;">Defined:</span> This is the total latch wait time (in milliseconds) for latch requests in the last second. In essence, this is the two above numbers multiplied appropriately for the most recent second. When reading these figures, be sure you have read the scale on Performance Monitor correctly. The scale can change from counter to counter, and this is can be confusing if you don't compare apples to apples. Based on experience, the Average Latch Wait Time (ms) counter will remain fairly constant over time, while you may see huge fluctuations in the other two counters, depending on what SQL Server is doing. Because each server is somewhat different, latch activity is different on each server. It is a good practice to get baseline numbers for each of these counters for your typical workload. This will allow you to compare typical latch activity against what is happening right now, letting you know if latch activity is higher or lower than typical. 
- <span style="color: rgb(255, 0, 0);">Threshold:</span> If latch activity is higher than expected, this often indicates a few potential problems.

1. It may mean your SQL Server could use more memory.
2. If latch activity is high, check to see what your buffer cache hit ratio is. If it is below 99%, your server could probably benefit from more RAM.
3. If the bugger cache hit ratio is above 99%, then it could be the I/O system that is contributing to the problem, and a faster I/O system might benefit your server's performance.