## StatsOptimize

StatsOptimize is stored procedure for updating Statistics for Synapse SQL Pools. This can be critical for SQL Pools as while they support auto create of stats they do not support auto updte of statistics. This procedure is like a replacement for auto update and allows for more control and flexibility.

Key features include:

- Dynamic determine modification level based on imporved algorithm.
- Dynamically determine sampling level or support setting sample level.
- Support removal of duplicate statistics (covering same column).
- [Ola Hallogren](https://ola.hallengren.com/) style features such as CommandLog, @Time Limit, and @Table parameer to set scope.

All executed commands are logged to the [CommandLog](https://github.com/ProdataSQL/SynapseTools/blob/main/SqlPools/Maintenance/CommandLog.sql) table.

It is based on Best practise guidance from MS sites below and community.

- [https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics)
- [https://github.com/abrahams1/Azure\_Synapse\_Toolbox/tree/master/SQL\_Queries/Statistics](https://github.com/abrahams1/Azure_Synapse_Toolbox/tree/master/SQL_Queries/Statistics)
- [https://www.sqlskills.com/blogs/tim/when-updating-statistics-is-too-expensive/](https://www.sqlskills.com/blogs/tim/when-updating-statistics-is-too-expensive/)
- [https://www.sqlskills.com/blogs/erin/updating-statistics-with-ola-hallengrens-script/](https://www.sqlskills.com/blogs/erin/updating-statistics-with-ola-hallengrens-script/)
- [https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-2017](https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-2017)
- [https://ola.hallengren.com/](https://ola.hallengren.com/)

### Usage:

```
exec [dbo].[StatsOptimize] , @Tables, @StatisticsModificationLevel, @StatisticsSample ,@OnlyModifiedStatistics, @DeleteOverlappingStats, @TimeLimit , @Execute     

```

#### Parameters

##### @Tables

Select Tables and optionally columns to be included. The minus character is used to exclude objects and wildcards (%) are also supported as SQL Like clause. Use this to exclude more complex tables, exclude staging, or only include relevant schemas and objects 

| Value | Description |
| --- | --- |
| Null | All Tables in Pool |
| dbo.% | Tables in dbo schema |
| %.Fact% | All Fact tables, regardless of Schema |
| %.Fact%,-FactBig | All Fact tables, Except one called FactBig |
| %.%.Date,%.%.AccountKey | ONLY do stats maintenance on Date and AccountKey columns |

Note that usually we just do stats maintenance at the table level, but there is also support for specifing column(s). This is a special case to support columns that need frequent updates like low cardinatlity incremental values (Eg Business Date) which is recommended by Microsoft [here](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics)

#### @StatisticsModificationLevel

By default the SProc will only update a stat if the number of modified rows is greater than either

- 20% or specified value in parameter
- an Adaptive algorithm of SQRT(1000 \* \[row count)\] based on improved stats algorithm introduced in SQL 2014

The SProc uses the view vTableStats to return meta data such as the nuber of rows per table and the number of rows in the statistcis. The main difference between SQL Pools and a traditional SQL Server is that in a SQL Pool we can only track the estimated row count at the the table/partition level and not per statistic. Eg we do not have [sys.dm\_db\_stats\_properties](https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-properties-transact-sql?view=sql-server-ver15) but we do have [pdw\_table\_distribution\_properties](https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-pdw-table-distribution-properties-transact-sql?view=aps-pdw-2016-au7) with per stat data.

This ommission is largely due to the fact that statistics are per distribution, so it is much more complex to amalgamate and sychronise the set of 60 stat objects. This is also why we have no auto stats for SQL Pools (as of 31/08/2021).

#### @StatisticsSample

This is the sample rate. if this is null, then the SProc uses an adaptive sample rate

- sqrt(\[row count\]) \*1000/ \[row count\]\*100

While syanpse uses a default of 20% for the sample rate, there is a recommendation to use about 2% when we reach a billion rows. The adaptive sample rate generates recommendations such as:

| Row Count | Sample % |
| --- | --- |
| 1000 | FULLSCAN |
| 100k | FULLSCAN |
| 1 million | FULLSCAN |
| 10 million | 31 |
| 100 million | 10 |
| 1 billion | 3 |
| 10 billion+ | 1 |

#### @OnlyModifiedStatistics

Default Y. Set this to N to

#### @DeleteOverlappingStats

Default N. Set to Y to delete any auto stats which overlap an existing statistic.

#### @TimeLimit

Default Null or infinite. Set a time limit in seconds for the job to run. No more commands will be started after time limit (but existing ones will finish). Use this if you have a short maintenace windows and do not want to exceed time.

#### @Execute

Default Y. Set to N to only show commands but not execute or log them. Useful for seeing experimental maintenance commands before actually executing them.

### Example Usage

Default Best Practise with smart defaults.

```
exec  dbo.[StatsOptimize]  @Tables=null, @StatisticsModificationLevel=null, @StatisticsSample=null ,@OnlyModifiedStatistics=null,@DeleteOverlappingStats=null, @TimeLimit=null , @Execute=null 

```

Update stats in dbo schema except FactBig and use a FULLSCAN if any rows have changed

```
exec  dbo.[StatsOptimize]  @Tables='dbo.%,-FactBig', @StatisticsModificationLevel=0, @StatisticsSample=100 ,@OnlyModifiedStatistics=null,@DeleteOverlappingStats=null, @TimeLimit=null , @Execute=null 

```

Update all Column Stats based on Columns DateKey or AccountKey

```
exec  dbo.[StatsOptimize]  @Tables='%.%.DateKey,%.%.AccountKey', @StatisticsModificationLevel=0, @StatisticsSample=null ,@OnlyModifiedStatistics=null,@DeleteOverlappingStats=null, @TimeLimit=null , @Execute=null 

```