# StatsOptimize

## Intro and why Update Stats 
While a SqlPool can automatically create statistics in response to new table or operation on a column, it will never update this statistic. This may be ok if the table is only ever loaded once, but if the table has a lot of subsequent data movement, the original statistics will become out of date. We call this an out of date or **_stale statistic_**

If a table is empty or has very limited data, SQL Server may also add “generic” statists that assume 1,000 rows, regardless of the actual row size. We call this a **_missing statistic_**. 

Lastly, we may have auto stats on a single column but ALSO another stat covering that column. This is a waste of resources as the query plan can only use one of the statists. We call that **_overlapping statistics_** and the impact of this is just wasted disk space and time maintaining them.


## 2.1 First Things First. Check Auto Stats

You can confirm this with the query below if auto stats is created.

This also shows us a few other things

- If Query Store is on (it probably should be)
- If Result Set Caching is enabled (it probably should be)
- We can see that Auto Update Stats is not enabled (not currently supported on SqlPools)

If auto stats creation is not enabled, then you can enable it with the following command (recommended).

ALTER DATABASE \<yourdatawarehousename\>  
SET AUTO\_CREATE\_STATISTICS ON

More details is contained in MS Article below

<span style="color: windowtext;"><a href="https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics">https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-statistics</a></span>

In [1]:
select name, d.is_query_store_on, is_result_set_caching_on, is_query_store_on, is_auto_update_stats_on, is_auto_create_stats_on
from sys.databases d where d.name =db_name()

name,is_query_store_on,is_result_set_caching_on,is_query_store_on.1,is_auto_update_stats_on,is_auto_create_stats_on
AdventureWorksDW,1,1,1,0,1


In [4]:
/* 
    We can see auto stats in action here by creating table . It will just hav estats on the CX

    We have Views that show Stats at three different Level s
    - dbo.vStats
    - dbo.vPartitionStats (you wont need this unless using partitioning at say > 300  billion rows)
*/

IF OBJECT_ID('[dbo].[FactFinance1m]') is not null 
	DROP TABLE [dbo].[FactFinance1m]
GO
CREATE TABLE [dbo].[FactFinance1m] WITH (
	DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX 
) AS
SELECT TOP 1000000 * FROM [dbo].[FactFinance1b]
GO

SELECT * FROM dbo.vStats WHERE table_name='FactFinance1m'



object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,sqlCommand
1921441919,ClusteredIndex_9c7f77bcccaf4aeb94d2b77e8bacc0f2,FactFinance1m,dbo,1,0,,,"AccountKey,ScenarioKey,DepartmentGroupKey,DateKey,OrganizationKey,Amount,Date,LineageKey,ID",1000000,1000000,0.0,UPDATE STATISTICS [dbo].[FactFinance1m] (ClusteredIndex_9c7f77bcccaf4aeb94d2b77e8bacc0f2) WITH FULLSCAN


In [2]:
SELECT Count(distinct AccountKey) as accountKeys, Count(distinct DateKey) as Dates from [dbo].[FactFinance1m] 

/* Prodata View to View Stats Information */
SELECT * FROM dbo.vStats WHERE table_name='FactFinance1m'

accountKeys,Dates
68,23


object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,dynamic_threshold_rows,stats_sample_rate,recommend_update,sqlCommand
1953442033,ClusteredIndex_ecc8da729e134fe99547b46e1a9d1beb,FactFinance1m,dbo,1,0,,,"AccountKey,ScenarioKey,DepartmentGroupKey,DateKey,OrganizationKey,Amount,Date,LineageKey,ID",1000000,1000000,0.0,31622,100,0,UPDATE STATISTICS [dbo].[FactFinance1m] (ClusteredIndex_ecc8da729e134fe99547b46e1a9d1beb) WITH FULLSCAN
1953442033,_WA_Sys_00000001_746F28F1,FactFinance1m,dbo,2,1,,2021-08-31 12:08:15.003,AccountKey,1000000,1000000,0.0,31622,100,0,UPDATE STATISTICS [dbo].[FactFinance1m] (_WA_Sys_00000001_746F28F1) WITH FULLSCAN
1953442033,_WA_Sys_00000004_746F28F1,FactFinance1m,dbo,3,1,,2021-08-23 15:17:28.400,DateKey,1000000,1000000,0.0,31622,100,0,UPDATE STATISTICS [dbo].[FactFinance1m] (_WA_Sys_00000004_746F28F1) WITH FULLSCAN


If we run a query doing an aggregate on a column we can see the auto stats kick in.

We can also see the the potenital for overlapping stats on AccopuntKey Above

In [10]:
/*We can see what a Stats object looks like with below query */


DBCC SHOW_STATISTICS ('dbo.FactFinance1m' , _WA_Sys_00000004_746F28F1)



Name,Updated,Rows,Rows Sampled,Steps,Density,Average key length,String Index,Filter Expression,Unfiltered Rows,Persisted Sample Percent
_WA_Sys_00000004_746F28F1,Aug 23 2021 3:17PM,1000000,1000000,23,0.04347826,4,NO,,1000000,0


All density,Average Length,Columns
0.05928854,4,DateKey


RANGE_HI_KEY,RANGE_ROWS,EQ_ROWS,DISTINCT_RANGE_ROWS,AVG_RANGE_ROWS
20110501,0,19224,0,1
20110531,0,29336,0,1
20110701,0,34396,0,1
20110801,0,6204,0,1
20110829,0,17116,0,1
20110929,0,60188,0,1
20111129,0,70544,0,1
20111228,0,2256,0,1
20120229,0,31768,0,1
20120330,0,20000,0,1
