## 2.2 Locating Stale Stats

While SQL pools do have auto create stats they do not have auto update stats like traditional SQL Server, so we really need to maintain statistics ourlseves.

How can stats get out of date and how can we decide which stats are up to date and which need updating ?

### 2.21 Example of Stale Stats

If a table is created with under 1,000 rows then the SQL pol may create stats, but it will "assume" 1,000 rows. Even as new data is loaded this stat will become older.

Its especially a problem if we create empty tables or very small tables before adding more data. For example.

In [3]:
IF OBJECT_ID('[dbo].[FactFinance100m_nostats]') is not null 
	DROP TABLE [dbo].[FactFinance100m_nostats]
GO
CREATE TABLE [dbo].[FactFinance100m_nostats] WITH (
	DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX 
) AS
SELECT  TOP 100 *  FROM [dbo].[FactFinance1b] 
GO

SELECT count(distinct DateKey) as DateKey, count(distinct AccountKey)  as AccountKey from FactFinance100m_nostats

/* Prodata sample view to analyse Stats on Tables */
SELECT * FROM dbo.vStats WHERE table_name='FactFinance100m_nostats'

DateKey,AccountKey
1,1


object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,stats_sample_rate,sqlCommand
354868381,ClusteredIndex_790bfcda58a449a7a8517a169f0e7e0a,FactFinance100m_nostats,dbo,1,0,,,"DateKey,DepartmentGroupKey,ScenarioKey,OrganizationKey,AccountKey,Amount,Date,LineageKey",100,100,0.0,100,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (ClusteredIndex_790bfcda58a449a7a8517a169f0e7e0a) WITH FULLSCAN
354868381,_WA_Sys_00000001_1526DC9D,FactFinance100m_nostats,dbo,2,1,,2022-07-08 15:44:33.740,DateKey,100,100,0.0,100,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000001_1526DC9D) WITH FULLSCAN
354868381,_WA_Sys_00000005_1526DC9D,FactFinance100m_nostats,dbo,3,1,,2022-07-08 15:44:34.317,AccountKey,100,100,0.0,100,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000005_1526DC9D) WITH FULLSCAN


In [4]:
/* Lets now insert 100 million rows */
INSERT INTO FactFinance100m_nostats ([AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey])
SELECT  TOP 100000000 [AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey]  
FROM [dbo].[FactFinance1b]  

In [6]:
/* 
    We can see that the stats are not super out of date 

*/
SELECT * FROM dbo.vStats WHERE table_name='FactFinance100m_nostats'

object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,stats_sample_rate,sqlCommand
354868381,ClusteredIndex_790bfcda58a449a7a8517a169f0e7e0a,FactFinance100m_nostats,dbo,1,0,,,"DateKey,DepartmentGroupKey,ScenarioKey,OrganizationKey,AccountKey,Amount,Date,LineageKey",100,100000100,100.0,9,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (ClusteredIndex_790bfcda58a449a7a8517a169f0e7e0a) WITH SAMPLE 9 PERCENT
354868381,_WA_Sys_00000001_1526DC9D,FactFinance100m_nostats,dbo,2,1,,2022-07-08 15:44:33.740,DateKey,100,100000100,100.0,9,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000001_1526DC9D) WITH SAMPLE 9 PERCENT
354868381,_WA_Sys_00000005_1526DC9D,FactFinance100m_nostats,dbo,3,1,,2022-07-08 15:44:34.317,AccountKey,100,100000100,100.0,9,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000005_1526DC9D) WITH SAMPLE 9 PERCENT


<span style="color: rgb(33, 33, 33);">Rather&nbsp;than&nbsp;<span style="color: #a31515;">"per&nbsp;stats"</span>&nbsp;we&nbsp;track&nbsp;meta&nbsp;data&nbsp;per&nbsp;<span style="color: #0000ff;">table</span>&nbsp;due&nbsp;</span> to the limitations in DMVs in the SqlPool. Sample View provided by Prodata below. This provides some critcal information

- stats\_row\_count. This is the number of estimated rows when the stats were last rebuilt. Using DMV pdw\_table\_distribution\_properties  
- actual\_row\_count. This is the actual number of rows using the DMV sys.dm\_pdw\_nodes\_db\_partition\_stats
- dynamic\_threshold\_row. recommended threshold to update stats using same algorithm(s) as traditonal SQL Server. The lower of 20% difference or   SQRT(1000 \* \[row count\]).