## 2.2 Locating Stale Stats

While SQL pools do have auto create stats they do not have auto update stats like traditional SQL Server, so we really need to maintain statistics ourlseves.

How can stats get out of date and how can we decide which stats are up to date and which need updating ?

### 2.21 Example of Stale Stats

If a table is created with under 1,00 rows then the SQL pol may create stats, but it will "assume" 1,00 rows. Even as new data is loaded this stat will becom older.

Its especially a problme if we create empty tables or very small tables before adding more data. For example.

In [4]:
IF OBJECT_ID('[dbo].[FactFinance100m_nostats]') is not null 
	DROP TABLE [dbo].[FactFinance100m_nostats]
GO
CREATE TABLE [dbo].[FactFinance100m_nostats] WITH (
	DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX 
) AS
SELECT  TOP 100 *  FROM [dbo].[FactFinance1b] 
GO

SELECT count(distinct DateKey) as DateKey, count(distinct AccountKey)  as AccountKey from FactFinance100m_nostats

/* Prodata sample view to analyse Stats on Tables */
SELECT * FROM dbo.vStats WHERE table_name='FactFinance100m_nostats'

DateKey,AccountKey
1,1


object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,dynamic_threshold_rows,stats_sample_rate,recommend_update,sqlCommand
1655012977,ClusteredIndex_3acd8b271a154db3883ebb27ae616b9d,FactFinance100m_nostats,dbo,1,0,,,"AccountKey,ScenarioKey,DepartmentGroupKey,DateKey,OrganizationKey,Amount,Date,LineageKey,ID",100,100,0.0,316,100,0,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (ClusteredIndex_3acd8b271a154db3883ebb27ae616b9d) WITH FULLSCAN
1655012977,_WA_Sys_00000001_62A57E71,FactFinance100m_nostats,dbo,2,1,,2021-08-31 12:25:20.087,AccountKey,100,100,0.0,316,100,0,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000001_62A57E71) WITH FULLSCAN
1655012977,_WA_Sys_00000004_62A57E71,FactFinance100m_nostats,dbo,3,1,,2021-08-31 12:25:20.663,DateKey,100,100,0.0,316,100,0,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000004_62A57E71) WITH FULLSCAN


In [8]:
/* Lets now insert 100 million rows */
INSERT INTO FactFinance100m_nostats ([AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey])
SELECT  TOP 100000000 [AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey]  
FROM [dbo].[FactFinance1b]  

In [9]:
/* 
    We can see that the stats are not super out of date 

*/
SELECT * FROM dbo.vStats WHERE table_name='FactFinance100m_nostats'

object_id,stat_name,table_name,schema_name,stats_id,auto_created,filter_definition,last_updated_date,stat_columns,stats_row_count,actual_row_count,stats_difference_percent,dynamic_threshold_rows,stats_sample_rate,recommend_update,sqlCommand
1655012977,ClusteredIndex_3acd8b271a154db3883ebb27ae616b9d,FactFinance100m_nostats,dbo,1,0,,,"AccountKey,ScenarioKey,DepartmentGroupKey,DateKey,OrganizationKey,Amount,Date,LineageKey,ID",100,100000100,100.0,316227,9,1,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (ClusteredIndex_3acd8b271a154db3883ebb27ae616b9d) WITH SAMPLE 9 PERCENT
1655012977,_WA_Sys_00000001_62A57E71,FactFinance100m_nostats,dbo,2,1,,2021-08-31 12:25:20.087,AccountKey,100,100000100,100.0,316227,9,1,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000001_62A57E71) WITH SAMPLE 9 PERCENT
1655012977,_WA_Sys_00000004_62A57E71,FactFinance100m_nostats,dbo,3,1,,2021-08-31 12:25:20.663,DateKey,100,100000100,100.0,316227,9,1,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] (_WA_Sys_00000004_62A57E71) WITH SAMPLE 9 PERCENT


<span style="color: rgb(33, 33, 33);">Rather&nbsp;than&nbsp;<span style="color: #a31515;">"per&nbsp;stats"</span>&nbsp;we&nbsp;track&nbsp;meta&nbsp;data&nbsp;per&nbsp;<span style="color: #0000ff;">table</span>&nbsp;due&nbsp;</span> to the limitations in DMVs in the SqlPool. Sample View provided by Prodata below. Thsi provides some critcal information

- stats\_row\_count. This is the number of estimated rows when the stats were last rebuilt. Using DMV pdw\_table\_distribution\_properties  
- actual\_row\_count. This is the actual number of rows using the DMV sys.dm\_pdw\_nodes\_db\_partition\_stats
- dynamic\_threshold\_row. recommended threshold to update stats using same algorithm(s) as tradiitonal SQL Server. The lower of 20% difference or   SQRT(1000 \* \[row count\])
- stats\_sample\_rate (recommended sample rate). We use an adaptive formula for this, so that 1 billion is abotu 3% and less than 1 million is a FULLSCAN. The default other wise is 20%
- recommeneded\_update. 1 = we should consdier updating stats on the table.

In [11]:
select * from vTableStats WHERE table_name='FactFinance100m_nostats'

object_id,schema_name,table_name,table_type,distribution_type,stats_row_count,actual_row_count,stats_difference_percent,dynamic_threshold_rows,stats_sample_rate,recommend_update,sqlCommand
1655012977,dbo,FactFinance100m_nostats,CLUSTERED COLUMNSTORE,ROUND_ROBIN,100,100000100,100.0,316227,9,1,UPDATE STATISTICS [dbo].[FactFinance100m_nostats] WITH SAMPLE 9 PERCENT
