# When do we Optimize Column Stores ?

## <span style="font-size: 14px;">We have three scenarios that may class a column store as requiring maintenance.</span>  

- The Avg Row size of a segment is below the optimal (1024\*1024=1,048,576). We call this density and it can be expressed as %. Eg 10% **fragmentation** means each segment is only 90% full.
- The table contains a lot of soft **deleted** rows
- The table contains a lot of **inserted** rows that have not yet been compressed by the [tuple mover](https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-index). It can take over 60 million rows before this is engaged (eg 1 segment per distribution).

.

### Demo - Fragmentation and Density

In [30]:
/* Demo - Examining Column store Density on inital Insert for 100 milion row */
IF OBJECT_ID('[dbo].[FactFinance100m]') is not null 
	DROP TABLE [dbo].[FactFinance100m]
GO
CREATE TABLE [dbo].[FactFinance100m] WITH (
	DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX 
) AS
SELECT TOP 130000000 * FROM [dbo].[FactFinance1b]



In [31]:
/* Using the DMV sys.[dm_pdw_nodes_db_column_store_row_group_physical_stats] we can track the size of each segment and WHY it was closed 
    The view "[dbo].[vCS_rg_physical_stats]" from https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-memory-optimizations-for-columnstore-compression 
    Helps a lot in understanding why segments arent full.

    BULK INSERT, over partitioning or over enthusiastic REORG are the.

    In this case we see only OPEN and NO_TRIM

*/
select * From [dbo].[vCS_rg_physical_stats] WHERE logical_table_name='FactFinance100m'

logical_table_name,row_group_id,partition_number,state,state_desc,total_rows,trim_reason_desc,physical_name,created_time
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_1,2022-07-08 14:20:27.257
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_2,2022-07-08 14:20:27.217
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_3,2022-07-08 14:20:27.210
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_4,2022-07-08 14:20:27.003
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_5,2022-07-08 14:20:27.113
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_6,2022-07-08 14:20:27.247
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_7,2022-07-08 14:20:27.123
FactFinance100m,2,1,1,OPEN,69172,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_8,2022-07-08 14:20:27.193
FactFinance100m,2,1,1,OPEN,69736,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_9,2022-07-08 14:20:26.993
FactFinance100m,2,1,1,OPEN,69736,,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_10,2022-07-08 14:20:27.173


In [32]:

/* This view shows the fragmentation_density is perfect, with some open row stores for overspill. Not ununsual as this table is only 2 complete segments per distribution */ 
SELECT * FROM dbo.vColumnstoreStats WHERE table_name='FactFinance100m'



execution_date,database_name,schema_name,table_name,partition_number,partition_scheme,object_id,index_name,row_count,deleted_row_count,row_group_count,compressed_row_count,compressed_rowgroup_count,open_rowgroup_count,open_row_count,compressed_row_max,compressed_row_avg,fragmentation_density,fragmentation_deletes,fragmentation_open
2022-07-08 14:21:21.870,AdventureWorksDW,dbo,FactFinance100m,,,290868153,ClusteredIndex_e2f1410ade274620bfb52e91dcee6ddb,130000000,0,180,125829120,120,60,4170880,1048576,1048576,0.0,0.0,6.63


In [28]:
/*
    Lets create same table, but just with 20 million rows (not complete segment, then insert in batches to make 130 million)

    Here we can see the RowGroup closed prematurely by BULkOAD. If BULK operation IS > about 200k records a close may happen on inital insert.

    This results in 14% fragmentation and very small RowGroupo
*/

IF OBJECT_ID('[dbo].[FactFinance100m]') is not null 
	DROP TABLE [dbo].[FactFinance100m]
GO
CREATE TABLE [dbo].[FactFinance100m] WITH (
	DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX 
) AS
SELECT TOP 20000000 * FROM [dbo].[FactFinance1b]
GO
SELECT * FROM dbo.vColumnstoreStats WHERE table_name='FactFinance100m'
select * From [dbo].[vCS_rg_physical_stats] WHERE logical_table_name='FactFinance100m'


execution_date,database_name,schema_name,table_name,partition_number,partition_scheme,object_id,index_name,row_count,deleted_row_count,row_group_count,compressed_row_count,compressed_rowgroup_count,open_rowgroup_count,open_row_count,compressed_row_max,compressed_row_avg,fragmentation_density,fragmentation_deletes,fragmentation_open
2022-07-08 13:48:55.610,AdventureWorksDW,dbo,FactFinance100m,,,274868096,ClusteredIndex_f0d644e25550454eb739471226eee313,20000000,0,60,20000000,60,0,0,333884,333333,68.21,0.0,0.0


logical_table_name,row_group_id,partition_number,state,state_desc,total_rows,trim_reason_desc,physical_name,created_time
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_1,2022-07-08 13:48:54.293
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_2,2022-07-08 13:48:54.433
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_3,2022-07-08 13:48:54.277
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_4,2022-07-08 13:48:54.337
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_5,2022-07-08 13:48:54.320
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_6,2022-07-08 13:48:54.380
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_7,2022-07-08 13:48:54.313
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_8,2022-07-08 13:48:54.300
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_9,2022-07-08 13:48:54.320
FactFinance100m,0,1,3,COMPRESSED,333324,BULKLOAD,Table_fc3e1376c2bf43ea9af3735d63f84159_10,2022-07-08 13:48:54.453


In [33]:
/*
    Lets Insert another 100 million to make similar as inital test case (120 million)

    We can see that we have 360 RowGroups instead of 180 and Fragmentation is only 17%, with only 33k per Row Group instead of 1024l
*/

INSERT INTO FactFinance100m ([AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey])
SELECT TOP 20000000 [AccountKey], [ScenarioKey], [DepartmentGroupKey], [DateKey], [OrganizationKey], [Amount], [Date], [LineageKey] FROM [dbo].[FactFinance1b]
GO 5


In [34]:
/* Now lets Look at the Fragmnentation. 68% Fragmented. Eg only 32% full */

SELECT * FROM dbo.vColumnstoreStats WHERE table_name='FactFinance100m'
select * From [dbo].[vCS_rg_physical_stats] WHERE logical_table_name='FactFinance100m'

execution_date,database_name,schema_name,table_name,partition_number,partition_scheme,object_id,index_name,row_count,deleted_row_count,row_group_count,compressed_row_count,compressed_rowgroup_count,open_rowgroup_count,open_row_count,compressed_row_max,compressed_row_avg,fragmentation_density,fragmentation_deletes,fragmentation_open
2022-07-08 14:32:24.287,AdventureWorksDW,dbo,FactFinance100m,,,290868153,ClusteredIndex_e2f1410ade274620bfb52e91dcee6ddb,230000000,0,480,225829120,420,60,4170880,1048576,537688,48.72,0.0,6.63


logical_table_name,row_group_id,partition_number,state,state_desc,total_rows,trim_reason_desc,physical_name,created_time
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_1,2022-07-08 14:30:44.753
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_2,2022-07-08 14:30:44.740
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_3,2022-07-08 14:30:44.840
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_4,2022-07-08 14:30:44.840
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_5,2022-07-08 14:30:44.797
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_6,2022-07-08 14:30:44.837
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_7,2022-07-08 14:30:45.490
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_8,2022-07-08 14:30:44.847
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_9,2022-07-08 14:30:45.540
FactFinance100m,7,1,3,COMPRESSED,333324,BULKLOAD,Table_5e36fad5620b4f1c8f3eb0d9fc992bec_10,2022-07-08 14:30:44.767
