# V-Order

V-Order is a write time optimization to the parquet file format that enables lightning-fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark and others.

Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered parquet files to achieve in-memory like data access times. Spark and other non-Verti-Scan compute engines also benefit from the V-Ordered files with an average of 10% faster read times, with some scenarios up to 50%.

V-Order works by applying special sorting, row group distribution, dictionary encoding and compression on parquet files, thus requiring less network, disk, and CPU resources in compute engines to read it, providing cost efficiency and performance. V-Order sorting has a 15% impact on average write times but provides up to 50% more compression.

Its 100% open-source parquet format compliant; all parquet engines can read it as a regular parquet files. Delta tables are more efficient than ever; features such as Z-Order, compaction, vacuum, time trave are compatible with V-Order and may be used together for extra benefits. 

Table properties and optimization commands can be used on control V-Order on its partitions.

## Enable by defaut 

V-Order is enabled by default in Microsoft Fabric and in Apache Spark it's controlled by the following configurations:

|Configuration | Default value| Description|
|--|--|--|
|spark.sql.parquet.vorder.enabled| true |Controls session level V-Order writing|
|TBLPROPERTIES(“delta.parquet.vorder.enabled”)| false |Default V-Order mode on tables|
|Dataframe writer option: parquet.vorder.enabled| unset |Control V-Order writes using Dataframe writer|

In [None]:
spark.conf.get('spark.sql.parquet.vorder.enabled')

## Control V-Order in Apache Spark session

> To enable it

In [None]:
spark.conf.set('spark.sql.parquet.vorder.enabled', 'true')

> To disable it

In [None]:
spark.conf.set('spark.sql.parquet.vorder.enabled', 'false')

In [None]:
spark.sql("DROP TABLE IF EXISTS demo.vorder_demo")
spark.sql("DROP TABLE IF EXISTS demo.not_vorder_demo")

## Control V-Order using Delta table properties

When the table property is set to true; I**NSERT, UPDATE and MERGE** commands will behave as expected and perform. 

If the V-Order session configuration is set to true or the spark.write enables it, then the writes will be V-Order even if the TBLPROPERTIES is set to false.

In [None]:
%%sql 
CREATE TABLE demo.vorder_demo (id BIGINT) 
USING DELTA 
TBLPROPERTIES("delta.parquet.vorder.enabled" = "true");

In [None]:
%%sql

INSERT INTO demo.vorder_demo VALUES(1)

> Disable and Unset V-Order setting

In [None]:
%%sql 

ALTER TABLE demo.vorder_demo SET TBLPROPERTIES("delta.parquet.vorder.enabled"="false");

ALTER TABLE demo.vorder_demo UNSET TBLPROPERTIES("delta.parquet.vorder.enabled");

## Controlling V-Order directly on write operations

All Apache Spark write commands inherit the session setting if not explicit. 

The following command writes using V-Order by implicitly inheriting the session configuration.

In [None]:
spark.range(5).write.format("delta").mode("append").saveAsTable("demo.vorder_demo")

> Disabling V-Order when writing.

In [None]:
spark.conf.set('spark.sql.parquet.vorder.enabled', 'false')

In [None]:
spark.conf.get('spark.sql.parquet.vorder.enabled')

In [None]:
spark.range(5).write.format("delta").saveAsTable("not_vorder_demo")

## Checking tables if V-Order is enabled

In [None]:
import pyarrow.dataset as pq

def get_schema_metadata(file_api_path):

    schema_metadata = pq.dataset(file_api_path).schema.metadata
    if schema_metadata:
        for key, value in schema_metadata.items():
            print(f"{key.decode('utf-8')}: {value.decode('utf-8')}")
        is_vorder = any(b'vorder' in key for key in schema_metadata.keys())
    else:
        print("No schema metadata found.")
        is_vorder = None
    return is_vorder

In [None]:
# Retrieve the list of Delta tables
delta_tables = spark.sql("SHOW TABLES").toPandas()

# Iterate over the Delta tables and check the property
for _, row in delta_tables.iterrows():
    table_name = row['tableName']
    table_name_path = "//lakehouse/default/Tables/" + table_name

    print("\nTable: " + table_name)
    print("\nSchema metadata: ")
    print("--------------------")
    is_vorder = get_schema_metadata(table_name_path)

    if is_vorder:
        print('V-order is enabled')
    else:
        print('V-order is NOT enabled')



# Clean up

In [None]:
spark.sql("DROP TABLE IF EXISTS vorder_demo")
spark.sql("DROP TABLE IF EXISTS not_vorder_demo")