<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Open Table Format - Getting Started
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style="font-size:20px;font-family:Arial"><b>Introduction</b></p>

<p style="font-size:16px;font-family:Arial">
    The <b>Open Table Format</b> is a standardized way of accessing, storing, and managing large analytic datasets in data lakes that are engine-agnostic and vendor-neutral. It defines how data files (such as Parquet, ORC, or Avro) and their metadata are structured so that multiple compute engines can reliably read, write, and modify the same data without transformations and data duplication.</p><br>
<div style="text-align: center;">
  <img src="./images/otf.png" 
       alt="otf" 
       style="width:40%; border: 4px solid #404040; border-radius: 10px;" />
</div><br>
<p style="font-size:16px;font-family:Arial">
Teradata VantageCloud supports reading from and writing to data stored in Apache Iceberg and Delta Lake. Customers can run analytical workloads on Apache Iceberg and Delta Lake Open Table Format (Iceberg OTF) tables directly within Teradata VantageCloud. <br>Users can effortlessly query and write Iceberg and Delta Lake OTF datasets stored in popular catalogs such as Unity Catalog, AWS Glue Data Catalog, or Apache Hive using simple and intuitive SQL syntax.  The offering is compliant with <a href = 'https://iceberg.apache.org/spec/'> Apache Iceberg Specifications Version 2</a>  and     <a href = 'https://docs.delta.io/latest/index.html'> Delta Lake Documentation </a>. Cross reads are supported across all Cloud Object Storages, alongside users or applications manipulating data in the Iceberg and Delta Lake OTF Tables using ACID (Atomicity, Consistency, Isolation, Durability)-compliant services. <br>
<ul style="font-size:16px;font-family:Arial"> <b>Benefits of OTF</b>
    <li> Separation of data analytics and data management</li>
    <li>Reduce data integration complexity and replication costs</li>
    <li>Vendor agnostic, Multi-analytics engine interoperability and powered by open-source community</li>
    <li>Access and manipulate the data using the same syntax as any other tables in a Teradata database.
    </ul>
</p>

<p style = 'font-size:20px;font-family:Arial'><b>What we will do in this Notebook</b></p>

<p style = 'font-size:16px;font-family:Arial'>
This notebook is designed to guide us through steps required to work with OTF tables. Here's what we'll learn:
</p>
<div style = 'font-size:16px;font-family:Arial'>
<ol>
    <li>Installation and Prerequisites</li>
<li>Creating authorization objects to access storage and catalogues.</li>
<li>Creating a Datalake object to query the existing table</li>
    <li>Help sql commands</li>
    <li>Querying OTF tables</li>
    <li>Join OTF tables with other database tables</li>
    <li>Creating new OTF tables and inserting data</li>
    <li>Snapshots and time travel</li>
</ol>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>1. Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>First we will create the variable for magic command to connect to enviroment in the sql kernel. Hence we will remove any variable if any same name variable is already there. If it is not present below command will give error, please ignore the message "Profile does not exist".

In [None]:
%rmconnect otf

<p style = 'font-size:16px;font-family:Arial'> For the below command please add the username given in the .env file without quotes. To get this open a new Terminal and locate the .env file we have provided.
Then select File > New Launcher > Terminal<br>
<img src="./images/terminal.png" alt="terminal" style="width: 50%; border: 4px solid #404040; border-radius: 10px;"/><br>
<br>
At the command prompt in the terminal, execute this command <code>cat ~/JupyterLabRoot/VantageCloud_Lake/.config/.env</code><br>
The output from the command will be similar to this:<br>
<img src="./images/env_file.png" alt="env file" style="width: 60%; border: 4px solid #404040; border-radius: 10px;"/>     </li>

In [None]:
%addconnect name=otf, host=54.156.178.22, user=dallas38_jn6hvh5icpnu97gw

<p style = 'font-size:16px;font-family:Arial'>For the password please give the my_variable value (without quotes) and press enter.

In [None]:
%connect otf, hidewarnings=true

<p style = 'font-size:18px;font-family:Arial'> <b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>For this demo, we are using Open Table Formats (OTF) data stored in a cloud data lake on AWS S3. Specifically, the dataset is managed using the Iceberg Catalog with the AWS Glue catalog.

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>2. Installation and Setup</b></p>    
<p style = 'font-size:16px;font-family:Arial;'>The Apache Iceberg and Delta Lake read and write capabilities are already enabled within this Teradata VantageCloud environment and no additional installations or tasks are required to enable this feature.<br><br>
Sections 3 and 4 are provided here as an example of the requirements within the Teradata database.

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>3. Configuration Prerequisites (Already implemented)</b></p>
<p style = 'font-size:16px;font-family:Arial;'>The following grants need to be granted to users who will use the datalake.<br> <i>Note* These are already given to your user</i></p>
<p style="line-height: 1.1; font-size:14px; font-family:Arial; padding-left: 2em;">
<code style="padding:0; line-height:1.1;">GRANT ALL ON &lt;username&gt; TO &lt;username&gt;;
GRANT ALL ON TD_SERVER_DB TO &lt;username&gt; WITH GRANT OPTION;  
GRANT EXECUTE FUNCTION ON SYSLIB TO &lt;username&gt; WITH GRANT OPTION; 
GRANT EXECUTE FUNCTION ON td_sysfnlib TO &lt;username&gt;;
/*depending on how authorization object is setup one of below commands*/   
GRANT CREATE AUTHORIZATION ON &lt;username&gt; TO &lt;username&gt; WITH GRANT OPTION;  or
GRANT EXECUTE ON &lt;authorization_object&gt; TO &lt;username&gt; WITH GRANT OPTION;    
</code></p>


<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>4. Creating Authorization Objects (Already implemented)</b></p>
<p style = 'font-size:16px;font-family:Arial;'>The DATALAKE object requires two Authorization objects to be specified in the <code>&lt;auth-list&gt;</code> of create datalake ddl – one for the Catalog connection, and one for the Storage connection. The CREATE AUTHORIZATION DDL can be used to create the authorization objects. The 
AUTHORIZATION object holds the AWS credentials i.e. the access key id (user) and access secret key (password) of the service principal that’s trying to access services and resources in AWS. Below is the sql that we should use to create the authorization object.</p>
<p style="line-height: 1.1; font-size:14px; font-family:Arial; padding-left: 2em;">
<code style="padding:0; line-height:1.1;">CREATE AUTHORIZATION &lt;Databasename.AuthorizationObject&gt;
USER '&lt;User Name&gt;' 
PASSWORD '&lt;Password&gt;'; </code></p>
<p style = 'font-size:16px;font-family:Arial;'>Create Authorization defines the credentials to access the catalog and storage. The credentials could 
be the same for catalog and storage.
<ul style="font-size:16px;font-family:Arial"> 
    <li>In AWS, credentials are given using the IAM ASSUMEROLE policy</li>
    <li> In Azure, credentials are given using an Azure AD Service Principal where the user is Azure AD Service Principal Client ID and 
        Password is Azure AD Service Principal client secret key</li>
</ul>
<p style = 'font-size:16px;font-family:Arial;'><i>Authorization object is already created for the Clearscape Experience environment</i></p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>5. Creating Datalake Objects </b></p>
<p style = 'font-size:16px;font-family:Arial;'>
     The newly introduced DATALAKE object encapsulates the information needed to connect to an external data lake using OTF:
    <ul style="font-size:16px;font-family:Arial"> <li>CREATE DATALAKE creates  connections to customer’s choice of Catalog and Object Stores</li>
        <li>We support AWS Glue, Unity, and Apache Hive catalogs with this release</li>
        <li>Users can use ALTER DATALAKE to alter the properties of the DATALAKE object such as specifying new Authorization objects</li>
<li>Users can ADD/DROP catalog and storage locations</li>
<li>Users cannot change the TABLE FORMAT option using ALTER DATALAKE, use DROP/ADD 
DATALAKE instead</li>
        </ul>

<p style = 'font-size:16px;font-family:Arial;'>The below SQL creates a DATALAKE object in the Teradata database (TD_SERVER_DB). It configures the DATALAKE object to use the Iceberg table format, points it to an S3 bucket location with the appropriate external security invoker settings for the catalog and storage. <br>
<p style="line-height: 1.1; font-size:14px; font-family:Arial; padding-left: 2em;">
<code style="padding:0; line-height:1.1;">CREATE DATALAKE &lt;Datalake name&gt;
    EXTERNAL SECURITY CATALOG &lt;Authorization_object name&gt;,
    EXTERNAL SECURITY STORAGE &lt;Authorization_object name&gt;
USING
    storage_location('&lt;Location&gt;')
    catalog_type('glue')
    storage_region ('&lt;Region name&gt;')
TABLE FORMAT iceberg;
</code>
</p>
<p style = 'font-size:16px;font-family:Arial;'>&nbsp;&nbsp;&nbsp;&nbsp;<i>Note* this is already executed for the Clearscape Experience environment and the you can see the defintion by running the below command.</i></p>

In [None]:
SHOW DATALAKE iceberg_glue;

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>6. Help on the Datalake Objects </b></p>
<p style = 'font-size:16px;font-family:Arial;'>the above sections we have explained the steps on how we establish the connection for the OpenTableFormat tables . Once the connection is created we can query/update the OTF tables as we need.<br>
    <b>HELP</b> commands give us the information about the datalake and the databases and tables in the datalake.<br>
    <code>HELP DATALAKE</code> queries the Iceberg or Delta catalog and lists the databases present in the <code>DATALAKE</code> object

In [None]:
HELP DATALAKE iceberg_glue;

<p style = 'font-size:16px;font-family:Arial;'><code>HELP DATABASE</code> lists the tables 
that are present in a data lake

In [None]:
HELP DATABASE iceberg_glue.demo_glue_db;

<p style = 'font-size:16px;font-family:Arial;'><code>HELP TABLE</code> lists the table schema 
associated with a table in a 
Datalake database

In [None]:
HELP TABLE iceberg_glue.demo_glue_db.trip_detail;

<p style = 'font-size:16px;font-family:Arial;'>The above <code>HELP TABLE</code> command gives us the information about the columns in the table present in the database of datalake. Note that if we have multiple table in the database of datalake all will be shown in the Help Database command.

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>7. Querying the OTF Data</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We have made our connection to the Vantage system, now let's start exploring the data. Not a problem to analyze large datasets using Vantage, lets take a look at a sample of the data.<br>Querying OTF table require 3-level dot notation to refer to an OTF Table <br><code>&lt;datalakename&gt;.&lt;databasename&gt;.&lt;tablename name&gt;</code></p>

In [None]:
SELECT TOP 10 * FROM iceberg_glue.demo_glue_db.trip_detail;

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial;'><b>7.1 Query the iceberg table with other tables in Teradata</b><br>
    <p style = 'font-size:16px;font-family:Arial;'>We can query the OTF table with the other tables present in the database, in the below query we will join the iceberg table with the DEMO_AustinBikeShare.stations table</p>

In [None]:
SELECT
    t.bikeid,
    t.trip_ID,
    t.subscriber_type,
    t.start_station_id,
    COALESCE(t.start_station_name, st.NAME) AS start_station_name,
    t.start_time,
    st.status starting_station_status,
    t.end_station_id,
    COALESCE(t.end_station_name, ed.NAME) AS end_station_name,
    t.start_time 
        + CAST(t.duration_minutes/60 AS INTERVAL HOUR(4)) 
        + CAST(t.duration_minutes MOD 60 AS INTERVAL MINUTE(4)) AS end_time,
    ed.status AS end_station_status,
    t.duration_minutes
   FROM
    iceberg_glue.demo_glue_db.trip_detail AS t
    LEFT JOIN DEMO_AustinBikeShare.stations AS st ON t.start_station_id = st.station_id
    LEFT JOIN DEMO_AustinBikeShare.stations AS ed ON t.end_station_id = ed.station_id;

<br>
<p style = 'font-size:16px;font-family:Arial;'>We can also join multiple OTF tables residing in different data lakes and/or different clouds providers.</p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>8. Writing to OTF tables</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We can also create and insert data in the OTF tables. We're doing this in a shared catalog, which means you need to name your table for uniqueness. We request that you use your Database username as a prefix for your table name. Replace <code>{username}</code> with that value in all the remaining cells.</p>

In [None]:
CREATE TABLE iceberg_glue.demo_glue_db.{username}_customers (
    customerid INT,
    firstname VARCHAR(50),
    lastname VARCHAR(50),
    gender CHAR(1),
    city VARCHAR(50),
    no_trips INT
);

In [None]:
SELECT * FROM iceberg_glue.demo_glue_db.{username}_customers;

<p style = 'font-size:16px;font-family:Arial;'>Insert sample data in the table.</p>

In [None]:
INSERT INTO iceberg_glue.demo_glue_db.{username}_customers VALUES (19310, 'Flavio','DeCosta','M','New York',12);

In [None]:
INSERT INTO iceberg_glue.demo_glue_db.{username}_customers VALUES (19311, 'Isabella','Mayer','F','LA',5);

In [None]:
INSERT INTO iceberg_glue.demo_glue_db.dallas38_jn6hvh5icpnu97gw_customers VALUES (19312, 'Dai','Sun','M','Tokyo',10);

<p style = 'font-size:16px;font-family:Arial;'>Querying the inserted data.</p>

In [None]:
SELECT * FROM iceberg_glue.demo_glue_db.{username}_customers;

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial;'><b>8.1 Schema Evolution</b></p>
<p style = 'font-size:16px;font-family:Arial;'>Over the time the table structure of iceberg table can change, this can be easily accomodated without rewriting the whole data again.

In [None]:
ALTER TABLE iceberg_glue.demo_glue_db.{username}_customers ADD Phone VARCHAR(50);

In [None]:
HELP TABLE iceberg_glue.demo_glue_db.{username}_customers;

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>9. TimeTravel and Snapshots</b></p>
<p style = 'font-size:16px;font-family:Arial;'>Table metadata for OTF tables such as table history, snapshots, manifests and partition information can be retrieved invoking new system table operators
     <ul style="font-size:16px;font-family:Arial">
        <li>Use <code>TD_SNAPSHOTS</code> to get snapshot information for an OTF table</li>
        <li>Use <code>TD_MANIFESTS</code> to get manifest information for an OTF table</li>
        <li>Use <code>TD_PARTITIONS</code> to retrieve partition information for an OTF table</li>
     </ul>
</p>

In [None]:
SELECT * FROM TD_SNAPSHOTS (ON (iceberg_glue.demo_glue_db.{username}_customers)) C; 

<p style = 'font-size:16px;font-family:Arial;'>The above query returns following columns:
 <ul style="font-size:16px;font-family:Arial"> <li>snapshotId: is a unique identifier for the snapshot</li>
        <li>snapshotTimestamp:is the timestamp of when the snapshot was taken timestampMSecs is the timestamp expressed  in milliseconds </li>
<li>manifestList :is the pointer to the manifest file</li>
<li>summary: A summary of what was changed in this snapshot</li>
        </ul>

In [None]:
SELECT * FROM TD_MANIFESTS (ON (iceberg_glue.demo_glue_db.{username}_customers)) D;

In [None]:
SELECT * FROM TD_MANIFESTS (ON (iceberg_glue.demo_glue_db.dallas38_jn6hvh5icpnu97gw_customers)) D;

<p style = 'font-size:16px;font-family:Arial;'>The above query returns following columns:
<ul style="font-size:16px;font-family:Arial">
     <li><b>snapshotId</b>: is a unique identifier for the snapshot</li>
     <li><b>snapshotTimestamp</b>:is the timestamp of when the snapshot was taken timestampMSecs is the timestamp expressed in milliseconds </li>
    <li><b>manifestList</b>: manifestList is the pointer to the manifest file</li>
    <li><b>manifestFileLength</b>: manifestFileLength is the length of the manifest file</li>
    <li><b>datafilecount</b>:  Number of data files that this manifest file points to</li>
    <li><b>totalrowcount</b>: totalrowcount total number of rows in data files</li>
</ul>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial;'><b>9.1 Time Travel</b></p>

 <ul style="font-size:16px;font-family:Arial"> <li>OTF takes snapshots of Iceberg tables whenever a change is made </li>
        <li>These immutable snapshots are created by copying manifest lists and 
files</li>
<li>Immutable snapshots ensure that historical data is preserved and 
available for audit and debugging purposes</li>
<li>Snapshots can be queried by specifying a timestamp or a snapshot id</li>
    <li>Snapshots have retention periods, typically 30 or 60 days</li>
    <li>Storage costs go up as retention periods are increased</li>
        </ul>

In [None]:
SELECT * FROM iceberg_glue.demo_glue_db.{username}_customers FOR SNAPSHOT AS OF '<snapshot_id>'; 
--please get the snapshot id from above queries

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>10. Clean up</b></p>


In [None]:
DROP TABLE iceberg_glue.demo_glue_db.{username}_customers PURGE ALL;

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'> <b> 11. Conclusion </b> </p>
<p style = 'font-size:16px;font-family:Arial'>This notebook highlights the power of Open Table Formats (OTF) in enabling seamless, scalable, and flexible data exploration across multiple cloud data lakes. By leveraging Vantage’s support for OTFs like Apache Iceberg and Delta Lake, we can run SQL queries directly on data stored in different catalogs and storage systems without worrying about data movement or format compatibility.<br> Throughout this demo, we’ve seen how Vantage’s integration with OTFs simplifies querying and managing evolving datasets while preserving data consistency and enabling advanced features like schema evolution, partition management, and time travel.You can further explore this feature by creating different datalakes in different cloud CSPs and catalogues.</p>

<p style = 'font-size:20px;font-family:Arial'><b>Reference Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
        <li>Open Table Format Reference:
        <a href = 'https://docs.teradata.com/search/all?query=Open+table+format&content-lang=en-US'>
        Open Table Format Documentation</a></li>
  
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2026. All Rights Reserved
        </div>
    </div>
</footer>