<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Vantage SQL Plugin Basics</b>
</header>

<p style = 'font-size:16px;font-family:Arial'>Welcome to this introductory guide. This guide will walk you through your first SQL queries.  A more extensive notebook with more commands can be found online in the Teradata Vantage Modules for Jupyter <a href = 'https://teradata.github.io/jupyterextensions/#/'>Here.</a></p>

<hr>
<p style = 'font-size:16px;font-family:Arial'>For help on available "magic" commands in the Teradata SQL plugin, run the following cell:</p>

In [None]:
%help

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Object Browser</b>

Included with the plugin is a browser that alows you to view the databases and tables on your ClearScape Analytics Experience platform.  Note that if you haven't run any demos or haven't used the data dictionary, you will only see system tables. 

<ol style = 'font-size:16px;font-family:Arial'>
    <li>Select <b>File/New Launcher</b> or if there is a launcher t to a launcher (new or existing)</li>
<li>Click Navigator</li>
<li>Select database profile</li>
</ol>


<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Introduction to SQL</b>

<p style = 'font-size:16px;font-family:Arial'>This page is a Jupyter Notebook. Looking at the top-right corner, you can see that it is running a *Teradata SQL* kernel. Other kernel include R and Python. This means that any command on this notebook will have to be a SQL statement.</p>

<p style = 'font-size:16px;font-family:Arial'>The first step with any SQL editor or Jupyter Notebook is to connect to a Teradata system.</p>

<p style = 'font-size:16px;font-family:Arial'>If this notebook is being run from the Vantage LIVE image, three connections should already be set up; representing connections to Transcend Production, Transcend AWS, and Vantage LIVE (Azure).  The below command will list configured connections</p>

In [None]:
%lsconnect

In [None]:
%connect local

<p style = 'font-size:16px;font-family:Arial'>You are connected if the last command returned `Success:... connection established`</p>

<p style = 'font-size:16px;font-family:Arial'>The most simple SQL you can run is to calculate a formula, for example 2+2. Click on the cell below and press Shift-Return to execute its content. The result is saved in the Teradata directory mentioned above with the exact path below which can be accessed/downloaded from the file browser on the left.</p>

In [None]:
SELECT 2+5

<p style = 'font-size:16px;font-family:Arial'>Now, let's run your first real SQL query. Click in the cell below and press Shift+Return to execute it. It will retrieve ten rows from a table named DBC.TablesV</p>

In [None]:
SELECT top 10 * FROM DBC.TablesV;

<p style = 'font-size:16px;font-family:Arial'>
    This is the simplest high-level SQL query. <b>SELECT</b> is a SQL keyword that starts the first section on defining the columns we wan to extract. <b>FROM</b> is a SQL keyword that starts the source data section. <b>DBC.TablesV</b> is the table we want to extract all the columns.</p>

<p style = 'font-size:16px;font-family:Arial'>
    Instead of "shift+Enter", one can also click on the <b>Execute</b> button (triangular Play button in the top toolbar) will run the SQL query on our Vantage environment</p>

<p style = 'font-size:16px;font-family:Arial'>After a few seconds you will find the results in the result window. You can select any cell, type Ctrl-C to copy the data, and then Ctrl-P in Excel to paste the data for further analysis (filtering, pivoting, etc.).</p>

<p style = 'font-size:16px;font-family:Arial'>The results include the following columns:</p>

<ul style = 'font-size:16px;font-family:Arial'>
    <li>DatabaseName, name of the database a table is in</li>
<li>TableName, name of the table</li>
<li>Version</li>
<li>TableKind</li>
<li>ProtectionType</li>
<li>JournalFlag</li>
<li>CreatorName</li>
<li>RequestText</li>
<li>CommentString</li>
<li>ParentCount</li>
<li>ChildCount</li>
<li>...</li>
</ul>

<p style = 'font-size:16px;font-family:Arial'>To select specific columns, we define them in the <b>SELECT</b> statement delimitated by a comma. For readability, we will enter each one on a new line and use a nice tabular form:</p>

In [None]:
SELECT top 5 TableName, DatabaseName FROM DBC.TablesV;

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Ordering</b>

<p style = 'font-size:16px;font-family:Arial'>We can order the data by Name:</p>

In [None]:
SELECT top 5 TableName, DatabaseName FROM DBC.TablesV ORDER BY TableName;

<p style = 'font-size:16px;font-family:Arial'>We added here the SQL keyword <b>ORDER BY</b> followed by the column on which we want to order the data.</p>

<p style = 'font-size:16px;font-family:Arial'>To order in descendign order, we add the SQL keyword <b>DESC</b> next to the column name:</p>

In [None]:
SELECT top 5 TableName, DatabaseName FROM DBC.TablesV ORDER BY TableName DESC;

<p style = 'font-size:16px;font-family:Arial'>We can order using multiple columns, for example:</p>

In [None]:
SELECT
    top 50 *
FROM
    DBC.TablesV
ORDER BY DatabaseName DESC, TableName 

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Filtering</b>

<p style = 'font-size:16px;font-family:Arial'>We can filter for specific value with the SQL keyword <b>WHERE</b>. For example:</p>

In [None]:
SELECT top 50 * FROM DBC.TablesV
WHERE TableKind = 'T'
ORDER BY DatabaseName DESC, TableName 

<p style = 'font-size:16px;font-family:Arial'>We use the SQL keyword **AND** and **OR** with parenthesis to define any logic. For example, to select only the CreatorName of DBC OR SYSADM:</p>

In [None]:
SELECT top 50 * FROM DBC.TablesV
WHERE (TableKind = 'T' AND CreatorName = 'DBC')
  OR (TableKind = 'T' AND CreatorName = 'SYSADM')
ORDER BY DatabaseName DESC, TableName 

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Aggregation</b>

<p style = 'font-size:16px;font-family:Arial'>We can aggregate multiple rows together in a similar way as pivot tables work in Excel. For example, to count how many object (tables, views, macros, stored procedures, functions, etc) we have with the same name in Vantage, we would do:</p>


In [None]:
SELECT top 20 DatabaseName, count(*) as ObjectCount FROM DBC.TablesV
GROUP BY DatabaseName  
ORDER BY DatabaseName


<p style = 'font-size:16px;font-family:Arial'>In the <b>SELECT</b> example below, we use a sub query to aggregate object size within a database, then we use functions to get the minimum and maximum object size and count for each database.  (The counts are different from above because views and macros take no space).</p>

<p style = 'font-size:16px;font-family:Arial'>There are multiple aggregation functions:</p>

<ul style = 'font-size:16px;font-family:Arial'>
    <li>SUM(X) to sum all the X values</li>
<li>AVG(X) for the average of X</li>
<li>MIN(X) for the smallest X value</li>
<li>MAX(X) for the largest X value</li>
</ul>

<p style = 'font-size:16px;font-family:Arial'>For example:</p>

In [None]:
SELECT
    TOP 20 databasename,
    MIN(TableBytes)  AS MinSize,
    MAX(TableBytes)  AS MaxSize,
    COUNT(*) AS #Objects_Using_Space
FROM (
    SELECT
        DatabaseName,
        TableName,
        Sum(CurrentPerm) as TableBytes   
        From DBC.TablesizeV
        Group By DatabaseName, Tablename
    ) as Subtotals 
GROUP BY DatabaseName order by DatabaseName;


<p style = 'font-size:16px;font-family:Arial'>For courageous readers, you can also try the following aggregation functions:</p>

<ul style = 'font-size:16px;font-family:Arial'>
<li>STDDEV_POP(X) and STDDEV_SAMP(X) to measure the standard deviation of the X distribution (for the full population, and for a sampled population)
<li>VAR_POP(X) and VAR_SAMP(X) to measure the variance of the X distribution (for the full population, and for a sampled population)
<li>COVAR_POP(X,Y) and COVAR_SAMP(X,Y) to measure the covariance between X and Y (for the full population, and for a sampled population)
<li>CORR(X,Y) to measure the correlation between the columns X and Y
<li>KURTOSIS(X) to measure the kurtosis of the X distribution **minus** 3 (the tailedness of the distribution, it should be around 0 for a Normal/Gaussian distribution)
<li>SKEW(X) to measure the skewness of the X distribution (the asymetry of the distribution, it should be around 0 for a Normal/Gaussian distribution) 
</ul>


<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2022 Teradata. All Rights Reserved</footer>