<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Teradataml Python Basics</b>
</header>

<table>
    <tr>
        <td width = '50%' style = 'vertical-align:top;font-size:16px;font-family:Arial'><br><p>Teradataml is a library that allows python programmers and developers to access the power of Vantage.</p>
            <p>Developers can use common data management functions and methods based on Python Pandas.  Teradataml translates python directives to underlying SQL in order to process directly on the Vantage system without costly data movement.</p>
        <p>Additionally, teradataml provides simple functions providing access to machine learning, open analytics frameworks, and powerful advanced analytical capabilities.</td>
        <td width = '50%'><img src = 'images/Functional_Diagram.png'></td>
    </tr>
    </table>




<p style = 'font-size:16px;font-family:Arial'>This notebook will cover the very basics of the Teradataml package and is a technical demonstration of different functionalities of Teradataml. This is not a business outcome type demo.  Please see the Getting Started Guide online <a href = 'https://docs.teradata.com/r/Teradata-Package-for-Python-User-Guide/November-2021/Introduction-to-Teradata-Package-for-Python'>here</a></p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Contents</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
<li>Connecting to Vantage
    <ul>
        <li>Initiate a connection to Vantage</li>
    </ul>
</li>    
<li>Teradatml Basics
    <ul>
        <li>Create a Teradata DataFrame (virtual DataFrame)</li>
        <li>Aggregations</li>
        <li>Transformations</li>
        <li>SQL Functions</li>
        <li>Joins</li>
        <li>Bring the data to the client - Pandas</li>
        <li>Cleanup</li>
    </ul>
</li>
<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Install and Import the necessary libraries</b></p>



<p style = 'font-size:16px;font-family:Arial'>Upgrade to latest version of Teradatml before importing it.</p>

<p style = 'font-size:14px;font-family:Arial'>Note :'%%capture' suppresses the display of installation steps of the following packages</p>

In [None]:
%%capture
!pip install --upgrade teradataml --user

In [None]:
# getpass to ask password to user and prevent storing it plain in the Notebook
import getpass
import pandas as pd

# import all Teradataml functions and supporting libraries
from teradataml import *
from teradataml import create_context, DataFrame, get_context, copy_to_sql, in_schema, remove_context
from teradataml.table_operators.Script import Script
from sqlalchemy import func

<b style = 'font-size:20px;font-family:Arial;color:#E37C4D'> Accessing the Data
<p style = 'font-size:16px;font-family:Arial'>These demos will work either with foreign tables accessed from Cloud Storage via NOS or you may import the tables to your machine. If you import data for multiple demos, you may need to use the Data Dictionary "Manage Your Space" routine to cleanup tables you no longer need. 
    
<p style = 'font-size:16px;font-family:Arial'>Use the link below to access the 2 options for using data from the data dictionary notebook:

[Click Here to get data for this notebook](../Data_Dictionary/Data_Dictionary.ipynb#TRNG_DataScienceExploration)

[Click Here to Manage Your Space](../Data_Dictionary/Data_Dictionary.ipynb#Manage_Your_Space)

<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>Initiate a connection to Vantage.</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to enter the password. Enter the password to connect to Vantage, then click on the next cell to continue.</p>

In [None]:
eng = create_context(host = 'host.docker.internal', username='demo_user', password = getpass.getpass())
print(eng)

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Create a Teradata DataFrame (virtual DataFrame)</b>
<p style = 'font-size:16px;font-family:Arial'>The teradataml DataFrame module function can create a pointer to a table or a SQL statement in the target Vantage system.  Note, no data is copied back to the client, and all operations on the data are translated to SQL and executed in Vantage.</p>

In [None]:
tdf = DataFrame('"TRNG_DataScienceExploration"."HOUSE_PRICES"')

<p style = 'font-size:16px;font-family:Arial'>Extract a few rows - note, only the rows needed to satisfy the head() method are returned to the client</p>

In [None]:
tdf.head()

<p style = 'font-size:16px;font-family:Arial'>Look at the underlying query by using the show_query() method.</p>

In [None]:
tdf.head().show_query()

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Aggregations</b>
<p style = 'font-size:16px;font-family:Arial'>Various aggregations are available for grouping, windowing, time series, etc.</p>

In [None]:
tdf_group =  tdf.select(["bedrooms", "price"])

In [None]:
# simple groupby:
tdf_group.groupby('bedrooms').sum()

<p style = 'font-size:16px;font-family:Arial'>The above output shows the sum of the price grouped by the number of bedrooms.</p>

In [None]:
tdf_count = tdf.select(["bedrooms", "id","bathrooms"])

In [None]:
# Groupby using the agg() method
#  Valid aggregation  values are 'count', 'sum', 'min', 'max', 'mean', 'std', 'percentile', 'unique','median', 'var'
tdf_count.groupby('bedrooms').agg(['count','min'])

<p style = 'font-size:16px;font-family:Arial'>The output shows the grouping by bedrooms.It showcases the count of all the properties with the specific number of bathrooms and the count and minimum of bathrooms</p>

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Transformations</b>
<p style = 'font-size:16px;font-family:Arial'>Assign method can be used to create new columns as the result of an expression.</p>

In [None]:
# As with Pandas, the method call returns an object of teradataml DataFrame
# Use self-assignment to create the column in the existing dataframe if desired:

tdf.assign(price_per_bed = tdf['price'] / tdf['bedrooms'])

<p style = 'font-size:16px;font-family:Arial'>Please scroll to the right most column to check the calculated column.</p>

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>SQL Functions</b>
<p style = 'font-size:16px;font-family:Arial'>teradataml supports the following categories of SQL functions with SQLAlchemy extension:</p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Aggregate Functions :-  Avg, Corr, Count, Covar_pop, Covar_samp etc. (total 23 functions)</li>
    <li>Arithmetic, Hyperbolic and Trigonometric Functions :- ABS, CASE_N, CEIL, DEGREES, RADIANS etc. (total 32 functions)</li>
<li>Bit Byte Manipulation Functions :- BITAND/OR/NOT/XOR, COUNTSET, GETBIT, ROTATELEFT, SETBIT etc (total 13 functions)</li>
<li>Built-In Functions :- CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP</li>
<li>Hash Related Functions :- HASHAMP, HASHBAKAMP, HASHBUCKET, HASHROW</li>
<li>Regular Expression Functions :- REGEXP_SUBSTR, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SIMILAR</li>
<li>String Functions :- ASCII, CHAR2HEXINT, CHR, CONCAT, EDITDISTANCE etc (total 27 functions)</li>
<li>Window Aggregate Functions :- CSUM, CUME_DIST, DENSE_RANK, FIRST_VALUE, LAST_VALUE etc.(total 18 functions)</li>
</ul>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-Package-for-Python-User-Guide/November-2021/teradataml-Extension-with-SQLAlchemy/Accessing-Vantage-SQL-Functions/Supported-SQL-Functions' >SQL Functions</a></p>


In [None]:
# Pearson Correlation Coefficient - in this example, for our data set,
# what's the correlation between price and square footage?

corr_func = func.corr(tdf['price'].expression, tdf['sqft_living'].expression)


#Setting drop_columns = True here
df_corr = tdf.assign(drop_columns = True, corr_ = corr_func)

print(df_corr)

<p style = 'font-size:16px;font-family:Arial'>Returns the Sample Pearson product moment correlation coefficient of its arguments for all non-null data point pairs. Here we are finding the correlation coefficient between price and square footage. </p>

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Joins</b>
<p style = 'font-size:16px;font-family:Arial'>Pandas-Style joins can be used to create SQL joins in Vantage</p>

In [None]:
# Create a new dataframe using a SQL statement

qry = '''
SELECT id,
CASE 
    WHEN waterfront = 0 THEN 'no'
    ELSE 'yes'
END str_waterfront
FROM "TRNG_DataScienceExploration"."HOUSE_PRICES"
'''
tdf_waterfront = DataFrame.from_query(qry)
tdf_waterfront.head()

<p style = 'font-size:16px;font-family:Arial'>The original dataframe (tdf) and the Dataframe created using the query above(tdf_waterfront) will be joined using the 'id' column. tdf_joined_data is the resultant dataframe.</p>

In [None]:
#join these two dataframes together

tdf_joined_data = tdf.join(tdf_waterfront, on = 'id', how = 'left', rsuffix = '_r', lsuffix = '_l')
tdf_joined_data.head()

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Bring the data to the client - Pandas</b>
<p style = 'font-size:16px;font-family:Arial'>The to_pandas() function creates a pandas DataFrame from a teradataml DataFrame. This method will selectively retrieve data to the client</p>

In [None]:
df = tdf_joined_data.to_pandas(all_rows = True)

In [None]:
tdf_joined_data.head()

In [None]:
tdf_joined_data.groupby('bedrooms').agg({'_l_id': 'count', 'price': 'sum'})

In [None]:
df['price'].plot(kind = 'hist', bins = 10)

<hr>
<b style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Cleanup</b>
<p style = 'font-size:16px;font-family:Arial'>It is a good practice to remove the context that we created to connect to Vantage. The remove_context function removes the current context associated with the Vantage connection.remove_context() not only closes the connection but also garbage collects the intermediate views and tables created by teradataml. Teradata recommends calling remove_context() to end a session, so that intermediate views and tables created by teradataml are garbage collected.</p>

In [None]:
remove_context()

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2023 Teradata. All Rights Reserved</footer>