<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
      tdplyr R Basics
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>
<p style = 'font-size:16px;font-family:Arial'>The Teradata Package for R (tdplyr) is an open-source R library package that combines the benefits of open-source R language environment with the massive parallel processing capabilities of Teradata Vantage, which includes the Machine Learning Engine analytic functions and the Analytics Database in-database analytic functions. Teradata Package for R allows users to develop and run R programs that take advantage of the Big Data and Machine Learning analytics capabilities of Vantage, and can be used in conjunction with open source R capabilities. Moreover, the tdplyr package conforms and works with the functions of the dbplyr package and most of the verbs of the dplyr package.</p>
  
<p style = 'font-size:16px;font-family:Arial'>This notebook will cover the very basics of the tdplyr package and is a technical demonstration of different functionalities of Teradataml. This is not a business outcome type demo.  Please see the Getting Started Guide online <a href = 'https://docs.teradata.com/search/all?query=Introduction+to+Teradata+Package+for+R&filters=prodname~%2522Teradata+Package+for+R%2522&content-lang=en-US'>here.</a></p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>1. Install and load the necessary Packages and Libraries</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Installing R only needs to be done once for this environment.
<br><br>
To install the package, copy this line into the clipboard:
<blockquote><i>
Rscript -e "install.packages('tdplyr',repos=c('https://r-repo.teradata.com','https://cloud.r-project.org'))"
</i></blockquote> </p>
<p style = 'font-size:16px;font-family:Arial'>Open a terminal window by selecting File --> New --> and the select Terminal.  Then paste the line at the Command Prompt and press Enter.  This will take a few minutes.  It's finished when you are returned to the <b>(base) jovyan@de76f2e68a54:~/JupyterLabRoot$ </b> Command Prompt.
</p>

    

In [None]:
suppressWarnings({
library(tdplyr)
library(dbplyr)
library(dplyr)
library(DBI)    
    })

<p style = 'font-size:16px;font-family:Arial'>Help function will list all the documentation related to tdplyr package.</p>

In [None]:
help(package='tdplyr')

<p style = 'font-size:16px;font-family:Arial'>Below code is used to check the tdplyr package version installed.</p>

In [None]:
packageVersion("tdplyr")

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, then use down arrow to go to next cell.</p>

In [None]:
con <- td_create_context(host = 'host.docker.internal', uid = "demo_user", pwd = getPass("Enter your password: "), dType = "NATIVE", logmech = "TD2")

<p style = 'font-size:16px;font-family:Arial'>Set the queryband, below code will set the queryband for the session.</p>

In [None]:
dbExecute(con,"SET query_band='DEMO=tdplyr_R_Basics.ipynb;' UPDATE FOR SESSION;") 

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>3. Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage.  You have the option of either running the demo using foreign tables to access the data without using any storage on your environment or downloading the data to local storage which may yield somewhat faster execution, but there could be considerations of available storage.  There are two statements in the following cell, and one is commented out.  You may switch which mode you choose by changing the comment string.</p>

In [None]:
dbExecute(con,"call get_data('DEMO_DataScienceExploration_local');") 
#takes 2min to load the data

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>4. Creating a tibble from a Table in Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>The tbl() function of the "dplyr" package creates a tdplyr tibble from an existing table in a database in Vantage.<br>
A tdplyr tibble is an R table that resembles an R tibble, that is, a form of data frame. The difference between a tdplyr tibble and an R tibble is that the output object of tbl() function shows that a tdplyr tibble is a remote source in a database in Vantage.</p>

In [None]:
tdf <- tbl(con, in_schema("DEMO_DataScienceExploration", "House_Prices"))
tdf

<p style = 'font-size:16px;font-family:Arial'>glimpse function will give the description of columns in dataframe/tibble.</p>

In [None]:
glimpse(tdf)

<p style = 'font-size:16px;font-family:Arial'>Below code can be used to get the total row count of the tibble</p>

In [None]:
td_nrow(tdf)

<p style = 'font-size:16px;font-family:Arial'>head will extract first few rows of the tibble.</p>

In [None]:
tdf %>% head()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>5. Aggregations</b>
<p style = 'font-size:16px;font-family:Arial'>Various aggregations are available for grouping, windowing, time series, etc.</p>

In [None]:
tdf %>% group_by(bedrooms) %>% summarize(mean = mean(price), n = n())

<p style = 'font-size:16px;font-family:Arial'>
<i>mutate(price_per_bed = price / bedrooms) </i><br>
is used to create a new column price_per_bed in the tdf table, which contains the result of dividing the price column by the bedrooms column.</p>

In [None]:
tdf <- tdf %>% mutate(price_per_bed = price / bedrooms)

<p style = 'font-size:16px;font-family:Arial'>
Print all columns and a specified number of rows (e.g., 10)

In [None]:
print(tdf, n = 10, width = Inf)

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>6. Cleanup</b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
dbExecute(con,"call remove_data('DEMO_DataScienceExploration');") 

<hr style="height:1px;border:none;">
<p style = 'font-size:16px;font-family:Arial'>It is a good practice to remove the context that we created to connect to Vantage. The remove_context function removes the current context associated with the Vantage connection.remove_context() not only closes the connection but also garbage collects the intermediate views and tables created by tdplyr. Teradata recommends calling remove_context() to end a session, so that intermediate views and tables created by tdplyr are garbage collected.</p>

In [None]:
td_remove_context()

<p style = 'font-size:20px;font-family:Arial'><b>Downloading and Installing the Teradata R package</b></p>
<p style = 'font-size:16px;font-family:Arial'>The Teradata Package for R, tdplyr, is available from Teradata GitHub site <a href = 'https://github.com/Teradata/tdplyr'>here.</a> <br>
The following command should be run from a Terminal (for Linux and Mac) or Command Prompt (for Windows).  This will download and install the tdplyr package along with the dependent packages, from Amazon AWS and CRAN repository.  
<blockquote><i>
Rscript -e "install.packages('tdplyr',repos=c('https://r-repo.teradata.com','https://cloud.r-project.org'))"
</i></blockquote> </p>

    

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>