<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       InDB Visualizations using teradataml
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial'>
Visualization is the process of representing data or information in graphical or visual formats such as charts, graphs, maps, and dashboards. The goal of visualization is to present complex data in a way that is easy to understand, allowing viewers to quickly grasp insights, patterns, and trends.<br>Visualizations leverage the human brain's ability to process visual information more efficiently than textual or numerical data alone. By encoding data into visual elements such as points, lines, bars, colors, and shapes, visualizations enable users to explore and interpret data intuitively.</p>
<p style = 'font-size:18px;font-family:Arial'><b>Business Values</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Explore data patterns intuitively.</li>
    <li>Communicate complex information effectively.</li>
    <li>Facilitate decision-making processes.</li>
    <li>Provide insights into trends and outliers.</li>
    <li>Engage stakeholders in data-driven discussions.</li>
 </ul>
<p style = 'font-size:18px;font-family:Arial'><b>Why Vantage?</b></p>  
<p style = 'font-size:16px;font-family:Arial'>
Enterprises often grapple with vast volumes of data, which can pose challenges when it comes to scaling up data visualization efforts. These challenges often manifest in issues like slow performance and difficulties in processing and interpreting large datasets. ClearScape Analytics's specialized methods "td_plot" offer solutions tailored to address these specific challenges.
<br>The "td_plot" method, streamlines large-scale visualization tasks by providing users with efficient tools to create visualizations directly within the Vantage platform. By eliminating the need for data movement, "td_plot" enhances efficiency and effectively tackles the hurdles associated with handling extensive datasets. This approach not only simplifies the visualization process but also ensures that insights can be gleaned swiftly without compromising on speed or performance. <br>
<p style = 'font-size:16px;font-family:Arial'>
In this functional demonstration, we will see all the visualizations possible in Clearscape Analytics.
<p style = 'font-size:16px;font-family:Arial'><b>Simple Plot</b>
<ul style = 'font-size:16px;font-family:Arial'>    
    <li>line plot</li>
    <li>bar plot</li>
    <li>scatter plot</li>
    <li>geometry plot</li>
    <li>correlation plot</li>
    <li>wiggle plot</li>
    <li>mesh plot</li>
    </ul>
<p style = 'font-size:16px;font-family:Arial'><b>Combine multiple plots</b>    
<ul style = 'font-size:16px;font-family:Arial'> 
    <li>Composite plot</li>
    <li>Subplot</li>
    </ul>
</p>    

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>1. Connect to Vantage</b></p>


<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries.</p> 

In [None]:
# Import.
import os
import getpass

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

from teradataml import *

<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ~/JupyterLabRoot/UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=InDB_Visualizations_using_teradataml.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:18px;font-family:Arial'> <b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
%run -i ~/JupyterLabRoot/UseCases/run_procedure.py "call get_data('DEMO_Plot_local');"
# takes about 50 seconds, estimated space: 2 MB
# %run -i ~/JupyterLabRoot/UseCases/run_procedure.py "call get_data('DEMO_Plot_cloud');"
# takes about 30 seconds, estimated space: 0 MB

<p style = 'font-size:16px;font-family:Arial'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ~/JupyterLabRoot/UseCases/run_procedure.py "call space_report();"

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>2. Line Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A line plot is a visualization that connects data points with straight lines, commonly used to depict trends or relationships between two continuous variables. It's particularly useful for showing changes over time or across ordered categories, providing a clear and intuitive representation of the data's behavior.<br>Let us take a look how we can create a line plot in Vantage. We will visualize a company's stock price over the period.</p>

In [None]:
df1 = DataFrame(in_schema("DEMO_Plot", "Stock_data"))
df1

In [None]:
plot = df1.plot(x=df1.period,
                y=df1.stockprice,
                title="Stock price over the period",)
plot.show()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>2. Bar Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A bar plot is a visualization that represents categorical data with rectangular bars, where the length or height of each bar corresponds to the frequency, count, or proportion of the category it represents. It's commonly used to compare values between different categories, making it easy to identify patterns, trends, and differences within the data.<br>Let us create a bar plot to visualize the rate of change in inflation for a country over 10 years.</p>

In [None]:
df2 = DataFrame(in_schema("DEMO_Plot", "Inflation"))
df2

In [None]:
df2 = df2[df2.countryid==1]
df2.plot(x=df2.year_recorded, 
        y=df2.inflation_rate, 
        kind="bar",
        title="Change in inflation over 10 years",
        color = "orange",
        xlabel="Year",
        ylabel="Inflation Rate",
        grid_linestyle="-",
        grid_linewidth= 0.5 
        )

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>3. Scatter Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A scatter plot is a visualization that displays individual data points as dots on a two-dimensional coordinate system, with one variable plotted on the x-axis and another variable plotted on the y-axis. It is commonly used to show the relationship or correlation between two continuous variables, allowing for the identification of patterns, trends, clusters, or outliers within the data. Scatter plots are valuable for visualizing the distribution and association between variables.<br>Let us create a scatter plot to visualize the Blood Pressure for different ages.</p>

In [None]:
df3 = DataFrame(in_schema("DEMO_Plot", "AgeandPressure"))
df3

In [None]:
df3.plot(x=df3.age, 
        y=df3.blood_pressure, 
        kind="scatter",
        color="red", 
        grid_color='grey',
        xlabel='Age', 
        ylabel='Blood Pressure',
        grid_linestyle="-",
        grid_linewidth= 0.5, 
        marker="o",
        markersize=7,
        title="Blood Pressure for different Ages")

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>4. Geometry Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>Geometry plot is a plot generated on GeoSpatial data or Geometry data, which is the geometry column in teradataml GeoDataFrame. Only the columns with ST_GEOMETRY type are allowed for generating geometry plot.<br>Let us create a geometry plot to visualize the population density for all the states across US in year 1990.<br>
<ul style = 'font-size:16px;font-family:Arial'>Data source:
    <li>Shapes of US states are generated from Free Blank United States Map in SVG - Resources | Simplemaps.com</li>
    <li>Population data is accessed from Historical Population Change Data (1910-2020) (census.gov)</li>
    </ul></p>

In [None]:
us_population = DataFrame(in_schema("DEMO_Plot", "US_Population"))
us_population

In [None]:
us_states_shapes = GeoDataFrame(in_schema("DEMO_Plot", "US_States_Shapes"))
us_states_shapes

In [None]:
us_states_shapes.tdtypes

In [None]:
# Join shapes with population and filter only 1990 data.
population_data = us_states_shapes.join(us_population,
                                        on=us_population.state_name == us_states_shapes.state_name,
                                        lprefix="us",
                                        rprefix="t2")
population_data = population_data.select(["us_state_name", "state_shape", "population_year", "population"])
df4 = population_data[population_data.population_year == 1990]
df4

In [None]:
figure = Figure(width=1550, height=860)
# Set heading for Figure.
figure.heading = "Geometry Plot"

In [None]:
plot_1990 = df4.plot(y=(df4.population, df4.state_shape),
                       cmap='rainbow',
                       figure=figure,
                       reverse_yaxis=True,
                       title="Population Density in US for the year 1990",
                       xlabel="",
                       ylabel="")
plot_1990.show()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>5. Correlation Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A correlation plot visualizes the pairwise relationships between variables in a dataset, typically displayed as a matrix of correlation coefficients. It helps identify patterns of association, showing the strength and direction of linear relationships between variables at a glance.<br>Let us create a correlation plot to visualize the correlation of one variable on the output value.</p>

In [None]:
df5 = DataFrame(in_schema("DEMO_Plot", "ACF"))
df5

In [None]:
df5.plot(x=df5.ROW_I, 
        y=(df5.OUT_v, df5.CONF_OFF_v),
        kind='corr', 
        color="orange",
        xlabel="Row_Id", 
        ylabel="computed_autocorelation_confidence_bands",
        title="ACF Plot"
        )

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>6. Wiggle Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A wiggle plot is a type of visualization used in geology and seismic exploration to represent subsurface structures and seismic data. It displays seismic waveforms vertically, with each trace shifted horizontally relative to the previous one, giving a "wiggling" appearance. This technique helps geologists interpret subsurface features and identify geological formations, faults, and other structural elements.<br>Let us create a wiggle plot on sample wavelet data.</p>

In [None]:
df6 = DataFrame(in_schema("DEMO_Plot", "Wavelet"))
df6

In [None]:
df6.plot(x=df6.x, y=df6.y, scale=df6.c, kind='wiggle')

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>7. Mesh Plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A Mesh plot in Vantage can be used to display matrix in an image form.<br>Let us create a mesh plot on sample wavelet data.</p>

In [None]:
df6

In [None]:
plot = df6.plot(x=df6.x,
               y=df6.y,
               scale=df6.c,
               kind='mesh',
               cmap='matter',
               vmin=-0.5,
               vmax=0.5)
plot.show()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>8. Combining multiple Plots - Composite plot</b></p>
<p style = 'font-size:16px;font-family:Arial'>With ClearScape Analytic's plot method we can combine multiple plot in a single image. A composite plot is a visualization that combines multiple individual plots or charts into a single cohesive display. It allows for the simultaneous presentation of different types of data or multiple perspectives on the same dataset.<br>Let us create a composite plot for comparing domestic passengers vs international passengers for an airline company.</p>

In [None]:
df7 = DataFrame(in_schema("DEMO_Plot", "US_Air_Pass"))
df7

In [None]:
df7.plot(x=df7.TD_TIMECODE, 
         y=[df7.international, df7.domestic],
         title="Domestic passengers vs International passengers",
         xlabel="year-month",
         ylabel="passenger count in million"         
        )

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>9. Combining multiple Plots - Subplot</b></p>
<p style = 'font-size:16px;font-family:Arial'>A subplot is a smaller plot or chart that is embedded within a larger plot or visualization, typically organized in a grid-like layout. Subplots allow for the simultaneous display of multiple views or aspects of the data within the same figure. They are commonly used to compare different datasets, highlight specific features, or present related information side by side, enhancing the overall clarity and comprehensiveness of the visualization.<br>Let us create subplots for changes in population density in US across four decades.</p>

In [None]:
from teradataml import subplots
fig, axis = subplots(2, 2)
fig.height = 1200
fig.heading = "Change in population density in US across four decades."
axis

In [None]:
us_population

In [None]:
us_states_shapes

In [None]:
# Join shapes with population and filter only 1990 data.
population_data = us_states_shapes.join(us_population,
                                        on=["state_name"],
                                        lprefix="us",
                                        rprefix="t2")
population_data = population_data.select(["us_state_name", "state_shape", "population_year", "population"])

In [None]:
population_data

In [None]:
# Find out the minimum and maximum population. This helps in coloring the plot.
population_data.assign(min_population=population_data.population.min(), max_population=population_data.population.max(), drop_columns=True)

In [None]:
population_data_2020 = population_data[population_data.population_year == 2020]
population_data_2010 = population_data[population_data.population_year == 2010]
population_data_2000 = population_data[population_data.population_year == 2000]
population_data_1990 = population_data[population_data.population_year == 1990]

# Generate subplot.
# Plot population_data_1990 on first axis.
plot_1990 = population_data_1990.plot(y=(population_data_1990.population, population_data_1990.state_shape),
                                      cmap='rainbow',
                                      figure=fig,
                                      ax=axis[0],
                                      reverse_yaxis=True,
                                      vmin=55036.0,
                                      vmax=39538223.0,
                                      title="US 1990 Population",
                                      xlabel="",
                                      ylabel="")

# Plot population_data_2000 on second axis.
plot_2000 = population_data_2000.plot(y=(population_data_2000.population, population_data_2000.state_shape),
                                      cmap='rainbow',
                                      figure=fig,
                                      ax=axis[1],
                                      reverse_yaxis=True,
                                      vmin=55036.0,
                                      vmax=39538223.0,
                                      title="US 2000 Population",
                                      xlabel="",
                                      ylabel="")

# Plot population_data_2010 on third axis.
plot_2010 = population_data_2010.plot(x=population_data_2010.population_year,
                                      y=(population_data_2010.population, population_data_2010.state_shape),
                                      cmap='rainbow',
                                      figure=fig,
                                      ax=axis[2],
                                      reverse_yaxis=True,
                                      vmin=55036.0,
                                      vmax=39538223.0,
                                      title="US 2010 Population",
                                      xlabel="",
                                      ylabel="",
                                      xtick_values_format="")

# Plot population_data_2020 on fourth axis.
plot = population_data_2020.plot(x=population_data_2020.population_year,
                                 y=(population_data_2020.population, population_data_2020.state_shape),
                                 cmap='rainbow',
                                 figure=fig,
                                 ax=axis[3],
                                 reverse_yaxis=True,
                                 vmin=55036.0,
                                 vmax=39538223.0,
                                 title="US 2020 Population",
                                 xlabel="",
                                 ylabel="",
                                 xtick_values_format="")

In [None]:
plot.show()

<p style = 'font-size:16px;font-family:Arial'>Let us look at one more example to create subplots showcasing the the company performance across different quarters and years.</p>

In [None]:
df8 = DataFrame(in_schema("DEMO_Plot", "Finance_Data"))
df8

In [None]:
from teradatasqlalchemy import DATE
from teradataml.dataframe.sql import case_when
c = case_when((df8.period.right(2).expression == 'q1', df8.period.left(4).expression+"-01-01"), 
              (df8.period.right(2).expression == 'q2', df8.period.left(4).expression+"-04-01"),
              (df8.period.right(2).expression == 'q3', df8.period.left(4).expression+"-07-01"),
              (df8.period.right(2).expression == 'q4', df8.period.left(4).expression+"-10-01"))
df8.assign(investment_date=c.cast(DATE()))

In [None]:
from sqlalchemy import func
df8=df8.assign(investment_date=c.cast(DATE()))

In [None]:
df_1980 = df8[(df8.id==3) & (func.to_char(df8.investment_date.expression, 'YYYY') == '1980')].select(["investment_date", "investment", "expenditure", "income"])
df_1981 = df8[(df8.id==3) & (func.to_char(df8.investment_date.expression, 'YYYY') == '1981')].select(["investment_date", "investment", "expenditure", "income"])
df_1982 = df8[(df8.id==3) & (func.to_char(df8.investment_date.expression, 'YYYY') == '1982')].select(["investment_date", "investment", "expenditure", "income"])
df_all = df8[(df8.id==3) & ((func.to_char(df8.investment_date.expression, 'YYYY') == '1980') | 
                         (func.to_char(df8.investment_date.expression, 'YYYY') == '1981') | 
                         (func.to_char(df8.investment_date.expression, 'YYYY') == '1982'))].select(["investment_date", "investment", "expenditure", "income"])

In [None]:
fig, axes = subplots(grid={(1, 1): (1,1), (1, 2): (1, 1), (1, 3): (1, 1), (2, 1): (1, 3)})

In [None]:
axes

In [None]:
# Plot 1980 data at first Axis.
plot = df_1980.plot(x=df_1980.investment_date, 
                    y=[df_1980.investment, df_1980.expenditure, df_1980.income],
                    kind="bar",
                    title="Financial overview of the company for all quarters in year 1980",
                    legend=["Investment", "Expenditure", "Income"],
                    xlabel="investment_month",
                    xtick_format='MM',
                    figure=fig,
                    ax=axes[0])

# Plot 1981 data at second Axis.
plot = df_1981.plot(x=df_1981.investment_date, 
                    y=[df_1981.investment, df_1981.expenditure, df_1981.income],
                    kind="bar",
                    title="Financial overview of the company for all quarters in year 1981",
                    legend=["Investment", "Expenditure", "Income"],
                    xlabel="investment_month",
                    xtick_format='MM',
                    figure=fig,
                    ax=axes[1])

# Plot 1982 data at third Axis.
plot = df_1982.plot(x=df_1982.investment_date, 
                    y=[df_1982.investment, df_1982.expenditure, df_1982.income],
                       kind="bar",
                       title="Financial overview of the company for all quarters in year 1982",
                       legend=["Investment", "Expenditure", "Income"],
                       xlabel="investment_month",
                       xtick_format='MM',
                       figure=fig,
                       ax=axes[2])

# Plot all 3 years of data at third Axis.
plot = df_all.plot(x=df_all.investment_date, 
                   y=[df_all.investment, df_all.expenditure, df_all.income],
                   kind="line",
                   title="Financial overview of the company for all 3 years",
                   legend=["Investment", "Expenditure", "Income"],
                   figure=fig,
                   ax=axes[3])

In [None]:
plot.figure.heading = "Financial overview of a company"

In [None]:
plot.show()

<p style = 'font-size:20px;font-family:Arial'><b>Conclusion</b></p>
<p style = 'font-size:16px;font-family:Arial'>In this functional demo we have seen how we plot various plots InDB in Vantage using ClearScape Analytics without moving the data outside of the database. </p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>10. Cleanup</b></p>

<p style = 'font-size:18px;font-family:Arial'><b>Databases and Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
%run -i ~/JupyterLabRoot/UseCases/run_procedure.py "call remove_data('DEMO_Plot');" 
#Takes 10 seconds

In [None]:
remove_context()

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>