<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       NPath function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>The nPath function scans a set of rows, looking for patterns that we specify. For each set of input rows that matches the pattern, nPath produces a single output row. nPath is useful when our goal is to identify the paths that lead to an outcome. For example, we can use nPath to analyze:<ul style = 'font-size:16px;font-family:Arial'><li>
    In Web site click data, to identify paths that lead to sales over a specified amount</li><li>
    In Sensor data from industrial processes, to identify paths to poor product quality</li><li>
In Healthcare records of individual patients, to identify paths that indicate that patients are at risk of developing conditions such as heart disease or diabetes</li><li>
In Financial data for individuals, to identify paths that provide information about credit or fraud risks.</ul><p style = 'font-size:16px;font-family:Arial'>In this notebook we will see how we can use the nPath function available in Vantage.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_NPath_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_Financial_cloud');"        # Takes 30 seconds
#%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_Financial_local');" 

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
tdf = DataFrame(in_schema('DEMO_Financial', 'Customer_journey'))
print("Shape of the data: ", tdf.shape)
tdf

In [None]:
tdf.tdtypes

<p style = 'font-size:16px;font-family:Arial'>Detailed help can be found by passing function name to built-in help function. </p>

In [None]:
help(NPath)

<p style = 'font-size:16px;font-family:Arial'>Let us check the 'interaction_type' we have in our data.</p>

In [None]:
events=tdf.groupby(['interaction_type']).count()
events

In [None]:
events.shape

<p style = 'font-size:16px;font-family:Arial'>From the results above we have 18 types of interactions that we are tracking.</p>

<p style = 'font-size:16px;font-family:Arial'>
    <b> Paths leading to Account Closed for 'Gold' customers.</b><p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Pass the filtered dataset.</li>
    <li>Provide partitioning (customer_identifier key) and ordering columns.</li>
    <li>Mode <b>OVERLAPPING</b> vs. <b>NONOVERLAPPING</b>
        <ul style = 'font-size:16px;font-family:Arial'>
            <li><b>OVERLAPPING</b> finds every occurrence of the match, regardless of the current row being part of a previous match.</li>
            <li><b>NONOVERLAPPING</b> starts matching again at the row that follows the previous match.
        </ul>
    </li>
    <li>Symbols.  Create a set of column expression aliases that can be assembled into a pattern to match.
        <ul style = 'font-size:16px;font-family:Arial'>
            <li>Example: "interaction_type = 'ACCOUNT_CLOSED' as B" will alias a match on the EVENT column when the content equals 'ACCOUNT_CLOSED'.</li>
        </ul>
    </li>
      <li>Pattern.  Compose a pattern to search for across the rows of events.  This pattern is composed of Symbols and directives.
        <ul style = 'font-size:16px;font-family:Arial'>
            <li>Example: '^B' uses a directive ^ to indicate the B Symbol must occur at the beginning of the group of rows</li>
        </ul>
    </li>
    <li>Result.  Since nPath emits a single row per group-of-row matches, Result indicates what columns make up this row and how to aggregate the data.</li>
    </ol>

<p style = 'font-size:16px;font-family:Arial'> First let us convert the interaction_type from Unicode to Latin as nPath function works on Latin characterset.</p>

In [None]:
from teradataml import ConvertTo
converted_data = ConvertTo(data = tdf,
                           target_columns = ['interaction_type'],
                           target_datatype = ["VARCHAR(charlen=100,charset=LATIN,casespecific=NO)"])
convert_tdf=converted_data.result
convert_tdf.to_sql('convert_tdf')

In [None]:
tdf_1 = DataFrame('convert_tdf')
tdf_gold = tdf_1[tdf_1.customer_type == 'Gold']

npath_sessions = NPath(data1 = tdf_gold, 
                       data1_partition_column = ['customer_identifier'], 
                       data1_order_column = 'interaction_timestamp', 
                       mode = 'NONOVERLAPPING', 
                       symbols = ['True as A', 'interaction_type in (\'ACCOUNT_CLOSED\') as B'], 
                       pattern = 'A{2,5}.B',  
                       result = ['ACCUMULATE(interaction_type OF ANY(A,B)) AS interaction_type_list',
                                 'COUNT (* OF ANY (A,B))  AS click_depth',
                                 'FIRST(customer_identifier OF B) AS customer_identifier',
                                 'FIRST(product_category OF B) AS product_category'])

npath_df=npath_sessions.result
npath_df

In [None]:
print(npath_sessions.show_query())

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
db_drop_table(table_name = 'convert_tdf', schema_name = 'demo_user')

In [None]:
%run -i ../../UseCases/run_procedure.py "call remove_data('DEMO_Financial');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

`Customer_Journey`

- `customer_skey`: customer key
- `customer_identifier`: unique customer identifier
- `customer_cookie`: cookie placed on customers device
- `customer_online_id`: boolean - does the customer have an online account
- `customer_offline_id`: customer account number
- `customer_type`: is this a high value customer or just a visitor browsing the website?
- `customer_days_active`: how long has the customer been active
- `interaction_session_number`: session identifier
- `interaction_timestamp`: timestamp for this event
- `interaction_source`: channel this event is from (online / offline, in branch etc.)
- `interaction_type`: type of event
- `sales_channel`: channel a sales event was in
- `conversion_id`: sales conversion identifier
- `product_category`: what type of product the event concerned (checking, savings, cd etc..)
- `product_type`: unused
- `conversion_sales`: unused
- `conversion_cost`: unused
- `conversion_margin`: unused
- `conversion_units`: unused
- `marketing_code`: marketing identifier
- `marketing_category`: marketing channel (inbranch, website, email etc..)
- `marketing_description`: marketing campaign name
- `marketing_placement`: specific marketing outlet (Google, Bloomberg.com etc..)
- `mobile_flag`: boolean was on a mobile device
- `updt`: unused

<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>NPath function reference: <a href = 'https://docs.teradata.com/search/all?query=npath&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>