<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Behavioral Analysis and Visualization using Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customer behavior varies from industry to industry and company to company, meaning there isn’t one easy solution. Although it seems complex, Teradata Vantage and ClearScape Analytics can bring some clarity to the customers’ behavior. Keep in mind that customer activity can include website click data, healthcare records, or financial data. Users will need to combine all this data to get a full picture of the customers’ experience. Using pathing analytics, businesses can understand the common paths that customers take that lead to a variety of outcomes, such as sales conversion, cart abandonment, or product searches. When businesses use Vantage to analyze all their data at scale, they have the chance to increase customer satisfaction and conversion rates.</p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Business Value</b></p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Using website click data, users can identify customers’ actions that lead to sales over a specified amount.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>With sensor data from industrial processes, users can identify poor product quality.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Healthcare records of individual patients will help hospitals identify paths that indicate that patients are at risk of developing conditions such as heart disease or diabetes.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Using financial data from individual accounts can help identify patterns of fraud or credit risks.</li>
    </p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Why Vantage?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Vantage has unique analytic capabilities for understanding customer and user behavior over time. In fact, these analytic techniques can be applied at massive scale to derive more accurate results. Then, these results can be combined with other analytics to create more advanced and accurate prediction models. Vantage also allows organizations to scale these models horizontally, by training segmented models per region, user type, etc., or vertically, by combining data from millions or billions of interactions. All of this can be deployed operationally to understand and predict actions in real-time.</p> 
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
nPath® is useful when your goal is to identify the paths that lead to an outcome. The nPath function scans a set of rows, looking for patterns that you specify. For each set of input rows that matches the pattern, nPath produces a single output row. The function provides a flexible pattern-matching capability that lets you specify complex patterns in the input data and define the values that are output for each matched input set. </p>    


<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>1. Connect to Vantage,  import python packages and explore the dataset</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let us start with importing the required libraries, set environment variables and connect to Vantage.</p>

In [None]:
#import libraries
import getpass
import warnings


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


from teradataml import *


display.max_rows = 5

warnings.filterwarnings('ignore')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell. Begin running steps with Shift + Enter keys. </p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=BehavioralAnalysis_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys.</p>

<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>2. Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.
</p>

In [None]:
%run -i ../run_procedure.py "call get_data('DEMO_Retail_cloud');"
# takes about 1 minute, estimated space: 0 MB
# %run -i ../run_procedure.py "call get_data('DEMO_Retail_local');"
# takes about 2 minutes, estimated space: 23 MB

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>3. Analyze the raw data set</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
Source events data may come from other source systems, log files, Object Storage, etc.  Let us start by analyzing the customer event data.

In [None]:
tdf_retail_events = DataFrame(in_schema('DEMO_Retail', 'Retail_Events'))
tdf_retail_events

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the sample data we can see shows the events in the table we have linked in the dataframe.

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
In this notebook we will use two powerful behavioral analysis functions available in Vantage:
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>    
 <li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Sessionize</b> Which will group a series of events into a keyed (by number) session.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>nPath</b> Sophisticated pattern matching function to analyze and collect data across rows.</li>
</ol>    

<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>4. Sessionize Function</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The Sessionize function maps each click in a session to a unique session identifier. A session is a sequence of clicks by one user that are separated by at most n seconds. The function is useful for both sessionization and detecting web crawler (bot) activity. A typical use is to understand user browsing behavior on a website.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Sessionization is the process of identifying and grouping together all the interactions a user has with a website or application during a single visit, typically based on the user's activity within a certain timeframe, such as prior to 30 minutes of inactivity.
</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Clickstream data refers to the sequence of clicks and other user interactions with a website or application. Sessionization of clickstream data involves analyzing this data to identify each user's session, so you can better understand how users are interacting with the site and make improvements to user experience and engagement.</p>
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In our case we are taking a duration of 24 hours for our session and observing the user behavior in this time.</p>

In [None]:
# Call the Sessionize function.  This function has several required parameters:
# data_partition_column - unique identifier of the user or entity we consolidate events for.
# data_order_column - the column or list of columns to use to order the sessions.
# time_column - column to apply the time boundary around to create a "session"
# time_out - duration in seconds to mark rows as a single session, 24 hours as example below, float.
# function returns an instance of the "Sessionize" object.  The "result" property is the teradata dataframe (virtual dataframe)

sessionized_events = Sessionize(data = tdf_retail_events, 
                               data_partition_column = ['entity_id'], 
                               data_order_column = ['datestamp'], 
                               time_column = 'datestamp', 
                               time_out = 86400.00)

sessionized_events.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the data returned above we can see that the function has assigned a sessionid on the events based on the parameter(time_out value) we have given

In [None]:
#commit our sessionized results to a permanent table:
tdf_sessionized_events = sessionized_events.result
tdf_sessionized_events.to_sql(table_name = 'demo_sessionized_events', if_exists = 'replace')

<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>5. nPath Function</b>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The nPath function scans a set of rows, looking for patterns that you specify. For each set of input rows that matches the pattern, nPath produces a single output row. The function provides a flexible pattern-matching capability that lets you specify complex patterns in the input data and define the values that are output for each matched input set.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <b> Paths leading to Cancellation.</b><p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Pass the sessionized data by reference.</li>
    <li>Provide partitioning (session key) and ordering columns.</li>
    <li>Mode <b>OVERLAPPING</b> vs. <b>NONOVERLAPPING</b>
        <ul style = 'font-size:16px;font-family:Arial'>
            <li><b>OVERLAPPING</b> finds every occurrence of the match, regardless of the current row being part of a previous match.</li>
            <li><b>NONOVERLAPPING</b> starts matching again at the row that follows the previous match.
        </ul>
    </li>
    <li>Symbols.  Create a set of column expression aliases that can be assembled into a pattern to match.
        <ul style = 'font-size:16px;font-family:Arial'>
            <li>Example: "EVENT = 'Mem Purchase' as P" will alias a match on the EVENT column when the content equals 'Mem Purchase'.</li>
        </ul>
    </li>
      <li>Pattern.  Compose a pattern to search for across the rows of events.  This pattern is composed of Symbols and directives.
        <ul style = 'font-size:16px;font-family:Arial'>
            <li>Example: '^P' uses a directive ^ to indicate the P Symbol must occur at the beginning of the group of rows</li>
        </ul>
    </li>
    <li>Result.  Since nPath emits a single row per group-of-row matches, Result indicates what columns make up this row and how to aggregate the data.</li>
    </ol>
    

In [None]:
#Create two symbols and assemble them with directives:
# 1. True as A - matches any row
# 2. EVENT Column match the string 'Mem Cancel' as B
# Pattern directs a range of any row (A) between 2 and 5 times preceding 'Mem Cancel' (B) - A{2,5}.B

npath_mem_cancel = NPath(data1 = tdf_sessionized_events, 
                      data1_partition_column = ['SESSIONID'], 
                      data1_order_column = 'datestamp', 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['True as A', 'EVENT in (\'Mem Cancel\') as B'], 
                      pattern = 'A{2,5}.B', 
                      result = ['FIRST (entity_id OF A) AS entity_id', 
                               'FIRST (sessionid OF A) AS sessionid', 
                               'ACCUMULATE (cast(event as VARCHAR(50) CHARACTER SET UNICODE NOT CASESPECIFIC) OF ANY(A,B)) AS path', 
                               'COUNT (* OF ANY (A,B)) AS event_cnt'])

npath_mem_cancel.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
Here we can clearly see that the nPath function has created a Path the customer took to the final event that is cancellation of the membership(Mem Cancel) as we have mentioned in the function.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <b> Paths leading to Purchase.</b><p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Since nPath emits a single row per match, it greatly reduces the number of rows returned from the function call.  Below we construct the statement to match sessions where either 'Membership Purchase' or 'Product Purchase' occurred after a series of prior actions of at least one action and no more than five actions:
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>    
   <li style = 'font-size:16px;font-family:Arial;color:#00233C'>Create Three Symbols:
       <ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
           <li style = 'font-size:16px;font-family:Arial;color:#00233C'>MP: Membership Purchase</li>
           <li style = 'font-size:16px;font-family:Arial;color:#00233C'>PP: Product Purchase</li>
           <li style = 'font-size:16px;font-family:Arial;color:#00233C'>A: Match any row not Membership Purchase or Product Purchase</li>
       </ul>
    </li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'>Assemble the Symbols into a Pattern using directives to match any A event between one and five times preceding MP OR PP:
        <ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
            <li style = 'font-size:16px;font-family:Arial;color:#00233C'> A{1,5}.(PP|MP)</li>
        </ul>
    </li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'>Return the sessionid, path, and number of steps</li>
    </ol>

In [None]:
demo_sessionized_events = DataFrame('demo_sessionized_events')

npath_purchase = NPath(data1 = demo_sessionized_events, 
                      data1_partition_column = ['SESSIONID'], 
                      data1_order_column = 'datestamp', 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['EVENT = \'Purchase\' AS PP', 'EVENT in (\'Mem Purchase\') as MP', 'EVENT not in (\'Purchase\', \'Mem Purchase\') AS A'], 
                      pattern = 'A{1,5}.(PP|MP)', 
                      result = ['FIRST (datestamp of A) AS start_time', 
                                'FIRST (entity_id of A) as entity_id',
                                'FIRST (sessionid of ANY(MP, A, PP)) as sessionid', 
                                'COUNT (* of ANY(MP, A, PP)) as event_cnt', 
                                'ACCUMULATE (cast(event as VARCHAR(50) CHARACTER SET UNICODE NOT CASESPECIFIC) OF ANY(MP, A, PP)) AS path', 
                               ])

npath_purchase.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> Here, we can see that the nPath function calculates and displays the path customer took to our final event (Purchase or Mem Purchase) as mentioned in the Pattern parameter of the function.

<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>6. Analysis and Visualization</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'> We can perform analysis on the data in-database and visualize the results by plotting the graphs and paths</p>

In [None]:
#Operate on the data as it lies in the database, and only retrieve the result of the aggregation

npath_mem_cancel.result.groupby(['path']).count().sort(['count_sessionid'], ascending = False)

In [None]:
npath_mem_cancel_plot = npath_mem_cancel.result.groupby(['event_cnt']).count().sort(['count_sessionid'], ascending = False)
# npath_mem_cancel_plot = npath_mem_cancel_plot.select(['count_event_cnt','path']).groupby('count_event_cnt').count()
npath_mem_cancel_plot

In [None]:
plot =  npath_mem_cancel_plot.plot(x=npath_mem_cancel_plot.event_cnt, y=npath_mem_cancel_plot.count_path,
                                 kind='bar',xlabel = 'Event count in path', yabel = 'Number of Path', 
                                 heading="Number of events in a Path", figsize=(600, 400))
 
# Display the plot.
plot.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> In our nPath function we have used the pattern where final event is 'Mem Cancel', The above histogram shows the number of events in the path where the final event is Mem Cancel.

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.1  Sankey Charts</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'> In order to visualize the distribution of the different path of events, we typically use Sankey diagram of the aggregated over the paths reported by the NPATH command.


In [None]:
from tdnpathviz.visualizations import plot_first_main_paths

In [None]:
plot_first_main_paths(npath_purchase.result,path_column='path',id_column='entity_id',width=1100)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> To check the details of any path or node we can move the mouse pointer over it and check details. For example if you move the pointer over the dark Green path having the largest width and going towards the right most node(Purchase).The number/count shows there number of entities who followed that path starting from Product Return ---> Purchase.<br>
When the pointer is moved over a Node, for example when the pointer is on the long purple Node at the right top Product Return it shows incoming flow count and outgoing flow count. Incoming flow count means the number of different event which led to the event in consideration and outgoing flow count the number of different event after this event. Similarly other nodes and paths can be analyzed.
<p style = 'font-size:16px;font-family:Arial;color:#00233C'> This visualization takes the input from Teradata nPath output. Here also we can see the events customer took to his final event of 'Purchase' or 'Mem Purchase'(membership purchase). </p>


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Conclusion</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Thus, with Teradata Vantage and ClearScape Analytics we can bring some clarity to the complex analysis of customers’ behavior. Using pathing analytics, we can understand the common paths that customers take that lead to a variety of outcomes, such as sales conversion, cart abandonment, or product searches. Using Vantage to analyze all our data at scale, we have the chance to increase customer satisfaction and conversion rates.</p>

<hr style="height:2px;border:none;background-color:#00233C;">

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>7. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Work tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We need to clean up our work tables to prevent errors next time.</p>

In [None]:
db_drop_table(table_name='demo_sessionized_events') 

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Database and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_Retail');" 
#Takes 10 seconds
#Please note that the same data is used in UseCases/TextProcessing_TF_IDF notebooks

In [None]:
remove_context()

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Required Materials</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let’s look at the elements we have available for reference for this notebook:</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://docs.teradata.com/reader/eteIDCTX4O4IMvazRMypxQ/uDjppX7PJInABCckgu~KFg'>Teradata Python Package User Guide</a></li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://docs.teradata.com/reader/GsM0pYRZl5Plqjdf9ixmdA/MzdO1q_t80M47qY5lyImOA'>Teradataml Python Reference</a></li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://docs.teradata.com/reader/CWVY0AJy8wyyf7Sm0EsK~w/wjkE42ypEfeMkRFOIqVXfQ'>Teradata nPath Function Reference</a></li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://docs.teradata.com/rear/CWVY0AJy8wyyf7Sm0EsK~w/RNbOiUg9~r~cxSZHrR~sFQ'>Teradata Sessionize Function Reference</a></li>
        <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://pandas.pydata.org/docs/user_guide/index.html'>Python Pandas Reference</a></li>
        <li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://plotly.com/'>Plotly Reference</a></li>
</ul>


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Filters: </b></p>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Industry:</b> Retail</li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Functionality:</b> Path Analytics</li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Use Case:</b> Digital Customer Conversion</li></p>
    <p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Related Resources:</b></p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://teradata.seismic.com/Link/Content/DCGBP9J9gjD288TPcG3HFgXDHDW8'>Broken Digital Journeys CX Solution Accelerator Demo via Python Video - External - SP004183</a></li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://www.teradata.com/Blogs/Customer-360-Analytics-What-Lies-Ahead'>Customer 360 Analytics, What Lies Ahead?</a></li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://www.teradata.com/Trends/Data-Analytics#:~:text=Data%20Analytics-,Royal%20Bank%20of%20Canada%20Deepens%20the%20Customer%20Experience,-Data%20Analytics'>Royal Bank of Canada Deepens the Customer Experience</a></li>


<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            © 2023, 2024 Teradata. All rights reserved.
        </div>
    </div>
</footer>