<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Sessionize Function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>A session is a group of records with a common session identifier. The Sessionize function will combine records with a similiar identifier or features and group them together by assigning that group a new, unique session identifier. Here are some examples of using the Sessionize:<br>

<ul style='font-size:16px;font-family:Arial'>
  <li><strong>Video Streaming Sessions:</strong> Tracking a user’s watch history as a single session, from the start of a video to the end or a break in activity, to analyze content consumption and user retention.</li>
 
<li><strong>Mobile App Interaction Sessions:</strong> Aggregating user interactions within a mobile app (e.g., screen views, button clicks) within a session to measure engagement and app usage over time.</li>
 
<li><strong>IoT Device Sessions:</strong> Monitoring data from IoT devices (e.g., smart home thermostats) within defined time frames to analyze patterns in usage, efficiency, or device performance.</li>
 
<li><strong>Customer Support Sessions:</strong> Grouping interactions between a customer and support agents within a single session, from the first contact to resolution, to evaluate response times, satisfaction, and service quality.</li>
</ul></p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [1]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [2]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

Performing setup ...
Setup complete



Enter password:  ·········


... Logon successful
Connected as: teradatasql://demo_user:xxxxx@host.docker.internal/dbc
Engine(teradatasql://demo_user:***@host.docker.internal)


In [3]:
%%capture
execute_sql('''SET query_band='DEMO=PP_Sessionize_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [4]:
%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_Retail_cloud');"        # Takes 30 seconds
#%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_Retail_local');" 

Database DEMO_Retail_cloud exists


<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [5]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

You have:  #databases=1 #tables=0 #views=4  You have used 0.8 MB of 30,679.6 MB available - 0.0%  ... Space Usage OK
 
   Database Name                  #tables  #views     Avail MB      Used MB
   demo_user                            0       0  30,679.6 MB       0.8 MB 
   DEMO_Retail                          0       4       0.0 MB       0.0 MB 


<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [6]:
tdf = DataFrame(in_schema('DEMO_Retail', 'Retail_Events'))
print("Shape of the data: ", tdf.shape)
tdf

Shape of the data:  (35866, 3)


entity_id,datestamp,event
33646.0,2018-04-09 21:35:00.000000,Store Visit
24409.0,2018-05-09 04:53:00.000000,Purchase
24409.0,2018-05-09 04:42:00.000000,Web Chat
33299.0,2018-03-30 03:37:00.000000,Mem Purchase
33299.0,2018-03-22 08:30:00.000000,Neutral Call


In [7]:
tdf.tdtypes

COLUMN NAME,TYPE
entity_id,"DECIMAL(precision=18, scale=0)"
datestamp,TIMESTAMP()
event,"VARCHAR(length=50, charset='UNICODE')"


<p style = 'font-size:16px;font-family:Arial'>Clickstream data contains to the clicks and other details that can be gathered from user interactions on a website or in an application. Sessionization of this data involves analyzing similar data based on a common identifier and then grouping them into a session. This allows us to better analyze what users are doing and interacting with on the site which can help with improvements to the user experience or customer journey.</p>
    
<p style = 'font-size:16px;font-family:Arial'>For this example, we are taking a duration of 24 hours for our session and observing the user behavior over this timeframe.</p>

<p style = 'font-size:16px;font-family:Arial'>Detailed help can be found by passing the function name to the included help function. </p>

In [8]:
help(Sessionize)

Help on class Sessionize in module teradataml.analytics.sqle:

class Sessionize(teradataml.analytics.meta_class._AnalyticFunction)
 |  Sessionize(**kwargs)
 |  
 |  DESCRIPTION:
 |      Sessionize() function maps each click in a session to a unique session identifier.
 |  
 |  
 |  PARAMETERS:
 |      data:
 |          Required Argument.
 |          Specifies the input teradataml DataFrame.
 |          Types: teradataml DataFrame
 |  
 |      time_column:
 |          Required Argument.
 |          Specifies the name of the input column that contains the click
 |          times.
 |          Note: The "time_column" must also be an "order_column".
 |          Types: str
 |  
 |      time_out:
 |          Required Argument.
 |          Specifies the number of seconds at which the session times out. If
 |          "time_out" seconds elapse after a click, then the next click
 |          starts a new session.
 |          Types: float
 |  
 |      click_lag:
 |          Optional Argument.
 |  

In [9]:
# Call the Sessionize function.  This function has several required parameters:
# data_partition_column - unique identifier of the user or entity we consolidate events for.
# data_order_column - the column or list of columns to use to order the sessions.
# time_column - column to apply the time boundary around to create a "session"
# time_out - duration in seconds to mark rows as a single session, 24 hours as example below, float.
# function returns an instance of the "Sessionize" object.  The "result" property is the teradata dataframe (virtual dataframe)

sessionized_events = Sessionize(data = tdf, 
                               data_partition_column = ['entity_id'], 
                               data_order_column = ['datestamp'], 
                               time_column = 'datestamp', 
                               time_out = 86400.00)

sessionized_events.result

entity_id,datestamp,event,SESSIONID
1578.0,2018-03-23 09:30:00.000000,Return Policy Inquiry,0
1455.0,2018-04-23 00:09:00.000000,Neutral Call,0
1455.0,2018-04-24 06:09:00.000000,Store Visit,1
1484.0,2018-04-12 09:09:00.000000,Web Chat,0
1484.0,2018-04-13 16:32:00.000000,Product Browsing,1


<p style = 'font-size:16px;font-family:Arial'>In the data returned above we can see that the function has assigned a sessionid on the events based on the parameter(time_out value) we have given.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call remove_data('DEMO_Retail');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

`Retail_Events`

- `entity_id`: customer key
- `datestamp`: timestamp of the event tracked
- `event`: event e.g website click etc which is tracked

<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>Sessionize function reference: <a href = 'https://docs.teradata.com/search/all?query=Sessionize&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>