<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Multi Touch Attribution using Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Target Audience</b></p>
 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This notebook is a simplified version of the MultiTouch_Attribution_PY_SQL notebook as it is targeted for the Business Analyst persona rather than the Data Scientist persona.</p>  
    
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Marketing attribution modelling techniques aim to determine the contribution of each marketing touchpoint or channel in influencing customer behavior and driving conversions. These models provide valuable insights into the effectiveness of marketing efforts, helping businesses make informed decisions regarding resource allocation and optimization.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><a href='#rule'>Rule-based</a> attribution modelling relies on predetermined rules or heuristics to assign credit to various touchpoints along the customer journey. Common rule-based models include the First Touch, Last Touch, Uniform (linear) and Exponential(time decay) models. The First Touch model attributes all credit to the first touchpoint a customer interacts with, while the Last Touch model assigns all credit to the final touchpoint before conversion. The Uniform model evenly distributes credit across all touchpoints in the customer journey. The Exponential model assigns more credit to touchpoints closer to the conversion event.<p>
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><a href='#stat'>Statistical</a> and <a href='#ml'>Algorithmic-based</a> attribution modelling, on the other hand, utilizes advanced statistical and machine learning techniques to determine the contribution of each touchpoint. These models take into account various factors such as the order, timing, and interaction patterns of touchpoints.<p>
   
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>All approaches have their strengths and limitations. <a href='#rule'>Rule-based</a> models are relatively straight forward to implement and interpret, but they may oversimplify the complexity of customer journeys. <a href='#ml'>Algorithmic-based</a> models offer more sophisticated and granular insights but may require advanced analytics expertise and extensive data sets to achieve accurate results.
It's important for businesses to select the most suitable attribution modelling approach based on their specific goals, available data, and resources. Implementing an effective marketing attribution model can significantly enhance decision-making and optimize marketing strategies.<p>
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Marketing attribution modelling techniques aim to determine the contribution of each marketing touchpoint or channel. Determining the importance of each interaction can aid in influencing customer behavior and driving conversions. Using the touchpoints to create models can provide valuable insights into the effectiveness of marketing efforts, which in turn will help businesses make informed decisions regarding resource allocation and optimization. With Teradata Vantage and ClearScape Analytics, users can get a full picture of their customer’s digital actions.  Using pathing analytics, businesses can understand the common paths that customers take that lead to a variety of outcomes, such as sales conversion, cart abandonment, or product searches. When businesses use Vantage to analyze all their data at scale, they have the chance to increase customer satisfaction and conversion rates.</p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Business Value</b></p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Increased customer conversion and attribution rates</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Decreased customer churn and broken journeys</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Provides an understanding of customer activity and touchpoints</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Improve customer satisfaction by optimizing processes related to the touchpoints</li><p></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Why Vantage? </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradata Vantage provides a variety of attribution modeling including rule-based, statistical, and algorithmic-based attribution. Each has their own strengths for a variety of team across an organization. and limitations, while being useful across an organization. Vantage has unique analytic capabilities for understanding customer and user behavior over time. Thus, implementing an effective marketing attribution model, using Teradata Vantage, can significantly enhance decision-making and optimize marketing strategies.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Also, ClearScape Analytics provides powerful, flexible attribution analysis, text processing, and statistical analytic techniques that can be applied to millions or billions of customers touchpoints. These results can be combined with other analytics to create more accurate models. Plus, Vantage allows organizations to scale these models horizontally (train segmented models per region, user type, etc.) or vertically (combine data from millions or billions of interactions). These models can be deployed operationally to understand and predict actions in real-time.</p>
    
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this use case we will show several different analytic techniques to perform Multi Touch Attribution modelling and analysis using Vantage.<p>
<img src="images/Attribution.png">    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Our innovative approach includes the use of <a href='#path'>Path Analysis</a> not only to identify and visualize customer conversion journeys but also to prepare data for advanced and sometimes creative techniques.<p>


<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>1. Connect to Vantage</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the section, we import the required libraries and set environment variables and environment paths (if required).</p>

In [None]:
import teradataml as tdml
import getpass
import pandas as pd
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
import matplotlib.pyplot as plt
import tdnpathviz
from teradataml import *
import warnings
warnings.filterwarnings('ignore')
display.max_rows = 5
from teradataml import configure
configure.val_install_location = "val"

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell. Begin running steps with Shift + Enter keys.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=Analyst_MultiTouch_Attribution_PY_SQL.ipynb;' UPDATE FOR SESSION; ''')

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>2. Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string. In this demo as we are using the nPath function with needs all character data in LATIN character set, we will only use the local option of creating tables and DDL.</p>   


In [None]:
%run -i ../run_procedure.py "call get_data('DEMO_MultiTouchAttribution_local');"
 # Takes about 1 minute 30 secs

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step – if you want to see status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>3. Analyze the raw data set</b></p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Data</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The dataset is digital marketing data containing 586,000 marketing touchpoints from July (2018), comprising 240,000 unique customers who generated ~18,000 conversions. A more detailed description of the features is shown below:

<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Cookie: Anonymous customer id enabling us to track the progression of a given customer</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Timestamp: Date and time when the visit took place</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Interaction: Categorical variable indicating the type of interaction that took place</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Conversion: Boolean variable indicating whether a conversion took place</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Conversion Value: Value of the potential conversion event (revenue)</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Channel: The marketing channel that brought the customer to our site</li>
</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let us start by creating a teradataml dataframe. A "Virtual DataFrame" that points directly to the dataset in Vantage.</p>



In [None]:
attr_df = DataFrame(in_schema('DEMO_MultiTouchAttribution', 'Attribution_Data'))
attr_df

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The Attribution data contains the channel details with the timestamp of the conversion , its conversion value and cost. We select the required data and do aggregations by channel to check conversions based on the types of channels.</p>

In [None]:
from sqlalchemy import literal_column
column2 = literal_column("cast('2018-07-30' as Date)")
conversions_df = attr_df.loc[attr_df['conversion'] == 1]
conversions_df = conversions_df.assign(time = conversions_df.tmstp.cast(type_=DATE))
conversions_df = conversions_df[conversions_df['time'] < column2]
conversions_df = conversions_df.drop(['cookie', 'interaction'], axis=1)
conversions_df = conversions_df.select(['conversion', 'conversion_value',
                           'cost', 'channel']).groupby('channel').sum()
conversions_df

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can see that the aggregated data is available to us in a teradataml dataframe. Let's visualize this data to better understand the Conversion values by the types of Channels. ClearScape Analytics can easily integrate with 3rd party visualization tools like Tableau, PowerBI or many python modules available like plotly, seaborn etc. We can do all the calculations and pre-processing on Vantage and pass only the necessary information to visualization tools.  This will not only make the calculation faster but also reduce the overall time due to less data movement between tools. We only transfer data for this and the subsequent visualizations wherever necessary.</p>

In [None]:
conversions = conversions_df.to_pandas()
fig = px.bar(data_frame = conversions, x = 'channel', y = 'sum_conversion', color = 'channel')

fig.update_layout(title = 'Channel Conversions',
                   xaxis_title = 'Channel',
                   yaxis_title = 'Conversions')
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above chart shows the number of conversions by each Channel.</p>

In [None]:
channel_df = DataFrame(in_schema('DEMO_MultiTouchAttribution', 'Channel_Cost'))

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The Channel data contains the channels and cost.</p>

In [None]:
df_plot = channel_df.to_pandas().reset_index()
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(x = 'channel',y = 'cost',data = df_plot)
plt.xlabel('Channel')
plt.ylabel('Cost of Conversion')
plt.title('Channel Cost')

plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The cost of Online Video is the highest and Instagram is the lowest.</p>

<a id="path"></a>
<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>4. PATH ANALYSIS</b></p>



<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>4.1. Use nPath® to visualize conversion journeys</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We want to see how our customers are converting.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The nPath function scans a set of rows, looking for patterns that you specify. For each set of input rows that matches the pattern, nPath produces a single output row. The function provides a flexible pattern-matching capability that lets you specify complex patterns in the input data and define the values that are output for each matched input set.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>nPath® is useful when your goal is to identify the paths that lead to an outcome. For example, you can use nPath to analyze:

<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Web site click data, to identify paths that lead to sales over a specified amount
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Sensor data from industrial processes, to identify paths to poor product quality
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Healthcare records of individual patients, to identify paths that indicate that patients are at risk of developing conditions such as heart disease or diabetes
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Financial data for individuals, to identify paths that provide information about credit or fraud risks.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the code here we can see a few key points:</p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>The 'Pattern' we are searching for is 8 events followed by conversion (conversion =1).</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>The 'Symbols' we are using is anything but converting is 'EVENT' and conversion column = 1 is 'CONVERSION'.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>We create a dummy 'Conversion' event to enable its visualization.</li>
</p>

In [None]:
npath_sessions = NPath(data1 = attr_df, 
                      data1_partition_column = ['cookie'], 
                      data1_order_column = ['tmstp'], 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['conversion=\'1\' as CONVERSION, conversion=\'0\' as EVENT'], 
                      pattern = 'EVENT{0,8}.CONVERSION', 
                      result = ['ACCUMULATE (case when conversion=\'1\' then \'Conversion\' else channel end OF ANY(CONVERSION,EVENT)) AS path',
                                  'COUNT (* of ANY(CONVERSION,EVENT)) as event_cnt',
                                  'FIRST (cookie OF ANY(CONVERSION,EVENT)) AS cookie'])


convcntpath = npath_sessions.result
convcntpath

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A visualization of this gives us lots of insight into the most common paths (the top 50) that users are taking before converting. A Sankey Diagram can be created using the output(path) of the nPath function used in the query above.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i>**The code in the below cell is the definition of the sankeyPlot which is used below when we visualize the Paths to Conversion and Paths to conversion by cost.</i></p>

In [None]:
#Convert Teradata nPath output to plotly Sankey
#can handle paths up to 999 links in length
import pandas as pd
import plotly.graph_objects as go
from collections import defaultdict
import random

def sankeyPlot(res, direction, title_text="Sankey nPath", topN=15):
    npath_pandas = res.copy()

    if topN:
        npath_pandas = npath_pandas.sort_values(by='count_event_cnt', ascending=False).head(topN)

    if direction == "from":
        dataDict = defaultdict(int)

        for index, row in npath_pandas.iterrows():
            pathCnt = row['count_event_cnt']
            rowList = [item.strip() for item in row['path'].replace('[','').replace(']','').split(',')]
            for i in range(len(rowList)-1):
                leftValue = rowList[i] + str(i)
                rightValue = rowList[i+1] + str(i+1)
                valuePair = leftValue + '+' + rightValue
                dataDict[valuePair] += pathCnt

        eventList = []
        for key in dataDict.keys():
            leftValue, rightValue = key.split('+')
            if leftValue not in eventList:
                eventList.append(leftValue)
            if rightValue not in eventList:
                eventList.append(rightValue)

        sankeyLabel = [s[:-1] for s in eventList]
        
        sankeySource = []
        sankeyTarget = []
        sankeyValue = []

        for key,val in dataDict.items():
            sankeySource.append(eventList.index(key.split('+')[0]))
            sankeyTarget.append(eventList.index(key.split('+')[1]))
            sankeyValue.append(val)

        sankeyColor = []
        for i in sankeyLabel:
            sankeyColor.append('#'+''.join([random.choice('0123456789ABCDEF') for _ in range(6)]))

        link = dict(source = sankeySource, target = sankeyTarget, value = sankeyValue, color='light grey')
        node=dict(label=sankeyLabel, color=sankeyColor)
        data=go.Sankey(link=link, node=node)

        fig=go.Figure(data)

        fig.update_layout(
            hovermode ='closest',
            title = title_text,
            title_font_size=20,
            plot_bgcolor='white',
            paper_bgcolor='white'
        )

        fig.show()

    elif direction == "to":
        
        dataDict = defaultdict(int)
        eventDict = defaultdict(int)
        maxPath = npath_pandas['count_event_cnt'].max()
    
        for index, row in npath_pandas.iterrows():
            rowList = row['path'].replace('[','').replace(']','').split(',')
            pathCnt = row['count_event_cnt']
            pathLen = len(rowList)
            for i in range(len(rowList)-1):
                leftValue = str(1000 + i + maxPath - pathLen) + rowList[i].strip()
                rightValue = str(1000 + i + 1 + maxPath - pathLen) + rowList[i+1].strip()
                valuePair = leftValue + '+' + rightValue
                dataDict[valuePair] += pathCnt
                eventDict[leftValue] += 1
                eventDict[rightValue] += 1
    
        eventList = []
        for key,val in eventDict.items():
            eventList.append(key)
    
        sortedEventList = sorted(eventList)
        sankeyLabel = []
        for event in sortedEventList:
            sankeyLabel.append(event[4:])
    
        sankeySource = []
        sankeyTarget = []
        sankeyValue = []

        for key,val in dataDict.items():
            sankeySource.append(sortedEventList.index(key.split('+')[0]))
            sankeyTarget.append(sortedEventList.index(key.split('+')[1]))
            sankeyValue.append(val)
    
        sankeyColor = []
        for i in sankeyLabel:
            sankeyColor.append('#'+''.join([random.choice('0123456789ABCDEF') for _ in range(10)]))
    
        link = dict(source = sankeySource, target = sankeyTarget, value = sankeyValue, color='light grey')
        data=go.Sankey(link=link, node=dict(label=sankeyLabel))
    
        fig=go.Figure(data)
        fig.update_layout(
                hovermode ='closest',
                title = title_text,
                title_font_size=20,
                plot_bgcolor='white',
                paper_bgcolor='white'
                )
    
        fig.show()

    else:
        print("Invalid direction.")

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will consider an example where we create a path for a cookie which leads to conversion.</p>

In [None]:
attr_df[attr_df['cookie'] == 'FFfBikCE3onF3hACFCCE9iDf3'].sort('tmstp')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above table shows the output of 1 cookie ordered by Timestamp(tmstp). We can see that there were 3 touch points of the Facebook channel when conversion did not happen. Finally on the 4th touch point of the Facebook channel, conversion takes place. So, the path will be </p>
<p style = 'font-size:14px;font-family:Arial;color:#00233C'><b>Facebook</b><b style = 'font-size:12px;font-family:Arial;color:#00233C'>(2018-07-02 16:08:02)--></b><b style = 'font-size:14px;font-family:Arial;color:#00233C'>Facebook</b><b style = 'font-size:12px;font-family:Arial;color:#00233C'>(2018-07-08 18:38:32)--></b><b style = 'font-size:14px;font-family:Arial;color:#00233C'>Facebook</b><b style = 'font-size:12px;font-family:Arial;color:#00233C'>(2018-07-10 12:30:15)--></b><b style = 'font-size:14px;font-family:Arial;color:#00233C'>Facebook</b><b style = 'font-size:12px;font-family:Arial;color:#00233C'>(2018-07-14 10:33:31)--></b><b style = 'font-size:14px;font-family:Arial;color:#00233C'>Conversion</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Below we plot the paths for Top 100 path that led to conversion based on the count of events.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i>**The visualization takes around 1 minute 30 seconds to execute</i></p>

In [None]:
res = convcntpath\
                    .groupby(['path'])\
                    .count()\
                    .sort('count_event_cnt',ascending=False)\
                    .to_pandas()\
                    .head(100)

In [None]:
sankeyPlot(res, "to", "Path to Conversion", 100)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above Sankey Diagram shows the paths that led to Conversion.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can check the details of any path or node when we move the mouse pointer over it and check details. For example, if we move the pointer over the path having the largest width at the topmost path going towards the right most node(Conversion) it shows <b>2.30k, source: Facebook, target: Conversion.</b> It means there were 2.30k touch points where after going to Facebook the next event was Conversion. Similarly, 1.92k Online Video touch points, 1.98k Paid Search touch points, 873 Instagram touch points, 816 Online Display touch points which lead to Conversion. </p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>When we move the pointer over a Node, for example when we moved the pointer on the largest Node at the top before conversion is <b>Facebook </b>  it shows <b>incoming flow count: 5 and outgoing flow count: 1</b> which means that there are 5 different paths which lead to Facebook after which the next 1 event led to Conversion. Similarly, other nodes and paths can be analyzed.</p>



<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>4.2 Use nPath as a data preparation function and input to additional analytics techniques</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this step we are using nPath function to create input tables to be used by statistical and machine learning based approaches. We have used these tables in analysis below for example in TERM FREQUENCY - INVERSE DOCUMENT FREQUENCY (TF-IDF) analysis where we score these converting and non-converting journeys.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Create a table with all converting journeys</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We are creating a table with all kinds of paths that lead to Conversion.  To achieve this, we look at any sequence of events ending with a conversion.</p>

In [None]:
npath_ConvJour = NPath(data1 = attr_df, 
                      data1_partition_column = ['cookie'], 
                      data1_order_column = ['tmstp'], 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['conversion=\'1\' as C, conversion=\'0\' as E'], 
                      pattern = 'E*.C', 
                      result = ['ACCUMULATE (channel OF ANY(C,E)) AS path'
                                ,'COUNT (* of ANY(C,E)) as event_cnt'
                                ,'FIRST (cookie OF ANY(C,E)) AS cookie'])


npath_ConvJour_df = npath_ConvJour.result
npath_ConvJour_df = npath_ConvJour_df[npath_ConvJour_df['event_cnt']>1]
npath_ConvJour_df

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Create a table with all non-converting journeys (leaving out potential converting journeys)</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We are creating a table with all kinds of paths that do not lead to any Conversion. To achieve this, we look for all paths where cookies are not part of any converting journey (just previously defined) and leaving out any potential converting journey.</p>

In [None]:
dist = npath_ConvJour_df.get('cookie')
dist_val = dist.get_values()
list_val = [dist_val[i][0] for i in range(len(dist_val))]
list_val = list(set(list_val))

In [None]:
max_tmstp = attr_df[attr_df['conversion'] == '1'].select('tmstp').max()
Journey_data = attr_df.merge(right = max_tmstp, how = "inner", on = ["tmstp < max_tmstp"])
Journey_data

In [None]:
copy_to_sql(df = Journey_data, table_name='journey_data', if_exists='replace')
Journey_data = DataFrame('journey_data')

In [None]:
npath_NConvJour = NPath(data1 = Journey_data, 
                      data1_partition_column = ['cookie'], 
                      data1_order_column = ['tmstp'], 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['TRUE as A'], 
                      pattern = 'A*', 
                      result = ['ACCUMULATE (channel of ANY(A)) as path'
                                ,'ACCUMULATE (conversion of ANY(A)) as conv'
                                ,'COUNT (* of ANY(A)) as event_cnt'
                                ,'FIRST (cookie OF ANY(A)) AS cookie'])

npath_NConvJour_df = npath_NConvJour.result
npath_NConvJour_df

In [None]:
npath_NConvJour_df = npath_NConvJour_df[npath_NConvJour_df['event_cnt']>1]
npath_NConvJour_df = npath_NConvJour_df[npath_NConvJour_df['conv'].str.contains('1') == False]
npath_NConvJour_df = npath_NConvJour_df[~npath_NConvJour_df.cookie.isin(list_val)]
npath_NConvJour_df

<a id="rule"></a>
<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>5. RULE BASED MODELS</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Rule Based attribution models assign conversion credits (weights) to touchpoints in a conversion path according to certain predefined rules.
</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>These rules are used to identify the position of an interaction on the conversion path and then assign conversion credit solely on the basis of its position.
</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To execute rule based models we can leverage the Vantage native Attribution function and easily consider the following methods:
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Uniform: Conversion event is attributed uniformly to preceding attributable events.</li>
    <li>First Click: Conversion event is attributed entirely to first attributable event.</li>
    <li>Last Click: Conversion event is attributed entirely to most recent attributable event</li> 
    <li>Exponential:  Conversion event is attributed exponentially to preceding attributable events (the more recent the event, the higher the attribution).</li>
 </ul>
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The function takes data and parameters from multiple tables and outputs attributions. Please refer to Teradata Vantage™ - Analytics Database Analytic Functions documentation for more on Attribution function.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Attribution Input :
<ol style = 'font-size:14px;font-family:Arial;color:#00233C'>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>Input tables (maximum of five) (Contain data for computing attributions).</li>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>ConversionEventTable (Contains conversion events).</li>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>FirstModelTable (Defines type and distributions of model - we'll create one table per model)</li></ol>
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Attribution Syntax Elements:
<ol style = 'font-size:14px;font-family:Arial;color:#00233C'>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>EventColumn specifies the name of the input column that contains the events.</li>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>TimeColumn specifies the name of the input column that contains the timestamps of the  events.</li>
<li style = 'font-size:14px;font-family:Arial;color:#00233C'>WindowSize specifies how to determine the maximum window size for the attribution calculation</li></ol>
    </p>


<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>5.1. Create Conversion Event Table.</b></p> 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'> Since we are focusing on the events that led to Conversion our ATTRIBUTION CONVERSION Table will have only one value <b>'conversion'</b>.</p>     
    

In [None]:
# Drop the table if it already exists
qry = 'DROP TABLE ATTRIBUTION_CONVERSION;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Create the table
qry = '''
CREATE MULTISET TABLE ATTRIBUTION_CONVERSION
(
    CONVERSION VARCHAR(100)
);
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line1)
qry = '''
INSERT INTO ATTRIBUTION_CONVERSION VALUES ('conversion');;
'''

# Execute the query
execute_sql(qry)

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>5.2 Create model specifications tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will need to create 1 model table for each type of Attribution: First Click , Last Click, Uniform and Exponential Attribution hence we are creating 4 different model tables below and creating data for each of these model types.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Uniform Model (applies equal weighting to all contributing touchpoints in the customer journey)</b></p>

In [None]:
# Drop the table if it already exists
qry = 'DROP TABLE ATTRIBUTION_MODEL_UNIFORM;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Create the table
qry = '''
CREATE MULTISET TABLE ATTRIBUTION_MODEL_UNIFORM
(
    ID   INT,
    MODEL VARCHAR(100)
);
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line1)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_UNIFORM VALUES (0,'EVENT_REGULAR');
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line2)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_UNIFORM VALUES (1,'ALL:1.0:UNIFORM:NA');
'''

# Execute the query
execute_sql(qry)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> First Click Model (100% of the credit is directly attributed to the first interaction in the customer journey)</b></p>


In [None]:
# Drop the table if it already exists
qry = 'DROP TABLE ATTRIBUTION_MODEL_FIRSTCLICK;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Create the table
qry = '''
CREATE MULTISET TABLE ATTRIBUTION_MODEL_FIRSTCLICK
(
    ID   INT,
    MODEL VARCHAR(100)
);
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line1)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_FIRSTCLICK VALUES (0,'EVENT_REGULAR');
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line2)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_FIRSTCLICK VALUES (1,'ALL:1.0:FIRST_CLICK:NA');
'''

# Execute the query
execute_sql(qry)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Last Click Model (100% of the credit is directly attributed to the last interaction in the customer journey)</b></p>

In [None]:
# Drop the table if it already exists
qry = 'DROP TABLE ATTRIBUTION_MODEL_LASTCLICK;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Create the table
qry = '''
CREATE MULTISET TABLE ATTRIBUTION_MODEL_LASTCLICK
(
    ID   INT,
    MODEL VARCHAR(100)
);
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line1)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_LASTCLICK VALUES (0,'EVENT_REGULAR');
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line2)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_LASTCLICK VALUES (1,'ALL:1.0:LAST_CLICK:NA');
'''

# Execute the query
execute_sql(qry)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Exponential Model (assigns exponentially more weight to the interactions which are closest in time to conversion)</b></p>

In [None]:
# Drop the table if it already exists
qry = 'DROP TABLE ATTRIBUTION_MODEL_EXPONENTIAL;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Create the table
qry = '''
CREATE MULTISET TABLE ATTRIBUTION_MODEL_EXPONENTIAL
(
    ID   INT,
    MODEL VARCHAR(100)
);
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line1)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_EXPONENTIAL VALUES (0,'EVENT_REGULAR');
'''

# Execute the query
execute_sql(qry)

#Insert model specification values (line2)
qry = '''
INSERT INTO ATTRIBUTION_MODEL_EXPONENTIAL VALUES (1,'ALL:1.0:EXPONENTIAL:0.5,ROW');
'''

# Execute the query
execute_sql(qry)

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>5.3. Compute all four models and store outputs in a table</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>After creating the four model tables we will use them in the calculation of ATTRIBUTION for each channel based on all these models as in the query below.</p> 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In order to consider 20 rows from most to least recent preceding conversion to compute all Rule-based models we use the WindowSize argument of the Attribution function. More specifically we use the "rows:K" option which assigns attributions to at most K events before conversion event. In our case K=20.</p>



In [None]:
attr_conversion =  DataFrame('ATTRIBUTION_CONVERSION')
attr_uniform =  DataFrame('ATTRIBUTION_MODEL_UNIFORM')
attr_fc =  DataFrame('ATTRIBUTION_MODEL_FIRSTCLICK')
attr_lc =  DataFrame('ATTRIBUTION_MODEL_LASTCLICK')
attr_exp =  DataFrame('ATTRIBUTION_MODEL_EXPONENTIAL')

In [None]:
attribution_uniform = Attribution(data=attr_df,
                                             data_partition_column="cookie",
                                             data_order_column="tmstp",
                                             event_column="interaction",
                                             conversion_data=attr_conversion,
                                             timestamp_column = "tmstp",
                                             window_size = "rows:20",
                                             model1_type=attr_uniform)

AttrUNI_df = attribution_uniform.result
attribution_FC = Attribution(data=attr_df,
                                             data_partition_column="cookie",
                                             data_order_column="tmstp",
                                             event_column="interaction",
                                             conversion_data=attr_conversion,
                                             timestamp_column = "tmstp",
                                             window_size = "rows:20",
                                             model1_type=attr_fc)

AttrFC_df = attribution_FC.result
attribution_LC = Attribution(data=attr_df,
                                             data_partition_column="cookie",
                                             data_order_column="tmstp",
                                             event_column="interaction",
                                             conversion_data=attr_conversion,
                                             timestamp_column = "tmstp",
                                             window_size = "rows:20",
                                             model1_type=attr_lc)

AttrLC_df = attribution_LC.result
attribution_EXP = Attribution(data=attr_df,
                                             data_partition_column="cookie",
                                             data_order_column="tmstp",
                                             event_column="interaction",
                                             conversion_data=attr_conversion,
                                             timestamp_column = "tmstp",
                                             window_size = "rows:20",
                                             model1_type=attr_exp)

AttrEXP_df = attribution_EXP.result


In [None]:
attr_12_df = AttrUNI_df.merge(right = AttrFC_df, how = "inner" , on = ["cookie","tmstp", "channel"], lsuffix = "t1", rsuffix = "t2")
attr_34_df = AttrLC_df.merge(right = AttrEXP_df, how = "inner" , on = ["cookie","tmstp", "channel"], lsuffix = "t3", rsuffix = "t4")
attr_all_df = attr_12_df.merge(right = attr_34_df, how = "inner" , on = ["cookie_t1 = cookie_t3","tmstp_t1 = tmstp_t3"
                                                                            , "channel_t1 = channel_t3"], lsuffix = "t5", rsuffix = "t6")


In [None]:
attr_4model_df = attr_all_df.select(['COOKIE_t1','TMSTP_t1','CHANNEL_t1', 
                                     'attribution_t1','attribution_t2','attribution_t3','attribution_t4',
                                     'time_to_conversion_t1', 'time_to_conversion_t2', 'time_to_conversion_t3',
                                     'time_to_conversion_t4'])

attr_4model_df = attr_4model_df.assign(drop_columns = True, 
                                       cookie = attr_4model_df.COOKIE_t1,
                                       tmstp = attr_4model_df.TMSTP_t1,
                                       channel = attr_4model_df.CHANNEL_t1,
                                       uni_attr = attr_4model_df.attribution_t1,
                                       uni_ttc = attr_4model_df.time_to_conversion_t1,
                                       fc_attr = attr_4model_df.attribution_t2,
                                       fc_ttc = attr_4model_df.time_to_conversion_t2,
                                       lc_attr = attr_4model_df.attribution_t3,
                                       lc_ttc = attr_4model_df.time_to_conversion_t3,
                                       exp_attr = attr_4model_df.attribution_t4,
                                       exp_ttc = attr_4model_df.time_to_conversion_t4)
attr_4model_df

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>5.4. Calculate attribution weights by channel and rule based model</b></p>

In [None]:
attr_tot = attr_4model_df.select(['exp_attr','fc_attr','lc_attr','uni_attr']).sum()
attr_channel_tot = attr_4model_df.select(['channel','exp_attr','fc_attr','lc_attr','uni_attr']).groupby('channel').sum()

In [None]:
attr_channel_tot = attr_channel_tot.merge(right=attr_tot, how="inner", 
                                          on=["sum_uni_attr < sum_uni_attr"]
                                         , lsuffix = "t1",rsuffix="t2")


In [None]:
attr_channel_total = attr_channel_tot.assign(drop_columns = True ,
                                           channel = attr_channel_tot.channel,
                                           tot_uni_attr = attr_channel_tot.sum_uni_attr_t1 / attr_channel_tot.sum_uni_attr_t2,
                                           tot_fc_attr = attr_channel_tot.sum_fc_attr_t1 / attr_channel_tot.sum_fc_attr_t2,
                                           tot_lc_attr = attr_channel_tot.sum_lc_attr_t1 / attr_channel_tot.sum_lc_attr_t2,
                                           tot_exp_attr = attr_channel_tot.sum_exp_attr_t1 / attr_channel_tot.sum_exp_attr_t2)
attr_channel_total

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above output shows the Attribution values for each type of channel using different models.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can see that the aggregated data is available to us in teradataml dataframe. Let's visualize this data to better understand the Attribution values by the types of Channels. ClearScape Analytics can easily integrate with 3rd party visualization tools like Tableau, PowerBI or many python modules available like plotly, seaborn etc. We can do all the calculations and pre-processing on Vantage and pass only the necessary information to visualization tools, this will not only make the calculation faster but also reduce the time due to less data movement between tools.</p>

In [None]:
attr_channel_plot = attr_channel_total.to_pandas()
import plotly.graph_objects as go

fig = go.Figure(
    data = [
        go.Bar(name='Uniform', x=attr_channel_plot["channel"], y=attr_channel_plot["tot_uni_attr"], yaxis='y', offsetgroup=1,marker_color='#76B7B2'),
        go.Bar(name='First Click', x=attr_channel_plot["channel"], y=attr_channel_plot["tot_fc_attr"], yaxis='y', offsetgroup=2, marker_color='#F28E2B'),
        go.Bar(name='Last Click', x=attr_channel_plot["channel"], y=attr_channel_plot["tot_lc_attr"], yaxis='y', offsetgroup=3,marker_color='#E15759'),
        go.Bar(name='Exponential', x=attr_channel_plot["channel"], y=attr_channel_plot["tot_exp_attr"], yaxis='y', offsetgroup=4,marker_color='#4E79A7')
    ],
    layout = {
        'yaxis': {'title': 'Attribution '},

    }
)
 
# Change the bar mode
fig.update_layout(barmode = 'group')
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above output shows the Attribution values for each type of channel using different models.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>From the above graph we can see that the Attribution Value for Facebook channel is highest in all the 4 models and that for Online Display is the lowest.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>5.5. Exploring Uniform Model in more details</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Whatever the model the attribution function will output a score (or attribution weight) and compute the time to conversion.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can easily put this information in perspective with the cost to measure and visualize channel effectiveness.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The uniform model can serve as a starting point or baseline for attribution analysis. It provides a benchmark against which more advanced attribution models can be compared. By evaluating the performance of other models relative to the uniform model, marketers can gain insights into the additional value or improvement offered by more sophisticated approaches like the Statistics based models or Machine learning models. We have used some of these models below in this notebook.</p>

In [None]:
attr_uni = attr_4model_df.assign(uni_ttc_rev = attr_4model_df.uni_ttc * -1)
channel_attr_cost = attr_uni.merge(right=channel_df, how="inner", on = ["channel"],lsuffix="t1", rsuffix = "t2")


In [None]:
channel_attr_cost = channel_attr_cost.select(['channel_t1','uni_attr','uni_ttc_rev','cost']).groupby('channel_t1').agg({'uni_attr' : ['sum'], 'uni_ttc_rev' : ['mean'],'cost' : ['sum']})
channel_attr_cost = channel_attr_cost.assign(drop_columns=True,
                                             channel = channel_attr_cost.channel_t1,
                                             total_attribution = channel_attr_cost.sum_uni_attr,
                                             total_cost = channel_attr_cost.sum_cost,
                                             time_to_conversion = channel_attr_cost.mean_uni_ttc_rev/86400)
channel_attr_cost

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The total attribution , cost and time to conversion are used from the output of the Attribution function used above. Here we are considering only the attribution scores from the UNIFORM attribution model(sum(uniform_attribution)).</p> 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>All three dimensions - cost, attribution and time to conversion - can be plotted on a bubble chart, the size of the bubbles showing the cost. </p>

In [None]:
AttribUni_plot = channel_attr_cost.to_pandas()
import plotly.express as px
ax = px.scatter(AttribUni_plot, x="total_attribution", y="time_to_conversion",
              size="total_cost",size_max = 70,color="channel",hover_data=['channel'],
              width=900, height=400, 
              color_discrete_map = {'Online Display': '#E15759','Online Video': '#76B7B2','Facebook': '#4E79A7','Instagram': '#F28E2B' ,'Paid Search': '#59A14F'},
             labels={
                     "total_attribution": "Total  Attribution",
                     "time_to_conversion": "Time to Conversion (Days)"
        }
             )
ax.update_layout(showlegend=False)
ax.update_layout(title_text='Channel Performance - Uniform Model', title_x=0.5)
ax.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above graph shows the Channel Performance using the UNIFORM Model. The size of the circle depends on the total cost of the channel. When we move the mouse over the circles we can see channel, it's attribution value, time to conversion and also the cost, in the text. The largest circle is for Online Video followed by Facebook, which indicates that the Online Video channel is less performant than Facebook (higher cost, lower attribution).</p>

<a id="stat"></a>
<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>6. STATISTICAL BASED MODELS</b></p>


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.1 SIMPLE FREQUENCY ANALYSIS</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A simple frequency analysis (obtained by calculating the occurrences of the channel in the journeys leading to Conversion) can be used as a basic approach to compute marketing attribution.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>NGramSplitter considers each input row to be one document and returns a row for each unique n-gram in each document. NGramSplitter also returns, for each document, the counts of each n-gram and the total number of n-grams.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>NGramSplitter is an algorithm used in natural language processing to divide text into smaller units known as n-grams. An n-gram is a sequence of n items, such as words, letters or characters, taken from a given sample of text or speech. The NGramSplitter algorithm takes a string of text as input and returns a list of n-grams based on a specified value of n..</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We just need to tokenize paths in converting journeys and calculate the frequency. </p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here for path tokenization, we use NGramSplitter function which splits the input stream of text (here paths) into "terms" (channel) of selected size (1:- which means each event) and count them.</p>



In [None]:
ngram_df = npath_ConvJour_df[npath_ConvJour_df['event_cnt']<=20]
tdf_grams = NGramSplitter(
               data             = ngram_df
              ,text_column      = 'path'
              #,accumulate       = 'comment_id'
              ,grams            = "1"
              ,overlapping      = True
              ,to_lower_case    = False
              ,delimiter        = ","
              #,punctuation      = '[`~#^&*()-]'
              ,reset = "[]"
              ,total_gram_count = False
            ).result
tdf_grams

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Thus, in the output we can see the ngrams which are various channels here and the frequency of these channels in the paths </p>

In [None]:
freq_df = tdf_grams.select(['ngram','frequency']).groupby('ngram').sum()
tot_freq = freq_df.select(['sum_frequency']).sum()
freq_tot_df = freq_df.merge(right=tot_freq,how="cross" ,on = ["sum_frequency < sum_sum_frequency"])
ngram_freq_df = freq_tot_df.assign(drop_columns=True,
                               channel = freq_tot_df.ngram,
                               frequency = freq_tot_df.sum_frequency,
                               # tot = tot_freq,
                               tp = (1.000 * freq_tot_df.sum_frequency/freq_tot_df.sum_sum_frequency))
ngram_freq_df

<p style = 'font-size:16px;font-family:Arial;'> The output of the NGramSplitter contains ngram, the frequency of the channel, the Total frequency(to) and the percentage of the channel frequency to total frequency(tp). </p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Visualizing the results in a vertical bar chart.</p>

In [None]:
import plotly.express as px
freq = ngram_freq_df.to_pandas().reset_index()
fig = px.bar(freq, y="tp", x="channel", 
             color='channel', orientation='v',
             height=600,width=900,
             color_discrete_map = {'Online Display': '#E15759','Online Video': '#76B7B2','Facebook': '#4E79A7','Instagram': '#F28E2B' ,'Paid Search': '#59A14F'},
             title='Attribution Summary')
fig.update_layout(title_text='Frequency Based Attribution Summary', title_x=0.5)
fig.update_xaxes(title='Channel',tickangle=-45)
fig.update_yaxes(title='Attribution Weight')
fig.update_traces(width=0.5)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above graph shows the Frequency based Attribution value for each channel using the ngrams. We can see that the Attribution Value for Facebook channel is highest and that for Online Display is lowest.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>7. ASSOCIATION ANALYSIS (looking for association of channels driving conversion)</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Association analysis can help identify channels that are frequently used in combination within converting journeys.  This information can guide resource allocation and enable marketers to focus on the most effective channel combinations to lift conversion.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>7.1. Prepare data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Association analysis can help identify channels that are frequently used in combination with other successful channels. This information can guide resource allocation and enable marketers to focus on the most effective channels.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We use the nPath function to identify all cookies that are leading to a conversion and use this cookies list as a filter to the original dataset.</p>

In [None]:
npathsession = NPath(data1 = attr_df, 
                      data1_partition_column = ['cookie'], 
                      data1_order_column = ['tmstp'], 
                      mode = 'NONOVERLAPPING', 
                      symbols = ['conversion=\'1\' as C, conversion=\'0\' as E'], 
                      pattern = 'E*.C', 
                      result = ['ACCUMULATE (case when conversion=\'1\' then \'converted\' else channel end OF ANY(C,E)) AS path',
                                  'COUNT (* of ANY(C,E)) as event_cnt',
                                  'FIRST (cookie OF ANY(C,E)) AS cookie'])


convdf = npathsession.result
convdf = convdf[convdf.event_cnt > 1]
convdf

In [None]:
asso2_df = attr_df.merge(right=convdf, how="inner", on = ['cookie'], lsuffix = "t1", rsuffix = "t2")
asso2_df = asso2_df.assign(drop_columns=True
                          ,channel = asso2_df.channel
                          ,cookie = asso2_df.cookie_t1)
asso2_df

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>7.2. Compute Association Analysis</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We calculate the association by using the Association function from the Vantage Analytic Library(VAL). The source data will be the output of the nPath function.</p>

In [None]:
convobj = valib.Association(data=asso2_df, group_column="cookie", item_column="channel")
 
    # Print the affinity result. Only affinity result for default combination 11 is produced.
asso_df=convobj.result_11
asso_df 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The output of the association function has the above columns.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Item1of2 and item2of2 are the channel for which the association is calculated. The measures are defined as follows:</p>

<li style = 'font-size:16px;font-family:Arial;color:#00233C'>
Support is percentage of groups containing the items on the left (left side support), on the right (right side support) or on both sides of a rule (rule support).</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Confidence is percentage of groups containing the left side items that also contain the right side items.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Lift is a measure of how much the probability is raised that the right side items occur in a group given that the left side items occur in the group.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Z Score is a statistical measure of how much the expected and actual values of the number of groups containing all the items in the rule varies.  (Zero means expected and actual are the same.)</li>
</p>

In [None]:
import plotly.graph_objects as go
ConvAsso = asso_df.to_pandas().reset_index()

marker_text = [f"{size}" for size in round(ConvAsso['CONFIDENCE'],2)]
hover_text = [f"Lift: {value}" for value in round(ConvAsso['LIFT'],2)]

fig = go.Figure(data=go.Scatter(x=ConvAsso['ITEM1OF2'],
                                y=ConvAsso['ITEM2OF2'],
                                mode='markers+text',
                               text=marker_text,  # Set the marker size text values
    hovertext=hover_text,  # Set the hovertext values
    hoverinfo='text',  # Only show hovertext on hover
                                #text=hover_text, 
                                marker=dict(
        size=ConvAsso['CONFIDENCE'],
        sizemode='area',
        sizeref=0.0004,
        symbol='square',
        color=ConvAsso['LIFT'],
        colorscale='GnBu'
    )))
       # text=toto['LIFT'])) # hover text goes here

fig.update_layout(title='Channel Associations in Converting Journeys', title_x=0.5)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The strongest channel associations within conversion journeys are <b>Instagram</b> + <b>Facebook</b> and <b>Paid Search</b> + <b>Online Display</b>. 


<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>8. TERM FREQUENCY (Inverse Document Frequency (TF-IDF))</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>TF-IDF is a technique commonly used in natural language processing and text mining tasks to determine the importance of a term within a document or corpus.</p> 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>TF-IDF can be defined as the calculation of how relevant a word in a series or corpus is to a text.
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The meaning increases proportionally to the number of times in the text a word appears but is compensated by the word frequency in the corpus.
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>It's commonly used for ranking word relevance and then compare text documents.

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Considering paths (sequence of events) as text we commute and compare the TF-IDF scores between the two sets of event paths (converting and non-converting). We can then examine the top-ranked terms - in our case, channels - with high TF-IDF scores in each set to identify the channels that are most distinctive or important within each set. Therefore, we can compare channel contribution across Converted and Non-Converted journeys and put calculated attribution weights in perspective.</p>




<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>8.1. Prepare Data</b></p>

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will tokenize paths for both converting and non-converting journeys and save output into a table. We use NGramSplitter function here for path tokenization which splits the input stream of text (here paths) into "terms" (grams) of selected size (1:- which means each event) and count them.
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Converting journeys.</b></p>

In [None]:
npath_ConvJour_df = npath_ConvJour_df[npath_ConvJour_df['event_cnt']<=20]
conv_ngrams = NGramSplitter(
               data             = npath_ConvJour_df
              ,text_column      = 'path'
              #,accumulate       = 'comment_id'
              ,grams            = "1"
              ,overlapping      = True
              ,to_lower_case    = False
              ,delimiter        = ","
              #,punctuation      = '[`~#^&*()-]'
              ,reset = "[]"
              ,total_gram_count = False
            ).result
conv_ngrams

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Non-Converting journeys.</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Similar to the converting journeys we also use the NgramSplitter on the non-converting journeys

In [None]:
nconv_ngrams = NGramSplitter(
               data             = npath_NConvJour_df
              ,text_column      = 'path'
              #,accumulate       = 'comment_id'
              ,grams            = "1"
              ,overlapping      = True
              ,to_lower_case    = False
              ,delimiter        = ","
              #,punctuation      = '[`~#^&*()-]'
              ,reset = "[]"
              ,total_gram_count = False
            ).result
nconv_ngrams

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>8.2. Compute TF-IDF scores</b></p>

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will calculate the TF-IDF scores for each term in the term-document sets (Converting and Non-Converting). TF-IDF is computed by multiplying the term frequency (TF) of a term in a document to the natural log of the inverse document frequency (IDF) across the collection of documents. The TF component measures the importance of a term within an individual event path, while the IDF component captures the rarity or distinctiveness of a term across the entire set of event paths.
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Converting journeys.</b></p>

In [None]:
tfconv = conv_ngrams.assign(drop_columns = True 
                            ,ngram = conv_ngrams.ngram
                            ,cookie = conv_ngrams.cookie
                            ,tf = 1.00000 * conv_ngrams.frequency / conv_ngrams.event_cnt)
tfconv

In [None]:
cnt_cookie = conv_ngrams.select(['cookie']).count()
ngram_cnt = conv_ngrams.select(['ngram','cookie']).groupby('ngram').count()
idfconv = ngram_cnt.merge(right=cnt_cookie, how = "inner" , on = [ngram_cnt.count_cookie < cnt_cookie.count_cookie], lsuffix = "t1", rsuffix = "t2")

In [None]:
from sqlalchemy import func
idfconv = idfconv.assign(drop_columns = True
                         , ngram = idfconv.ngram
                         ,idf = (idfconv.count_cookie_t2/idfconv.count_cookie_t1).log10())
idfconv

In [None]:
tfidfconv = idfconv.merge(right=tfconv , how = "inner" , on = "ngram" , lsuffix = "t1", rsuffix = "t2")
tfidfconv = tfidfconv.assign(drop_columns = True
                            ,ngram = tfidfconv.ngram_t1
                            ,tfidf = tfidfconv.tf* tfidfconv.idf)
tfidfconv = tfidfconv.groupby('ngram').sum()
tfidfconv

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Non-Converting journeys.</b></p>

In [None]:
tfnconv = nconv_ngrams.assign(drop_columns = True 
                            ,ngram = nconv_ngrams.ngram
                            ,cookie = nconv_ngrams.cookie
                            ,tf = 1.00000 * nconv_ngrams.frequency / nconv_ngrams.event_cnt)
tfnconv

In [None]:
cnt_ncookie = nconv_ngrams.select(['cookie']).count()
ngram_ncnt = nconv_ngrams.select(['ngram','cookie']).groupby('ngram').count()
idfnconv = ngram_ncnt.merge(right=cnt_ncookie, how = "inner" , on = [ngram_ncnt.count_cookie < cnt_ncookie.count_cookie], lsuffix = "t1", rsuffix = "t2")

In [None]:
from sqlalchemy import func
idfnconv = idfnconv.assign(drop_columns = True
                         , ngram = idfnconv.ngram
                         ,idf = (idfnconv.count_cookie_t2/idfnconv.count_cookie_t1).log10())
idfnconv

In [None]:
tfidfnconv = idfnconv.merge(right=tfnconv , how = "inner" , on = "ngram" , lsuffix = "t1", rsuffix = "t2")
tfidfnconv = tfidfnconv.assign(drop_columns = True
                            ,ngram = tfidfnconv.ngram_t1
                            ,tfidf = tfidfnconv.tf* tfidfnconv.idf)
tfidfnconv = tfidfnconv.groupby('ngram').sum()
tfidfnconv

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>8.3. Rank and Compare</b></p>

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will rank and regroup the channel TF-IDF scores for channels in both Converting and Non-Converting journeys.
</p>

In [None]:
window_func_conv= tfidfconv.sum_tfidf.window(order_columns="sum_tfidf")
window_func_nonconv= tfidfnconv.sum_tfidf.window(order_columns="sum_tfidf")

In [None]:
rank_conv = tfidfconv.assign(rank_conv=window_func_conv.rank())
rank_nonconv = tfidfnconv.assign(rank_nonconv=window_func_nonconv.rank())
rank_com = rank_conv.merge(right=rank_nonconv ,how = "inner", on = "ngram", lsuffix = "t1", rsuffix="t2")


In [None]:
rank_com = rank_com.assign(drop_columns = True
                           ,channel = rank_com.ngram_t1
                           ,converted_rank = rank_com.rank_conv
                           ,nonconverted_rank = rank_com.rank_nonconv)
rank_com

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will create a Slope Chart to compare the channel significance ranking in both Converting and Non-Converting journeys.
</p>

In [None]:
import matplotlib.pyplot as plt
df = rank_com.to_pandas()
# Sort DataFrame by channel
df.sort_values(by='channel', inplace=True)

# Create figure and axis
fig, ax = plt.subplots()

# Set x and y values for the slope chart
x = [0, 1]
channels = df['channel']
y_conv = df['converted_rank']
y_nconv = df['nonconverted_rank']

# Define custom colors for each channel
color_mapping = {
    'Instagram': '#F28E2B',
    'Facebook': '#4E79A7',
    'Online Display': '#E15759',
    'Online Video': '#76B7B2',
    'Paid Search': '#59A14F',
    # Add more channels and corresponding colors as needed
}

# Plot the slope chart with assigned colors
for channel, conv, nconv in zip(channels, y_conv, y_nconv):
    color = color_mapping.get(channel, 'black')  # Default color if channel not found in the mapping
    ax.plot(x, [conv, nconv], marker='o', markersize=10, color=color, label='_nolegend_')
    ax.text(-0.1, conv, channel, ha='right', va='center', fontsize=8, color='black')
    ax.text(1.05, nconv, channel, ha='left', va='center', fontsize=8, color='black')

# Set x-axis ticks and labels
ax.set_xticks(x)
ax.set_xticklabels(['CONVERTING', 'NON CONVERTING'])

# Set y-axis label
ax.set_ylabel('Rank')

# Set title
ax.set_title('Comparing Channel in Converting and Non Converting Paths',loc='center', pad=30)

# Remove spines (borders) of the plot
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# Hide ticks and tick labels on the left spine
ax.yaxis.set_ticks_position('none')
ax.xaxis.set_ticks_position('bottom')

# Set the limits of the x-axis
ax.set_xlim(-0.4, 1.2)

# Format y-axis tick labels to remove decimal values with .5 and invert the scale
ax.yaxis.set_major_locator(plt.MaxNLocator(integer=True))
ax.invert_yaxis()

# Display the plot
plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Online Video</b> and <b>Facebook</b> are slightly more significantly appearing in Converting journeys and <b>Paid Search</b> is clearly more distinctive to Non-Converting journeys.
</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<a id="ml"></a>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>9. MACHINE LEARNING BASED MODELS</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Machine Learning based models allow us to switch from rule-based/heuristic methods to probabilistic ones, moving further up the maturity scale. With a data-driven algorithmic  approach, attribution outputs are predicated based on data and the modelling of that data.</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>NAIVE BAYES</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Naive Bayes is a machine learning algorithm commonly used for classification tasks, including text classification, spam filtering, and sentiment analysis. While it is not typically used to directly compute marketing attribution, it can be employed as part of a broader marketing attribution framework.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will use Naive Bayes for binary text classification of paths in two categories, converted and non converted. Once the Naive Bayes classifier is trained, it can be used to estimate the probability that a specific marketing touchpoint contributed to an outcome.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>By evaluating the likelihood of the observed features associated with conversion, the algorithm can provide a probability score representing the attribution weight.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To run a Naive Bayes classification model, we can leverage Vantage native Naive Bayes text classifier trainer function beside some in-database data preparation.</p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Prepare Data</b></p>

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>Tokenize paths for both converting and non-converting journeys and save output into a table. We use NGramSplitter function here for path tokenization which splits the input stream of text (here paths) into "terms" (grams) of selected size (1:- which means each event) and count them.</p>
 

In [None]:
conv_ngrams_df = NGramSplitter(
               data             = npath_ConvJour_df
              ,text_column      = 'path'
              #,accumulate       = 'comment_id'
              ,grams            = "1"
              ,overlapping      = False
              ,to_lower_case    = False
              ,delimiter        = ","
              #,punctuation      = '[`~#^&*()-]'
              ,reset = "[]"
              ,total_gram_count = False
              ,accumulate = 'cookie'
            ).result


In [None]:
conv_ngrams_df = conv_ngrams_df.assign(drop_columns=True
                                       ,cookie = conv_ngrams_df.cookie
                                       ,ngram = conv_ngrams_df.ngram
                                       ,distcnt = '1' 
                                       ,totcnt = conv_ngrams_df.frequency
                                       ,conv = '1')
conv_ngrams_df

In [None]:
nonconv_ngrams_df = NGramSplitter(
               data             = npath_NConvJour_df
              ,text_column      = 'path'
              #,accumulate       = 'comment_id'
              ,grams            = "1"
              ,overlapping      = True
              ,to_lower_case    = False
              ,delimiter        = ","
              #,punctuation      = '[`~#^&*()-]'
              ,reset = "[]"
              ,total_gram_count = False
              ,accumulate = 'cookie'
            ).result


In [None]:
nonconv_ngrams_df = nonconv_ngrams_df.assign(drop_columns=True
                                       ,cookie = nonconv_ngrams_df.cookie
                                       ,ngram = nonconv_ngrams_df.ngram
                                       ,distcnt = '1' 
                                       ,totcnt = nonconv_ngrams_df.frequency
                                       ,conv = '0')
nonconv_ngrams_df

In [None]:
allngrams = conv_ngrams_df.concat(nonconv_ngrams_df)
allngrams

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Run Naive Bayes Text Classifier model</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>TD_NaiveBayesTextClassifierTrainer function calculates the conditional probabilities for token-category pairs, the prior probabilities, and the missing token probabilities for all categories. The trainer function trains the model with the probability values (and the predict function - not used here - would use the values to classify paths into categories).</p>

In [None]:
copy_to_sql(allngrams, table_name ='allngrams', if_exists = 'replace')

In [None]:
#Drop table if exists
qry = 'DROP TABLE NBOUTPUT;'
try:
    execute_sql(qry)
except Exception as e:
    if str(e.args).find('3807') >= 1:
        pass
    else:
        raise

# Run Naive Bayes Text Classifier and output the result in a table  
qry = '''
CREATE MULTISET TABLE NBOUTPUT AS
(
  SELECT token,category, prob as channel_prob FROM TD_NaiveBayesTextClassifierTrainer (
   ON allngrams AS InputTable
   USING
   TokenColumn ('ngram')
   DocCategoryColumn ('conv')
   DocIDColumn ('cookie')
   ModelType ('Bernoulli')
) AS dt)
WITH DATA;
'''

# Execute the query
execute_sql(qry)

In [None]:
df1= DataFrame('NBOUTPUT')
df1

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Derive Attribution Weights and visualize</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The output of the Naive Bayes Text Classifier contains: 
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'>token: The classified training tokens (channels from tokenized paths).</li>
    <li style = 'font-size:16px;font-family:Arial;color:#00233C'>category: The category of the token (converted, non-converted).</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>prob: The probability of the token in the category.</li>
</p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This output probability is used to calculate the attribution of the channels.</p>

In [None]:
nboutput_df = df1[df1.category == '1'] 
nboutput_df =  nboutput_df[nboutput_df.token.isin(['Online Display', 'Online Video', 'Facebook','Instagram','Paid Search'])]
tot_attr = nboutput_df.select('channel_prob').sum()


In [None]:
nboutput_df = nboutput_df.merge(right=tot_attr, how = "inner", on = [nboutput_df.channel_prob < tot_attr.sum_channel_prob], lsuffix = "t1", rsuffix = "t2")
nboutput_df = nboutput_df.assign(drop_columns=True
                                ,channel = nboutput_df.token
                                ,nb_attribution=nboutput_df.channel_prob/nboutput_df.sum_channel_prob)
nboutput_df

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Visualizing the results in a vertical bar chart.</p>

In [None]:
import plotly.express as px
nbattribution = nboutput_df.to_pandas()
fig = px.bar(nbattribution, y="nb_attribution", x="channel", 
             color='channel', orientation='v',
             height=600,width=900,
             color_discrete_map = {'Online Display': '#E15759','Online Video': '#76B7B2','Facebook': '#4E79A7','Instagram': '#F28E2B' ,'Paid Search': '#59A14F'},
             title='Attribution Summary')
fig.update_layout(title_text='Naive Bayes Model Attribution Summary', title_x=0.5)
fig.update_xaxes(title='Channel',tickangle=-45)
fig.update_yaxes(title='Attribution Weight')
fig.update_traces(width=0.5)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above graph shows the Attribution value using the Naive Bayes Model. The Attribution Value for Facebook channel is highest and that for Online Display is the lowest.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>10. MULTITOUCH ATTRIBUTION MODELS SUMMARY</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To compare the attribution results of all models into a single comparative chart, we will group them together using the below query and create a visualization chart.</p>

In [None]:
summ1_df = nboutput_df.merge(right=attr_channel_total , how = "inner" , on = ["Channel = channel"], lsuffix = "t1", rsuffix = "t2")
allsumm_df = summ1_df.merge(right=ngram_freq_df , how = "inner" , on = ["channel_t1=channel"], lsuffix = "t3", rsuffix = "t4")
# summ4_df = summ1_df.merge(right=summ2_df , how = "inner" , on = ["channel = t1_channel"], lsuffix = "s1", rsuffix = "s2")


In [None]:
allsumm_df = allsumm_df.assign(drop_columns=True,
                        CHANNEL = allsumm_df.channel,
                        Uniform = allsumm_df.tot_uni_attr,
                        FirstClick = allsumm_df.tot_fc_attr,
                        LastClick = allsumm_df.tot_lc_attr,
                        Exponential = allsumm_df.tot_exp_attr,
                        NaiveBayes = allsumm_df.nb_attribution,
                        frequency = allsumm_df.tp)
allsumm_df
                        
                        

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To compare the attribution results of all models into a single comparative chart, we will group them together using the below query and create a visualization chart.</p>

In [None]:
import plotly.graph_objects as go
summary_plot = allsumm_df.to_pandas()
fig = go.Figure(
    data=[
        go.Bar(name='Uniform', x=summary_plot["CHANNEL"], y=summary_plot["Uniform"], yaxis='y', offsetgroup=1,marker_color='#76B7B2'),
        go.Bar(name='First Click', x=summary_plot["CHANNEL"], y=summary_plot["FirstClick"], yaxis='y', offsetgroup=2, marker_color='#F28E2B'),
        go.Bar(name='Last Click', x=summary_plot["CHANNEL"], y=summary_plot["LastClick"], yaxis='y', offsetgroup=3,marker_color='#E15759'),
        go.Bar(name='Exponential', x=summary_plot["CHANNEL"], y=summary_plot["Exponential"], yaxis='y', offsetgroup=4,marker_color='#4E79A7'),
        go.Bar(name='Naive Bayes', x=summary_plot["CHANNEL"], y=summary_plot["NaiveBayes"], yaxis='y', offsetgroup=7,marker_color='#EDC948'),
        go.Bar(name='Frequency', x=summary_plot["CHANNEL"], y=summary_plot["frequency"], yaxis='y', offsetgroup=9,marker_color='#B07AA1')
    ],
    layout={
        'yaxis': {'title': 'Attribution '},

    }
)
 
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Statistical based(Simple Frequency, Association and Term Frequency) and Algorithmic based(like Naive Bayes) models tend to produce slightly different attribution scores compared to rule based.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The bar chart above shows how many conversions were attributed to each channel for each model. Analyzing the graph, specifically the statistical/ML based in comparison to the other methods, you can gain insights as to the relative importance of different marketing channels. For the first touch, last touch and linear touch models, Facebook and Paid Search are the most import channels driving conversions while Instagram and Online Display are the least important. However, according to the Statistical/ML based models, Instagram is far more important to our conversions than our simple attribution models suggest - indeed according to the probabilistic model it is infact our third most important channel. Also, according to Associations and Naive Bayes models, Online Video appears less important compared to what other models say.</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Conclusion</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have seen that Teradata Vantage provides a variety of attribution modeling including rule-based, statistical, and algorithmic-based attribution. Vantage has unique analytic capabilities for understanding customer and user behavior over time. Thus, implementing an effective marketing attribution model, using Teradata Vantage, can significantly enhance decision-making and optimize marketing strategies.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Also, with the help of ClearScape Analytics we can use powerful, flexible attribution analysis, text processing, and statistical analytic techniques that can be applied to millions or billions of customers touchpoints. These results can be combined with other analytics to create more accurate models. These models can be deployed operationally to understand and predict actions in real-time.</p>


<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>12. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We need to clean up our work tables to prevent errors next time.</p>

In [None]:
tables = ['ALLNGRAMS','ATTRIBUTION_CONVERSION','ATTRIBUTION_MODEL_UNIFORM','ATTRIBUTION_MODEL_FIRSTCLICK',
          'ATTRIBUTION_MODEL_LASTCLICK','ATTRIBUTION_MODEL_EXPONENTIAL','NBOUTPUT','journey_data']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name = table)
    except:
        pass


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Databases and Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_MultiTouchAttribution');" 
#Takes 40 seconds

In [None]:
remove_context()

<hr style="height:2px;border:none;background-color:#00233C;">

<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Required Materials</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let’s look at the elements we have available for reference for this notebook:</p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Filters:</b></p> 
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Industry:</b> Retail</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Functionality:</b> Path Analytics</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Use Case:</b> Digital Customer Conversion</li>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Related Resources:</b></p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://teradata.seismic.com/Link/Content/DCGBP9J9gjD288TPcG3HFgXDHDW8'>Broken Digital Journeys CX Solution Accelerator Demo via Python Video - External - SP004183</a></li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://www.teradata.com/Blogs/Customer-360-Analytics-What-Lies-Ahead'>Customer 360 Analytics, What Lies Ahead?</a></li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'><a href = 'https://www.teradata.com/Trends/Data-Analytics#:~:text=Data%20Analytics-,Royal%20Bank%20of%20Canada%20Deepens%20the%20Customer%20Experience,-Data%20Analytics'>Royal Bank of Canada Deepens the Customer Experience</a></li>


<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            © 2023, 2024 Teradata. All rights reserved.
        </div>
    </div>
</footer>