<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Hyper Personalization
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
Customer experience (CX) refers to the way in which customers perceive an organization's ability to meet their expectations during every interaction. In today's rapidly evolving technological landscape, companies such as Amazon and Uber have revolutionized CX and raised the standards significantly. As a result, customers now have heightened expectations, seeking quicker, smoother, and highly personalized experiences. These factors play a crucial role in determining whether a consumer will remain loyal to a brand or opt to switch to a competitor.<br>
As an increasing number of individuals transition to digital channels, the significance of providing outstanding customer experiences has grown immensely. In a crowded digital landscape, businesses must strive to deliver exceptional CX in order to distinguish themselves. This necessitates a fundamental reassessment of how organizations can utilize data-driven approaches to create hyper-personalized experiences for individuals in digital channels. Placing the customer at the forefront of every decision is crucial in this endeavor.<br>
Delivering exceptional customer experiences necessitates a significant shift in capabilities, moving away from a focus on product offers and tailored offers for customer segments. Instead, the goal is to provide highly personalized experiences to individual customers based on the specific context of their current interaction, in real time. This shift emphasizes the importance of understanding each customer's unique needs and preferences, enabling organizations to deliver tailored experiences that truly resonate with individuals on a one-to-one basis.<br>To achieve hyper-personalization, a thorough comprehension of customers and anticipation of their needs are essential. This necessitates the use of advanced analytics to scientifically optimize offers in real time. Hyper-personalization entails a paradigm shift from managing a limited number of campaigns per year to tailoring millions, or even billions, of interactions. Accomplishing this feat will rely on effectively implementing AI/ML models on a large scale.<br>
 <blockquote 'font-size:20px;'>It is easy to reach 1 billion decisions, from considering just 1 million customers when there are 200 possible messages across 5 different channels. </blockquote>                                                         <p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers are dynamic entities, with ever-changing needs and interests influenced by various factors such as time of day, location, weather, and more. Hyper-personalization entails the ability to swiftly respond to these changing needs by delivering personalized content in real-time. Organizations that excel in hyper-personalization take into account each individual's unique behavior and real-time context to accurately anticipate and meet their specific requirements.<br>This necessitates a shift from a single generic model to a personalized model for each individual, leading to challenges in data management, scalability, and deployment. To effectively tackle these challenges and achieve hyper-personalization, businesses must establish three key capabilities:    
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
<li>Listen to customers with continuous capture of data signals for a comprehensive customer 360.</li> 
<li>Decide the Next Best Offer (NBO) for each individual in real time by developing and deploying millions of personalized models.</li> 
<li>Activate these NBOs by integrating with the Martech stack to deliver personalized interactions. With Vantage and ClearScape analytics we enable our customers to Listen, Decide and Act seamlessly</li>
        </ol>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>        
Let's demo this use case with sample telco data for customer churn prediction.
</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Standard Libraries
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import pandas as pd
import numpy as np
import getpass

# Teradata Libraries
from teradataml import *
import teradataml as tdml
from teradataml.dataframe.sql_functions import case

# Data Visualization Libraries
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import seaborn as sns

# Miscellaneous Libraries
from IPython.display import display, Markdown
from sqlalchemy import func

# Machine Learning Metrics
from sklearn.metrics import mean_absolute_error, roc_auc_score, roc_curve

# Configuration
tdml.display.max_rows = 5
configure.val_install_location = 'val'

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=Hyper_Personalization_PY_SQL.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. You have the option of either running the demo using foreign tables to access the data without using any storage on your environment or downloading the data to local storage which may yield somewhat faster execution, but there could be considerations of available storage. There are two statements in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_Telco_cloud');"    # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_Telco_local');"    # Takes 2 minutes

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step – if you want to see status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>2. Data Exploration</b></p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Customer Churn</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatypes of all the columns of the dataframe.</p>

In [None]:
tdf = DataFrame(in_schema("DEMO_Telco", "Customer_Churn"))
tdf

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Check the demographics of data</p>

In [None]:
print("Shape of the data: ", tdf.shape)
tdf.info()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> As we can see from shape and info methods our dataset has 7043 rows with 21 columns</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C;color:#00233C'><b>Summary of columns</b><br>
<p style = 'font-size:16px;font-family:Arial;color:#00233C;'>The <b>ColumnSummary</b> function can be used to take a quick look at the columns, their datatypes, and summary of NULLs/non-NULLs for a given table.   </p>

In [None]:
obj = ColumnSummary(
                    data = tdf,
                    target_columns = [':']
)

obj.result

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>3. Exploratory Data Analysis</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
Exploratory Data Analysis (EDA) refers to the process of visually and statistically examining, analyzing, and summarizing data to understand its characteristics, patterns, and relationships.<br>As Vantage is not a visualization tool we will use Python libraries for visualization. Here we can see how seamless it is to use Teradata Vantage with python.<p>

In [None]:
# Converting teradata dataframe to pandas dataframe for visualization
df = tdf.to_pandas().reset_index()
df.head()

In [None]:
# customerid is unique for each row hence don't have any effect on the predictions
df = df.drop(['CustomerID'], axis = 1 )
# new column based on income band
df['MonthlyIncomeBand'] = np.where(df['MonthlyCharges'] >= 60, 'High', 'Low')
df.head()

In [None]:
#Gender and Churn percentage distribution
g_labels = ['Male', 'Female']
c_labels = ['No', 'Yes']
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows = 1, cols = 2, specs = [[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels = g_labels, values = df['Gender'].value_counts(), name = "Gender"),
              1, 1)
fig.add_trace(go.Pie(labels = c_labels, values = df['Churn'].value_counts(), name = "Churn"),
              1, 2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole = .4, hoverinfo = "label+percent+name", textfont_size = 16)

fig.update_layout(
    title_text = "Gender and Churn Distributions",
    # Add annotations in the center of the donut pies.
    annotations = [dict(text = 'Gender', x = 0.16, y = 0.5, font_size = 20, showarrow = False),
                 dict(text = 'Churn', x = 0.84, y = 0.5, font_size = 20, showarrow = False)])
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>26.6 % of customers switched to another firm.<br>
Customers are 49.5 % female and 50.5 % male.</p>

In [None]:
# Churn per Gender
plt.figure(figsize = (6, 6))
labels = ["Churn: Yes", "Churn:No"]
values = [1869, 5163]
labels_gender = ["F", "M", "F", "M"]
sizes_gender = [939, 930, 2544, 2619]
colors = ['#ff6666', '#66b3ff']
colors_gender = ['#c2c2f0', '#ffb3e6', '#c2c2f0', '#ffb3e6']
explode = (0.3, 0.3) 
explode_gender = (0.1, 0.1, 0.1, 0.1)
textprops = {"fontsize": 15}
# Plot
plt.pie(values, labels = labels, autopct = '%1.1f%%', pctdistance = 1.08,
        labeldistance = 0.8, colors = colors, startangle = 90,frame = True,
        explode = explode, radius = 10, textprops = textprops, counterclock = True)
plt.pie(sizes_gender, labels = labels_gender, colors = colors_gender, startangle = 90,
        explode = explode_gender, radius = 7, textprops = textprops, counterclock = True)

# Draw circle
centre_circle = plt.Circle((0, 0), 5, color = 'black', fc = 'white', linewidth = 0)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

plt.title('Churn Distribution w.r.t Gender: Male(M), Female(F)', fontsize = 15, y = 1.1)

# Show plot

plt.axis('equal')
plt.tight_layout()
plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>There is negligible difference in customer percentage/ count who changed the service provider. Both genders behaved in similar fashion when it comes to migrating to another service provider.</p>

In [None]:
fig = px.histogram(df, x = "Churn", color = "Contract", barmode = "group", title = "<b>Customer contract distribution<b>")
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> About 75% of customer with Month-to-Month Contract opted to move out as compared to 13% of customers with One Year Contract and 3% with Two Year Contract</p>

In [None]:
labels = df['PaymentMethod'].unique()
values = df['PaymentMethod'].value_counts()

fig = go.Figure(data = [go.Pie(labels = labels, values = values, hole = .3)])
fig.update_layout(title_text = "<b>Payment Method Distribution</b>")
fig.show()

In [None]:
fig = px.histogram(df, x = "Churn", color = "PaymentMethod", title = "<b>Customer Payment Method distribution w.r.t. Churn</b>")
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Major customers who moved out were having Electronic Check as Payment Method.
<br>Customers who opted for Credit-Card automatic transfer or Bank Automatic Transfer and Mailed Check as Payment Method were less likely to move out.  </p>

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(
  x = [['Churn: No', 'Churn: No', 'Churn: Yes', 'Churn: Yes'],
       ["Female", "Male", "Female", "Male"]],
  y = [965, 992, 219, 240],
  name = 'DSL',
))

fig.add_trace(go.Bar(
  x = [['Churn: No', 'Churn: No', 'Churn: Yes', 'Churn: Yes'],
       ["Female", "Male", "Female", "Male"]],
  y = [889, 910, 664, 633],
  name = 'Fiber optic',
))

fig.add_trace(go.Bar(
  x = [['Churn: No', 'Churn: No', 'Churn: Yes', 'Churn: Yes'],
       ["Female", "Male", "Female", "Male"]],
  y = [690, 717, 56, 57],
  name = 'No Internet',
))

fig.update_layout(title_text = "<b>Churn Distribution w.r.t. Internet Service and Gender</b>")

fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'> A lot of customers choose the Fiber optic service and it's also evident that the customers who use Fiber optic have high churn rate, this might suggest a dissatisfaction with this type of internet service.
<br> Customers having DSL service are majority in number and have less churn rate compared to Fiber optic service.</p>

In [None]:
color_map = {"Yes": "#FF97FF", "No": "#AB63FA"}
fig = px.histogram(df, x = "Churn", color = "Dependents", barmode = "group",
                   title = "<b>Dependents distribution</b>", color_discrete_map = color_map)
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers without dependents are more likely to churn</p>

In [None]:
color_map = {"Yes": '#FFA15A', "No": '#00CC96'}
fig = px.histogram(df, x = "Churn", color = "Partner", barmode = "group",
                   title = "<b>Chrun distribution w.r.t. Partners</b>", color_discrete_map = color_map)
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers that don't have partners are more likely to churn</p>

In [None]:
color_map = {"Yes": '#FFA15A', "No": '#00CC96'}
fig = px.histogram(df, x = "Churn", color = "PaperlessBilling",
                   title = "<b>Chrun distribution w.r.t. Paperless Billing</b>", color_discrete_map = color_map)
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers with Paperless Billing are most likely to churn.</p>

In [None]:
sns.set_context("paper", font_scale = 1.1)
ax = sns.kdeplot(df.MonthlyCharges[(df["Churn"] == 'No') ],
                color = "Red", shade = True);
ax = sns.kdeplot(df.MonthlyCharges[(df["Churn"] == 'Yes') ],
                ax = ax, color = "Blue", shade = True);
ax.legend(["Not Churn", "Churn"], loc = 'upper right');
ax.set_ylabel('Density');
ax.set_xlabel('Monthly Charges');
ax.set_title('Distribution of monthly charges by churn');

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers with higher Monthly Charges are also more likely to churn</p>

In [None]:
ax = sns.kdeplot(df.TotalCharges[(df["Churn"] == 'No') ],
                color = "Gold", shade = True);
ax = sns.kdeplot(df.TotalCharges[(df["Churn"] == 'Yes') ],
                ax = ax, color = "Green", shade = True);
ax.legend(["Not Churn", "Churn"],loc = 'upper right');
ax.set_ylabel('Density');
ax.set_xlabel('Total Charges');
ax.set_title('Distribution of total charges by churn');

In [None]:
fig = px.box(df, x = 'Churn', y = 'Tenure')

# Update yaxis properties
fig.update_yaxes(title_text = 'Tenure (Months)', row = 1, col = 1)
# Update xaxis properties
fig.update_xaxes(title_text = 'Churn', row = 1, col = 1)

# Update size and title
fig.update_layout(autosize = True, width = 750, height = 600,
    title_font = dict(size = 25, family = 'Courier'),
    title = '<b>Tenure vs Churn</b>',
)

fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>New customers are more likely to churn</p>

In [None]:
color_map = {"Yes": '#FFA15A', "No": '#00CC96'}
fig = px.histogram(df, x = "Churn", color = "MonthlyIncomeBand",
                   title = "<b>Chrun distribution w.r.t. Monthly Income Band</b>", color_discrete_map = color_map)
fig.update_layout(width = 700, height = 500, bargap = 0.1)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Customers in high income band are more likely are more likely to churn.</p>

In [None]:
plt.figure(figsize = (25, 10))

corr = df.apply(lambda x: pd.factorize(x)[0]).corr()

mask = np.triu(np.ones_like(corr, dtype = bool))

ax = sns.heatmap(corr, mask = mask, xticklabels = corr.columns, yticklabels = corr.columns,
                 annot = True, linewidths = .2, cmap = 'coolwarm', vmin = -1, vmax = 1)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above chart shows the heatmap of correlation of all the attributes with each other. Each variable is represented by a row and a column, and the cells show the correlation between them.The value of correlation can take any value from -1 to 1.<br>We have done our exploratory data analysis let us now create our prediction model.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>4. Data Preprocessing</b></p>

 <p style = 'font-size:16px;font-family:Arial;color:#00233C'>Before the data can be used for model creation; we will need to do some data cleansing and transformation on it. We can do this InDb with Teradata Vantage's inbuilt functions.<br>The <b>CategoricalSummary</b> function displays the distinct values and their counts for each specified input DataFrame column.</p>

In [None]:
CatSum = CategoricalSummary(data = tdf, target_columns = ["MultipleLines", "InternetService", "OnlineSecurity", "OnlineBackup",
                                                          "DeviceProtection", "TechSupport","StreamingTV", "StreamingMovies"])
CatSum.result.sort("ColumnName")

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
As we can see from the sample data above and the categorical summary values, the columns </p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li>OnlineSecurity </li>  
<li>OnlineBackup</li>     
<li>DeviceProtection</li> 
<li>TechSupport</li>      
<li>StreamingTV</li>      
<li>StreamingMovies</li>
</ul><p style = 'font-size:16px;font-family:Arial;color:#00233C'>are related to InternetService, wherever InternetService value is "No" the column have value of "No internet service". For our model let us replace "No internet service" to No in our  column. We will do similar operation for replacing "No phone service" to "No".</p>

In [None]:
tdf = tdf.assign(oreplace_MultipleLines = func.oreplace(tdf.MultipleLines.expression, "No phone service","No"),
                oreplace_OnlineSecurity = func.oreplace(tdf.OnlineSecurity.expression, "No internet service","No"),
                oreplace_OnlineBackup = func.oreplace(tdf.OnlineBackup.expression, "No internet service","No"),
                oreplace_DeviceProtection = func.oreplace(tdf.DeviceProtection.expression, "No internet service","No"),
                oreplace_TechSupport = func.oreplace(tdf.TechSupport.expression, "No internet service","No"),
                oreplace_StreamingTV = func.oreplace(tdf.StreamingTV.expression, "No internet service","No"),
                oreplace_StreamingMovies = func.oreplace(tdf.StreamingMovies.expression, "No internet service","No"))
tdf

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let's add an additional column MonthlyChargeBand based on the MonthlyCharge Value where >=60 then "High" or 1 else "Low" or 0</p>

In [None]:
# Now lets drop the extra columns, rename the columns in dataframe

tdf2 = tdf.assign(
    drop_columns = True,
    CustomerID = tdf.CustomerID,
    Gender = tdf.Gender,
    SeniorCitizen = tdf.SeniorCitizen,
    Partner = case({"Yes": 1, "No": 0}, value = tdf.Partner, else_ = 0),
    Dependents = tdf.Dependents,
    MonthlyChargeBand = case([(tdf.MonthlyCharges >= 60.0, 1)], else_ = 0),
    Tenure = tdf.Tenure,
    PhoneService = tdf.PhoneService,
    MultipleLines = tdf.oreplace_MultipleLines,
    InternetService = tdf.InternetService,
    OnlineSecurity = tdf.oreplace_OnlineSecurity,
    OnlineBackup = tdf.oreplace_OnlineBackup,
    DeviceProtection = tdf.oreplace_DeviceProtection,
    TechSupport = tdf.oreplace_TechSupport,
    StreamingTV = tdf.oreplace_StreamingTV,
    StreamingMovies = tdf.oreplace_StreamingMovies,
    Contract = tdf.Contract,
    PaperlessBilling = tdf.PaperlessBilling,
    PaymentMethod = tdf.PaymentMethod,
    MonthlyCharges = tdf.MonthlyCharges,
    TotalCharges = tdf.TotalCharges,
    Churn = case({"Yes": 1, "No": 0}, value = tdf.Churn, else_ = 0)
)

In [None]:
tdf2

In [None]:
# Copying the intermediate table to database
tdf2.to_sql("churn", if_exists = "replace")

<p style = 'font-size:16px;font-family:Arial;color:#00233C;color:#00233C'><b>Onehotencoding & Ordinal encoding</b>.</p> 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>TD_OneHotEncodingFit and Transform </b>and <b>TD_OrdinalEncodingFit and Transform</b> are Teradata's function to convert the categorical attributes to numerical.</p>

In [None]:
# Create fit table for onehotencoding
query = '''
CREATE MULTISET VOLATILE TABLE onehotencodingfittable AS (
    SELECT *
    FROM TD_OneHotEncodingFit (
        ON Churn AS InputTable
        USING
            TargetColumn (
                'Gender',
                'Dependents',
                'PhoneService',
                'MultipleLines',
                'OnlineSecurity',
                'OnlineBackup',
                'DeviceProtection',
                'TechSupport',
                'StreamingTV',
                'StreamingMovies',
                'PaperlessBilling'
            )
            IsInputDense ('true')
            CategoryCounts (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
            Approach ('Auto')
    ) AS dt
) WITH DATA
ON COMMIT PRESERVE ROWS;
'''

try:
    execute_sql(query)
except:
    db_drop_table('onehotencodingfittable')
    execute_sql(query)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i>*note the output from the above command is like "sqlalchemy.engine.cursor.LegacyCursorResult at xxxxxxxx" which signifies that the command is executed. As there is no other output to display.</i> </p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
The other categorical columns </p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
<li>InternetService </li>  
<li>Contract</li>     
<li>PaperlessBilling</li> 
<li>PaymentMethod</li>      
</ul><p style = 'font-size:16px;font-family:Arial;color:#00233C'>have more values where we can apply ordinalencoding on it  </p>       
      
         

In [None]:
#create fit table for ordinalencoding
query = '''
SELECT *
FROM TD_OrdinalEncodingFit (
    ON Churn AS InputTable
    OUT volatile table outputtable (ordinalencodingfittable)
    USING
        TargetColumn ('InternetService', 'Contract', 'PaperlessBilling', 'PaymentMethod')
        DefaultValue (-1)
) AS dt;
'''

try:
    execute_sql(query)
except:
    db_drop_table('ordinalencodingfittable')
    execute_sql(query)

<p style = 'font-size:16px;font-family:Arial;color:#00233C;color:#00233C'><b>Scale the numerical values</b></p><p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>TD_ScaleFit and TDScaleTransform </b>scales specified input table columns i.e perform the specific scale methods like standard deviation, mean etc to the input columns</p>

In [None]:
#create fit table for scale function
query = '''
SELECT *
FROM TD_scaleFit (
    ON Churn AS InputTable
    OUT VOLATILE TABLE OutputTable(scaleFitOut)
    USING
        TargetColumns ('MonthlyCharges', 'TotalCharges')
        MissValue ('Keep')
        ScaleMethod ('range')
        GlobalScale ('f')
) AS dt;
'''

try:
    execute_sql(query)
except:
    db_drop_table('scaleFitOut')
    execute_sql(query)

<p style = 'font-size:16px;font-family:Arial;color:#00233C;color:#00233C'><b>Putting it altogether</b></p><p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will use <b> TD_ColumnTransformer</b> function to apply all the transformations from the fit tables created below in one go.</p>

In [None]:
query = '''
CREATE MULTISET TABLE Transformed_data AS (
    SELECT 
        CustomerId,
        Churn,
        SeniorCitizen,
        Partner,
        MonthlyChargeBand,
        Tenure,
        InternetService,
        Contract,
        PaperlessBilling,
        PaymentMethod,
        MonthlyCharges,
        TotalCharges,
        Gender_0,
        Gender_1,
        Dependents_0,
        Dependents_1,
        PhoneService_0,
        PhoneService_1,
        MultipleLines_0,
        MultipleLines_1,
        OnlineSecurity_0,
        OnlineSecurity_1,
        OnlineBackup_0,
        OnlineBackup_1,
        DeviceProtection_0,
        DeviceProtection_1,
        TechSupport_0,
        TechSupport_1,
        StreamingTV_0,
        StreamingTV_1,
        StreamingMovies_0,
        StreamingMovies_1,
        PaperlessBilling_0,
        PaperlessBilling_1
    FROM TD_ColumnTransformer (
        ON Churn AS inputtable
        ON onehotencodingfittable AS Onehotencodingfittable DIMENSION
        ON ordinalencodingfittable AS OrdinalEncodingFitTable DIMENSION
        ON scaleFitOut AS ScaleFitTable DIMENSION
    ) AS dt
) WITH DATA;
'''

try:
    execute_sql(query)
except:
    db_drop_table('Transformed_data')
    execute_sql(query)

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>5. Segment Creation</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Segmentation is a fundamental step in hyper-personalization strategies as it allows businesses to divide their target audience into smaller, more manageable groups. By creating segments, businesses can tailor their marketing messages, offers, and experiences to resonate with each segment's unique needs and preferences.<br>
Based on what we have seen in the profiling and data anlysis (step 2 and 3)- we see that we have key segments of <b>Senior Citizen status, Partner Status and Monthly Charge Band </b>that look highly significant. So we will use these to create segments to build our hyper-Predictive modelling on these segments.<br>Let's create views for each segment on our transformed and scaled dataset.</p>

In [None]:
#Segment for SeniorCitizen (where SeniorCitizen equals 1)
query_sr = '''
REPLACE VIEW SeniorCitizen AS
SELECT
    CustomerId,
    Churn,
    Partner,
    MonthlyChargeBand,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    SeniorCitizen = 1;
'''

#Segment of NotSeniorCitizens (where SeniorCitizen equals 0)
query_nsr = '''
REPLACE VIEW NotSeniorCitizen AS
SELECT
    CustomerId,
    Churn,
    Partner,
    MonthlyChargeBand,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    SeniorCitizen = 0;
'''

#segment of HavePartner (where HavePartner equals "Yes')
query_ptr = '''
REPLACE VIEW HavePartner AS
SELECT
    CustomerId,
    Churn,
    SeniorCitizen,
    MonthlyChargeBand,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    Partner = 1;
'''

#Segment of NoPartner (where Partner equals "No")
query_nptr = '''
REPLACE VIEW NoPartner AS
SELECT
    CustomerId,
    Churn,
    SeniorCitizen,
    MonthlyChargeBand,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    Partner = 0;
'''

#segment of LowMonthlyChargeBand (where MonthlyChargeBand equals "Low")
query_low = '''
REPLACE VIEW LowMonthlyChargeBand AS
SELECT
    CustomerId,
    Churn,
    SeniorCitizen,
    Partner,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    MonthlyChargeBand = 0;
'''

#segment segment of HighMonthlyChargeBand (where MonthlyChargeBand equals "High")
query_high = '''
REPLACE VIEW HighMonthlyChargeBand AS
SELECT
    CustomerId,
    Churn,
    SeniorCitizen,
    Partner,
    Tenure,
    InternetService,
    Contract,
    PaperlessBilling,
    PaymentMethod,
    MonthlyCharges,
    TotalCharges,
    Gender_0,
    Gender_1,
    Dependents_0,
    Dependents_1,
    PhoneService_0,
    PhoneService_1,
    MultipleLines_0,
    MultipleLines_1,
    OnlineSecurity_0,
    OnlineSecurity_1,
    OnlineBackup_0,
    OnlineBackup_1,
    DeviceProtection_0,
    DeviceProtection_1,
    TechSupport_0,
    TechSupport_1,
    StreamingTV_0,
    StreamingTV_1,
    StreamingMovies_0,
    StreamingMovies_1,
    PaperlessBilling_0,
    PaperlessBilling_1
FROM
    Transformed_data
WHERE
    MonthlyChargeBand = 1;
'''

execute_sql(query_sr)
execute_sql(query_nsr)
execute_sql(query_ptr)
execute_sql(query_nptr)
execute_sql(query_low)
execute_sql(query_high)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the above codeblock we have created six segments </p><ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li>SeniorCitizen</li><li>NotSeniorCitizens</li><li>HavePartner</li><li>NoPartner</li><li>LowMonthlyChargeBand</li><li>HighMonthlyChargeBand</li></ul>

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>6. Function definitions</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>    We will need same code to run for all segments, hence we are creating functions for the processing. We will call the functions for each segment for our model creation and predictions.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.1 Function to create train and test data</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>TD_TrainTestSplit</b> helps us to divide our data in train and test sets for model training and scoring.<br>
We are creating function to split each segment in train and test dataset. </p>

In [None]:
def hyper_dataset(segment):
    query = '''
                CREATE MULTISET TABLE TrainTestSplit_output AS (
                    SELECT * FROM TD_TrainTestSplit(
                        ON {} AS InputTable
                        USING
                        IDColumn('CustomerId')
                        Seed(21)
                    ) AS dt
                ) WITH DATA;
            '''.format(segment)
    try:
        execute_sql(query)
    except:
        db_drop_table('TrainTestSplit_output')
        execute_sql(query)
    
    df = DataFrame("TrainTestSplit_output")
    # Split into 2 virtual dataframes
    df_train = df[df.TD_IsTrainRow == 1].drop(["TD_IsTrainRow"], axis = 1)
    df_test = df[df.TD_IsTrainRow == 0].drop(["TD_IsTrainRow"], axis = 1)
    
    return df_train, df_test

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have done our preprocessing of data and we created our training and test datasets, let's now create some predictive models.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.2 Function for InDb Model Training and Scoring</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will create functions for model creation prediction and AUC & ROC curve to give us flexibility to run same code for each segment in a loop later.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.3 Function for Logistic Regression Model and Prediction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <b>Logistic regression</b> is a statistical algorithm used for binary classification problems. It is a type of supervised learning algorithm that predicts the probability of an input belonging to a certain class (e.g., positive or negative) based on its features.<br>Logistic regression works by modeling the relationship between the input features and the probability of belonging to a certain class using a logistic function. The logistic function takes the input feature values and maps them onto a probability scale between 0 and 1, which represents the probability of belonging to the positive class.<br>
    The <b>TD_GLM </b>function is a generalized linear model (GLM) that performs regression and classification analysis on data sets.
<br>Please refer <a href ='https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-SQL-Engine-Analytic-Functions/Supported-on-Database-Version-17.20.xx/MODEL-TRAINING-functions/GLM'>TD_GLM</a> for function elements and output.

In [None]:
def hyper_GLM(df_train, df_test):
    from teradataml import GLM, TDGLMPredict

    glm_model = GLM(data = df_train,
                input_columns = ['3:'], 
                response_column = 'Churn',
                family = 'Binomial')
    
    glm_prediction = TDGLMPredict(newdata = df_test,
                           id_column = 'CustomerID',
                           object = glm_model.result,
                           accumulate = 'Churn',
                           output_prob = True,
                           output_responses = ['0', '1'])
    
    return glm_prediction.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In code block above we have created a function to create GLM model and its prediction using Teradata Vantage's InDb functions.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.4 Function to visualize the results (ROC curve and AUC) for Logistic Regression Model</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Calculate mean absolute error and AUC(Area Under the Curve) for Receiver Operating Characteristic Curve</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Mean Absolute Error is the summation of the difference between actual and predicted values averaged over the number of observations.</p>

In [None]:
def hyper_GLM_AUC(result_glm,segment):
    glm_pred = result_glm.to_pandas()
    ME = mean_absolute_error(glm_pred['Churn'], glm_pred['prob_1'])
    AUC = roc_auc_score(glm_pred['Churn'], glm_pred['prob_1'])
    fpr, tpr, thresholds = roc_curve(glm_pred['Churn'], glm_pred['prob_1'])
    plt.plot(fpr, tpr, color = 'orange', label = 'ROC. AUC = {}'.format(str(AUC)))
    plt.plot([0, 1], [0, 1], color = 'darkblue', linestyle = '--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve for {} GLM'.format(segment))
    plt.legend()
    return plt, ME,AUC

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In code block above we have created a function to visualize the ROC curve and caculate the AUC for GLM model created.

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.5 Function for XGB Model and Prediction</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <b>XGBoost (eXtreme Gradient Boosting) </b> is based on the gradient boosting framework, which is an ensemble learning method that combines multiple weak or base models (typically decision trees) to create a more accurate and robust predictive model. XGBoost improves upon traditional gradient boosting by using a number of optimization techniques, including parallelization, regularization, and efficient handling of missing values, to achieve faster training times and better model performance.<br>
    The <b>TD_XGBoost </b>is an implementation of the gradient boosted decision tree designed for speed and performance. In gradient boosting, each iteration fits a model to the residuals (errors) of the previous iteration to correct the errors made by existing models. The predicted residual is multiplied by this learning rate and then added to the previous prediction. Models are added sequentially until no further improvements can be made. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models.
<br>Please refer <a href ='https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Analytics-Database-Analytic-Functions-17.20/Model-Training-Functions/TD_XGBoost'>TD_XGBoost</a> for function elements and output.</p>

In [None]:
def hyper_XGB(segment):
    query_model = '''
                    CREATE MULTISET TABLE xgb_model AS (
                        SELECT * FROM TD_XGBoost (
                            ON (SELECT * FROM TrainTestSplit_output WHERE TD_IstrainRow = 1) PARTITION BY ANY
                            OUT TABLE MetaInformationTable(xgb_out)
                            USING
                            InputColumns ('[4:]')  -- Adjust this column selection as needed
                            ResponseColumn('Churn')
                            ModelType ('classification') 
                        ) AS dt
                    ) WITH DATA;
                '''
                
    query_pred = '''
                    CREATE TABLE xgb_predict_out AS (
                        SELECT * FROM TD_XGBoostPredict(
                            ON (SELECT * FROM TrainTestSplit_output WHERE TD_IstrainRow = 0) AS inputtable PARTITION BY ANY
                            ON xgb_model AS modeltable DIMENSION ORDER BY task_index, tree_num, iter, class_num, tree_order
                            USING
                            IdColumn('CustomerId')
                            ModelType('Classification')
                            OutputProb('t')
                            Responses('0', '1')
                            Accumulate('Churn')
                        ) AS dt
                    ) WITH DATA;
             '''
    try:
        execute_sql(query_model)
        try:
            execute_sql(query_pred)
        except:
            db_drop_table('xgb_predict_out')
            execute_sql(query_pred)
    except:
        db_drop_table('xgb_model')
        db_drop_table('xgb_out')
        execute_sql(query_model)
        try:
            execute_sql(query_pred)
        except:
            db_drop_table('xgb_predict_out')
            execute_sql(query_pred)

    result_xgb = DataFrame('xgb_predict_out')
    
    return result_xgb

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In code block above we have created a function to create XGB model and its prediction using Teradata Vantage's InDb functions.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.6 Function to visualize the results (ROC curve and AUC) for XGB Model</b></p>

In [None]:
def hyper_XGB_AUC(result_xgb,segment):
    xgb_pred = result_xgb.to_pandas().reset_index().sort_values("CustomerID")
    ME = mean_absolute_error(xgb_pred['Churn'], xgb_pred['Prob_1'])
    #print("Mean absolute error for XGB prediction is",mean_absolute_error(xgb_pred['Churn'], xgb_pred['Prob_1']))
    AUC = roc_auc_score(xgb_pred['Churn'], xgb_pred['Prob_1'])
    #print("AUC for XGB output for ", segment," is ",AUC)
    fpr, tpr, thresholds = roc_curve(xgb_pred['Churn'], xgb_pred['Prob_1'])
    plt.plot(fpr, tpr, color = 'orange', label = 'ROC. AUC = {}'.format(str(AUC)))
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve for {} XGB'.format(segment))
    plt.legend()
    return plt, ME,AUC

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In code block above we have created a function to visualize the ROC curve and caculate the AUC for XGB model created.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>6.7 Function for displaying values</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The below function is created to diaplay the mean_absolute_error and AUC values</p>

In [None]:
def display_template(a,b):
    return f"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>" f'''The <b>Mean absolute error</b> of model is <b>{a} </b>
<br><b> AUC </b>of model is <b>{b}</b>'''"</p>"

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>7. Model Training and Scoring for each segment</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the above steps we have created functions for model creation and prediction and for visualization of the model performance, let's combine all of this together and do the predictions for each segment.<br>Let us go step by step for one segment <b>SeniorCitizen</b> first for understandng and then we can run all the segments in a loop.</p>

In [None]:
#this command will create train and test dataset for each segment
df_train, df_test = hyper_dataset('SeniorCitizen')

In [None]:
#this command will create GLM model and predictions
globals()["result_glm_{}".format('SeniorCitizen')] = hyper_GLM(df_train, df_test)

In [None]:
#this command will create XGB model and predictions
globals()["result_xgb_{}".format('SeniorCitizen')] = hyper_XGB('SeniorCitizen')

In [None]:
#printing the output for GLM for SeniorCitizen segment
print("GLM output for ", 'SeniorCitizen')
print(globals()["result_glm_{}".format('SeniorCitizen')])

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The output above shows prob_1, i.e. customer will Churn and prob_0, i.e. customer will not Churn. The prediction column uses these probabilities to give a class label, i.e. prediction column.</p>

In [None]:
#printing the output for XGB for SeniorCitizen segment
print("XGB output for ", 'SeniorCitizen')
print(globals()["result_xgb_{}".format('SeniorCitizen')])

In [None]:
#visualize the ROC curve and AUC calculation for GLM model for Senior Citizen segment 
roc_glm, me_glm, auc_glm = hyper_GLM_AUC(globals()["result_glm_{}".format('SeniorCitizen')],'SeniorCitizen')
roc_glm.show()
display(Markdown(display_template(me_glm, auc_glm)))

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The ROC curve is a graph between TPR(True Positive Rate) and FPR(False Positive Rate). The area under the ROC curve is a metric of how well the model can distinguish between positive and negative classes. The higher the AUC, the better the model's performance in distinguishing between the positive and negative classes.</p>

In [None]:
#visualize the ROC curve and AUC calculation for XGB Model for Senior Citizen segment 
roc_xgb, me_xgb, auc_xgb = hyper_XGB_AUC(globals()["result_xgb_{}".format('SeniorCitizen')],'SeniorCitizen')
roc_xgb.show()
display(Markdown(display_template(me_xgb, auc_xgb)))

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the steps above we have seen how the funtions can be used to create model and it's accuracy. Now, let's run the functions together for all the segments we have created in section 5.<br><i>* Below step will take some time as we are creating two prediction model for each of six segments.</i>    </p>

In [None]:
#segment list
segments = ['SeniorCitizen', 'NotSeniorCitizen', 'HavePartner','NoPartner','LowMonthlyChargeBand','HighMonthlyChargeBand']
for segment in segments:
    df_train, df_test = hyper_dataset(segment)
    globals()["result_glm_{}".format(segment)] = hyper_GLM(df_train, df_test)
    globals()["result_xgb_{}".format(segment)] = hyper_XGB(segment)
    print("\033[1m ============= Model & Predictions for segment", segment,"================ \033[0m")
    print("\033[1m GLM model output for segment \033[0m", segment)
    print(globals()["result_glm_{}".format(segment)])
    print("\033[1m XGB model output for segment \033[0m", segment)
    print(globals()["result_xgb_{}".format(segment)])
    roc_glm, me_glm, auc_glm = hyper_GLM_AUC(globals()["result_glm_{}".format(segment)], segment)
    roc_glm.show()
    display(Markdown(display_template(me_glm, auc_glm)))
    roc_xgb, me_xgb, auc_xgb = hyper_XGB_AUC(globals()["result_xgb_{}".format(segment)], segment)
    roc_xgb.show()
    display(Markdown(display_template(me_xgb, auc_xgb)))
    print("\033[1m =============================================================================== \033[0m")

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>9. Conclusion</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this demo we have done a simple example of hyper personalization predictions for the basic segments that we created based on our data exploration. Segments can be dynamic and evolve over time as new data becomes available or as customer preferences change. It's essential to regularly analyze and update segments to ensure the hyper-personalization efforts remain effective and relevant.    

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>10. Cleanup</b></p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C;'>
Cleanup work tables to prevent errors next time.

In [None]:
tables = ['churn', 'onehotencodingfittable', 'ordinalencodingfittable',
         'scaleFitOut', 'Transformed_data', 'TrainTestSplit_output',
         'xgb_model', 'xgb_out', 'xgb_predict_out']
views = ['SeniorCitizen', 'NotSeniorCitizen', 'HavePartner',
         'NoPartner', 'LowMonthlyChargeBand', 'HighMonthlyChargeBand']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name = table)
    except:
        pass
    
# Loop through the list of views and execute the drop view command for each view
for view in views:
    try:
        db_drop_view(view_name = view)
    except:
        pass

<p style = 'font-size:18px;font-family:Arial;color:#00233C;color:#00233C'><b>Databases and Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_Telco');"    # Takes 5 seconds

In [None]:
remove_context()

<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Dataset:</b>

- `CustomerID`: unique id of customer
- `Gender`: Whether the customer is a male or a female
- `SeniorCitizen`:Whether the customer is a senior citizen or not (1, 0)
- `Partner`:Whether the customer has a partner or not (Yes, No)
- `Dependents`:Whether the customer has dependents or not (Yes, No)
- `Tenure`:Number of months the customer has stayed with the company
- `PhoneService`:Whether the customer has a phone service or not (Yes, No)
- `MultipleLines`:Whether the customer has multiple lines or not (Yes, No, No phone service)
- `InternetService`:Customer’s internet service provider (DSL, Fiber optic, No)
- `OnlineSecurity`:Whether the customer has online security or not (Yes, No, No internet service)
- `OnlineBackup`:Whether the customer has online backup or not (Yes, No, No internet service)
- `DeviceProtection`:Whether the customer has device protection or not (Yes, No, No internet service)
- `TechSupport`:Whether the customer has tech support or not (Yes, No, No internet service)
- `StreamingTV`:Whether the customer has streaming TV or not (Yes, No, No internet service)
- `StreamingMovies`:Whether the customer has streaming movies or not (Yes, No, No internet service)
- `Contract`:The contract term of the customer (Month-to-month, One year, Two year)
- `PaperlessBilling`:Whether the customer has paperless billing or not (Yes, No)
- `PaymentMethod`:The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
- `MonthlyCharges`:The amount charged to the customer monthly
- `TotalCharges`:The total amount charged to the customer
- `Churn`:Whether the customer churned or not (Yes or No)

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Reference Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'> 
       <li>Teradata Vantage™ - Analytics Database Analytic Functions - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Analytics-Database-Analytic-Functions-17.20/Introduction-to-Analytics-Database-Analytic-Functions '>https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Analytics-Database-Analytic-Functions-17.20/Introduction-to-Analytics-Database-Analytic-Functions </a></li>    
  <li>Teradata® Package for Python User Guide - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/Introduction-to-Teradata-Package-for-Python'>https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/Introduction-to-Teradata-Package-for-Python</a></li>
  <li>Teradata® Package for Python Function Reference - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/Teradata-Package-for-Python-Function-Reference'>https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/Teradata-Package-for-Python-Function-Reference</a></li>      
</ul>


<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023. All Rights Reserved
        </div>
    </div>
</footer>