<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Telco Customer Churn using Traditional Approach
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Traditional Approach</b></p>
<p style = 'font-size:16px;font-family:Arial'>
    ClearScape Analytics provides powerful, flexible end-to-end data connectivity, feature engineering, model training, evaluation, and operational functions that can be deployed at scale as enterprise data assets; treating the products of ML and AI as first-class analytic processes in the enterprise. With ClearScape Analytics, data scientists can use their preferred language, tools and platform to develop models to identify this fraud. Even in large scale operations, users have the guarantee that Vantage can scale to their needs and reduce fraud.</p>
<p style = 'font-size:16px;font-family:Arial'>Below are the steps involved in traditional approach:</p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li><b>Prepare data: </b>ClearScape Analytics offers highly optimized in-database functions for data preparation, minimizing data movement and enabling the enterprise feature store.</li>
    <li><b>Train models: </b>ClearScape Analytics provides vertical and horizontal scaling capabilities that make it possible to efficiently train any number of models — from a few to a few million.</li>
    <li><b>Deploy models: </b>ClearScape Analytics integrates model scoring with business data, both in real time and batch scoring, for effective operationalization and automated monitoring of AI models.</li>
    
</ul>

<p style = 'font-size:18px;font-family:Arial'>In the traditional approach mentioned, we will follow the below steps:</p>    
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Data Collection</li>
    <li>Data Exploration</li>
    <li>Enterprise feature store</li>
    <li>Data Preparation using widgets</li>
    <li>Model Training(2-3 different models)</li>
    <li>Model Evaluation using ROC and Confusion Matrix</li>
    <li>Best performing model</li>
    <li>Model Scoring using best model</li>
    <li>Operationalize Model using ModelOps</li>
</ul>

<p style = 'font-size:18px;font-family:Arial'><b>Why Vantage?</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Traditional ML and AI development and deployment pipelines require users to manually combine various tools and techniques across the lifecycle.  This leads to lengthy, fragile, manual, error-prone processes that are, in many cases, impossible to migrate out of the lab and into production in order to realize business value.<br>ClearScape Analytics helps to solve this “development to deployment gap” by providing highly scalable, performant, and easy-to-use analytic capabilities that address all aspects of the development lifecycle.  The same tools and techniques that data scientists use in development can be seamlessly deployed into production using the same code, platform, and operational pipeline.</p>

<p style = 'font-size:16px;font-family:Arial'>
Managing telco churn is complex and requires continuous monitoring, analysis, and proactive customer engagement strategies. By using data and advanced analytics, telecom companies can better understand customer behavior and preferences, and take proactive measures to retain customers and maintain profitability.</p>

<p style = 'font-size:16px;font-family:Arial'>
Let's demonstrate this use case with sample data using InDb analytics in Vantage which can pre-process and analyze huge amounts of data and at scale.   
</p>

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>1.Connect to Vantage, Import python packages and explore the dataset</b></p>


<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).</p>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages
!pip install --upgrade teradataml

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b><i>Please execute the above pip install to get the latest version of the required library. Be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>

In [None]:
#import libraries
import matplotlib.pyplot as plt 
import getpass
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

from teradataml import *

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go


display.max_rows=5

<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=EE_Telco_Customer_Churn_Traditional_Approach.ipynb;' UPDATE FOR SESSION; ''')

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
# %run -i ../../run_procedure.py "call get_data('DEMO_Telco_cloud');"
 # takes about 30 seconds, estimated space: 0 MB
%run -i ../../run_procedure.py "call get_data('DEMO_Telco_local');" 
# takes about 1 minute 30 seconds, estimated space: 4 MB

<p style = 'font-size:16px;font-family:Arial'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../../run_procedure.py "call space_report();"

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>3. Data Exploration</b></p>

<p style = 'font-size:16px;font-family:Arial'>Let us start by creating a "Virtual DataFrame" that points directly to the dataset in Vantage. We then begin our analysis by checking the shape of the DataFrame and examining the data types of all its columns.</p>

In [None]:
tdf = DataFrame(in_schema("DEMO_Telco", "Customer_Churn"))
tdf

In [None]:
print("Shape of the data: ", tdf.shape)


<p style = 'font-size:16px;font-family:Arial'> As we can see from above result our dataset has 7043 rows with 21 columns.</p>

<p style = 'font-size:16px;font-family:Arial'><b>Summary of Columns</b><br>
<p style = 'font-size:16px;font-family:Arial;'>We can use the <b>ColumnSummary</b> function for quickly examining the columns, their datatypes, and summary of NULLs/non-NULLs for a given table. </p>  

In [None]:
from teradataml import ColumnSummary
obj = ColumnSummary(data=tdf,
                        target_columns=[':']
                       )

In [None]:
obj.result.head(21)

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>4. Exploratory Data Analysis</b></p>

<p style = 'font-size:16px;font-family:Arial'>
Exploratory Data Analysis (EDA) is a process where we visually and statistically examine, analyze, and summarize data to comprehend its characteristics, patterns, and relationships. This approach is crucial for gaining insights and a deeper understanding of the dataset at hand.<br>First let us analyse the Gender and Churn distributions in our data.</p>

In [None]:
d1=tdf.select(['Gender','CustomerID']).groupby('Gender').count()
d1 = d1.assign(drop_columns=True,
          Gender=d1.Gender,
          Count=d1.count_CustomerID)
d1

In [None]:
d2=tdf.select(['Churn','CustomerID']).groupby('Churn').count()
d2 = d2.assign(drop_columns=True,
          Churn=d2.Churn,
          Count=d2.count_CustomerID)
d2

<p style = 'font-size:16px;font-family:Arial'>
We can see that the aggregated data is available to us in teradataml dataframe. Let's visualize this data to better understand the Churn and gender distributions. Clearscape Analytics can easily integrate with 3rd party visualization tools like Tableau, PowerBI or many python modules available like plotly, seaborn etc. We can do all the calculations and pre-processing on Vantage and pass only the necessary information to visulazation tools, this will not only make the calculation faster but also reduce the overall time due to less data movement between tools.</p>

In [None]:
d1=d1.to_pandas().reset_index()
d2=d2.to_pandas().reset_index()
#Gender and Churn percentage distribution
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=d1['Gender'], values=d1['Count'], name="Gender"),
              1, 1)
fig.add_trace(go.Pie(labels=d2['Churn'], values=d2['Count'], name="Churn"),
              1, 2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent+name", textfont_size=16)

fig.update_layout(
    title_text="Gender and Churn Distributions",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Gender', x=0.16, y=0.5, font_size=20, showarrow=False),
                 dict(text='Churn', x=0.84, y=0.5, font_size=20, showarrow=False)])
fig.show()

<p style = 'font-size:16px;font-family:Arial'>From the above plot we can see that 26.6 % of customers switched to another firm.<br>And of total customers 49.5 % are female and 50.5 % are male.</p>

<p style = 'font-size:16px;font-family:Arial'>Now, let us see the chrun with respect to gender.</p>

In [None]:
d3=tdf.select(['Churn','Gender','CustomerID']).groupby(['Churn','Gender']).count()
d3 = d3.assign(drop_columns=True,
          Churn=d3.Churn,
          Gender=d3.Gender,     
          Count=d3.count_CustomerID)
d3

In [None]:
d3=d3.to_pandas().reset_index()
fig2=px.sunburst(d3,path=['Churn','Gender'],values='Count')
fig2.update_layout(
    title_text="Churn Distribution w.r.t Gender")
fig2.show()

<p style = 'font-size:16px;font-family:Arial'>We can see that there is negligible difference in customer count who changed the service provider. Both genders behaved in similar fashion when it comes to migrating to another service provider.</p>

In [None]:
d4=tdf.select(['Churn','Contract','CustomerID']).groupby(['Churn','Contract']).count()
d4 = d4.assign(drop_columns=True,
          Churn=d4.Churn,
          Contract=d4.Contract,     
          Count=d4.count_CustomerID)
d4

In [None]:
d4=d4.to_pandas().reset_index()
fig4 = px.bar(d4,x="Churn",y="Count", color="Contract", barmode="group", title="<b>Customer contract distribution<b>")
fig4.update_layout(width=700, height=500, bargap=0.1)
fig4.show()

<p style = 'font-size:16px;font-family:Arial'> We can see that about 75% of customer with Month-to-Month Contract opted to move out as compared to 13% of customers with One Year Contract and 3% with Two Year Contract.</p>

In [None]:
d5=tdf.select(['PaymentMethod','CustomerID']).groupby('PaymentMethod').count()
d5 = d5.assign(drop_columns=True,
          PaymentMethod=d5.PaymentMethod,
          Count=d5.count_CustomerID)
d5

In [None]:
d5=d5.to_pandas().reset_index()
fig5 = go.Figure(data=[go.Pie(labels=d5['PaymentMethod'], values=d5['Count'], hole=.3)])
fig5.update_layout(title_text="<b>Payment Method Distribution</b>")
fig5.show()

In [None]:
d6=tdf.select(['Churn','PaymentMethod','CustomerID']).groupby(['Churn','PaymentMethod']).count()
d6 = d6.assign(drop_columns=True,
          Churn=d6.Churn,
          PaymentMethod=d6.PaymentMethod,     
          Count=d6.count_CustomerID)
d6

In [None]:
d6=d6.to_pandas().reset_index()
fig6 = px.bar(d6,x="Churn",y="Count", color="PaymentMethod", barmode="stack", title="<b>Customer Payment Method distribution w.r.t. Churn<b>")
fig6.update_layout(width=700, height=500, bargap=0.1)
fig6.show()

<p style = 'font-size:16px;font-family:Arial'>Major customers who moved out were having Electronic Check as Payment Method.
<br>Customers who opted for Credit-Card automatic transfer or Bank Automatic Transfer and Mailed Check as Payment Method were less likely to move out. </p>

In [None]:
d7=tdf.select(['Churn','InternetService','Gender','CustomerID']).groupby(['Churn','InternetService','Gender']).count()
d7 = d7.assign(drop_columns=True,
          Churn=d7.Churn,
          InternetService=d7.InternetService, 
          Gender=d7.Gender,
          Count=d7.count_CustomerID)
d7

In [None]:
d7.sort(["InternetService"]).head(21)

In [None]:
d7=d7.to_pandas().reset_index()
fig7 = go.Figure()

for t in d7['Churn'].unique():
    dfp = d7[d7['Churn']==t]
    fig7.add_traces(go.Bar(x=[dfp['InternetService'], dfp['Gender']],
                          y=dfp['Count'],
                          width=0.75,
                          customdata=d7['Churn'],
                          name='Churn :' +str(dfp['Churn'].values[0]) 
                         )
                  )

fig7.update_layout(barmode='stack',
                  title_text="<b>Churn Distribution w.r.t. Internet Service and Gender</b>")
fig7.show()

<p style = 'font-size:16px;font-family:Arial'> We can see that a lot of customers choose the Fiber optic service as compared to DSL but it's also evident that the customers who use Fiber optic have high churn rate, this might suggest a dissatisfaction with this type of internet service.
<br> Customers having DSL service have less churn rate compared to Fiber optic service.</p>

In [None]:
d8=tdf.select(['Churn','Dependents','CustomerID']).groupby(['Churn','Dependents']).count()
d8 = d8.assign(drop_columns=True,
          Churn=d8.Churn,
          Dependents=d8.Dependents,
          Count=d8.count_CustomerID)
d8

In [None]:
d8=d8.to_pandas().reset_index()
color_map = {"Yes": "#FF97FF", "No": "#AB63FA"}
fig8 = px.bar(d8, x="Churn",y="Count", color="Dependents", barmode="group", title="<b>Dependents distribution</b>", color_discrete_map=color_map)
fig8.update_layout(width=700, height=500, bargap=0.1)
fig8.show()

<p style = 'font-size:16px;font-family:Arial'>Customers without dependents are more likely to churn.</p>

In [None]:
d9=tdf.select(['Churn','Partner','CustomerID']).groupby(['Churn','Partner']).count()
d9 = d9.assign(drop_columns=True,
          Churn=d9.Churn,
          Partner=d9.Partner,
          Count=d9.count_CustomerID)
d9

In [None]:
d9=d9.to_pandas().reset_index()
color_map = {"Yes": '#FFA15A', "No": '#00CC96'}
fig9 = px.bar(d9, x="Churn",y="Count", color="Partner", barmode="group", title="<b>Chrun distribution w.r.t. Partners</b>", color_discrete_map=color_map)
fig9.update_layout(width=700, height=500, bargap=0.1)
fig9.show()

<p style = 'font-size:16px;font-family:Arial'>Customers that don't have partners are more likely to churn.</p>

In [None]:
d10=tdf.select(['Churn','PaperlessBilling','CustomerID']).groupby(['Churn','PaperlessBilling']).count()
d10 = d10.assign(drop_columns=True,
          Churn=d10.Churn,
          PaperlessBilling=d10.PaperlessBilling,
          Count=d10.count_CustomerID)
d10

In [None]:
d10=d10.to_pandas().reset_index()
color_map = {"Yes": '#FFA15A', "No": '#00CC96'}
fig10 = px.bar(d10, x="Churn",y="Count", color="PaperlessBilling",  title="<b>Chrun distribution w.r.t. Paperless Billing</b>", color_discrete_map=color_map)
fig10.update_layout(width=700, height=500, bargap=0.1)
fig10.show()

<p style = 'font-size:16px;font-family:Arial'>Customers with Paperless Billing are most likely to churn.</p>

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>5. Feature Engineering</b>

<p style='font-size:16px;font-family:Arial'>Teradata Enterprise Feature Store (EFS) Functions are designed to handle feature management within the Vantage environment. While inspired by the syntax of Feast, Teradata EFS Functions stands out, offering efficiency and robustness in data management and feature handling tailored specifically for the use of Teradata Vantage. Teradata EFS Functions use Teradata Dataframes for Feature management, to the contrary of the pandas dataframe of Feast. With Teradata Dataframes we avoid extracting the data to create or use Features from the Enterprise Feature Store (EFS). The EFS Functions are crafted to empower Data Science teams for effective and streamlined feature management. This notebook will walk you through the capabilities of EFS Functions, demonstrating how it integrates seamlessly with your data models and processes.</p>

<hr style="height:1px;border:none">
<b style = 'font-size:18px;font-family:Arial'>5.1 Setup a Feature Store Repository</b>

<p style='font-size:16px;font-family:Arial'>The Enterprise Feature Store (EFS) SDK is designed with a totally object-oriented approach, focusing on intuitive interaction with feature stores. Central to this design are several core objects: Feature, Entity, DataSource, FeatureGroup. Together, these objects facilitate the efficient management and utilization of features within your data ecosystem, leveraging Teradata Vantage for metadata storage.</p>
<p style='font-size:16px;font-family:Arial'>A feature store repository serves as the foundational environment for storing and managing your data features. The owner of the FeatureStore can grant/revoke read only, write only or read and write authorization to other user(s)</p>

In [None]:
telco_fs = FeatureStore(repo='TelcoFS')
telco_fs.setup(perm_size='10e8')

In [None]:
# List whether FeatureStore is setup or not.
telco_fs.list_repos()

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>5.2 Create and Register Entity </b></p>

<p style = 'font-size:16px;font-family:Arial'>Let us now start with feature engineering, for which we will create the required columns in the dataframe and than use those columns to register as features in the feature group of feature store created in the step above.</p>

In [None]:
tdf = DataFrame(in_schema("DEMO_Telco", "Customer_Churn"))
tdf

<p style = 'font-size:16px;font-family:Arial'>This code performs the following operations:</p>
    <ol style = 'font-size:16px;font-family:Arial'>
        <li><strong>Assigning New Values:</strong> The <code>df.assign()</code> function is used to create new columns or modify existing ones in the DataFrame <code>df</code>.</li>
        <li><strong>Replacing Values:</strong>
            <ul>
                <li><span class="highlight">MultipleLines</span>: Replaces "No phone service" with "No".</li>
                <li><span class="highlight">OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies</span>: Replaces "No internet service" with "No" for each of these columns.</li>
            </ul>
        </li>
        <li><strong>Converting Churn Values:</strong>
            <ul>
                <li><span class="highlight">Churn</span>: Uses the <code>case</code> function to convert "Yes" to 1 and "No" to 0. If the value is neither "Yes" nor "No", it defaults to 0.</li>
            </ul>
        </li>
        <li><strong>Displaying the DataFrame:</strong> The final <code>df</code> statement displays the modified DataFrame.</li>
    </ol>

In [None]:
df = tdf.assign(
    MultipleLines = tdf.MultipleLines.replace("No phone service","No"),
    OnlineSecurity = tdf.OnlineSecurity.replace("No internet service","No"),
    OnlineBackup = tdf.OnlineBackup.replace("No internet service","No"),
    DeviceProtection = tdf.DeviceProtection.replace("No internet service","No"),
    TechSupport = tdf.TechSupport.replace("No internet service","No"),
    StreamingTV = tdf.StreamingTV.replace("No internet service","No"),
    StreamingMovies = tdf.StreamingMovies.replace("No internet service","No"),
    Churn = case({ "Yes" : 1, "No" : 0}, value=tdf.Churn,else_=0)
)

df

In [None]:
df = ConvertTo(
    data=df,
    target_columns=['CustomerID', 'Gender', 'Partner', 'Dependents', 'PhoneService',
                    'MultipleLines', 'InternetService','OnlineSecurity', 'OnlineBackup',
                    'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
                    'Contract', 'PaperlessBilling', 'PaymentMethod'],
    target_datatype=["VARCHAR(charlen=10,charset=UNICODE,casespecific=NO)"]
).result

<p style = 'font-size:16px;font-family:Arial'>Let's store the transformed data to table.</p>

In [None]:
copy_to_sql(
    df=df,
    table_name='transformed_data',
    if_exists='replace'
)

<p style = 'font-size:16px;font-family:Arial'>Now we will proceed to save the features as well as the feature processing logic in feature store.</p>
<p style = 'font-size:16px;font-family:Arial'>This will allow us to re-use the features and processing later-on, avoiding to re-write the processing logic.</p>

In [None]:
df = DataFrame('transformed_data')

In [None]:
# Create entity for DataFrame 'patient_profile_df'
entity=Entity(name='CustId', columns=df.CustomerID)

In [None]:
# Register the Entity.
telco_fs.apply(entity)

In [None]:
# Look at existing Entities after registering the Entity.
telco_fs.list_entities()

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>5.3 Create and Register FeatureGroup </b></p>
<li style = 'font-size:16px;font-family:Arial'>FeatureGroup can be created using Teradata DataFrame.</li>
<li style = 'font-size:16px;font-family:Arial'>FeatureGroup can be created using SQL Query. </li>
<li style = 'font-size:16px;font-family:Arial'>FeatureGroup can be created using objects of Feature, Entity, DataSource.  </li>


<p style = 'font-size:16px;font-family:Arial'><b>Creating a FeatureGroup from Teradata DataFrame
</b></p>

In [None]:
telco_fg = FeatureGroup.from_DataFrame(
    name='TelcoFG', 
    entity_columns='CustomerID', 
    df=df
)

In [None]:
# Let's look at Properties.
telco_fg.features, telco_fg.entity, telco_fg.data_source, telco_fg.description

In [None]:
telco_fs.apply(telco_fg)

In [None]:
telco_fs.list_features()

<hr style="height:1px;border:none">
<b style = 'font-size:18px;font-family:Arial'>5.4 Reuse features from Enterprise Feature Store with teradataml analytic functions for AutoML processing.</b>


<p style = 'font-size:16px;font-family:Arial'>Since FeatureStore stores DataSource also, you can retrive Teradata DataFrame from FeatureStore. <br> <code>FeatureStore.get_dataset()</code> get's Teradata DataFrame from FeatureGroup.</p>

In [None]:
# Get DataSet for FeatureGroup TelcoFG. 
tdf = telco_fs.get_dataset('TelcoFG')
tdf

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>6. Data Preprocessing</b></p>

 <p style = 'font-size:16px;font-family:Arial'>Before the data can be used for model creation; we will need to do some data cleansing and transformation on it. We can do this InDb with Teradata Vantage's inbuilt functions.<br>We will use the <b>CategoricalSummary</b> function to showcase the distinct values and their corresponding counts for each specified column in the input DataFrame. This function provides a concise summary of categorical data, aiding in a quick understanding of the distribution of values within the specified columns.</p>

In [None]:
from teradataml import CategoricalSummary
CatSum = CategoricalSummary(data=tdf,target_columns=["MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport","StreamingTV","StreamingMovies"])
CatSum.result.sort("ColumnName")

<p style = 'font-size:16px;font-family:Arial'>
As we can see from the sample data above and the categorical summary values, the columns </p>
<ul style = 'font-size:16px;font-family:Arial'><li>OnlineSecurity </li>  
<li>OnlineBackup</li>     
<li>DeviceProtection</li> 
<li>TechSupport</li>      
<li>StreamingTV</li>      
<li>StreamingMovies</li>
</ul><p style = 'font-size:16px;font-family:Arial'>are related to InternetService, wherever InternetService value is "No" the column have value of "No internet service". For our model let us replace "No internet service" to No in our  column. We will do similar operation for replacing "No phone service" to "No".<br>We will use sqlalchemy's oreplace function to replace the respective strings to desired value.</p>

In [None]:
from sqlalchemy import func


tdf = tdf.assign(oreplace_MultipleLines=func.oreplace(tdf.MultipleLines.expression, "No phone service","No"),
                oreplace_OnlineSecurity=func.oreplace(tdf.OnlineSecurity.expression, "No internet service","No"),
                oreplace_OnlineBackup=func.oreplace(tdf.OnlineBackup.expression, "No internet service","No"),
                oreplace_DeviceProtection=func.oreplace(tdf.DeviceProtection.expression, "No internet service","No"),                     oreplace_TechSupport=func.oreplace(tdf.TechSupport.expression, "No internet service","No"),
                oreplace_StreamingTV=func.oreplace(tdf.StreamingTV.expression, "No internet service","No"),
                oreplace_StreamingMovies=func.oreplace(tdf.StreamingMovies.expression, "No internet service","No"))
tdf

In [None]:
# now lets drop the extra columns, rename the columns in dataframe
from teradataml.dataframe.sql_functions import case

tdf2 = tdf.assign(drop_columns=True
                ,CustomerID=tdf.CustomerID  
                ,Gender=tdf.Gender 
                ,SeniorCitizen=tdf.SeniorCitizen
                ,Partner=case([(tdf.Partner == 'Yes', 1)], else_ = 0)
                ,Dependents=case([(tdf.Dependents == 'Yes', 1)], else_ = 0)
                ,Tenure=tdf.Tenure
                ,PhoneService=case([(tdf.PhoneService == 'Yes', 1)], else_ = 0)    
                ,MultipleLines=case([(tdf.oreplace_MultipleLines == 'Yes', 1)], else_ = 0)     
                ,InternetService=tdf.InternetService     
                ,OnlineSecurity=case([(tdf.oreplace_OnlineSecurity == 'Yes', 1)], else_ = 0)      
                ,OnlineBackup=case([(tdf.oreplace_OnlineBackup == 'Yes', 1)], else_ = 0)        
                ,DeviceProtection=case([(tdf.oreplace_DeviceProtection == 'Yes', 1)], else_ = 0)    
                ,TechSupport=case([(tdf.oreplace_TechSupport == 'Yes', 1)], else_ = 0)         
                ,StreamingTV=case([(tdf.oreplace_StreamingTV == 'Yes', 1)], else_ = 0)         
                ,StreamingMovies=case([(tdf.oreplace_StreamingMovies == 'Yes', 1)], else_ = 0)     
                ,Contract=tdf.Contract            
                ,PaperlessBilling=case([(tdf.PaperlessBilling == 'Yes', 1)], else_ = 0)    
                ,PaymentMethod=tdf.PaymentMethod       
                ,MonthlyCharges=tdf.MonthlyCharges      
                ,TotalCharges=tdf.TotalCharges        
                ,Churn = tdf.Churn 
                 ) 

In [None]:
tdf2

In [None]:
copy_to_sql(tdf2, table_name = 'transform_data2', if_exists='replace')

<hr style="height:1px;border:none">
<p style = 'font-size:16px;font-family:Arial'><b>6.1 Ordinal encoding</b></p> 
<p style = 'font-size:16px;font-family:Arial'>From our categorical attributes we can see that there are limited distinct values in each of these columns. We will use Teradata's <b>OrdinalEncodingFit and Transform</b> functions to convert the categorical attributes to numerical.</p>

<p style = 'font-size:16px;font-family:Arial'>
The categorical columns  </p>
<ul style = 'font-size:16px;font-family:Arial'>
<li>InternetService </li>  
<li>Contract</li>
<li>Gender</li>
<li>PaymentMethod</li>      
</ul><p style = 'font-size:16px;font-family:Arial'>have more values where we can apply ordinalencoding on it   </p>      
      
         

In [None]:
tdf2 = DataFrame('transform_data2')

In [None]:
ordinalfit_df = OrdinalEncodingFit(target_column=['InternetService','Contract','PaymentMethod','Gender'],
                                   default_value=-1,
                                   data=tdf2)

In [None]:
ordinalfit_df.result

<hr style="height:1px;border:none">
<p style = 'font-size:16px;font-family:Arial'><b>6.2 Scale the numerical values using widgets</b></p>
<p style = 'font-size:16px;font-family:Arial'>For the numercial attributes we will use <b>ScaleFit and ScaleTransform </b>function to scale the specified input table columns i.e perform the specific scale methods like standard deviation, mean etc to the input columns.</p>

In [None]:
scalefit_df = ScaleFit(data=tdf2,
                       target_columns=['MonthlyCharges','TotalCharges'],
                       scale_method="RANGE",
                       miss_value="KEEP",
                       global_scale=False)

<p style = 'font-size:16px;font-family:Arial'><b>Putting it altogether</b><p style = 'font-size:16px;font-family:Arial'>We will use <b> ColumnTransformer</b> function to apply all the transformations from the fit tables created below in one go.</p>

In [None]:
ColumnTransformer_out = ColumnTransformer(fillrowid_column_name="output_value",
                                              input_data=tdf2,
                                              # onehotencoding_fit_data=onehotfit_df.result,
                                              ordinalencoding_fit_data=ordinalfit_df.result,
                                              scale_fit_data=scalefit_df.output)
                                              

In [None]:
ColumnTransformer_out.result

In [None]:
Transformed_data= ColumnTransformer_out.result.assign(drop_columns=True,
                   CustomerID=ColumnTransformer_out.result.CustomerID,
                   SeniorCitizen=ColumnTransformer_out.result.SeniorCitizen,
                   Tenure=ColumnTransformer_out.result.Tenure,
                   InternetService=ColumnTransformer_out.result.InternetService,
                   Contract=ColumnTransformer_out.result.Contract,
                   PaperlessBilling=ColumnTransformer_out.result.PaperlessBilling,
                   PaymentMethod=ColumnTransformer_out.result.PaymentMethod,
                   MonthlyCharges=ColumnTransformer_out.result.MonthlyCharges,
                   TotalCharges=ColumnTransformer_out.result.TotalCharges,
                   Gender=ColumnTransformer_out.result.Gender,
                   Partner=ColumnTransformer_out.result.Partner,
                   Dependents=ColumnTransformer_out.result.Dependents,
                   PhoneService=ColumnTransformer_out.result.PhoneService,
                   MultipleLines=ColumnTransformer_out.result.MultipleLines,
                   OnlineSecurity=ColumnTransformer_out.result.OnlineSecurity,
                   OnlineBackup=ColumnTransformer_out.result.OnlineBackup,
                   DeviceProtection=ColumnTransformer_out.result.DeviceProtection,
                   TechSupport=ColumnTransformer_out.result.TechSupport,
                   StreamingTV=ColumnTransformer_out.result.StreamingTV,
                   StreamingMovies=ColumnTransformer_out.result.StreamingMovies,
                   Churn=ColumnTransformer_out.result.Churn)                                         
                                                      

In [None]:
Transformed_data

In [None]:
Transformed_data.shape

<p style = 'font-size:16px;font-family:Arial'>We can see from above how our data is transformed from the original values.</p>

In [None]:
# Copying the intermediate table to database
Transformed_data.to_sql("Transformed_data",primary_index = "CustomerID", if_exists = "replace")

<hr style="height:1px;border:none">
<p style = 'font-size:16px;font-family:Arial'><b>6.3 Create train and test data</b><p style = 'font-size:16px;font-family:Arial'>Now we have transformed our data and it is fit to be used in machine learning models, let us split the whole dataset into train and test sets for model training and scoring. We will use <b>TrainTestSplit</b> function for this task.</p>

In [None]:
TrainTestSplit_out = TrainTestSplit(
                                    data = DataFrame('Transformed_data'),
                                    id_column = "CustomerID",
                                    train_size = 0.75,
                                    test_size = 0.25,
                                    seed = 21
)

In [None]:
# Split into 2 virtual dataframes
df_train = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 1].drop(['TD_IsTrainRow'], axis = 1)
df_test = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 0].drop(['TD_IsTrainRow'], axis = 1)

<p style = 'font-size:16px;font-family:Arial'>We have done our preprocessing of data and we created our training and test datasets, let's now create some predictive models.

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>7. InDb Model Training and Scoring</b></p>

<p style = 'font-size:18px;font-family:Arial'><b>7.1 Logistic Regression</b></p>

<p style = 'font-size:16px;font-family:Arial'>For our model we will use logistic regression.<br>
  <b>Logistic regression</b> is a statistical algorithm used for binary classification problems. It is a type of supervised learning algorithm that predicts the probability of an input belonging to a certain class (e.g., positive or negative) based on its features.<br>Logistic regression works by modeling the relationship between the input features and the probability of belonging to a certain class using a logistic function. The logistic function takes the input feature values and maps them onto a probability scale between 0 and 1, which represents the probability of belonging to the positive class.<br>
    The <b>GLM </b>function is a generalized linear model (GLM) that performs regression and classification analysis on data sets.
<br>Please refer <a href ='https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-SQL-Engine-Analytic-Functions/Supported-on-Database-Version-17.20.xx/MODEL-TRAINING-functions/GLM'>GLM</a> for function elements and output.

In [None]:
df_train.shape

In [None]:
from teradataml import GLM, TDGLMPredict

glm_model = GLM(data = df_train,
                input_columns = ['1:15','17:20'], 
                response_column = 'Churn',
                family = 'Binomial')

In [None]:
glm_model.result

<p style = 'font-size:16px;font-family:Arial'>We have created our model, let's do the predictions on the test dataset.

In [None]:
glm_prediction = TDGLMPredict(newdata = df_test,
                           id_column = 'CustomerID',
                           object = glm_model.result,
                           accumulate = 'Churn',
                           output_prob=True,
                           output_responses = ['0', '1'])

In [None]:
out_glm = glm_prediction.result.assign(prediction = glm_prediction.result.prediction.cast(type_ = BYTEINT))
out_glm = out_glm.assign(prediction = out_glm.prediction.cast(type_ = VARCHAR(2)))
out_glm = out_glm.assign(Churn = out_glm.Churn.cast(type_ = VARCHAR(2)))
out_glm

<p style = 'font-size:16px;font-family:Arial'>The output above shows prob_1, i.e. customer will Churn and prob_0, i.e. customer will not Churn. The prediction column uses these probabilities to give a class label, i.e. prediction column.</p>

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>7.2 Evaluation of Logistic Regression Model</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will use the <b>ClassificationEvaluator</b> function to evaluate the trained glm model on test data. This will let us know how well our model has performed on unseen data.</p>

In [None]:
ClassificationEvaluator_glm = ClassificationEvaluator(
                                                        data = out_glm,
                                                        observation_column = 'Churn',
                                                        prediction_column = 'prediction',
                                                        labels = ['0', '1']
)
ClassificationEvaluator_glm.output_data.head(10)

<p style = 'font-size:16px;font-family:Arial'>The above output shows recall, and F1-score values of confusion matrix.</p>
<table style = 'font-size:16px;font-family:Arial'>
  <tr>
    <th>Column</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>Precision</td>
    <td>The positive predictive value. Refers to the fraction of relevant instances among
the total retrieved instances.
        Precision answers the following question: what proportion of predicted Positives is truly Positive? 
        Precision = (TP)/(TP+FP)</td>
  </tr>
  <tr>
    <td>Recall</td>
    <td>Refers to the fraction of relevant instances retrieved over the total amount of
relevant instances. Recall answers a different question: what proportion of actual Positives is correctly classified?
Recall = (TP)/(TP+FN)</td>
  </tr>
  <tr>
    <td>F1</td>
    <td>F1 score, defined as the harmonic mean of the precision and recall and is a number between 0 and 1. F1 score maintains a balance between the precision and recall for your classifier.                                         
                      F1 = 2*(precision*recall/precision+recall)</td>
  </tr>
  <tr>
    <td>Support</td>
    <td>The number of times a label displays in the Observation Column.</td>
  </tr>
</table>
<p style = 'font-size:16px;font-family:Arial'>**TP:- True Positive , FP :- False Positive, TN :- True Negative , FN :- False Negative</p>

<p style = 'font-size:16px;font-family:Arial'>We can also calculate mean absolute error and AUC(Area Under the Curve) for Receiver Operating Characteristic Curve.<br>Mean Absolute Error is the summation of the difference between actual and predicted values averaged over the number of observations.</p>

In [None]:
from teradataml import ROC

glm_roc_out = ROC(
    probability_column = '"prob_1"',
    observation_column = "Churn",
    positive_class = "1",
    data = out_glm,
    num_thresholds=300
)

glm_roc_df = glm_roc_out.output_data
glm_roc_df

<p style = 'font-size:16px;font-family:Arial'>The ROC curve is a graph between TPR(True Positive Rate) and FPR(False Positive Rate). The area under the ROC curve is a metric of how well the model can distinguish between positive and negative classes. The higher the AUC, the better the model's performance in distinguishing between the positive and negative classes.</p>

In [None]:
glm_auc = glm_roc_out.result.get_values()[0][0]
glm_auc

In [None]:
plot = glm_roc_df.plot(x=glm_roc_df.fpr,
    y=[glm_roc_df.tpr, glm_roc_df.fpr],
    xlabel='False Positive Rate',
    ylabel='True Positive Rate',
    # figure=fig,
    # ax=axis[0],
    color='carolina blue',
    legend=[f'GLM AUC = {round(glm_auc, 4)}', 'AUC Baseline'],
    legend_style='lower right',
    grid_linestyle='--',
    grid_linewidth=0.5,
    linestyle = ['-', '--'])

plot.show()

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>7.3 DecisionForest</b></p>
 
<p style = 'font-size:16px;font-family:Arial'>The <a href = 'https://docs.teradata.com/search/all?query=TD_DecisionForest&content-lang=en-US'>DecisionForest</a> is an ensemble algorithm used for classification and regression predictive modelling problems. It is an extension of bootstrap aggregation (bagging) of decision trees.The function supports regression, binary, and multiclass classification.</p>

<p style = 'font-size:16px;font-family:Arial'>Constructing a decision tree involves evaluating the value for each input variable in the data to select a split point. The function reduces the variables to a random subset that can be considered at each split point. The algorithm can force each decision tree in the forest to be different to improve prediction accuracy.</p>

<p style = 'font-size:16px;font-family:Arial'>Each node in the tree represents a decision based on the value of a single variable, and the tree is grown by iteratively splitting the data into smaller and smaller subsets based on these decisions. It repeats this process until it finds the best variable to split the data at a given level of a tree, and repeats it at each level until the stopping criterion is met.</p>

<p style = 'font-size:16px;font-family:Arial'>This function takes the training data as input, as well as the following function parameters</p>
    <ul style = 'font-size:16px;font-family:Arial'>
        <li style = 'font-size:16px;font-family:Arial'>InputColumns; list or range of columns used as features (we used an ordinal reference of columns 2:217)</li>
        <li style = 'font-size:16px;font-family:Arial'>ResponseColumn; the dependent or target value (we used “class”, the first column)</li>
        <li style = 'font-size:16px;font-family:Arial'>TreeType; either CLASSIFICATION or REGRESSION</li>
    <li style = 'font-size:16px;font-family:Arial'>Other hyperparameter values detailed in the documentation</li>
        </ul><p style = 'font-size:16px;font-family:Arial'>  

In [None]:
copy_to_sql(df_train, table_name='train_data', if_exists='replace')
copy_to_sql(df_test, table_name='test_data', if_exists='replace')

In [None]:
train_df = DataFrame('train_data')
test_df = DataFrame('test_data')

In [None]:
df_model = DecisionForest(data = train_df,
                input_columns = ["Tenure", "InternetService", "OnlineSecurity", "SeniorCitizen",
                                    "PaymentMethod", "OnlineBackup", "Dependents", "Partner", "MultipleLines", 
                                    "StreamingMovies", "Gender", "PhoneService", "TotalCharges", "Contract", 
                                    "MonthlyCharges", "DeviceProtection", "PaperlessBilling", "StreamingTV", 
                                    "TechSupport"],
                response_column = 'Churn',
                family = 'Binomial',
                min_impurity= 0.0,
                max_depth= 5,
                min_node_size= 1,
                num_trees= -1,
                seed= 42,
                tree_type = 'CLASSIFICATION')
    

<p style = 'font-size:16px;font-family:Arial'>We have created our model, let's do the predictions on the test dataset.

In [None]:
df_out = TDDecisionForestPredict(object = df_model,
                                        newdata = test_df,
                                        id_column = "CustomerID",
                                        detailed = False,
                                        output_prob = True,
                                        output_responses = ['0','1'],
                                        accumulate="Churn")

In [None]:
out_df = df_out.result.assign(prediction = df_out.result.prediction.cast(type_ = BYTEINT))
out_df = out_df.assign(prediction = out_df.prediction.cast(type_ = VARCHAR(2)))
out_df = out_df.assign(Churn = out_df.Churn.cast(type_ = VARCHAR(2)))
out_df

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>7.4 Evaluation of DecisionForest Model</b></p>

In [None]:
ClassificationEvaluator_df = ClassificationEvaluator(
                                                        data = out_df,
                                                        observation_column = 'Churn',
                                                        prediction_column = 'prediction',
                                                        labels = ['0', '1']
)

In [None]:
ClassificationEvaluator_df.output_data.head(10)

In [None]:
from teradataml import ROC

df_roc_out = ROC(
    probability_column = '"prob_1"',
    observation_column = "Churn",
    positive_class = "1",
    data = out_df,
    num_thresholds=300
)

df_roc_df = df_roc_out.output_data
df_roc_df

<p style = 'font-size:16px;font-family:Arial'>The ROC curve is a graph between TPR(True Positive Rate) and FPR(False Positive Rate). The area under the ROC curve is a metric of how well the model can distinguish between positive and negative classes. The higher the AUC, the better the model's performance in distinguishing between the positive and negative classes.</p>

In [None]:
df_auc = df_roc_out.result.get_values()[0][0]
df_auc

In [None]:
from teradataml import subplots
plot = df_roc_df.plot(x=df_roc_df.fpr,
    y=[df_roc_df.tpr, df_roc_df.fpr],
    xlabel='False Positive Rate',
    ylabel='True Positive Rate',
    # figure=fig,
    # ax=axis[0],
    color='carolina blue',
    legend=[f'DecisionForest AUC = {round(df_auc, 4)}', 'AUC Baseline'],
    legend_style='lower right',
    grid_linestyle='--',
    grid_linewidth=0.5,
    linestyle = ['-', '--'])

plot.show()

<p style = 'font-size:20px;font-family:Arial'><b>Conclusion</b></p>

<p style = 'font-size:16px;font-family:Arial'>In this demo we have seen how we can do analysis and pre-processing of the data in Vantage using InDb functions. We have also used created two commonly used predictive models for classification and predicted the customers that are likely to churn. 

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>8. ModelOps for Telco Customer Churn</b></p>

<p style = 'font-size:16px;font-family:Arial'>We used feature store to store features as well as its processing. We re-used it in model training. The features and processing can be re-used accross multiple machine learning models and use-case , helping to improve data science productivity</p>

<p style = 'font-size:16px;font-family:Arial'>Teradata's traditional approach using Clearscape Analytic functions play a crucial role in this context by automating the complex process of building and deploying machine learning models. Various Clearscape Analytic functions are used for optimal preparation and training of models, delivering high-quality machine learning models in minutes. With the capabilities of ClearScape Analytics ModelOps, Analytics-driven organizations can follow a mature methodology and automated capabilities to solve this gap and make efficient model operationalization at Scale in Vantage.</p>

<p style = 'font-size:16px;font-family:Arial'>ClearScape Analytics ModelOps manages the operationalization of advanced analytics in Teradata Vantage providing Deployment, Governance and Monitoring of your AI/ML models at scale. ModelOps provides an easy-to-use web-based user interface (UI), a command line interface (CLI) and Python/R Software Development Kit (SDK).</p>

<p style = 'font-size:16px;font-family:Arial'>As a part of this End-to_End demo for Telco Customer Churn prediction, we will implement the ModelOps cycle using Vantage In-Db ClearScape Analytics functions. Click the button below to load the next notebook, "<i>Telco_EndtoEnd_ModelOps_GIT_Python_indb_DF.ipynb</i>", which will showcase the steps required for the ModelOps portion of the workflow to Operationalizing the model.</p>

<a href="Telco_EndtoEnd_ModelOps_GIT_Python_indb_DF.ipynb" style="display: inline-flex; align-items: center; justify-content: center; background-color: #007373; color: #FFFFFF; font-family: Arial, sans-serif; font-size: 16px; font-weight: bold; text-decoration: none; padding: 12px 24px; border: none; border-radius: 8px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); cursor: pointer; transition: all 0.3s ease;">
  LAUNCH the next notebook to Continue
  <img src="https://img.icons8.com/ios-filled/50/ffffff/external-link.png" alt="External Link Icon" style="margin-left: 8px; width: 20px; height: 20px;">
</a>


<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>9. Cleanup</b></p>

<div class="alert alert-block alert-warning">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b>The tables created in this demo will be used in the ModelOps notebook which can be invoked on click of the above button. Please uncomment the below lines of code in case you do not want to run the ModelOps notebook and want to delete the tables created for this demo. </p>
    
</div>

<p style = 'font-size:18px;font-family:Arial'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
We need to clean up our work tables to prevent errors next time.

In [None]:
# tables = ['Transformed_data','transform_data2']

# # Loop through the list of tables and execute the drop table command for each table
# for table in tables:
#     try:
#         db_drop_table(table_name = table)
#     except:
#         pass

In [None]:
# telco_fs.archive_feature_group(feature_group='TelcoFG')

In [None]:
# telco_fs.delete_feature_group(feature_group='TelcoFG')

<p style = 'font-size:18px;font-family:Arial'><b>Databases and Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
# %run -i ../../run_procedure.py "call remove_data('DEMO_Telco');" 
#Takes 10 seconds

In [None]:
# remove_context()

<hr style="height:1px;border:none">

<b style = 'font-size:20px;font-family:Arial'>Required Materials</b>
<p style = 'font-size:16px;font-family:Arial'>Let’s look at the elements we have available for reference for this notebook:</p>

<p style = 'font-size:18px;font-family:Arial'><b>Filters:</b></p>
    <ul style = 'font-size:16px;font-family:Arial'>
    <li><b>Industry:</b> Telco</li>
    <li><b>Functionality:</b> Machine Learning</li>
    <li><b>Use Case:</b> Customer Retention</li>
    </ul>
    <p style = 'font-size:18px;font-family:Arial'><b>Related Resources:</b></p>
    <ul style = 'font-size:16px;font-family:Arial'>
    <li><a href = 'https://www.teradata.com/Blogs/NPS-is-a-metric-not-the-goal'>In the fight to improve customer experience, NPS is a metric, not the goal</a></li>
    <li><a href = 'https://www.teradata.com/Blogs/Hyper-scale-time-series-forecasting-done-right'>Hyper-scale time series forecasting done right</a></li>
    <li><a href = 'https://www.teradata.com/Resources/Datasheets/Digital-Identity-Management-and-Great-CX?utm_campaign=i_coremedia-AMS&utm_source=google&utm_medium=paidsearch&utm_content=GS_CoreMedia_NA-US_BKW&utm_creative=Brand-Vantage&utm_term=teradata%20analytic%20platform&gclid=Cj0KCQjwnMWkBhDLARIsAHBOftrWZxDktHkKMsaWjMmNRnQ6Ys-bZBAUhXjWTo1Xa02fsci-IHWBV_waAppkEALw_wcB'>Close the Gap Between Digital Identity Management and Great Customer Experiences</a></li>
        </ul>

<p style = 'font-size:18px;font-family:Arial'><b>Reference Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'> 
       <li><a href = 'https://docs.teradata.com/search/all?query=Database+Analytic+Functions&content-lang=en-US'>Teradata Vantage™ - Analytics Database Analytic Functions </a></li>    
  <li><a href = 'https://docs.teradata.com/search/all?query=Teradata+package+for+python+user+guide&content-lang=en-US'>Teradata® Package for Python User Guide</a></li>
  <li><a href = 'https://docs.teradata.com/search/all?query=Using+Teradata+Vantage+Analytic+Functions+with+Teradata+Package+for+Python&content-lang=en-US'>Teradata® Package for Python Function Reference</a></li>      
</ul>

<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

- `CustomerID`: unique id of customer
- `Gender`: Whether the customer is a male or a female
- `SeniorCitizen`:Whether the customer is a senior citizen or not (1, 0)
- `Partner`:Whether the customer has a partner or not (Yes, No)
- `Dependents`:Whether the customer has dependents or not (Yes, No)
- `Tenure`:Number of months the customer has stayed with the company
- `PhoneService`:Whether the customer has a phone service or not (Yes, No)
- `MultipleLines`:Whether the customer has multiple lines or not (Yes, No, No phone service)
- `InternetService`:Customer’s internet service provider (DSL, Fiber optic, No)
- `OnlineSecurity`:Whether the customer has online security or not (Yes, No, No internet service)
- `OnlineBackup`:Whether the customer has online backup or not (Yes, No, No internet service)
- `DeviceProtection`:Whether the customer has device protection or not (Yes, No, No internet service)
- `TechSupport`:Whether the customer has tech support or not (Yes, No, No internet service)
- `StreamingTV`:Whether the customer has streaming TV or not (Yes, No, No internet service)
- `StreamingMovies`:Whether the customer has streaming movies or not (Yes, No, No internet service)
- `Contract`:The contract term of the customer (Month-to-month, One year, Two year)
- `PaperlessBilling`:Whether the customer has paperless billing or not (Yes, No)
- `PaymentMethod`:The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
- `MonthlyCharges`:The amount charged to the customer monthly
- `TotalCharges`:The total amount charged to the customer
- `Churn`:Whether the customer churned or not (Yes or No)

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>