<header style="padding:10px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

# Vantage Analytic Library Demo Notebook 3 

</header>

## Analytic Algorithms and Scoring

The following functions are currently available in the Vantage Analytic Library XSP release 2.0.

### Matrix Building

Matrix Building builds a sum-of-squares-and-cross-products (SSCP) matrix or other derived matrix type from a table in a Teradata Database. Matrix Building does this by generating and running the SQL to call
the Teradata CALCMATRIX table operator.  The resulting matrix can be (re)used by Linear Regression or Factor analysis.

### Linear Regression

Linear Regression is one of the fundamental types of predictive modeling algorithms. In linear regression, a dependent numeric variable is expressed in terms of the sum of one or more independent numeric variables, which are each multiplied by a numeric coefficient, usually with a constant term added to the sum of independent variables. Linear Regression consists of the coefficients of the independent variables together with a constant term that comprise a linear regression model. Applying these coefficients to the variables (columns) of each observation (row) in a data set (table) is known as scoring.  

### Factor Analysis - PCA

Factor Analysis is one of the most fundamental types of statistical analysis, and Principal Components Analysis (PCA), is arguably the most common variety of Factor Analysis.  In Factor Analysis, a set of variables (in this case columns) is reduced to a smaller number of factors that account for most of the variance in the variables.  This can be useful in reducing the number of variables by converting them to factors, or in gaining insight into the nature of the variables when they are used for further data analysis. Additionally, the Factor Analysis scoring process expresses each factor as a linear combination of the input columns.   The score output table contains one or more index (key) columns and factor score columns, one for each factor.

### Logistic Regression

Logistic Regression is one of the most widely used types of statistical analysis.  In Logistic Regression, a set of independent variables (in this case columns) is processed to predict the value of a dependent variable (column) that assumes two values referred to as response (1) and non-response (0).  Actually, the user specifies what value of the dependent variable to treat as the response, and all other values assumed by the depedent variable are treated as non-repsonse.  The result is not however a continuous numeric variable as seen in Linear Regression, but rather a probability between 0 and 1 that the response value is assumed by the dependent variable.

### Decision Trees

Currently, the Teradata Warehouse Miner External Stored Procedure provides decision trees for classification models. They are built largely on the techniques described in [Quinlan] and as such, splits using information gain ratio are provided. Pruning is also provided, also using the gain ratio technique. The concept of Information gain ratio is simple - the more you know about a topic, the less new information you are apt to get about it. To be more concise: If you know an event is very probable, it is no surprise when it happens - that is, it gives you little information that it actually happened.  Taking this a bit further, we can formulate that the amount of information gained is inversely proportional to the probability of an event happening. Given that entropy refers to the probability of an event occuring, we can also say that as the entropy increases, the information gain decreases. A decision tree scoring function is provided to score and/or evaluate a decision tree model. 

### K-Means Clustering

The task of modeling multidimensional data sets encompasses a variety of statistical techniques, including that of ‘cluster analysis’. Cluster analysis is a statistical process for identifying homogeneous groups of data objects.  K-Means clustering is one of the simplest and popular unsupervised machine learning algorithms.  Unsupervised algorithms make inferences from datasets using only input without known, or labelled, outcomes.  The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.  A cluster refers to a collection of data points aggregated together because of certain similarities.  The algorithm requires as input a target number k, which refers to the number of centroids to identify in the dataset, where a centroid is the location representing the center of the cluster.  Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares.  In other words, the K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.

### Association Rules/Sequence Analysis

Association Rules provide various measures concerning items residing in groups.  The measures, support, confidence, lift and Z Score, help to determine the likelihood that one or more items exist in a group, given that another one or more items exist in the same group.  The classic example of this type of study is market basket analysis, in which the groups are shopping carts and the items are the products purchased in the shopping carts.  A sequence analysis may be optionally requested, wherein the sequence of items matters, ordering the items on each side of each rule, with left side items preceding the right side items.  

## For access to the Vantage Analytic Library XSP on Transcend, please open a Service Hub incident @

>- https://teradataservicehub.service-now.com/sp?id=index
>- and begin the incident description with “This Incident is directed to the ‘IDW DBA Admin team.”
>- Rest of the description: “Please grant the following users the role Training_TD_Warehouse_Miner_Exec_Role in tdprd2.
>- Qlid_1
>- Qlid_2
>- Qlid_n “

## Vantage Analytic Library - Call Structure

call ${XSPDB}.td_analyze('\<function name\>','\<database=database name\>;\<tablename=table name\>;\<columns=column1, column2\>,'\<param1\>=\<value1\>;\<param2\>=\<value2\>...');

---

### Accessing the Data in DemoNow
<p style = 'font-size:16px;font-family:Arial'>The Vantage Analytic Library demos will work either with foreign tables accessed from Cloud Storage via NOS or you may import the tables to your machine. You only need to import them once for all of this series of notebooks. If you import data for multiple demos, you may need to use the Data Dictionary "Manage Your Space" routine to cleanup tables you no longer need. 
    
<p style = 'font-size:16px;font-family:Arial'>Use the link below to access the 2 options for using data from the data dictionary notebook:

[Click Here to get data for this notebook](../Data_Dictionary/Data_Dictionary.ipynb#TRNG_XSP)

[Click Here to Manage Your Space](../Data_Dictionary/Data_Dictionary.ipynb#Manage_Your_Space)
    
In the instructions below, use "SystemName=local" and "QLID=demo_user"

### Accessing the Data in Transcend AppCenter
These notebooks can run from within the Transcend AppCenter Jupyter instance .  You will sign on to Transcend Vantage systems (tdprd, tdprd2 or tdprd3 (AKA Vantage LIVE)) using your credentials.  First we setup variables that you will need to change as follows.

1) First, setup the variable SystemName for the system to connect to.  In the Transcend environment, this will be one of the following:

* NAME=Transcend-Production, USER=, HOST=tdprd.td.teradata.com, PROPS="logMech=LDAP,logmech=LDAP"
* NAME=Transcend-Production-AWS, USER=, HOST=tdprd2.td.teradata.com, PROPS="logMech=LDAP,logmech=LDAP"
* NAME=Vantage-LIVE, USER=, HOST=tdprd3.td.teradata.com, PROPS="logMech=LDAP,logmech=LDAP"


In [2]:
%var SystemName=local

2) Next change QLID below to be equal to your QuickLook ID for Transcend or demo_user for DemoNow.  Your data lab will be the result database for all VAL calls where any output tables or views are created.  

In [3]:
%var QLID=demo_user

3) This next variable has been set for you to change, ONLY if you have installed the Vantage Analytic Library outside of the Transcend environment; if you are using this on Transcend, keep it set to TRNG_XSP.  This is where the software, statistical test tables and demo data are installed on both tdprd and tdprd2.

In [4]:
%var VALDB=val

In [5]:
%var XSPDB=TRNG_XSP

---

Now, connect to the Transcend system you have specified in the variable "SystemName" above.

In [6]:
%connect ${SystemName}

Password: ···


Success: 'local' connection established and activated for user 'demo_user'


Change focus to the database specified by the variable "XSPDB" above.

In [7]:
DATABASE ${XSPDB};

Success: 1 rows affected

### Demo data - Financial Customers/Accounts/Transactions

The following data has been put into the ${XSPDB} database on Transcend for the examples in the three different Jupyter Notebooks.  Its a simplistic ficticiouss dataset of banking customers (10K-ish rows), Accounts (20K-ish rows) and Transactions (1M-ish rows).  They are related to each other in the following ways:

![DemoDataModel](./img/DemoData.png)

In [8]:
SELECT * FROM ${XSPDB}.Customer SAMPLE 10;

Unnamed: 0,cust_id,income,age,years_with_bank,nbr_children,gender,marital_status,postal_code,state_code
1,23175896,29155.4,60,6,1,F,2,93769,CA
2,20448795,58936.5,63,10,3,F,4,60666,IL
3,19076820,22209.6,67,7,1,F,1,90009,CA
4,23179143,23236.9,58,4,0,M,3,10091,NY
5,20446200,0.0,18,7,2,F,1,96822,HI
6,13627960,18663.0,40,10,4,M,2,98165,WA
7,28614516,0.0,16,1,1,F,1,77083,TX
8,28615377,5034.3,46,6,1,F,1,96839,HI
9,24529644,21567.6,35,3,2,F,1,38177,TN
10,27266860,0.0,15,2,0,M,1,77061,TX


In [9]:
%meta

Result Set ID: /home/jovyan/JupyterLabRoot/Teradata/Resultsets/2022.09.16_20.18.16.474_UTC
History ID:    214
Rows:          10 of 10
Parts:         2
Column Definitions:
    cust_id: INTEGER
    income: DECIMAL(15, 1)
    age: INTEGER
    years_with_bank: INTEGER
    nbr_children: INTEGER
    gender: VARCHAR(1)
    marital_status: VARCHAR(1)
    postal_code: VARCHAR(5)
    state_code: VARCHAR(2)


In [10]:
SELECT * FROM ${XSPDB}.Accounts SAMPLE 10;

Unnamed: 0,acct_nbr,cust_id,acct_type,account_active,acct_start_date,starting_balance,ending_balance
1,1362584220,27251680,CK,Y,1995-05-01,821.169,88.75
2,1363237314,19085318,SV,Y,1995-08-09,1043.686,3102.634
3,456114321362499417,23162483,CC,Y,1990-03-15,1065.3,5890.0
4,1363361217,23177137,CK,N,1994-02-24,3669.35,66.774
5,1362905318,24532290,SV,Y,1991-05-30,551.29,775.98
6,1363183213,17721379,CK,Y,1993-08-09,37582.344,623.776
7,456114321363435413,17724655,CC,Y,1991-02-15,3267.831,1434.648
8,1363370318,24540660,SV,Y,1993-10-05,175.95,236.17
9,1362666313,17714658,SV,Y,1991-02-23,3503.962,669.937
10,1363284311,14996124,SV,Y,1992-01-26,2762.904,4218.557


In [11]:
%meta

Result Set ID: /home/jovyan/JupyterLabRoot/Teradata/Resultsets/2022.09.16_20.18.31.778_UTC
History ID:    215
Rows:          10 of 10
Parts:         2
Column Definitions:
    acct_nbr: VARCHAR(18)
    cust_id: INTEGER
    acct_type: VARCHAR(2)
    account_active: VARCHAR(1)
    acct_start_date: DATE
    starting_balance: DECIMAL(11, 3)
    ending_balance: DECIMAL(11, 3)


In [12]:
SELECT * FROM ${XSPDB}.Transactions SAMPLE 10;

Unnamed: 0,tran_id,acct_nbr,tran_amt,principal_amt,interest_amt,new_balance,tran_date,tran_time,channel,tran_code
1,672,1363175216,-0.15,9.85,0.0,118.74,1995-01-27,0,,FK
2,594,1362641218,1.81,0.0,4.072,1446.15,1995-07-31,235959,,IN
3,429,1362562313,71.35,116.35,0.0,4274.78,1995-11-24,190604,A,DP
4,357,1363200317,1.82,76.0,3.877,1458.34,1995-07-31,235959,,IN
5,901,1362591217,-258.25,-182.25,0.0,410.07,1995-02-08,0,P,WD
6,276,1362508212,-172.71,-388.6,1.0,304.93,1995-01-12,0,P,WD
7,462,1362772221,0.0,9.0,0.0,3974.0,1995-01-16,144624,A,IQ
8,340,1363447220,0.0,0.0,4.0,2045.14,1995-01-20,190508,V,IQ
9,528,1363413216,158.7,168.7,0.0,179.98,1995-07-29,112301,A,DP
10,990,456114321362591410,405.92,4036.4,2.508,-958.48,1995-12-02,0,M,PM


In [13]:
%meta

Result Set ID: /home/jovyan/JupyterLabRoot/Teradata/Resultsets/2022.09.16_20.18.46.187_UTC
History ID:    216
Rows:          10 of 10
Parts:         2
Column Definitions:
    tran_id: INTEGER
    acct_nbr: VARCHAR(18)
    tran_amt: DECIMAL(9, 2)
    principal_amt: DECIMAL(15, 2)
    interest_amt: DECIMAL(11, 3)
    new_balance: DECIMAL(9, 2)
    tran_date: DATE
    tran_time: INTEGER
    channel: VARCHAR(1)
    tran_code: VARCHAR(2)


The following Analytic Data Set (ADS) was created by joining all three tables above:

In [14]:
CREATE TABLE ${QLID}.VAL_ADS AS (
    SELECT 
        T1.cust_id  AS cust_id
       ,MIN(T1.income) AS tot_income
       ,MIN(T1.age) AS tot_age
       ,MIN(T1.years_with_bank) AS tot_cust_years
       ,MIN(T1.nbr_children) AS tot_children
       ,CASE WHEN MIN(T1.marital_status) = 1 THEN 1 ELSE 0 END AS single_ind
       ,CASE WHEN MIN(T1.gender) = 'F' THEN 1 ELSE 0 END AS female_ind
       ,CASE WHEN MIN(T1.marital_status) = 2 THEN 1 ELSE 0 END AS married_ind
       ,CASE WHEN MIN(T1.marital_status) = 3 THEN 1 ELSE 0 END AS separated_ind
       ,MAX(CASE WHEN T1.state_code = 'CA' THEN 1 ELSE 0 END) AS ca_resident_ind
       ,MAX(CASE WHEN T1.state_code = 'NY' THEN 1 ELSE 0 END) AS ny_resident_ind
       ,MAX(CASE WHEN T1.state_code = 'TX' THEN 1 ELSE 0 END) AS tx_resident_ind
       ,MAX(CASE WHEN T1.state_code = 'IL' THEN 1 ELSE 0 END) AS il_resident_ind
       ,MAX(CASE WHEN T1.state_code = 'AZ' THEN 1 ELSE 0 END) AS az_resident_ind
       ,MAX(CASE WHEN T1.state_code = 'OH' THEN 1 ELSE 0 END) AS oh_resident_ind
       ,MAX(CASE WHEN T2.acct_type = 'CK' THEN 1 ELSE 0 END) AS ck_acct_ind
       ,MAX(CASE WHEN T2.acct_type = 'SV' THEN 1 ELSE 0 END) AS sv_acct_ind
       ,MAX(CASE WHEN T2.acct_type = 'CC' THEN 1 ELSE 0 END) AS cc_acct_ind
       ,AVG(CASE WHEN T2.acct_type = 'CK' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS ck_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'SV' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS sv_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'CC' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS cc_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'CK' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS ck_avg_tran_amt
       ,AVG(CASE WHEN T2.acct_type = 'SV' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS sv_avg_tran_amt
       ,AVG(CASE WHEN T2.acct_type = 'CC' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS cc_avg_tran_amt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 1 THEN T3.tran_id ELSE NULL END) AS q1_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 2 THEN T3.tran_id ELSE NULL END) AS q2_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 3 THEN T3.tran_id ELSE NULL END) AS q3_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 4 THEN T3.tran_id ELSE NULL END) AS q4_trans_cnt
    FROM ${XSPDB}.Customer AS T1
        LEFT OUTER JOIN ${XSPDB}.Accounts AS T2
            ON T1.cust_id = T2.cust_id
        LEFT OUTER JOIN ${XSPDB}.Transactions AS T3
            ON T2.acct_nbr = T3.acct_nbr
GROUP BY T1.cust_id) WITH DATA UNIQUE PRIMARY INDEX (cust_id);

Success: 0 rows affected

In [15]:
SELECT * FROM ${XSPDB}.VAL_ADS SAMPLE 10;

Unnamed: 0,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,27269060,25456.0,35,4,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1497.093,0.0,0.0,-53.379375,0.0,0.0,11,1,4,0
2,14995101,28723.2,38,8,2,0,0,1,0,0,0,0,0,0,0,1,0,0,2243.653,0.0,0.0,7.984109589041096,0.0,0.0,27,17,11,18
3,17720898,10895.3,40,7,2,0,1,1,0,0,0,0,0,0,0,0,0,1,0.0,0.0,16972.518,0.0,0.0,2875.725,1,0,0,0
4,13630180,46884.0,37,5,3,0,0,1,0,0,0,0,0,1,0,1,1,1,4505.675686567164,15439.507343283582,3500.0322985074627,58.52885572139304,-132.48133333333334,-42.89574129353234,83,70,24,24
5,23166648,2568.8,94,10,0,0,1,1,0,0,0,0,0,0,0,1,1,1,385.70092,1245.2363466666666,842.7454933333333,33.390662222222225,24.050853333333333,24.480364444444444,39,32,122,32
6,24537168,4989.6,24,1,3,0,1,0,0,0,0,0,0,0,0,0,1,0,0.0,419.36,0.0,0.0,-7.608538461538462,0.0,8,8,4,6
7,14995519,8471.1,27,6,3,0,0,1,0,0,0,0,0,0,0,0,1,1,0.0,1134.786724137931,5159.511724137931,0.0,-2.938,-20.668310344827585,18,16,17,36
8,28631253,0.0,17,2,1,1,1,0,0,0,1,0,0,0,0,1,0,1,188.26324468085107,0.0,463.93617021276594,4.582925531914894,0.0,-1.0585106382978724,99,9,42,38
9,24535242,20880.0,20,3,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0
10,14996344,16258.0,90,10,2,0,1,0,0,1,0,0,0,0,0,1,0,0,761.338,0.0,0.0,-26.52521212121212,0.0,0.0,99,0,0,0


In [16]:
%meta

Result Set ID: /home/jovyan/JupyterLabRoot/Teradata/Resultsets/2022.09.16_20.19.28.325_UTC
History ID:    218
Rows:          10 of 10
Parts:         2
Column Definitions:
    cust_id: INTEGER
    tot_income: DECIMAL(15, 1)
    tot_age: INTEGER
    tot_cust_years: INTEGER
    tot_children: INTEGER
    single_ind: BYTEINT
    female_ind: BYTEINT
    married_ind: BYTEINT
    separated_ind: BYTEINT
    ca_resident_ind: BYTEINT
    ny_resident_ind: BYTEINT
    tx_resident_ind: BYTEINT
    il_resident_ind: BYTEINT
    az_resident_ind: BYTEINT
    oh_resident_ind: BYTEINT
    ck_acct_ind: BYTEINT
    sv_acct_ind: BYTEINT
    cc_acct_ind: BYTEINT
    ck_avg_bal: FLOAT(0, 0)
    sv_avg_bal: FLOAT(0, 0)
    cc_avg_bal: FLOAT(0, 0)
    ck_avg_tran_amt: FLOAT(0, 0)
    sv_avg_tran_amt: FLOAT(0, 0)
    cc_avg_tran_amt: FLOAT(0, 0)
    q1_trans_cnt: INTEGER
    q2_trans_cnt: INTEGER
    q3_trans_cnt: INTEGER
    q4_trans_cnt: INTEGER


---

## Matrix Building

### Purpose

Matrix Building builds a sum-of-squares-and-cross-products (SSCP) matrix or other derived matrix type  from a table in a Teradata Database. Matrix Building does this by generating and running the SQL to call
the Teradata CALCMATRIX table operator provided in Teradata beginning with the 14.10 release. The results are stored either in a table or as a result set returned to the user. The purpose in building a matrix
depends on the type of matrix built. For example, when a correlation matrix is built, view it to determine the correlations or relationships between the various columns in the matrix.  For more information about the CALCMATRIX table operator, see Teradata® Database SQL Functions, Operators, Expressions, and Predicates, B035-1145,

### Required Parameters

- **columns**

    The input columns comprising the created matrix or matrices. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.  For example: columns=column1,column2,column3.  When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.  Do not use the following column names, as these are reserved for use by the CALCMATRIX table  operator: rownum, rowname, c, or s.

- **database**

    The database containing the input table.

- **tablename**

    The input table to build a matrix from.
    
- **Matrix**

    The Matrix parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


### Optional Parameters

- **groupby**

    If specified, group by columns divide the input table into parts, one for each combination of values in the group by columns. For each combination of values a separate matrix is built, though they are all stored in the same output table or result data set. The group by columns must reside in the table named with the tablename parameter. The default case is no group by columns.  For example:  groupby=column1,column2,column3.  Do not use the column names rownum, rowname, c, or s, as these are reserved for use by the CALCMATRIX table operator.

- **matrixoutput {COLUMNS|VARBYTE}**

    Matrix output can either be returned as COLUMNS in a table or as VARBYTE values, one per column, in a reduced output table. The default is output returned as COLUMNS.

- **matrixtype**

    The following types of matrix can be built with this function. If not specified, a sum-of-squares-andcross- products (SSCP) matrix is built.

    - SSCP = sum-of-squares-and-cross-products matrix 
    - ESSCP = Extended-sum-of-squares-and-cross-products matrix (the default)
    - CSSCP = Corrected-sum-of-squares-and-cross-products matrix
    - COV = Covariance matrix
    - COR = Correlation matrix


- **nullhandling {ZERO|IGNORE}**

    If a value in a selected column is NULL, the row that contains the NULL value is by default omitted from processing (nullhandling=IGNORE), or the value may be replaced in calculations with zero through the use of this parameter (nullhandling=ZERO).
    
- **outputdatabase**

    The database that contains the resulting matrix output table.  If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase and the results are returned to the user in a result data set.

- **outputtablename**

    The name of the output table representing one or more matrices. If group by columns are specified, there is a matrix for each combination of group by column values.  Note that the output table must first be dropped by the user before executing the function if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon userdatabase, and the result set is returned to the user instead.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **where**

    An optional conditional expression may be specified with this parameter, limiting the amount of data used to build the matrix.  For example: where=income>0
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.

---

1.  In this example, input columns age, years_with_bank, and nbr_children are used to build a 3-by-3 SSCP matrix. No permanent output table is created, just a result data set that is returned to the user.

In [17]:
call ${VALDB}.td_analyze('matrix',
                         'database=${XSPDB};
                          tablename=Customer;
                          columns=age,years_with_bank,nbr_children');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,c,s,age,years_with_bank,nbr_children
1,1,age,10458,463670,24438992.0,2774158.0,811274
2,2,years_with_bank,10458,60288,2774158.0,440030.0,107472
3,3,nbr_children,10458,18681,811274.0,107472.0,55743


2.  In this example, we create a matrix on all columns in the VAL_ADS table and include a null handling parameter so that NULL values are replaced with zeros.

In [18]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          nullhandling=zero'); 

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,c,s,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,10458,235193613039.0,5.605061202908375e+18,7382153006076687.0,10451007810693.0,1351764716889.0,382960438683.0,86898533337.0,131607318612.0,111142766427.0,15112424019.0,55728229095.0,33688165896.0,26132220114.0,17630902905.0,7556915982.0,7242449907.0,163722659100.0,132551873223.0,147349652373.0,843027488644901.0,283634891165244.0,392413000169559.2,-3221620783647.088,2262976268540.6445,3354134585820.8647,10212553346688.0,4962079304028.0,4366944378027.0,4878307683480.0
2,2,tot_income,10458,314433040.3999998,7382153006076687.0,24775516769360.93,15248945669.400003,1877134999.0000029,613402801.7999995,72416071.19999999,153541041.2,173816976.4000001,33307997.999999996,74546685.6,46867582.40000001,40318373.29999999,28935443.899999995,10319492.000000002,6967483.2,233869745.39999995,163889090.8,218096408.90000004,1966116548512.8877,488900028633.303,876408829020.602,-11105742226.438196,3175147576.273457,3492741598.1472096,14693312716.700008,6946693479.700003,6458329732.600001,6479122380.600003
3,3,tot_age,10458,463670.0,10451007810693.0,15248945669.400003,24438992.0,2774158.0,811274.0,122816.0,258612.0,260212.0,32440.0,100992.0,69394.0,52404.0,32690.0,16710.0,14584.0,334190.0,258130.0,303732.0,1758862996.3775003,507144912.0931873,838774672.6489916,-9012663.491866011,4402968.974371112,7701722.276337635,21311448.0,10171400.0,8928718.0,9606920.0
4,4,tot_cust_years,10458,60288.0,1351764716889.0,1877134999.0000029,2774158.0,440030.0,107472.0,21904.0,33674.0,29268.0,3698.0,14710.0,8592.0,6386.0,4242.0,1730.0,1760.0,40974.0,33640.0,37886.0,228281238.82380375,82470831.70812306,111965132.74941888,-1038973.3802908204,655462.4233111559,1247012.16200086,2817784.0,1252140.0,1059990.0,1101962.0
5,5,tot_children,10458,18681.0,382960438683.0,613402801.7999995,811274.0,107472.0,55743.0,4140.0,10400.0,10307.0,1630.0,4853.0,2571.0,1917.0,1540.0,500.0,457.0,13134.0,9899.0,11948.0,69265929.08656433,20499934.17181709,31344443.27460774,-507216.32276717183,210894.80505212292,261348.0939134376,800210.0,392206.0,363674.0,392234.0
6,6,single_ind,10458,3864.0,86898533337.0,72416071.19999999,122816.0,21904.0,4140.0,3864.0,2226.0,0.0,0.0,1022.0,518.0,420.0,280.0,98.0,126.0,2646.0,2436.0,2310.0,12013647.058261778,5557314.893578707,4890016.223760187,-58997.67888544466,50747.775700842016,13440.530920341414,167328.0,83440.0,65366.0,78162.0
7,7,female_ind,10458,5852.0,131607318612.0,153541041.2,258612.0,33674.0,10400.0,2226.0,5852.0,2646.0,420.0,1498.0,826.0,560.0,406.0,224.0,182.0,4242.0,3514.0,3892.0,18928529.443326376,7688963.487020183,9840089.08488188,-90329.28202659098,77021.57195308401,144430.21829351323,259462.0,135436.0,120638.0,128800.0
8,8,married_ind,10458,4942.0,111142766427.0,173816976.4000001,260212.0,29268.0,10307.0,0.0,2646.0,4942.0,0.0,1008.0,784.0,588.0,350.0,168.0,154.0,3178.0,2352.0,2940.0,16962212.090544082,5009031.801771553,8743716.29845681,-96543.2818100779,43377.1646382011,128074.57518885972,203980.0,95144.0,85106.0,93268.0
9,9,separated_ind,10458,672.0,15112424019.0,33307997.999999996,32440.0,3698.0,1630.0,0.0,420.0,0.0,672.0,126.0,56.0,70.0,70.0,42.0,0.0,574.0,504.0,574.0,4450835.512659114,1258277.5356145615,2104194.7355719022,-15557.673690589769,7454.756559167712,13452.780683728124,35434.0,16492.0,17542.0,18676.0
10,10,ca_resident_ind,10458,2478.0,55728229095.0,74546685.6,100992.0,14710.0,4853.0,1022.0,1498.0,1008.0,126.0,2478.0,0.0,0.0,0.0,0.0,0.0,1652.0,1288.0,1428.0,8752358.991255801,3729855.503148928,4198524.73389717,-46027.59031224319,22240.09220457638,-13616.637203621543,99288.0,53970.0,46382.0,51282.0


3.  In this example, an output table is requested to persist a copy of the created matrix.  Note that you must select the data out of the created matrix table in order to view it, since a result set is not returned to the user.

In [19]:
call ${VALDB}.td_analyze('matrix', 
                         'database=${XSPDB};
                          tablename=Customer;
                          columns=age,years_with_bank,nbr_children;
                          outputdatabase=${QLID};
                          outputtablename=_matrix1b');

Success: 0 rows affected

In [20]:
SELECT * FROM ${QLID}._matrix1b ORDER BY 1;

Unnamed: 0,rownum,rowname,c,s,age,years_with_bank,nbr_children
1,1,age,10458,463670,24438992.0,2774158.0,811274
2,2,years_with_bank,10458,60288,2774158.0,440030.0,107472
3,3,nbr_children,10458,18681,811274.0,107472.0,55743


4.  In this example, an SQL WHERE clause is requested to limit the amount of data passed to the CALCMATRIX table operator.

In [21]:
call ${VALDB}.td_analyze('matrix',
                         'database=${XSPDB};
                          tablename=Customer;
                          columns=age,years_with_bank,nbr_children;
                          where=nbr_children > 1');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,c,s,age,years_with_bank,nbr_children
1,1,age,5694,248004,12400826.0,1476989.0,702483
2,2,years_with_bank,5694,32858,1476989.0,237892.0,93807
3,3,nbr_children,5694,16247,702483.0,93807.0,53309


5.  In this example, an SQL GROUP BY clause is added to the requested matrix so that two matrices are built and returned in the same result set (for gender=’F’ and gender=’M’).

In [22]:
call ${VALDB}.td_analyze('matrix',
                         'database=${XSPDB};
                          tablename=Customer;
                          columns=age,years_with_bank,nbr_children;
                          groupby=gender');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,gender,rownum,rowname,c,s,age,years_with_bank,nbr_children
1,F,1,age,5852,258612,13543076.0,1537978.0,449522
2,F,2,years_with_bank,5852,33674,1537978.0,246558.0,59235
3,F,3,nbr_children,5852,10400,449522.0,59235.0,30618
4,M,1,age,4606,205058,10895916.0,1236180.0,361752
5,M,2,years_with_bank,4606,26614,1236180.0,193472.0,48237
6,M,3,nbr_children,4606,8281,361752.0,48237.0,25125


6.  In each of the following examples, one matrix of each type is created on the entire VAL_ADS table and returned to the user as a result set.

In [24]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          matrixtype=COR');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,1.0,0.1412946456708041,0.0211123722413809,-0.0238372339924585,-0.4421831374999713,-1.1236786421808636e-05,-1.2354043575014405e-05,1.4051837064770115e-05,-2.97082840062444e-05,-1.5471488896554525e-05,-4.367143171818019e-05,-2.2347758073979595e-05,-4.935417102885042e-05,4.882040107001244e-05,8.860350985050037e-05,7.474935410600362e-06,-1.2298094183544178e-05,-2.025604987040682e-05,-0.0061466034246248,0.0029444241013742,0.0024441569692593,0.2046041312307351,-0.0789635098700659,-0.0126513028841951,1.91884961628778e-05,2.208561582870242e-05,-2.9683514759868893e-05,5.798001414825925e-06
2,2,tot_income,0.1412946456708041,1.0,0.1696258430209875,0.054184260301849,0.0883617251086536,-0.2264932212412574,-0.1127550574229519,0.1262454475436492,0.1334965142083685,0.0002495246889912,0.0130380096341067,0.0427772282864815,0.0508811984625131,0.0030773669791132,-0.0392460968770752,0.0814051707336478,-0.0671036577139375,0.1089797942222013,0.2807896874292352,0.0747860360667138,0.24879358886445,-0.1568481738555244,-0.0077929036187127,-0.0074420600063776,0.0573476276662624,0.0338353641410542,0.0705807295328443,-0.0044285198239375
3,3,tot_age,0.0211123722413809,0.1696258430209875,1.0,0.1689159390707854,-0.0575997949536377,-0.4987347205842626,-0.0084439474964538,0.4086176520959074,0.0535569512769904,-0.1035784395840155,0.0421938702923144,0.0139787760999015,-0.0390099662166678,0.0510282579659698,0.0088402077837453,0.1232497826416744,-0.0319123257461797,0.1358488036432587,0.0612309569290736,-0.0682732190226801,0.0927531828355919,-0.0374894067778946,-0.0500955030267798,0.0038317577628572,0.1290563705593974,0.083584488291551,0.0722507873259208,-0.0021316782434362
4,4,tot_cust_years,-0.0238372339924585,0.054184260301849,0.1689159390707854,1.0,-0.0048305325438105,-0.0247213483218938,-0.0039804052003674,0.0501397398799925,-0.0230695737080027,0.0321305806557784,-0.0040047283139217,-0.0319906468862426,-0.0338940372887251,-0.0377384675571068,-0.0179169372269459,-0.0694607933546959,-0.0218868896610625,0.0076579079746313,0.0497944590225521,0.0864347379099397,0.104521985220472,0.0177582951090139,0.0102976689531109,0.0192157334559999,0.1419197488272439,-0.0275601911072393,-0.0870481204850979,-0.1980967023196854
5,5,tot_children,-0.4421831374999713,0.0883617251086536,-0.0575997949536377,-0.0048305325438105,1.0,-0.374130847042647,-0.0070265080797411,0.1936920505952978,0.1145382176110784,0.0655851400804262,-0.0195684482339289,-0.0330060346280507,0.0346440109659725,-0.0371443153371272,-0.0447258845070455,0.0184530755544946,-0.0829649932218186,0.0330083804162256,0.0184435707020491,-0.0350382987688359,0.003976623838364,-0.1286712526342487,0.0165098598896443,-0.0044450605442513,-0.0158002291217842,-0.0054353297233926,0.0500810917054593,0.0129078792993875
6,6,single_ind,-1.1236786421808636e-05,-0.2264932212412574,-0.4987347205842626,-0.0247213483218938,-0.374130847042647,1.0,0.0254663106082173,-0.7245749728853184,-0.200597778094832,0.0495887102696642,-0.0200633635893703,-0.0058835745713736,-0.0072753127901688,-0.0293720624898218,0.0080599634432258,-0.0188659690389971,0.1031804465321708,-0.045385576487929,-0.0500907301264278,0.0487016031124478,-0.0866988206470783,0.0243806160861626,0.0381262854482425,-0.0190763655040368,-0.0019824593049144,0.016452299305296,-0.0575842248196356,-0.0163025443873298
7,7,female_ind,-1.2354043575014405e-05,-0.1127550574229519,-0.0084439474964538,-0.0039804052003674,-0.0070265080797411,0.0254663106082173,1.0,-0.0460665885225788,0.0345368489649397,0.0504539972716346,-0.0067289176846431,-0.0552964066406474,-0.0239208332830169,0.0393043274523894,0.0020269939072792,0.0704894311969009,0.083845322681454,0.0898643639636585,-0.0547403093135827,0.0337604146798975,0.0048211043770158,0.0337251927859849,0.0567629525855943,0.0202921674164958,0.0227787623920972,0.0998248754472341,0.1051442401579383,0.0592879498151235
8,8,married_ind,1.4051837064770115e-05,0.1262454475436492,0.4086176520959074,0.0501397398799925,0.1936920505952978,-0.7245749728853184,-0.0460665885225788,1.0,-0.2480397429451644,-0.0734192502891822,0.0416117183266771,0.0237006115094521,-0.0148988127235249,0.0100150903055388,0.0020363733789961,-0.1091939791794409,-0.1673193471771213,-0.0618433496357785,-0.0207932470103656,-0.0486387972892151,0.0275247046452559,-0.0164859947790472,-0.0339887861939007,0.0194069558500965,-0.0448506769014703,-0.0756238261356849,-0.0580790966601478,-0.0733927909899261
9,9,separated_ind,-2.97082840062444e-05,0.1334965142083685,0.0535569512769904,-0.0230695737080027,0.1145382176110784,-0.200597778094832,0.0345368489649397,-0.2480397429451644,1.0,-0.0304737196629524,-0.0448118831275773,-0.0057905131826509,0.0290572451812167,0.0451329689860786,-0.0467064291839411,0.0900491060268042,0.0984979942081849,0.1233290998688263,0.1067284850148672,0.0474863768672899,0.1084684182315341,-0.0155055140867258,0.0025325614506757,0.0022446397870841,0.0538309745618518,0.0390678791464207,0.0899879327472179,0.0766393929800714
10,10,ca_resident_ind,-1.5471488896554525e-05,0.0002495246889912,-0.1035784395840155,0.0321305806557784,0.0655851400804262,0.0495887102696642,0.0504539972716346,-0.0734192502891822,-0.0304737196629524,1.0,-0.2278510882595311,-0.1970172314131031,-0.1586368929275272,-0.101527964103917,-0.0993216304491623,-0.0356826652518508,-0.049229975487547,-0.0578695069359484,-0.0047626890993329,0.0455576449595504,0.0044033674634921,-0.003523675019524,-0.0178414926354156,-0.0240099428860668,-0.0412492580540407,0.0164503538871208,0.003806359587659,-0.0010799702331046


In [25]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          matrixtype=COV');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,30191252728660.5,29717757234.452354,2234991.5647256942,-389515.3986261581,-3553896.157998159,-29.802167453403555,-33.70046927693561,38.54866112628769,-40.0292546296883,-36.14905101903262,-84.0659326605456,-38.5920436071531,-71.41634646072096,47.30606695693552,84.10558903875999,18.89134044065715,-33.51416414438475,-53.84191715700902,-251079874.0261969,59742715.22801361,47530614.03233938,97368406.16897756,-22051883.90021593,-34338865.248516224,4777.318183200614,2803.4248851579214,-3579.70913473061,767.9992602732866
2,2,tot_income,29717757234.452354,1465208799.6863174,125095.03277496425,6168.095835454952,4947.387591431844,-4184.755849639347,-2142.7526574757067,2412.687709762913,1253.0807385994208,4.061509805123158,174.84092501465295,514.6188018658436,512.9089666370063,20.773227672093245,-259.5252360434688,1433.2320587440513,-1273.931168977963,2017.998939892702,79903730.1940098,10570956.844271008,33704874.02017164,-519986.8030175577,-15160.99660433262,-140719.09908892377,99464.44807767175,29919.853156773617,59296.32141830574,-4086.488397169633
3,3,tot_age,2234991.5647256942,125095.03277496425,371.19026190601465,9.678256407369666,-1.6232343135023186,-4.638024604874504,-0.0807662769915529,3.930529295787594,0.2530308668930287,-0.8485782087900229,0.2847932484136283,0.0846429292446314,-0.1979279202814253,0.1733739970880941,0.0294234859171472,1.0921925565255508,-0.3049346344608295,1.2661329068785423,8770.11878155771,-4857.27500901944,6324.568091551428,-62.5561411866763,-49.05414553887208,36.46753868853356,112.66276236244586,37.20170279793107,30.551512863477758,-0.9900607818414648
4,4,tot_cust_years,-389515.3986261581,6168.095835454952,9.678256407369666,8.84415630801461,-0.0210128619506784,-0.0354866919144494,-0.0058768112518929,0.0744467782193131,-0.0168239180303503,0.0406322622420445,-0.0041723746856989,-0.0299002263236747,-0.0265451209063086,-0.0197918958995588,-0.009205032811748,-0.0950129292151872,-0.0322821360991445,0.0110170048079858,1100.8936454454647,949.206165039006,1100.1188819563397,4.573962181241336,1.5564881450232009,28.22898676514929,19.123778272696796,-1.893430084495964,-5.681715097935973,-14.20193003053622
5,5,tot_children,-3553896.157998159,4947.387591431844,-1.6232343135023186,-0.0210128619506784,2.1395575882677966,-0.2641500303595562,-0.0051025561555776,0.1414521046795962,0.0410839110482284,0.0407935653871102,-0.010027678851583,-0.0151732491791782,0.0133451468684338,-0.0095814068169013,-0.0113019736976019,0.0124149653985551,-0.060187580195507,0.0233566954055103,200.5594531510124,-189.2557258570697,20.58639331784357,-16.300713521013677,1.2273941929499597,-3.2118095885327334,-1.0471961480808958,-0.1836651377432845,1.607783721670655,0.4551544612033291
6,6,single_ind,-29.802167453403555,-4184.755849639347,-4.638024604874504,-0.0354866919144494,-0.2641500303595562,0.232986262732867,0.0061026356549848,-0.1746160313050999,-0.0237438229536679,0.0101782284536443,-0.0033927428178814,-0.0008925440693634,-0.0009248046983765,-0.002500198748518,0.0006720964377736,-0.0041885050002054,0.0247008882810576,-0.0105976166308151,-179.74576272624049,86.80653891748118,-148.1092257267799,1.0192317154522694,0.9353374834228912,-4.54852841234908,-0.0433582853936545,0.1834554436546991,-0.6100431178668965,-0.1896978753687409
7,7,female_ind,-33.70046927693561,-2142.7526574757067,-0.0807662769915529,-0.0058768112518929,-0.0051025561555776,0.0061026356549848,0.2464747901746926,-0.0114184704134826,0.004204635314712,0.010651384345837,-0.0011703439303098,-0.0086279260038464,-0.0031274887571067,0.0034411337614011,0.0001738489452374,0.0160962616203873,0.0206450103112395,0.0215823608097878,-202.03640937475228,61.892503696987326,8.47103646126265,1.4501194896987506,1.4322865603219983,4.976509172896594,0.5124117009301431,1.1448902940185082,1.14568067942933,0.7095689506295879
8,8,married_ind,38.54866112628769,2412.687709762913,3.930529295787594,0.0744467782193131,0.1414521046795962,-0.1746160313050999,-0.0114184704134826,0.249270711355831,-0.0303680054443652,-0.0155872605848467,0.0072783563567969,0.0037189336223476,-0.0019589370839643,0.000881790526359,0.0001756412024048,-0.0250754700290435,-0.0414316089387033,-0.0149366712330818,-77.17809968103673,-89.67316120708242,48.63647129366612,-0.7128758136857697,-0.8624818572427241,4.786335791171002,-1.0146290430921352,-0.8722342111424884,-0.636425143370972,-0.8833462055803463
9,9,separated_ind,-40.0292546296883,1253.0807385994208,0.2530308668930287,-0.0168239180303503,0.0410839110482284,-0.0237438229536679,0.004204635314712,-0.0303680054443652,0.0601338124804852,-0.0031776719577938,-0.0038497683955675,-0.0004462720346817,0.001876493254264,0.0019517680552947,-0.0019786519128056,0.0101567213676356,0.0119794469068777,0.0146301952574571,194.569989708576,43.00039355647138,94.1383193675685,-0.3293128179079062,0.0315644370282525,0.2719043437208439,0.598128192218045,0.2213186685731162,0.4843234466027061,0.4530575203174753
10,10,ca_resident_ind,-36.14905101903262,4.061509805123158,-0.8485782087900229,0.0406322622420445,0.0407935653871102,0.0101782284536443,0.010651384345837,-0.0155872605848467,-0.0031776719577938,0.1808208256186263,-0.033943558493321,-0.026330050046221,-0.0177648530432334,-0.0076135084471,-0.0072962789284708,-0.0069790494098417,-0.0103825457707275,-0.0119041721058471,-15.056103723718223,71.53680105972653,6.626938149000817,-0.1297726482619922,-0.3855973986050686,-5.043423878639075,-0.7947728563676145,0.1615988674982996,0.0355243293149647,-0.0110707725230077


In [26]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          matrixtype=CSSCP');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,3.157099297836029e+17,310758587400668.25,23371306792.336582,-4073162523.433735,-37163092124.186745,-311641.265060241,-352405.8072289157,403103.3493975904,-418585.9156626506,-378010.6265060241,-879077.4578313254,-403557.0,-746800.734939759,494679.5421686747,879492.1445783132,197546.7469879518,-350457.6144578313,-563024.9277108434,-2625542242691.941,624729573139.3383,497027630936.17285,1018181423308.998,-230596549944.55804,-359081513903.7342,49956416.24172882,29315414.024096385,-37433018.42187799,8030968.264677758
2,2,tot_income,310758587400668.25,15321688418319.822,1308118757.727801,64499778.15135244,51734832.0436028,-43759991.91967864,-22406764.539223462,25229475.380990807,13103465.283534143,42471.20803217287,1828311.5528782203,5381368.811111126,5363489.064123169,217225.64176707616,-2713855.3933065534,14987307.638286544,-13321498.234002557,21102214.914457984,835553306638.7604,110540495720.54192,352451867628.9347,-5437501999.154601,-158538541.4915062,-1471499619.172876,1040099733.5482109,312871904.4603817,620061633.0712231,-42732409.16920286
3,3,tot_age,23371306792.336582,1308118757.727801,3881536.568751195,101205.5272518646,-16974.161216293745,-48499.82329317269,-844.5729585006693,41101.54484605087,2645.9437751004016,-8873.582329317269,2978.082998661312,885.1111111111111,-2069.732262382865,1812.9718875502008,307.6813922356091,11421.057563587685,-3188.7014725568943,13239.951807228916,91709132.09874935,-50792524.76931629,66136008.53335328,-654149.5683890741,-512959.1998999839,381341.0520659954,1178114.5060240964,389018.2061579652,319477.1700133869,-10353.065595716198
4,4,tot_cust_years,-4073162523.433735,64499778.15135244,101205.5272518646,92483.34251290878,-219.7314974182444,-371.0843373493976,-61.45381526104418,778.4899598393574,-175.9277108433735,424.89156626506025,-43.630522088353416,-312.6666666666667,-277.58232931726906,-206.9638554216868,-96.2570281124498,-993.5502008032128,-337.574297188755,115.20481927710844,11512044.850423224,9925848.867812883,11503943.148617445,47829.92252924064,16276.196532507613,295190.5146031663,199977.34939759035,-19799.5983935743,-59413.694779116464,-148509.58232931726
5,5,tot_children,-37163092124.186745,51734832.0436028,-16974.161216293745,-219.7314974182444,22373.353700516356,-2762.21686746988,-53.3574297188755,1479.1646586345382,429.6144578313253,426.578313253012,-104.859437751004,-158.66666666666666,139.55020080321285,-100.19277108433737,-118.18473895582328,129.82329317269077,-629.3815261044176,244.24096385542168,2097250.201600125,-1979047.1252873784,215271.914924696,-170456.5612892399,12834.861075677729,-33585.89286728684,-10950.530120481928,-1920.586345381526,16812.59437751004,4759.550200803213
6,6,single_ind,-311641.265060241,-43759991.91967864,-48499.82329317269,-371.0843373493976,-2762.21686746988,2436.3373493975905,63.81526104417671,-1825.9598393574297,-248.289156626506,106.43373493975903,-35.47791164658634,-9.333333333333334,-9.670682730923694,-26.14457831325301,7.028112449799197,-43.7991967871486,258.2971887550201,-110.81927710843374,-1879601.4408282968,907735.9774601014,-1548778.1734249373,10658.106048484393,9780.824064153174,-47563.96160793432,-453.3975903614458,1918.3935742971887,-6379.220883534136,-1983.670682730924
7,7,female_ind,-352405.8072289157,-22406764.539223462,-844.5729585006693,-61.45381526104418,-53.3574297188755,63.81526104417671,2577.3868808567604,-119.40294511378848,43.967871485943775,111.38152610441767,-12.238286479250334,-90.22222222222224,-32.704149933065594,35.98393574297189,1.8179384203480589,168.3186077643909,215.88487282463183,225.6867469879518,-2112694.7328317817,647209.9111593965,88581.62827542353,15163.899503779836,14977.42056128715,52039.35642097968,5358.289156626506,11972.11780455154,11980.382864792504,7419.962516733601
8,8,married_ind,403103.3493975904,25229475.380990807,41101.54484605087,778.4899598393574,1479.1646586345382,-1825.9598393574297,-119.40294511378848,2606.623828647925,-317.55823293172693,-162.99598393574297,76.10977242302543,38.888888888888886,-20.484605087014724,9.220883534136544,1.8366800535475236,-262.21419009370817,-433.25033467202144,-156.19277108433735,-807051.3883646011,-937712.2467424608,508591.5803178651,-7454.542383712116,-9018.972781187154,50050.71336827517,-10609.97590361446,-9120.953145917,-6655.097724230254,-9237.15127175368
9,9,separated_ind,-418585.9156626506,13103465.283534143,2645.9437751004016,-175.9277108433735,429.6144578313253,-248.289156626506,43.967871485943775,-317.55823293172693,628.8192771084338,-33.2289156626506,-40.2570281124498,-4.666666666666667,19.62248995983936,20.40963855421687,-20.69076305220884,106.20883534136546,125.26907630522088,152.9879518072289,2034618.3823825791,449655.1154200214,984404.4056266634,-3443.624136862975,330.0693180044355,2843.3037222888647,6254.626506024097,2314.3293172690765,5064.570281124498,4737.622489959839
10,10,ca_resident_ind,-378010.6265060241,42471.20803217287,-8873.582329317269,424.89156626506025,426.578313253012,106.43373493975903,111.38152610441767,-162.99598393574297,-33.2289156626506,1890.843373493976,-354.9477911646586,-275.3333333333333,-185.7670682730924,-79.6144578313253,-76.29718875502007,-72.97991967871486,-108.570281124498,-124.48192771084338,-157441.67663892146,748060.3286815602,69297.89222410154,-1357.0325828756468,-4032.1919972132023,-52739.083498928805,-8310.939759036144,1689.8393574297188,371.4779116465864,-115.76706827309236


In [25]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          matrixtype=ESSCP');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,c,s,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,10458,235193613039.0,5.605061202908375e+18,7382153006076687.0,10451007810693.0,1351764716889.0,382960438683.0,86898533337.0,131607318612.0,111142766427.0,15112424019.0,55728229095.0,33688165896.0,26132220114.0,17630902905.0,7556915982.0,7242449907.0,163722659100.0,132551873223.0,147349652373.0,843027488644901.0,283634891165244.0,392413000169559.2,-3221620783647.088,2262976268540.6445,3354134585820.8647,10212553346688.0,4962079304028.0,4366944378027.0,4878307683480.0
2,2,tot_income,10458,314433040.3999998,7382153006076687.0,24775516769360.93,15248945669.400003,1877134999.0000029,613402801.7999995,72416071.19999999,153541041.2,173816976.40000007,33307997.999999996,74546685.59999998,46867582.40000001,40318373.29999999,28935443.899999995,10319492.000000002,6967483.2,233869745.39999995,163889090.8,218096408.90000004,1966116548512.8877,488900028633.30304,876408829020.602,-11105742226.438196,3175147576.273457,3492741598.1472096,14693312716.700008,6946693479.700003,6458329732.6,6479122380.600003
3,3,tot_age,10458,463670.0,10451007810693.0,15248945669.400003,24438992.0,2774158.0,811274.0,122816.0,258612.0,260212.0,32440.0,100992.0,69394.0,52404.0,32690.0,16710.0,14584.0,334190.0,258130.0,303732.0,1758862996.3775,507144912.0931873,838774672.6489916,-9012663.491866011,4402968.974371112,7701722.276337633,21311448.0,10171400.0,8928718.0,9606920.0
4,4,tot_cust_years,10458,60288.0,1351764716889.0,1877134999.0000029,2774158.0,440030.0,107472.0,21904.0,33674.0,29268.0,3698.0,14710.0,8592.0,6386.0,4242.0,1730.0,1760.0,40974.0,33640.0,37886.0,228281238.8238038,82470831.70812306,111965132.74941888,-1038973.3802908204,655462.4233111559,1247012.16200086,2817784.0,1252140.0,1059990.0,1101962.0
5,5,tot_children,10458,18681.0,382960438683.0,613402801.7999995,811274.0,107472.0,55743.0,4140.0,10400.0,10307.0,1630.0,4853.0,2571.0,1917.0,1540.0,500.0,457.0,13134.0,9899.0,11948.0,69265929.08656433,20499934.17181709,31344443.274607737,-507216.32276717183,210894.80505212292,261348.0939134376,800210.0,392206.0,363674.0,392234.0
6,6,single_ind,10458,3864.0,86898533337.0,72416071.19999999,122816.0,21904.0,4140.0,3864.0,2226.0,0.0,0.0,1022.0,518.0,420.0,280.0,98.0,126.0,2646.0,2436.0,2310.0,12013647.058261778,5557314.893578706,4890016.223760187,-58997.67888544468,50747.77570084202,13440.530920341414,167328.0,83440.0,65366.0,78162.0
7,7,female_ind,10458,5852.0,131607318612.0,153541041.2,258612.0,33674.0,10400.0,2226.0,5852.0,2646.0,420.0,1498.0,826.0,560.0,406.0,224.0,182.0,4242.0,3514.0,3892.0,18928529.443326376,7688963.487020183,9840089.08488188,-90329.28202659098,77021.57195308401,144430.21829351323,259462.0,135436.0,120638.0,128800.0
8,8,married_ind,10458,4942.0,111142766427.0,173816976.40000007,260212.0,29268.0,10307.0,0.0,2646.0,4942.0,0.0,1008.0,784.0,588.0,350.0,168.0,154.0,3178.0,2352.0,2940.0,16962212.090544082,5009031.801771553,8743716.298456812,-96543.2818100779,43377.1646382011,128074.57518885972,203980.0,95144.0,85106.0,93268.0
9,9,separated_ind,10458,672.0,15112424019.0,33307997.999999996,32440.0,3698.0,1630.0,0.0,420.0,0.0,672.0,126.0,56.0,70.0,70.0,42.0,0.0,574.0,504.0,574.0,4450835.512659114,1258277.5356145615,2104194.7355719027,-15557.673690589769,7454.756559167714,13452.780683728124,35434.0,16492.0,17542.0,18676.0
10,10,ca_resident_ind,10458,2478.0,55728229095.0,74546685.59999998,100992.0,14710.0,4853.0,1022.0,1498.0,1008.0,126.0,2478.0,0.0,0.0,0.0,0.0,0.0,1652.0,1288.0,1428.0,8752358.991255801,3729855.503148928,4198524.733897169,-46027.5903122432,22240.09220457638,-13616.637203621543,99288.0,53970.0,46382.0,51282.0


In [27]:
call ${VALDB}.td_analyze('matrix',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=all;
                          matrixtype=SSCP');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,rownum,rowname,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,1,cust_id,5.605061202908376e+18,7382153006076688.0,10451007810693.0,1351764716889.0,382960438683.0,86898533337.0,131607318612.0,111142766427.0,15112424019.0,55728229095.0,33688165896.0,26132220114.0,17630902905.0,7556915982.0,7242449907.0,163722659100.0,132551873223.0,147349652373.0,843027488644901.1,283634891165244.0,392413000169559.1,-3221620783647.088,2262976268540.6445,3354134585820.8647,10212553346688.0,4962079304028.0,4366944378027.0,4878307683480.0
2,2,tot_income,7382153006076688.0,24775516769360.934,15248945669.400003,1877134999.0000024,613402801.7999995,72416071.19999999,153541041.2,173816976.40000007,33307997.999999996,74546685.59999998,46867582.40000002,40318373.29999999,28935443.9,10319492.000000002,6967483.2,233869745.39999992,163889090.8,218096408.90000004,1966116548512.8877,488900028633.303,876408829020.602,-11105742226.438196,3175147576.273457,3492741598.14721,14693312716.700008,6946693479.700003,6458329732.600001,6479122380.600003
3,3,tot_age,10451007810693.0,15248945669.400003,24438992.0,2774158.0,811274.0,122816.0,258612.0,260212.0,32440.0,100992.0,69394.0,52404.0,32690.0,16710.0,14584.0,334190.0,258130.0,303732.0,1758862996.3775,507144912.0931872,838774672.6489916,-9012663.491866011,4402968.974371112,7701722.276337634,21311448.0,10171400.0,8928718.0,9606920.0
4,4,tot_cust_years,1351764716889.0,1877134999.0000024,2774158.0,440030.0,107472.0,21904.0,33674.0,29268.0,3698.0,14710.0,8592.0,6386.0,4242.0,1730.0,1760.0,40974.0,33640.0,37886.0,228281238.82380375,82470831.70812306,111965132.74941888,-1038973.3802908204,655462.4233111559,1247012.16200086,2817784.0,1252140.0,1059990.0,1101962.0
5,5,tot_children,382960438683.0,613402801.7999995,811274.0,107472.0,55743.0,4140.0,10400.0,10307.0,1630.0,4853.0,2571.0,1917.0,1540.0,500.0,457.0,13134.0,9899.0,11948.0,69265929.08656435,20499934.17181709,31344443.27460774,-507216.32276717183,210894.80505212292,261348.0939134376,800210.0,392206.0,363674.0,392234.0
6,6,single_ind,86898533337.0,72416071.19999999,122816.0,21904.0,4140.0,3864.0,2226.0,0.0,0.0,1022.0,518.0,420.0,280.0,98.0,126.0,2646.0,2436.0,2310.0,12013647.058261778,5557314.893578707,4890016.223760187,-58997.67888544468,50747.77570084202,13440.530920341414,167328.0,83440.0,65366.0,78162.0
7,7,female_ind,131607318612.0,153541041.2,258612.0,33674.0,10400.0,2226.0,5852.0,2646.0,420.0,1498.0,826.0,560.0,406.0,224.0,182.0,4242.0,3514.0,3892.0,18928529.443326376,7688963.487020183,9840089.08488188,-90329.28202659098,77021.57195308401,144430.21829351323,259462.0,135436.0,120638.0,128800.0
8,8,married_ind,111142766427.0,173816976.40000007,260212.0,29268.0,10307.0,0.0,2646.0,4942.0,0.0,1008.0,784.0,588.0,350.0,168.0,154.0,3178.0,2352.0,2940.0,16962212.090544082,5009031.801771553,8743716.298456812,-96543.28181007788,43377.1646382011,128074.57518885972,203980.0,95144.0,85106.0,93268.0
9,9,separated_ind,15112424019.0,33307997.999999996,32440.0,3698.0,1630.0,0.0,420.0,0.0,672.0,126.0,56.0,70.0,70.0,42.0,0.0,574.0,504.0,574.0,4450835.512659114,1258277.5356145613,2104194.7355719027,-15557.673690589769,7454.756559167713,13452.780683728124,35434.0,16492.0,17542.0,18676.0
10,10,ca_resident_ind,55728229095.0,74546685.59999998,100992.0,14710.0,4853.0,1022.0,1498.0,1008.0,126.0,2478.0,0.0,0.0,0.0,0.0,0.0,1652.0,1288.0,1428.0,8752358.9912558,3729855.503148928,4198524.733897169,-46027.59031224319,22240.09220457638,-13616.637203621543,99288.0,53970.0,46382.0,51282.0


---

## Linear Regression

### Purpose

Linear Regression is one of the fundamental types of predictive modeling algorithms. In linear regression, a dependent numeric variable is expressed in terms of the sum of one or more independent numeric variables, which are each multiplied by a numeric coefficient, usually with a constant term added to the sum of independent variables. Linear Regression is the coefficients of the independent variables together with a constant term that comprise a linear regression model. Applying these coefficients to the variables (columns) of each observation (row) in a data set (table) is known as scoring, as described in Linear Regression Scoring.

Some of the key features of VAL version of linear regression are outlined below.

- The Teradata supplied table operator CALCMATRIX is used to build a table that represents an extended cross-products matrix that is the input to the algorithm
- One or more group by columns may optionally be specified so that an input matrix is built for each combination of group by column values, and subsequently a separate linear model is built for each matrix.To achieve this, the names of the group by columns are passed to CALCMATRIX as parameters, so it includes them as columns in the matrix table it creates.
- The algorithm is partially scalable because the size of each input matrix depends only on the number of independent variables (columns) and not on the size of the input table. The calculations performed on the client workstation however are not scalable when group by columns are used, because each model is built serially based on each matrix in the matrix table.

### Required Parameters

- **columns**

    The input columns representing the independent variables used in building a linear regression model.  The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.  For example, columns=c1,c2,c3. When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.

- **database**

    The database containing the input table.

- **dependent** 

    The name of the column that represents the dependent variable.

- **tablename**

    The input table to build a predictive model from.

- **Linear**

    The Linear parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


### Optional Parameters

- **constant**

    Set to true if the linear model includes a constant term or false otherwise. The default value is true.

- **groupby**

    The input columns dividing the input table into partitions, one for each combination of values in the group by columns. For each partition or combination of values a separate linear model is built.  The columns must reside in the table named with the tablename parameter. The default case is no group by columns.  For example: groupby=column1,column2,column3

- **matrixdatabase**

    The database where the matrix table resides if specified, as indicated by the matrixtablename parameter.

- **matrixtablename**

    Instead of internally building a matrix with the Matrix function each time this analysis is performed, the user may build an ESSCP Matrix once with the Matrix Analysis and save it to a table with this name in matrixdatabase.  The matrix can subsequently be read from this table instead of re-building it each time.  If the matrix table is specified, the columns specified with the columns parameter may be a subset of the columns in this matrix and may be specified in any order.  The columns must however all be present in the matrix.  Further, if group by columns are specified in the matrix, these same group by columns must be specified in this analysis.

- **outputdatabase**

    The database that contains the resulting output table that represents one or more linear models.  If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon user database.
    
- **outputtablename**

    The name of the output table representing one or more linear models. A second output table reporting statistical measures is automatically named on the user’s behalf by appending _rpt to the end of this name. These two output tables represent a single linear model with coefficients and statistical measures in the absence of group by columns, or if group by columns are specified, there is a model for each combination of group bycolumn values in these output tables.  Note that both of the output tables must first be dropped by the user if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified, volatile output tables with randomly generated names are created in the logon user database, and the two output result sets are returned to the user instead.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.
    
- **neardependencyreport**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table if specified.  Two threshold parameters are available for  this report, conditionindexthreshold (default 30) and varianceproportionthreshold (default 0.5).  The report is included in the XML output only if collinearity is detected.  

- **conditionindexthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table if specified.  One of the threshold parameters for that report is conditionindexthreshold with a default value of 30.

- **varianceproportionthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table, if specified.  One of the threshold parameters available for this report is varianceproportionthreshold with a default value of 0.5.
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1.  Using the VAL_ADS table, build a linear model to estimate average monthly balance (cc_avg_bal) that a banking customer has on their credit card based on all non-credit card related variables in the table.  Coefficients and model variable statistics are created in the LinearRegressionDemo1 table as specified by the outputtablename argument.  Note that model statistics are created in the LinearRegressionDemo1_rpt table.

In [28]:
call ${VALDB}.td_analyze('linear',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind;
                          dependent=cc_avg_bal;
                          neardependencyreport=true;
                          outputdatabase=${QLID};
                          outputtablename=LinearRegressionDemo1');

Success: 0 rows affected

In [29]:
SELECT * FROM ${QLID}.LinearRegressionDemo1 order by 2 DESC;

Unnamed: 0,Column Name,B Coefficient,Standard Error,T Statistic,P-Value,Lower,Upper,Standard Coefficient,Incremental R-Squared,Squared Multiple Correlation Coefficient (1-Tolerance)
1,separated_ind,1042.63279236674,171.56621549145,6.07714513827867,1.26614430051575e-09,706.330196115911,1378.93538861756,0.0722415383177219,0.0790668145987975,0.383309900589348
2,(Constant),768.003873280991,203.608286980909,3.77196765745101,0.0001628600491014,368.892688519353,1167.11505804263,0.0,0.0,0.0
3,female_ind,264.93590247466,67.7629345826051,3.90974659091387,9.29775251463649e-05,132.107590301272,397.764214648048,0.0371640729405268,0.0733944614184985,0.0355219653108901
4,oh_resident_ind,143.338235100739,196.826407221627,0.728246972162328,0.466478751654328,-242.479168225117,529.155638426595,0.0069966742183706,0.0902949047278045,0.0559047780695759
5,tot_cust_years,87.8241823128999,11.4801283097871,7.65010459317143,2.17603712826531e-14,65.320935127463,110.327429498337,0.0737969742281909,0.0715791600301097,0.0635181507440152
6,ny_resident_ind,16.0525708652234,103.207273743472,0.155537204723797,0.876400832999393,-186.253425175292,218.358566905739,0.0015889962430425,0.0880342926919872,0.165042119215222
7,tot_age,7.83627163006634,2.16098941419085,3.62624248809683,0.0002889452410728,3.60031906573905,12.0722241943936,0.0426583452138762,0.0086031529261327,0.370280118550217
8,ck_avg_bal,0.0279729509394047,0.0049312307387892,5.67261043361213,1.44380833955182e-08,0.0183067955403579,0.0376391063384515,0.0587584946742787,0.0878589163496794,0.187793751664804
9,sv_avg_bal,0.0269067316652558,0.0097613717234493,2.75644985433953,0.0058534022730103,0.0077725761197602,0.0460408872107513,0.0280738005083615,0.0850681541973243,0.159883522197768
10,tot_income,0.0203204625702955,0.0009404299424358,21.607630354328,0.0,0.0184770400157603,0.0221638851248307,0.219775655902042,0.064529395682861,0.157641731303966


In [30]:
SELECT * FROM ${QLID}.LinearRegressionDemo1_rpt order by 2 DESC;

Unnamed: 0,rid,Total Observations,Total Sum of Squares,Multiple Correlation Coefficient (R):,Squared Multiple Correlation Coefficient (1-Tolerance),Adjusted R-Squared,Standard Error of Estimate,Regression Sum of Squares,Regression Degrees of Freedom,Regression Mean-Square,Regression F Ratio,Regression P-Value,Residual Sum of Squares,Residual Degrees of Freedom,Residual Mean-Square,Output Database,Output Tablename,Dependent
1,1,10458,130982910494.896,0.300491105904658,0.0902949047278044,0.088726297417248,3378.53313798446,11827089424.1072,18,657060523.561511,57.5637408547898,0,119155821070.789,10439,11414486.1644592,demo_user,VAL_ADS,cc_avg_bal


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [31]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=LinearRegressionDemo1;
                           analysistype=linear');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,id,html
1,1,Linear Regression SummaryDatabasedemo_userTablenameVAL_ADSIndependentVariables18DependentVariablecc_avg_balConstantIncludeConstantStepwisenone
Database,demo_user,
Tablename,VAL_ADS,
IndependentVariables,18,
DependentVariable,cc_avg_bal,
Constant,IncludeConstant,
Stepwise,none,

0,1
Database,demo_user
Tablename,VAL_ADS
IndependentVariables,18
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none


2)  In order to showcase the "group by" feature of Linear Regression, lets build a model for each state_code;  to do this, the original ADS was first modifying to include state_code instead of the state indicator variables:

In [32]:
CREATE TABLE ${QLID}.VAL_ADS2 AS (
    SELECT 
        T1.cust_id  AS cust_id
       ,MIN(T1.income) AS tot_income
       ,MIN(T1.age) AS tot_age
       ,MIN(T1.years_with_bank) AS tot_cust_years
       ,MIN(T1.nbr_children) AS tot_children
       ,CASE WHEN MIN(T1.marital_status) = 1 THEN 1 ELSE 0 END AS single_ind
       ,CASE WHEN MIN(T1.gender) = 'F' THEN 1 ELSE 0 END AS female_ind
       ,CASE WHEN MIN(T1.marital_status) = 2 THEN 1 ELSE 0 END AS married_ind
       ,CASE WHEN MIN(T1.marital_status) = 3 THEN 1 ELSE 0 END AS separated_ind
       ,MAX(CASE WHEN T1.state_code = 'CA' THEN 'CA'
                 WHEN T1.state_code = 'NY' THEN 'NY'
                 WHEN T1.state_code = 'TX' THEN 'TX'
                 WHEN T1.state_code = 'IL' THEN 'IL'
                 WHEN T1.state_code = 'AZ' THEN 'AZ'
                 WHEN T1.state_code = 'OH' THEN 'OH' ELSE 'OTHER' END) AS state_code
       ,MAX(CASE WHEN T2.acct_type = 'CK' THEN 1 ELSE 0 END) AS ck_acct_ind
       ,MAX(CASE WHEN T2.acct_type = 'SV' THEN 1 ELSE 0 END) AS sv_acct_ind
       ,MAX(CASE WHEN T2.acct_type = 'CC' THEN 1 ELSE 0 END) AS cc_acct_ind
       ,AVG(CASE WHEN T2.acct_type = 'CK' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS ck_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'SV' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS sv_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'CC' THEN T2.starting_balance+T2.ending_balance ELSE 0 END) AS cc_avg_bal
       ,AVG(CASE WHEN T2.acct_type = 'CK' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS ck_avg_tran_amt
       ,AVG(CASE WHEN T2.acct_type = 'SV' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS sv_avg_tran_amt
       ,AVG(CASE WHEN T2.acct_type = 'CC' THEN T3.principal_amt+T3.interest_amt ELSE 0 END) AS cc_avg_tran_amt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 1 THEN T3.tran_id ELSE NULL END) AS q1_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 2 THEN T3.tran_id ELSE NULL END) AS q2_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 3 THEN T3.tran_id ELSE NULL END) AS q3_trans_cnt
       ,COUNT(CASE WHEN ((EXTRACT(MONTH FROM T3.tran_date) + 2) / 3) = 4 THEN T3.tran_id ELSE NULL END) AS q4_trans_cnt
    FROM ${XSPDB}.Customer AS T1
        LEFT OUTER JOIN ${XSPDB}.Accounts AS T2
            ON T1.cust_id = T2.cust_id
        LEFT OUTER JOIN ${XSPDB}.Transactions AS T3
            ON T2.acct_nbr = T3.acct_nbr
GROUP BY T1.cust_id) WITH DATA UNIQUE PRIMARY INDEX (cust_id);

Success: 0 rows affected

In [33]:
SELECT * FROM ${QLID}.VAL_ADS2 SAMPLE 10;

Unnamed: 0,cust_id,tot_income,tot_age,tot_cust_years,tot_children,single_ind,female_ind,married_ind,separated_ind,state_code,ck_acct_ind,sv_acct_ind,cc_acct_ind,ck_avg_bal,sv_avg_bal,cc_avg_bal,ck_avg_tran_amt,sv_avg_tran_amt,cc_avg_tran_amt,q1_trans_cnt,q2_trans_cnt,q3_trans_cnt,q4_trans_cnt
1,28627683,5117.3,34,4,4,0,1,1,0,OTHER,1,1,1,4217.9878125,709.456125,305.96,-14.1744875,1.83841875,2.92235,110,21,20,9
2,25900686,4612.2,60,7,1,0,1,0,1,OTHER,0,1,1,0.0,1022.5917,7989.512666666666,0.0,42.07663333333333,152.79356666666666,21,7,2,0
3,29987254,14284.6,79,9,0,0,1,1,0,NY,0,1,1,0.0,5524.053752941177,6131.313741176471,0.0,-7.690752941176471,7.265223529411765,19,19,15,32
4,21813232,32006.0,40,3,1,1,1,0,0,CA,1,1,1,4666.748694915254,224.77364406779665,801.9209039548023,-7.795949152542373,2.4558474576271188,-13.409418079096046,9,116,27,25
5,16354932,9184.8,69,9,3,0,0,1,0,CA,1,0,1,238.65336,0.0,590.43618,-2.077,0.0,20.899933333333333,116,12,9,13
6,17717973,25983.1,41,8,4,0,1,0,1,OTHER,1,0,1,1244.1804,0.0,3278.022066666667,31.55038888888889,0.0,36.05322222222223,26,22,19,23
7,24539418,71983.8,42,0,2,1,1,0,0,AZ,1,1,0,4513.0275,206.4525,0.0,-162.29179545454545,3.690310606060606,0.0,6,11,105,10
8,25894302,11446.0,46,3,4,0,1,1,0,CA,1,1,1,2374.594824324324,1597.143963963964,820.8101801801802,33.9775990990991,9.862436936936938,25.972072072072077,33,66,96,27
9,23171187,64076.1,60,6,0,0,0,0,0,NY,1,1,1,9061.22913402062,1164.25212371134,1030.3105154639177,-18.32652577319588,14.986458762886596,42.76696391752577,71,71,32,20
10,16359924,2585.1,43,3,3,1,1,0,0,CA,1,1,1,2027.921033898305,206.08222033898303,1672.8813559322034,-29.69248587570621,1.712768361581921,-37.18988700564972,9,116,27,25


In [34]:
%meta

Result Set ID: /home/jovyan/JupyterLabRoot/Teradata/Resultsets/2022.09.16_20.25.03.629_UTC
History ID:    235
Rows:          10 of 10
Parts:         2
Column Definitions:
    cust_id: INTEGER
    tot_income: DECIMAL(15, 1)
    tot_age: INTEGER
    tot_cust_years: INTEGER
    tot_children: INTEGER
    single_ind: BYTEINT
    female_ind: BYTEINT
    married_ind: BYTEINT
    separated_ind: BYTEINT
    state_code: VARCHAR(5)
    ck_acct_ind: BYTEINT
    sv_acct_ind: BYTEINT
    cc_acct_ind: BYTEINT
    ck_avg_bal: FLOAT(0, 0)
    sv_avg_bal: FLOAT(0, 0)
    cc_avg_bal: FLOAT(0, 0)
    ck_avg_tran_amt: FLOAT(0, 0)
    sv_avg_tran_amt: FLOAT(0, 0)
    cc_avg_tran_amt: FLOAT(0, 0)
    q1_trans_cnt: INTEGER
    q2_trans_cnt: INTEGER
    q3_trans_cnt: INTEGER
    q4_trans_cnt: INTEGER


Now, build a linear regression model for each state_code.  Coefficients and variable statistics are created within the LinearRegressionDemo2 table as specified by the outputtablename argument.  Note that model statistics are created withing the LinearRegressionDemo2_rpt table, one per each state_code.

In [35]:
call ${VALDB}.td_analyze('linear',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal;
                          dependent=cc_avg_bal;
                          outputdatabase=${QLID};
                          outputtablename=LinearRegressionDemo2;
                          groupby=state_code');

Success: 0 rows affected

In [36]:
SELECT * FROM ${QLID}.LinearRegressionDemo2 ORDER BY 1, 2;

Unnamed: 0,state_code,Column Name,B Coefficient,Standard Error,T Statistic,P-Value,Lower,Upper,Standard Coefficient,Incremental R-Squared,Squared Multiple Correlation Coefficient (1-Tolerance)
1,AZ,(Constant),-1339.44052399292,564.73808730409,-2.37179066562883,0.0182879617308801,-2450.46987174179,-228.41117624405,0.0,0.0,0.0
2,AZ,ck_acct_ind,234.859324924772,201.018461158362,1.16834704420381,0.24352841926255,-160.611451364482,630.330101214026,0.0579428825488167,0.185267905301922,0.219846574990083
3,AZ,ck_avg_bal,0.0647434003368303,0.015188141305836,4.26275994100429,2.65352177768463e-05,0.0348632291426137,0.094623571531047,0.214774368705887,0.380217931774224,0.244119165299696
4,AZ,female_ind,990.447417255364,197.347515906873,5.0187985022459,8.60611593100913e-07,602.198622270494,1378.69621224023,0.28363961978334,0.178829453480332,0.399238331296591
5,AZ,married_ind,588.229130076109,315.600341938741,1.86384186551448,0.0632510701894486,-32.6626601826073,1209.12092033483,0.178672724410335,0.142027945314128,0.791196347840124
6,AZ,separated_ind,217.174789339178,380.620551427962,0.570580827872827,0.568680530960531,-531.633574218421,965.983152896777,0.043632507249396,0.144437531868086,0.67186655722361
7,AZ,single_ind,-389.390794324363,347.600299156543,-1.12022571692034,0.263449839705402,-1073.23723983824,294.455651189512,-0.107520096778798,0.134090014286211,0.791710072718693
8,AZ,sv_acct_ind,-685.898591471242,201.861765073255,-3.39786284550882,0.0007641198677847,-1083.02842957974,-288.768753362748,-0.189392980035364,0.185307898165738,0.382381325956212
9,AZ,sv_avg_bal,0.206400977536822,0.0224512726032062,9.19328633100939,0.0,0.162231789523404,0.250570165550239,0.450400845961503,0.345350638368201,0.200571433910011
10,AZ,tot_age,17.7377185163389,6.89921789235563,2.57097526025267,0.0105886115100067,4.16464149666593,31.3107955360119,0.160235262476375,0.0599020635636228,0.506012354444638


In [37]:
SELECT * FROM ${QLID}.LinearRegressionDemo2_rpt ORDER BY 1, 2;

Unnamed: 0,rid,state_code,Total Observations,Total Sum of Squares,Multiple Correlation Coefficient (R):,Squared Multiple Correlation Coefficient (1-Tolerance),Adjusted R-Squared,Standard Error of Estimate,Regression Sum of Squares,Regression Degrees of Freedom,Regression Mean-Square,Regression F Ratio,Regression P-Value,Residual Sum of Squares,Residual Degrees of Freedom,Residual Mean-Square,Output Database,Output Tablename,Dependent
1,1,AZ,336,910449417.162019,0.61661814097075,0.380217931774224,0.357191972583173,1321.74036548687,346169194.378391,12.0,28847432.8648659,16.5125773315017,0.0,564280222.783629,323.0,1746997.59375736,demo_user,VAL_ADS2,cc_avg_bal
2,2,CA,2478,44113786169.4622,0.371007294884161,0.137646412857263,0.133448342656162,3928.45146734839,6072104423.7788,12.0,506008701.981567,32.7880207484806,0.0,38041681745.6834,2465.0,15432730.9313117,demo_user,VAL_ADS2,cc_avg_bal
3,3,IL,784,6715073765.02844,0.528971708848751,0.279811068762367,0.268601902517424,2504.50224893863,1878951967.01074,12.0,156579330.584228,24.9627012971269,0.0,4836121798.0177,771.0,6272531.51493865,demo_user,VAL_ADS2,cc_avg_bal
4,4,NY,1498,12610643233.2239,0.364545658972794,0.132893537475908,0.125886616566623,2713.5735932287,1675872989.10975,12.0,139656082.425813,18.9660393197528,0.0,10934770244.1142,1485.0,7363481.64586814,demo_user,VAL_ADS2,cc_avg_bal
5,5,OH,322,1963052345.80181,,,,,,,,,,,,,demo_user,VAL_ADS2,cc_avg_bal
6,6,OTHER,3878,57026027903.0422,0.36748558029803,0.13504565172698,0.132360153103623,3572.38621732915,7701117103.56728,12.0,641759758.630607,50.2869934664676,0.0,49324910799.475,3865.0,12761943.2857632,demo_user,VAL_ADS2,cc_avg_bal
7,7,TX,1162,7382858427.86142,0.470017274356145,0.22091623819318,0.212779593161255,2237.40359845336,1630993310.99596,12.0,135916109.249663,27.1507774182591,0.0,5751865116.86546,1149.0,5005974.86237203,demo_user,VAL_ADS2,cc_avg_bal


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [38]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=LinearRegressionDemo2;
                           analysistype=linear');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,state_code,html
Column Name,Value,Unnamed: 2_level_1
Column Name,Value,Unnamed: 2_level_2
Column Name,Value,Unnamed: 2_level_3
Column Name,Value,Unnamed: 2_level_4
Column Name,Value,Unnamed: 2_level_5
Column Name,Value,Unnamed: 2_level_6
Column Name,Value,Unnamed: 2_level_7
1,OTHER,Linear Regression SummaryDatabasedemo_userTablenameVAL_ADS2IndependentVariables12DependentVariablecc_avg_balConstantIncludeConstantStepwisenoneGroup By ColumnsColumn NameValuestate_codeOTHER
Database,demo_user,
Tablename,VAL_ADS2,
IndependentVariables,12,
DependentVariable,cc_avg_bal,
Constant,IncludeConstant,
Stepwise,none,
Column Name,Value,
state_code,OTHER,
2,NY,Linear Regression SummaryDatabasedemo_userTablenameVAL_ADS2IndependentVariables12DependentVariablecc_avg_balConstantIncludeConstantStepwisenoneGroup By ColumnsColumn NameValuestate_codeNY

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,OTHER

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,NY

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,TX

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none
ErrorMessage,Constant columns detected...run terminated.

Column Name,Value
state_code,OH

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,IL

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,CA

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_avg_bal
Constant,IncludeConstant
Stepwise,none

Column Name,Value
state_code,AZ


---

## Linear Regression Scoring

### Purpose

Linear Regression Scoring is the application of a Linear Regression model to an input table that containsthe same independent variable columns contained in the model. The result is an output score table that minimally contains one or more key columns and an estimate of the dependent variable in the model. The user may also choose to perform model evaluation, either separately or in combination with scoring. When
requested, a report is produced as a result data set containing the standard error of estimate as well as the minimum, maximum, and average absolute error. When model evaluation is requested, the input table must
contain a column representing the dependent variable in the model. When both scoring and evaluation are requested, the output table automatically includes the residual value, calculated as the difference between
the original value and the predicted value of the dependent variable. The residual value can also be requested when only scoring is performed.

### Required Parameters

- **database**

    The database containing the input table.

- **modeldatabase**

    The database containing the model input table.

- **modeltablename**

    The input table containing the linear model to use in scoring. This table must be created using the linear function, named with the outputtablename parameter.

- **tablename**

    The input table to score.

- **Linearscore**

    The Linearscore parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


### Optional Parameters

- **index**

    By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more different columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns needs to form a unique key for the score output table. Otherwise, there are more than one score for a given observation.

- **outputdatabase**

    The database that contains the output score table. If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon user database.

- **outputtablename**

    The name of the score output table containing key columns and predicted values of the dependent variable in the linear model. The output table may also contain retained columns passed through from the input to the output table unchanged, as well as a residual value containing the difference between the actual and predicted values of the dependent variable column. The output table may also contain group by columns if these are present in the model table.  If the output table exists, it must first be dropped by the user if outputdatabase and outputtablename are both specified. If outputdatabase and outputtablename are not both specified because only model evaluation is being performed, a volatile output table with a randomly generated name is created in the logon user database, and the output result set is returned to the user instead.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **predicted**

    If the score method is score or score and evaluate, the name of the predicted value column is entered here. If not entered here, the name of the dependent column in the input table is used.

- **residual**

    If the score method is score and evaluate, the name of a column that contains the residual value (the difference between the predicted and actual value of the dependent variable) is given here.  By default, this column is named “Residual”.

- **retain**

    One or more columns from the input table can optionally be specified here to be passed along to the score output table.
    
- **gensqlonly**

    When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.

- **samplescoresize**

    When a scoring function produces a score table, the user has the option to view a sample of the rows using the "samplescoresize=n" parameter, where n is an integer number of rows to view in a result set.  Cases where a sample is not returned include when you are only generating SQL and when you are only evaluating (i.e. not scoring).  By default, a sample of output score rows is not returned.

- **scoringmethod** 
    Three scoring methods are available as outlined below. By default, the model is scored but not evaluated.
    - Score
    - Evaluate
    - Score and Evaluate

---

1) First, lets score the single Linear Regression model created above - for demonstration purposes, we'll use the same VAL_ADS table to score.  The minimum, maximum and average absolute error, and the standard error of estimate are returned as a result set.  The scored data set includes the actual along with the prediction and the residual as we are evaluating as well.

In [39]:
call ${VALDB}.td_analyze('linearscore',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          modeldatabase=${QLID};
                          modeltablename=LinearRegressionDemo1;
                          outputdatabase=${QLID};
                          outputtablename=LinearRegressionScore1;
                          predicted=estimate;
                          retain=cc_avg_bal;
                          scoringmethod=scoreandevaluate;');

Success: 0 rows affected

In [40]:
SELECT * FROM ${QLID}.LinearRegressionScore1 SAMPLE 25;

Unnamed: 0,cust_id,cc_avg_bal,estimate,Residual
1,16350696,0.0,1299.177337026138,-1299.177337026138
2,29992248,2150.454622754491,2945.1147324105395,-794.6601096560483
3,13629520,3652.0726785714287,962.2402228213996,2689.832455750029
4,25900116,2206.005,1096.9488321358497,1109.0561678641504
5,25895689,0.0,1910.2424909120896,-1910.2424909120896
6,16351824,0.0,120.20264492450517,-120.20264492450517
7,25888906,4030.0,1364.6985014784896,2665.3014985215104
8,24541668,1420.84,2124.1750105705373,-703.3350105705375
9,28633122,2004.8605813953488,1483.702742457408,521.1578389379409
10,14987533,0.0,662.7602224787081,-662.7602224787081


2) Next lets score the multiple state_code Linear Regression models again using VAL_ADS2 table for demonstration purposes.  For each model, the minimum, maximum and average absolute error, and the standard error of estimate are returned as a result set.  The scored data set includes the actual along with the state_code, prediction and the residual as we are evaluating as well.

In [41]:
call ${VALDB}.td_analyze('linearscore',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          modeldatabase=${QLID};
                          modeltablename=LinearRegressionDemo2;
                          outputdatabase=${QLID};
                          outputtablename=LinearRegressionScore2;
                          predicted=estimate;
                          retain=cc_avg_bal;
                          scoringmethod=scoreandevaluate;');

Success: 0 rows affected

In [42]:
SELECT * FROM ${QLID}.LinearRegressionScore2 SAMPLE 25;

Unnamed: 0,cust_id,state_code,cc_avg_bal,estimate,Residual
1,27253240,NY,0.0,3069.2240937674646,-3069.2240937674646
2,20440620,NY,1202.241188571429,710.6651076142168,491.57608095721184
3,25893352,CA,575.7142857142857,1839.5512267163783,-1263.836941002093
4,14993242,OTHER,1651.3883606557376,1241.5839738543796,409.8043868013581
5,16355256,TX,1021.0324417177914,248.31777854708105,772.7146631707103
6,20439150,OTHER,4728.702,2269.84727300038,2458.85472699962
7,31338650,NY,0.0,169.88044361030845,-169.88044361030845
8,28620375,OTHER,1392.750506329114,687.0608330700238,705.6896732590901
9,23162364,TX,1482.9923076923078,1510.92876351749,-27.936455825182364
10,13631790,IL,0.0,1812.538743091116,-1812.538743091116


---

## Factor Analysis - PCA

### Purpose

Factor Analysis is one of the most fundamental types of statistical analysis, and Principal Components Analysis (PCA), is arguably the most common variety of Factor Analysis.  In Factor Analysis, a set of variables (in this case columns) is reduced to a smaller number of factors that account for most of the variance in the variables.  This can be useful in reducing the number of variables by converting them to factors, or in gaining insight into the nature of the variables when they are used for further data analysis. Additionally, the Factor Analysis scoring process expresses each factor as a linear combination of the input columns.   The score output table contains one or more index (key) columns and factor score columns, one for each factor.  

Some of the key features of version of Factor Analysis are outlined below:

- The Teradata supplied table operator CALCMATRIX is used to build a table that represents a correlation matrix for input to Factor Analysis.  To avoid rebuilding this matrix every time the algorithm is run, the user may run the Matrix Analysis separately, saving an ESSCP matrix in a table that can then be input to Factor Analysis.  Refer to the matrixdatabase and matrixtablename parameters.

- One or more group by columns may optionally be specified so that an input matrix is built for each combination of group by column values, and subsequently a separate Factor Analysis model is built for each matrix.  To achieve this, the names of the group by columns are passed to CALCMATRIX as parameters, so it includes them as columns in the matrix table it creates.  Refer to the groupby parameter.

- A Near Dependency Report is available to identify two or more columns that may be collinear.  This report can be requested by setting parameter neardependencyreport=true and if desired, conditionindexthreshold (default 30) and varianceproportionthreshold (default 0.5).

- Both orthogonal and oblique factor rotations are available.  Refer to the rotationtype parameter.

- There are three Prime Factor reports available.  Refer to parameters factorloadingsreport, factorvariablesreport and factorvariablesloadingsreport.

- The algorithm is partially scalable because the size of each input matrix depends only on the number of independent variables (columns) and not on the size of the input table. The calculations performed on the client workstation however are not scalable when group by columns are used, because each model is built serially based on each matrix in the matrix table.

### Required Parameters

- **columns**

    The input columns representing the variables used in building a factor analysis model.  The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.  For example, columns=c1,c2,c3. When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.

- **database**

    The database containing the input table.

- **tablename**

    The input table to build a factor model from.

- **Factor**

    The Factor parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


### Optional Parameters

- **conditionindexthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table if specified.  One of the threshold parameters for that report is conditionindexthreshold with a default value of 30.

- **eigenmin**

    The minimum eigenvalue to include factors for.  The default is 1.0.

- **flr or factorloadingsreport**

    The Prime Factor Loadings Report in which rows are variables and columns are factors, matching each variable with the factor it has the biggest absolute loading value with.  To request it, set factorloadingsreport=true, or flr=true.

- **fvlr or factorvariablesloadingsreport**

    The Prime Factor Variables with Loadings report is equivalent to the Prime Factor Variables report with the addition of the loading values that determined the relationship between factors and variables. The absolute sizes of the loading values point out the relationship strength and the sign its direction, i.e. either a positive or negative correlation. To request it, set factorvariablesloadingsreport=true, or fvlr=true.

- **fvr or factorvariablesreport**

    The Prime Factor Variables Report in which rows are variables and columns are factors, matching variables with their prime factors, and if a threshold is used, possibly other than prime factors.  (Either a threshold percent may be specified with the thresholdpercent parameter, or a threshold loading may be specified with the thresholdloading parameter.) To request it, set factorvariablesreport=true, or fvr=true.

- **gamma**

    If a factor rotation is requested of the type orthomax or orthomin, a parameter in the rotation equation called gamma must be set by the user using this parameter.

- **groupby**

    The input columns dividing the input table into partitions, one for each combination of values in the group by columns. For each partition or combination of values a separate factor model is built.  The columns must reside in the table named with the tablename parameter. The default case is no group by columns.  For example:   groupby=column1,column2

- **matrixdatabase**

    The database where the matrix table resides if specified, as indicated by the matrixtablename parameter.

- **matrixtablename**

    Instead of internally building a matrix with the Matrix function each time a Factor Analysis is performed, the user may build an ESSCP Matrix once with the Matrix Analysis and save it to a table with this name in matrixdatabase.  The matrix can subsequently be read from this table instead of re-building it each time.  If the matrix table is specified, the columns specified with the columns parameter may be a subset of the columns in this matrix and may be specified in any order.  The columns must however all be present in the matrix.  Further, if group by columns are specified in the matrix, these same group by columns must be specified in the Factor Analysis.

- **matrixtype**

    The type of matrix for processing can be either correlation or covariance, affecting measure and score scaling. The default is correlation.

- **neardependencyreport**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table if specified.  Two threshold parameters are available for  this report, conditionindexthreshold (default 30) and varianceproportionthreshold (default 0.5).

- **outputdatabase**

    The database that contains the resulting output table that represents one or more factor models.  If outputdatabase and outputtablename are not both specified, a volatile output table with randomly generated name is created in the logon user database.

- **outputtablename**

    The name of the output table representing one or more factor models. A second output table reporting statistical measures is automatically named on the user’s behalf by appending "_rpt" to the end of this name.  If outputdatabase and outputtablename are not both specified, volatile output tables with randomly generated names are created in the logon user database, and the two output result sets are returned to the user instead.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **rotationtype**

    Various schemes are provided for rotating factors for possibly better results.  Both orthogonal and oblique rotations are provided.  The default is rotationtype=none.  A parameter in the rotation equation, called below gamma, assumes a different value for each rotation type, with f the number of factors and v the number of variables.  In some cases, the user sets the value of gamma using the td_analyze parameter of the same name.


| rotationtype | gamma value | orthogonal / oblique | notes|
|---|---|---|---|
| equamax | f / 2 | orthogonal | |
| orthomax | set by user | orthogonal | |
| parsimax | v ( f – 1 )  /  ( v + f + 2) | orthogonal | |
| quartimax | 0.0 | orthogonal |  |
| varimax | 1.0 | orthogonal | |
| biquartimin | 0.5 | oblique | |
| covarimin | 1.0 | oblique | least oblique rotation |
| orthomin | set by user | oblique | |
| quartimin | 0.0 | oblique | most oblique rotation |


- **thresholdloading**

    When the Prime Factor Variables Report is selected, variables are matched with their prime factors, and if the threshold value in this parameter is used, possibly other than prime factors.  Specifically, if a threshold factor loading is specified, a factor that is not a prime factor may be associated with a variable.  For example, thresholdloading=0.5.  Note that thresholdloading and thresholdpercent may not both be specified.

- **thresholdpercent**

    When the Prime Factor Variables Report is selected, variables are matched with their prime factors, and if the threshold value in this parameter is used, possibly other than prime factors.  Specifically, if a threshold percent less than 1.0 is specified, a factor that is not a prime factor may be associated with a variable.  For example, thresholdpercent=0.9.  Note that thresholdloading and thresholdpercent may not both be specified.

- **varianceproportionthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table, if specified.  One of the threshold parameters available for this report is varianceproportionthreshold with a default value of 0.5.
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1.  Using the VAL_ADS table, perform a factor analysis with all reporting options.  In this case, the model is stored in an XML string (xmlmodel) within the outputtablename specified.

In [43]:
call ${VALDB}.td_analyze('factor',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind;
                          outputdatabase=${QLID};
                          outputtablename=FactorAnalysisOut1;
                          flr=true;fvr=true;fvlr=true');

Success: 0 rows affected

In [44]:
SELECT * FROM ${QLID}.FactorAnalysisOut1

Unnamed: 0,partid,mfactors,modelstatus,xmlmodel
1,1,9,SUCCEEDED,demo_userVAL_ADS1819CorrelationNonetot_age44.33639319.266299tot_income30066.26892338278.045923tot_cust_years5.7647732.973913tot_children1.7862881.462723single_ind0.3694780.482687married_ind0.4725570.49927separated_ind0.0642570.245222female_ind0.5595720.496462ck_acct_ind0.6961180.459954sv_acct_ind0.5635880.495964sv_avg_bal1203.3071733692.699997ck_avg_bal3595.5612067434.232542ca_resident_ind0.2369480.42523ny_resident_ind0.143240.350334tx_resident_ind0.1111110.314285il_resident_ind0.0749670.26335az_resident_ind0.0321290.17635oh_resident_ind0.030790.1727562.813191E-1 2.977534E-2 2.257887E-1 1.834721E-2 7.366529E-2 2.847132E-1 -1.418722E-2 9.092133E-2 -1.588252E-21.700978E-1 2.631382E-1 -8.775494E-2 4.243383E-2 -3.040128E-1 5.832003E-2 3.066311E-2 -6.855512E-2 -1.447992E-15.736072E-2 -2.03555E-2 1.044345E-2 2.092601E-1 -2.486306E-1 4.432949E-1 4.385155E-2 1.053655E-1 3.326007E-21.609818E-1 9.645489E-3 -3.335337E-1 1.08625E-1 1.395655E-1 -3.612698E-1 2.670881E-2 -1.716327E-1 1.692188E-1-3.798187E-1 1.670198E-2 2.421863E-2 -1.150943E-1 -1.450842E-1 1.131512E-1 3.801863E-2 1.123557E-3 -6.964218E-23.393002E-1 -2.223232E-1 5.688682E-2 3.659701E-2 1.728254E-3 4.883455E-3 -6.854209E-2 8.11486E-2 5.237677E-23.269372E-2 3.344452E-1 -7.5128E-2 1.693425E-1 1.862483E-1 -2.229234E-1 2.566473E-2 -1.103224E-1 -4.866492E-2-4.648557E-2 1.763587E-2 2.758934E-2 1.47811E-1 4.680281E-1 1.151005E-1 8.16591E-2 1.022859E-1 3.108327E-12.867822E-2 3.774071E-1 3.911766E-2 -2.777214E-1 2.440193E-1 1.723455E-1 4.784753E-2 -7.822679E-2 1.578173E-1-1.040754E-1 2.014672E-1 2.968698E-1 3.527857E-1 1.418242E-1 -2.054335E-2 -9.823925E-2 -2.366773E-2 1.99874E-1-5.670703E-2 8.609891E-2 9.539962E-2 5.463864E-1 -2.23749E-1 -2.059685E-2 -4.846276E-2 1.659714E-1 1.165914E-17.362866E-2 3.981538E-1 -9.343403E-2 -1.536408E-1 -1.957077E-1 1.920048E-1 1.099108E-1 -4.143809E-3 2.596851E-2-5.395316E-2 -5.024391E-2 -5.211327E-1 1.910156E-1 1.043789E-1 3.414175E-1 -7.87947E-2 -1.165359E-1 -3.881122E-22.487401E-2 -8.411754E-2 3.151397E-1 3.526849E-2 -7.82686E-2 -6.459246E-2 5.878359E-1 -4.368198E-1 2.38407E-21.410766E-2 6.465638E-2 2.181256E-1 -1.142155E-1 -1.539773E-1 -1.247406E-1 -6.682691E-1 -2.546987E-1 1.201595E-18.625832E-3 1.263797E-1 -7.702007E-2 -4.578651E-2 -2.231432E-1 -3.821692E-1 1.769291E-1 5.665579E-1 7.494812E-21.340601E-2 6.649393E-2 1.628019E-1 3.766552E-2 3.231933E-1 3.680121E-3 -5.312409E-2 2.836868E-1 -7.225482E-1-5.622641E-3 -6.082237E-2 6.205649E-2 -2.205629E-1 7.389233E-2 1.205774E-1 1.471019E-2 3.731686E-1 4.62432E-16.631739E-1 4.807555E-2 3.040864E-1 2.447903E-2 8.95737E-2 3.343359E-1 -1.618986E-2 9.808405E-2 -1.596695E-24.009837E-1 4.248655E-1 -1.181861E-1 5.661566E-2 -3.696659E-1 6.848464E-2 3.499144E-2 -7.395584E-2 -1.455689E-11.352206E-1 -3.286618E-2 1.406497E-2 2.79197E-1 -3.023238E-1 5.205569E-1 5.004152E-2 1.136661E-1 3.343688E-23.794941E-1 1.55737E-2 -4.491947E-1 1.449285E-1 1.697054E-1 -4.242356E-1 3.047895E-2 -1.851537E-1 1.701184E-1-8.953739E-1 2.696717E-2 3.261703E-2 -1.5356E-1 -1.76416E-1 1.328724E-1 4.338524E-2 1.21207E-3 -7.001238E-27.998568E-1 -3.589652E-1 7.661371E-2 4.882811E-2 2.101479E-3 5.734594E-3 -7.821732E-2 8.754143E-2 5.26552E-27.707125E-2 5.399984E-1 -1.011805E-1 2.259385E-1 2.264696E-1 -2.617768E-1 2.92875E-2 -1.190136E-1 -4.892361E-2-1.095838E-1 2.847504E-2 3.715662E-2 1.97211E-1 5.691012E-1 1.351614E-1 9.31859E-2 1.103439E-1 3.12485E-16.760523E-2 6.093651E-1 5.268266E-2 -3.705387E-1 2.967165E-1 2.023836E-1 5.460157E-2 -8.438944E-2 1.586562E-1-2.453444E-1 3.252909E-1 3.998166E-1 4.706903E-1 1.724519E-1 -2.412386E-2 -1.121064E-1 -2.553226E-2 2.009365E-1-1.336796E-1 1.390161E-1 1.284818E-1 7.289944E-1 -2.720687E-1 -2.418667E-2 -5.530363E-2 1.790465E-1 1.172112E-11.735701E-1 6.42863E-1 -1.258346E-1 -2.049892E-1 -2.379718E-1 2.254694E-1 1.254256E-1 -4.470255E-3 2.610656E-2-1.271877E-1 -8.112431E-2 -7.018482E-1 2.54855E-1 1.269201E-1 4.009232E-1 -8.991715E-2 -1.257165E-1 -3.901753E-25.86373E-2 -1.35817E-1 4.244221E-1 4.705558E-2 -9.517112E-2 -7.585029E-2 6.708134E-1 -4.712322E-1 2.396743E-23.3257E-2 1.043948E-1 2.93766E-1 -1.523875E-1 -1.872296E-1 -1.464817E-1 -7.626003E-1 -2.747637E-1 1.207983E-12.03343E-2 2.040539E-1 -1.037287E-1 -6.108883E-2 -2.713322E-1 -4.487776E-1 2.01904E-1 6.111909E-1 7.534653E-23.160295E-2 1.073617E-1 2.192575E-1 5.025373E-2 3.929886E-1 4.321531E-3 -6.062296E-2 3.060354E-1 -7.263892E-1-1.325466E-2 -9.820439E-2 8.35761E-2 -2.942772E-1 8.984977E-2 1.415928E-1 1.678664E-2 4.025666E-1 4.648902E-1Factor 12.357372Factor 21.61461Factor 31.346774Factor 41.33421Factor 51.215955Factor 61.17429Factor 71.141157Factor 81.078779Factor 91.005316(Factor 10)0.989306(Factor 11)0.940484(Factor 12)0.844232(Factor 13)0.825491(Factor 14)0.636242(Factor 15)0.517018(Factor 16)0.427191(Factor 17)0.414034(Factor 18)0.137537tot_age0.6631740.0480760.3040860.0244790.0895740.334336-0.016190.098084-0.015967tot_income0.4009840.424866-0.1181860.056616-0.3696660.0684850.034991-0.073956-0.145569tot_cust_years0.135221-0.0328660.0140650.279197-0.3023240.5205570.0500420.1136660.033437tot_children0.3794940.015574-0.4491950.1449290.169705-0.4242360.030479-0.1851540.170118single_ind-0.8953740.0269670.032617-0.15356-0.1764160.1328720.0433850.001212-0.070012married_ind0.799857-0.3589650.0766140.0488280.0021010.005735-0.0782170.0875410.052655separated_ind0.0770710.539998-0.101180.2259390.22647-0.2617770.029288-0.119014-0.048924female_ind-0.1095840.0284750.0371570.1972110.5691010.1351610.0931860.1103440.312485ck_acct_ind0.0676050.6093650.052683-0.3705390.2967170.2023840.054602-0.0843890.158656sv_acct_ind-0.2453440.3252910.3998170.470690.172452-0.024124-0.112106-0.0255320.200936sv_avg_bal-0.133680.1390160.1284820.728994-0.272069-0.024187-0.0553040.1790460.117211ck_avg_bal0.173570.642863-0.125835-0.204989-0.2379720.2254690.125426-0.004470.026107ca_resident_ind-0.127188-0.081124-0.7018480.2548550.126920.400923-0.089917-0.125716-0.039018ny_resident_ind0.058637-0.1358170.4244220.047056-0.095171-0.075850.670813-0.4712320.023967tx_resident_ind0.0332570.1043950.293766-0.152387-0.18723-0.146482-0.7626-0.2747640.120798il_resident_ind0.0203340.204054-0.103729-0.061089-0.271332-0.4487780.2019040.6111910.075347az_resident_ind0.0316030.1073620.2192570.0502540.3929890.004322-0.0606230.306035-0.726389oh_resident_ind-0.013255-0.0982040.083576-0.2942770.089850.1415930.0167870.4025670.464890.681581Factor 12.35737213.09651113.0965111Factor 21.614618.97005422.0665651.208315Factor 31.3467747.4820829.5486451.323021Factor 41.334217.41227936.9609241.329235Factor 51.2159556.75530743.7162321.392372Factor 61.174296.52383550.2400661.416858Factor 71.1411576.33976456.579831.43728Factor 81.0787795.99321862.5730481.47825Factor 91.0053165.58508868.1581361.5313090.0790180.0840056.342751E-60.523565single_indFactor 1-0.89537394032842450.0269671720944991870.03261702622966789-0.1535599856747894-0.176415963657603310.132872387305047830.043385244595803050.001212070192397315-0.070012381021729married_indFactor 10.7998568389973855-0.358965234619822440.076613713971753510.048828107358916690.0021014792016442360.005734594098726444-0.07821732143033180.087541433092384460.05265519570172911tot_ageFactor 10.66317386495493880.048075550412065950.30408637525052530.0244790330238816060.089573696771628120.334335922588512-0.0161898564166458940.09808404661127361-0.015966946882214214ck_avg_balFactor 20.173570138845631860.6428629539199964-0.12583456306424928-0.20498919232104196-0.237971833205201880.22546938993977220.1254255851709459-0.0044702551391003850.0261065555361708ck_acct_indFactor 20.067605233712547340.60936514435015960.052682659907771254-0.37053867697261170.296716530338845150.202383611575570.05460157002784907-0.084389444930477650.15865623882110874separated_indFactor 20.077071253772582630.539998398897656-0.10118046693706790.225938518842359770.2264696332958248-0.261776799763929540.029287501597022945-0.11901357096018866-0.048923614130792714tot_incomeFactor 20.400983698543365740.4248655246715962-0.118186106457028730.05661565651131216-0.36966592370243370.06848464484094580.03499143614402962-0.0739558394847571-0.1455689090935625ca_resident_indFactor 3-0.12718765787964736-0.08112430826295611-0.7018481871058310.25485499546236880.1269200778571340.40092323536220914-0.08991715437565095-0.12571646719119536-0.03901753482994216tot_childrenFactor 30.37949406144776540.01557369974612292-0.449194666466727570.144928521564025050.1697054372894722-0.424235599360923250.0304789528302452-0.185153748775960660.17011835570793216sv_avg_balFactor 4-0.13367955416824930.139016129730548280.12848176999274780.728994373789633-0.2720687267997494-0.024186674867763772-0.055303634483687030.17904647675151360.11721122439508141sv_acct_indFactor 4-0.24534436354719320.325290928198770650.3998165878819980.470690268795248950.1724519422970948-0.024123860999853475-0.11210644921319497-0.0255322585799601380.20093647031117826female_indFactor 5-0.109583783415590230.0284750433370524540.037156620449956820.197210958273997930.56910123970975060.135161369348141270.093185895723882380.110343892727000240.3124850133958876tot_cust_yearsFactor 60.13522055294174914-0.032866180929693420.0140649711064986980.2791969772078426-0.30232375827113070.52055687053020530.050041518827593410.113666092607337270.033436876738304tx_resident_indFactor 70.033257000473707370.104394819541834760.2937660389881562-0.15238745983556762-0.18722955866334587-0.1464817037247476-0.7626002518520242-0.274763682970687150.12079828486000523ny_resident_indFactor 70.05863729607602924-0.13581699367526220.424422142536568240.04705558343813324-0.09517112352708254-0.07585028967308030.6708133722595968-0.47123219082661930.023967427801346725il_resident_indFactor 80.020334295596928820.2040539165393426-0.10372865678017146-0.06108883066921966-0.27133215965472834-0.448777569648176060.201903971966924280.61119093944253230.07534652670727195az_resident_indFactor 90.0316029495867287750.107361740343800510.219257464454734240.0502537262244528850.39298857795839820.0043215305131227835-0.0606229558346254240.3060354026851511-0.7263891661290925oh_resident_indFactor 9-0.013254655365187332-0.098204389622593520.0835760976160488-0.29427724657694080.08984977243564680.141592843281070390.0167866378594590.402566584994204170.4648902266773281Factor1single_indmarried_indtot_ageFactor2ck_avg_balck_acct_indseparated_indtot_incomeFactor3ca_resident_indtot_childrenFactor4sv_avg_balsv_acct_indFactor5female_indFactor6tot_cust_yearsFactor7tx_resident_indny_resident_indFactor8il_resident_indFactor9az_resident_indoh_resident_indFactor 1single_ind-0.895374Factor 1married_ind0.799857Factor 1tot_age0.663174Factor 2ck_avg_bal0.642863Factor 2ck_acct_ind0.609365Factor 2separated_ind0.539998Factor 2tot_income0.424866Factor 3ca_resident_ind-0.701848Factor 3tot_children-0.449195Factor 4sv_avg_bal0.728994Factor 4sv_acct_ind0.47069Factor 5female_ind0.569101Factor 6tot_cust_years0.520557Factor 7tx_resident_ind-0.7626Factor 7ny_resident_ind0.670813Factor 8il_resident_ind0.611191Factor 9az_resident_ind-0.726389Factor 9oh_resident_ind0.46489


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [45]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=FactorAnalysisOut1;
                           analysistype=factor');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0
Column Name,Mean,Standard Deviation,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6,Factor 7,Factor 8,Factor 9
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4
1,1,Factor Analysis SummaryDatabasedemo_userTablenameVAL_ADSNumberOfVariables18MinimumEigenvalue1NumberOfFactors9MatrixTypeCorrelationRotationNoneVariable StatisticsColumn NameMeanStandard Deviationtot_age44.33639319.266299tot_income30066.26892338278.045923tot_cust_years5.7647732.973913tot_children1.7862881.462723single_ind0.3694780.482687married_ind0.4725570.49927separated_ind0.0642570.245222female_ind0.5595720.496462ck_acct_ind0.6961180.459954sv_acct_ind0.5635880.495964sv_avg_bal1203.3071733692.699997ck_avg_bal3595.5612067434.232542ca_resident_ind0.2369480.42523ny_resident_ind0.143240.350334tx_resident_ind0.1111110.314285il_resident_ind0.0749670.26335az_resident_ind0.0321290.17635oh_resident_ind0.030790.172756EigenvaluesFactor 12.357372Factor 21.61461Factor 31.346774Factor 41.33421Factor 51.215955Factor 61.17429Factor 71.141157Factor 81.078779Factor 91.005316(Factor 10)0.989306(Factor 11)0.940484(Factor 12)0.844232(Factor 13)0.825491(Factor 14)0.636242(Factor 15)0.517018(Factor 16)0.427191(Factor 17)0.414034(Factor 18)0.137537Principal Component LoadingsColumn NameFactor 1Factor 2Factor 3Factor 4Factor 5Factor 6Factor 7Factor 8Factor 9tot_age0.6631740.0480760.3040860.0244790.0895740.334336-0.016190.098084-0.015967tot_income0.4009840.424866-0.1181860.056616-0.3696660.0684850.034991-0.073956-0.145569tot_cust_years0.135221-0.0328660.0140650.279197-0.3023240.5205570.0500420.1136660.033437tot_children0.3794940.015574-0.4491950.1449290.169705-0.4242360.030479-0.1851540.170118single_ind-0.8953740.0269670.032617-0.15356-0.1764160.1328720.0433850.001212-0.070012married_ind0.799857-0.3589650.0766140.0488280.0021010.005735-0.0782170.0875410.052655separated_ind0.0770710.539998-0.101180.2259390.22647-0.2617770.029288-0.119014-0.048924female_ind-0.1095840.0284750.0371570.1972110.5691010.1351610.0931860.1103440.312485ck_acct_ind0.0676050.6093650.052683-0.3705390.2967170.2023840.054602-0.0843890.158656sv_acct_ind-0.2453440.3252910.3998170.470690.172452-0.024124-0.112106-0.0255320.200936sv_avg_bal-0.133680.1390160.1284820.728994-0.272069-0.024187-0.0553040.1790460.117211ck_avg_bal0.173570.642863-0.125835-0.204989-0.2379720.2254690.125426-0.004470.026107ca_resident_ind-0.127188-0.081124-0.7018480.2548550.126920.400923-0.089917-0.125716-0.039018ny_resident_ind0.058637-0.1358170.4244220.047056-0.095171-0.075850.670813-0.4712320.023967tx_resident_ind0.0332570.1043950.293766-0.152387-0.18723-0.146482-0.7626-0.2747640.120798il_resident_ind0.0203340.204054-0.103729-0.061089-0.271332-0.4487780.2019040.6111910.075347az_resident_ind0.0316030.1073620.2192570.0502540.3929890.004322-0.0606230.306035-0.726389oh_resident_ind-0.013255-0.0982040.083576-0.2942770.089850.1415930.0167870.4025670.46489VarianceFactor Variance to Total Variance Ratio0.681581Variance Explained By FactorsFactorVariancePercent of TotalCumulative PercentCondition IndicesFactor 12.35737213.09651113.0965111Factor 21.614618.97005422.0665651.208315Factor 31.3467747.4820829.5486451.323021Factor 41.334217.41227936.9609241.329235Factor 51.2159556.75530743.7162321.392372Factor 61.174296.52383550.2400661.416858Factor 71.1411576.33976456.579831.43728Factor 81.0787795.99321862.5730481.47825Factor 91.0053165.58508868.1581361.531309DifferenceAbsolute DifferenceMeanStandard DeviationMinimumMaximum0.0790180.0840056.342751E-60.523565,,,,,,,
Database,demo_user,,,,,,,,
Tablename,VAL_ADS,,,,,,,,
NumberOfVariables,18,,,,,,,,
MinimumEigenvalue,1,,,,,,,,
NumberOfFactors,9,,,,,,,,
MatrixType,Correlation,,,,,,,,
Rotation,,,,,,,,,
Column Name,Mean,Standard Deviation,,,,,,,
tot_age,44.336393,19.266299,,,,,,,

0,1
Database,demo_user
Tablename,VAL_ADS
NumberOfVariables,18
MinimumEigenvalue,1
NumberOfFactors,9
MatrixType,Correlation
Rotation,

Column Name,Mean,Standard Deviation
tot_age,44.336393,19.266299
tot_income,30066.268923,38278.045923
tot_cust_years,5.764773,2.973913
tot_children,1.786288,1.462723
single_ind,0.369478,0.482687
married_ind,0.472557,0.49927
separated_ind,0.064257,0.245222
female_ind,0.559572,0.496462
ck_acct_ind,0.696118,0.459954
sv_acct_ind,0.563588,0.495964

0,1
Factor 1,2.357372
Factor 2,1.61461
Factor 3,1.346774
Factor 4,1.33421
Factor 5,1.215955
Factor 6,1.17429
Factor 7,1.141157
Factor 8,1.078779
Factor 9,1.005316
(Factor 10),0.989306

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6,Factor 7,Factor 8,Factor 9
tot_age,0.663174,0.048076,0.304086,0.024479,0.089574,0.334336,-0.01619,0.098084,-0.015967
tot_income,0.400984,0.424866,-0.118186,0.056616,-0.369666,0.068485,0.034991,-0.073956,-0.145569
tot_cust_years,0.135221,-0.032866,0.014065,0.279197,-0.302324,0.520557,0.050042,0.113666,0.033437
tot_children,0.379494,0.015574,-0.449195,0.144929,0.169705,-0.424236,0.030479,-0.185154,0.170118
single_ind,-0.895374,0.026967,0.032617,-0.15356,-0.176416,0.132872,0.043385,0.001212,-0.070012
married_ind,0.799857,-0.358965,0.076614,0.048828,0.002101,0.005735,-0.078217,0.087541,0.052655
separated_ind,0.077071,0.539998,-0.10118,0.225939,0.22647,-0.261777,0.029288,-0.119014,-0.048924
female_ind,-0.109584,0.028475,0.037157,0.197211,0.569101,0.135161,0.093186,0.110344,0.312485
ck_acct_ind,0.067605,0.609365,0.052683,-0.370539,0.296717,0.202384,0.054602,-0.084389,0.158656
sv_acct_ind,-0.245344,0.325291,0.399817,0.47069,0.172452,-0.024124,-0.112106,-0.025532,0.200936

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.357372,13.096511,13.096511,1.0
Factor 2,1.61461,8.970054,22.066565,1.208315
Factor 3,1.346774,7.48208,29.548645,1.323021
Factor 4,1.33421,7.412279,36.960924,1.329235
Factor 5,1.215955,6.755307,43.716232,1.392372
Factor 6,1.17429,6.523835,50.240066,1.416858
Factor 7,1.141157,6.339764,56.57983,1.43728
Factor 8,1.078779,5.993218,62.573048,1.47825
Factor 9,1.005316,5.585088,68.158136,1.531309

Mean,Standard Deviation,Minimum,Maximum
0.079018,0.084005,6.342751e-06,0.523565


2.  Using the VAL_ADS2 table, perform a factor analysis with all reporting options, grouping by state_code, creating a factor model for each state.  In this case, the model is stored in an XML string (xmlmodel) within the outputtablename specified for each state_code.

In [46]:
call ${VALDB}.td_analyze('factor',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal;
                          outputdatabase=${QLID};
                          outputtablename=FactorAnalysisOut2;
                          flr=true;fvr=true;fvlr=true;
                          groupby=state_code');

Success: 0 rows affected

In [47]:
SELECT * FROM ${QLID}.FactorAnalysisOut2

Unnamed: 0,state_code,partid,mfactors,modelstatus,xmlmodel
1,IL,1,6,SUCCEEDED,demo_userVAL_ADS21214CorrelationNonetot_age41.69642919.67234tot_income36907.45395450113.243726tot_cust_years5.4107142.955988tot_children1.9642861.409238single_ind0.3571430.479463married_ind0.4464290.497439separated_ind0.0892860.285338female_ind0.5178570.5ck_acct_ind0.7321430.443125sv_acct_ind0.5178570.5sv_avg_bal1844.8063515658.67812ck_avg_bal5803.12473913942.1898542.682555E-1 -7.474626E-2 1.383107E-1 -3.414532E-22.01812E-1 2.904799E-1 -2.158214E-2 -1.019604E-18.61163E-2 4.733857E-2 3.366346E-1 -4.316687E-11.720667E-1 4.606839E-3 -1.748118E-1 3.079142E-1-3.312219E-1 3.522033E-2 4.45492E-2 -1.741037E-12.487636E-1 -2.515315E-1 -1.602894E-2 9.761367E-28.297364E-2 3.642135E-1 -6.768218E-2 4.401047E-22.374743E-2 -1.841171E-2 2.922158E-1 4.604946E-1-7.345618E-2 1.189676E-1 -3.515608E-1 2.673119E-1-1.254946E-1 2.129389E-1 2.194361E-1 3.483551E-14.387665E-2 2.656537E-1 3.507654E-1 8.55762E-29.797488E-2 2.533278E-1 -2.849164E-1 -2.348655E-17.422249E-1 -1.54444E-1 2.189669E-1 -4.258394E-25.583851E-1 6.002025E-1 -3.416782E-2 -1.271588E-12.382716E-1 9.781305E-2 5.329438E-1 -5.383506E-14.76084E-1 9.518855E-3 -2.767538E-1 3.840116E-1-9.164439E-1 7.27738E-2 7.052818E-2 -2.171315E-16.882934E-1 -5.197254E-1 -2.537626E-2 1.217378E-12.295763E-1 7.52554E-1 -1.071512E-1 5.488716E-26.570578E-2 -3.80431E-2 4.626222E-1 5.743005E-1-2.032428E-1 2.45816E-1 -5.565744E-1 3.33375E-1-3.472256E-1 4.399838E-1 3.474009E-1 4.344471E-11.214004E-1 5.489054E-1 5.553152E-1 1.067254E-12.710826E-1 5.234371E-1 -4.510661E-1 -2.929098E-1Factor 12.766858Factor 22.066244Factor 31.583153Factor 41.247139(Factor 5)0.952025(Factor 6)0.840843(Factor 7)0.680523(Factor 8)0.623055(Factor 9)0.476341(Factor 10)0.39306(Factor 11)0.250047(Factor 12)0.120713tot_age0.742225-0.1544440.218967-0.042584tot_income0.5583850.600203-0.034168-0.127159tot_cust_years0.2382720.0978130.532944-0.538351tot_children0.4760840.009519-0.2767540.384012single_ind-0.9164440.0727740.070528-0.217131married_ind0.688293-0.519725-0.0253760.121738separated_ind0.2295760.752554-0.1071510.054887female_ind0.065706-0.0380430.4626220.574301ck_acct_ind-0.2032430.245816-0.5565740.333375sv_acct_ind-0.3472260.4399840.3474010.434447sv_avg_bal0.12140.5489050.5553150.106725ck_avg_bal0.2710830.523437-0.451066-0.292910.638616Factor 12.76685823.05714923.0571491Factor 22.06624417.21870240.2758511.157184Factor 31.58315313.19293853.4687891.322002Factor 41.24713910.39282163.861611.4894850.0975280.1042280.0003280.549196single_indFactor 1-0.91644388344900750.072773802849497070.07052817663065586-0.21713145427214553tot_ageFactor 10.7422249300531345-0.154444030217602030.2189669028071773-0.04258394467630219married_indFactor 10.6882934468222643-0.5197254395046935-0.0253762644029929260.12173776945491029tot_childrenFactor 10.4760840027333990.009518855354033415-0.276753750890863850.3840116198094494separated_indFactor 20.22957626932544310.7525540425999883-0.107151219299888160.05488715551814397tot_incomeFactor 20.55838506267893760.6002025248989412-0.034167819585959935-0.12715878162325134ck_avg_balFactor 20.271082557254400950.5234371125033671-0.45106605489841156-0.2929098076956528sv_acct_indFactor 2-0.34722559888803480.43998376891813180.34740088806083480.43444710684056437ck_acct_indFactor 3-0.203242813353150920.24581603024548263-0.5565743720308470.33337500399723274sv_avg_balFactor 30.121400445090083960.5489054230982710.55531515069400560.10672537429932735female_indFactor 40.06570577768498007-0.038043100475823890.462622232846685630.5743005235807346tot_cust_yearsFactor 40.238271573168411580.09781304620038990.5329438477902895-0.5383506242152701Factor1single_indtot_agemarried_indtot_childrenFactor2separated_indtot_incomeck_avg_balsv_acct_indFactor3ck_acct_indsv_avg_balFactor4female_indtot_cust_yearsFactor 1single_ind-0.916444Factor 1tot_age0.742225Factor 1married_ind0.688293Factor 1tot_children0.476084Factor 2separated_ind0.752554Factor 2tot_income0.600203Factor 2ck_avg_bal0.523437Factor 2sv_acct_ind0.439984Factor 3ck_acct_ind-0.556574Factor 3sv_avg_bal0.555315Factor 4female_ind0.574301Factor 4tot_cust_years-0.538351
2,TX,1,6,SUCCEEDED,demo_userVAL_ADS21215CorrelationNonetot_age45.09810720.010781tot_income34697.39526746675.590885tot_cust_years5.4956972.77749tot_children1.6497421.330891single_ind0.3614460.480626married_ind0.5060240.500179separated_ind0.0602410.238035female_ind0.4819280.499888ck_acct_ind0.734940.441555sv_acct_ind0.6385540.480626sv_avg_bal1125.7490242755.719646ck_avg_bal3898.3208037510.4190142.302462E-1 1.397216E-1 3.031944E-1 -2.530186E-1 7.695219E-26.115477E-2 -2.328798E-1 4.103962E-1 -4.028429E-2 -2.462449E-11.044464E-1 2.605289E-2 2.382843E-1 -4.697865E-1 1.368625E-11.574858E-1 2.748557E-2 -1.430412E-1 4.519188E-1 -1.110241E-1-3.409297E-1 -1.451997E-1 -2.714634E-2 -1.057375E-1 1.541842E-13.173391E-1 4.732275E-2 -1.944712E-1 -1.348061E-1 -7.532014E-21.124724E-2 1.821333E-1 3.879555E-1 4.294483E-1 -2.206147E-18.74378E-2 3.768144E-1 1.698885E-2 2.110742E-1 3.999756E-11.012263E-1 -3.452062E-1 7.157972E-2 2.470271E-1 5.262388E-1-1.556199E-1 2.123413E-1 2.269953E-1 3.421464E-2 4.517896E-1-2.032389E-1 2.051655E-1 1.545299E-1 2.743676E-2 -2.99371E-14.145617E-2 -3.572174E-1 2.129268E-1 1.919181E-1 1.692803E-25.83401E-1 2.299464E-1 4.644423E-1 -2.981316E-1 8.487401E-21.549548E-1 -3.832614E-1 6.286572E-1 -4.746694E-2 -2.715945E-12.646478E-1 4.287647E-2 3.65011E-1 -5.535491E-1 1.509518E-13.990396E-1 4.523431E-2 -2.191147E-1 5.324956E-1 -1.224534E-1-8.638523E-1 -2.389621E-1 -4.158358E-2 -1.245904E-1 1.700566E-18.040781E-1 7.788128E-2 -2.978969E-1 -1.58842E-1 -8.307396E-22.849842E-2 2.997454E-1 5.942819E-1 5.060186E-1 -2.433258E-12.215511E-1 6.201412E-1 2.602403E-2 2.487086E-1 4.41151E-12.564885E-1 -5.681222E-1 1.09648E-1 2.910718E-1 5.804123E-1-3.943119E-1 3.494602E-1 3.477182E-1 4.031508E-2 4.982989E-1-5.149694E-1 3.376505E-1 2.367136E-1 3.232872E-2 -3.301897E-11.050422E-1 -5.878896E-1 3.261676E-1 2.261369E-1 1.867068E-2Factor 12.533814Factor 21.645747Factor 31.53183Factor 41.178299Factor 51.102945(Factor 6)0.919173(Factor 7)0.851168(Factor 8)0.697682(Factor 9)0.59323(Factor 10)0.43637(Factor 11)0.39806(Factor 12)0.111682tot_age0.5834010.2299460.464442-0.2981320.084874tot_income0.154955-0.3832610.628657-0.047467-0.271595tot_cust_years0.2646480.0428760.365011-0.5535490.150952tot_children0.399040.045234-0.2191150.532496-0.122453single_ind-0.863852-0.238962-0.041584-0.124590.170057married_ind0.8040780.077881-0.297897-0.158842-0.083074separated_ind0.0284980.2997450.5942820.506019-0.243326female_ind0.2215510.6201410.0260240.2487090.441151ck_acct_ind0.256488-0.5681220.1096480.2910720.580412sv_acct_ind-0.3943120.349460.3477180.0403150.498299sv_avg_bal-0.5149690.3376510.2367140.032329-0.33019ck_avg_bal0.105042-0.587890.3261680.2261370.0186710.666053Factor 12.53381421.11511521.1151151Factor 21.64574713.71456134.8296761.240811Factor 31.5318312.7652547.5949271.286122Factor 41.1782999.8191657.4140871.466424Factor 51.1029459.19120666.6052931.515690.0982630.09820.0011680.492164single_indFactor 1-0.8638523039626244-0.2389620710254086-0.041583583281766846-0.124590397524565850.1700566163957775married_indFactor 10.80407811590252810.07788128412279978-0.2978968733685234-0.15884198012131331-0.0830739552875026tot_ageFactor 10.5834009801152610.229946420063334380.4644422741632928-0.298131605487212460.08487401404858437sv_avg_balFactor 1-0.51496942566931060.337650537145534670.236713550453562350.032328715098950314-0.33018966142387146female_indFactor 20.221551113087741720.62014124249610660.026024030426996210.248708563735593220.44115100763192094ck_avg_balFactor 20.10504222287903588-0.58788958254615020.326167637090950370.226136930160710070.01867068183780302tot_incomeFactor 30.1549548102134998-0.383261381944517750.6286571899403389-0.04746694227292905-0.27159450425198145separated_indFactor 30.0284984184658620030.29974542956842690.59428191439564510.5060186425345855-0.24332581468646816tot_cust_yearsFactor 40.26464777018699480.042876470998645950.3650109700064246-0.55354911119543690.1509517762196776tot_childrenFactor 40.399039637845885250.04523430852754618-0.21911474582117780.5324955642483213-0.12245341191209808ck_acct_indFactor 50.2564884802781979-0.56812217242929450.109647966067654230.291071795233237340.5804123177217596sv_acct_indFactor 5-0.394311921815696630.349460185530272750.34771822571072510.040315082503812290.49829891816595634Factor1single_indmarried_indtot_agesv_avg_balFactor2female_indck_avg_balFactor3tot_incomeseparated_indFactor4tot_cust_yearstot_childrenFactor5ck_acct_indsv_acct_indFactor 1single_ind-0.863852Factor 1married_ind0.804078Factor 1tot_age0.583401Factor 1sv_avg_bal-0.514969Factor 2female_ind0.620141Factor 2ck_avg_bal-0.58789Factor 3tot_income0.628657Factor 3separated_ind0.594282Factor 4tot_cust_years-0.553549Factor 4tot_children0.532496Factor 5ck_acct_ind0.580412Factor 5sv_acct_ind0.498299
3,CA,1,6,SUCCEEDED,demo_userVAL_ADS21215CorrelationNonetot_age40.75544817.959176tot_income30083.40823237812.995502tot_cust_years5.9362392.985223tot_children1.9584341.600454single_ind0.4124290.492371married_ind0.406780.491332separated_ind0.0508470.21973female_ind0.604520.489052ck_acct_ind0.6666670.4715sv_acct_ind0.5197740.49971sv_avg_bal1505.1878544721.802353ck_avg_bal3532.025426644.7660492.598874E-1 5.936628E-2 1.426845E-1 1.951325E-1 1.046632E-12.006993E-1 8.656857E-2 6.848764E-2 1.606263E-1 -2.964576E-29.648131E-3 -1.269552E-1 3.97686E-2 3.545159E-1 -6.174366E-11.733167E-1 -1.947646E-1 1.178034E-2 -3.658275E-1 -2.173399E-1-3.288007E-1 1.405216E-1 -1.391823E-1 1.645164E-1 2.6068E-22.763515E-1 -2.772223E-1 6.42391E-2 8.902218E-2 2.820987E-12.590387E-2 1.570152E-1 2.427738E-1 -4.392339E-1 -5.331281E-1-1.039766E-1 1.264342E-3 2.601414E-1 -2.621893E-1 2.469784E-11.200716E-1 4.974157E-1 1.050835E-1 -9.555322E-2 1.069452E-1-1.113166E-1 1.82284E-2 5.810764E-1 -4.331922E-2 1.841858E-1-1.022145E-1 -2.094217E-1 5.033037E-1 3.079786E-1 -9.108982E-29.564962E-2 4.386248E-1 7.843311E-2 3.008113E-1 -9.588129E-26.833371E-1 8.856316E-2 1.817546E-1 2.390175E-1 1.127067E-15.277105E-1 1.291438E-1 8.724102E-2 1.967509E-1 -3.192409E-22.53684E-2 -1.893929E-1 5.065809E-2 4.342459E-1 -6.648875E-14.557118E-1 -2.905515E-1 1.500605E-2 -4.481015E-1 -2.340428E-1-8.64535E-1 2.096315E-1 -1.772934E-1 2.015159E-1 2.807137E-27.266271E-1 -4.135628E-1 8.182914E-2 1.090431E-1 3.037784E-16.811056E-2 2.342366E-1 3.092504E-1 -5.380169E-1 -5.740997E-1-2.733919E-1 1.886157E-3 3.313737E-1 -3.211552E-1 2.65959E-13.157114E-1 7.420492E-1 1.338576E-1 -1.17043E-1 1.151641E-1-2.926912E-1 2.719329E-2 7.401875E-1 -5.306164E-2 1.983408E-1-2.687585E-1 -3.124171E-1 6.411189E-1 3.772425E-1 -9.809021E-22.514972E-1 6.543445E-1 9.990977E-2 3.684633E-1 -1.032499E-1Factor 12.629359Factor 21.491809Factor 31.273821Factor 41.224898Factor 51.076851(Factor 6)0.944085(Factor 7)0.851958(Factor 8)0.847434(Factor 9)0.584738(Factor 10)0.5244(Factor 11)0.382266(Factor 12)0.168381tot_age0.6833370.0885630.1817550.2390180.112707tot_income0.527710.1291440.0872410.196751-0.031924tot_cust_years0.025368-0.1893930.0506580.434246-0.664887tot_children0.455712-0.2905520.015006-0.448102-0.234043single_ind-0.8645350.209631-0.1772930.2015160.028071married_ind0.726627-0.4135630.0818290.1090430.303778separated_ind0.0681110.2342370.30925-0.538017-0.5741female_ind-0.2733920.0018860.331374-0.3211550.265959ck_acct_ind0.3157110.7420490.133858-0.1170430.115164sv_acct_ind-0.2926910.0271930.740188-0.0530620.198341sv_avg_bal-0.268759-0.3124170.6411190.377243-0.09809ck_avg_bal0.2514970.6543440.099910.368463-0.103250.641395Factor 12.62935921.91132221.9113221Factor 21.49180912.43174234.3430631.327603Factor 31.27382110.61517844.9582411.436715Factor 41.22489810.20748755.1657281.465126Factor 51.0768518.97376264.139491.5625970.0977090.1106840.0008350.657502single_indFactor 1-0.86453504254553960.20963145387355653-0.17729343531818490.201515911621043960.028071368174628318married_indFactor 10.7266271453822494-0.41356277151569510.081829135910616360.109043130517580470.3037784172338022tot_ageFactor 10.68333712285931480.088563155484353690.181754584022570440.239017505852365750.11270673239368838tot_incomeFactor 10.52771049145534090.129143777926890520.087241017089049050.19675092540845976-0.03192408547386531tot_childrenFactor 10.4557117886291535-0.29055153275551560.015006051913173894-0.4481015079943265-0.23404280598828267ck_acct_indFactor 20.31571136638628070.74204924685205090.1338575613198906-0.117042987126700690.11516406417095512ck_avg_balFactor 20.25149715389808930.65434447406001250.099909766656880280.3684633419564779-0.10324990617712718sv_acct_indFactor 3-0.292691239935405360.027193291695122180.7401875112660511-0.053061641096122250.1983408032374635sv_avg_balFactor 3-0.26875850412707264-0.312417147707025170.64111893712579280.3772425475869448-0.09809020551477483female_indFactor 3-0.27339189401451780.0018861565191646490.331373721116808-0.321155235562213540.26595902058321297tot_cust_yearsFactor 50.025368396033606206-0.18939291821789120.050658085069836780.43424592149159175-0.664887489805541separated_indFactor 50.0681105586771280.234236622126643340.30925040629492634-0.5380168735804952-0.5740997483927295Factor1single_indmarried_indtot_agetot_incometot_childrenFactor2ck_acct_indck_avg_balFactor3sv_acct_indsv_avg_balfemale_indFactor4tot_cust_yearsseparated_indFactor 1single_ind-0.864535Factor 1married_ind0.726627Factor 1tot_age0.683337Factor 1tot_income0.52771Factor 1tot_children0.455712Factor 2ck_acct_ind0.742049Factor 2ck_avg_bal0.654344Factor 3sv_acct_ind0.740188Factor 3sv_avg_bal0.641119Factor 3female_ind0.331374Factor 5tot_cust_years-0.664887Factor 5separated_ind-0.5741
4,OH,1,6,FAILED,Constant columns detected...run terminated.demo_userVAL_ADS21210CorrelationNone
5,AZ,1,6,SUCCEEDED,demo_userVAL_ADS21215CorrelationNonetot_age49.73214314.89244tot_income30712.7738134430.471988tot_cust_years5.148813.268374tot_children1.4880951.226514single_ind0.2916670.455208married_ind0.50.500746separated_ind0.1250.331212female_ind0.6666670.472108ck_acct_ind0.7916670.406722sv_acct_ind0.7083330.455208sv_avg_bal1278.2487563597.434997ck_avg_bal3300.3064855468.8051988.184103E-3 -2.504379E-1 4.315343E-1 1.404039E-1 1.773155E-1-2.140654E-1 5.42834E-3 -1.942564E-1 4.290169E-1 1.524005E-1-3.012274E-2 4.800003E-3 2.382089E-1 3.467282E-1 -2.501445E-1-9.545586E-2 3.032264E-1 -1.25975E-2 4.259323E-2 -4.987826E-12.944437E-1 -1.095059E-1 -3.521524E-1 -5.945076E-2 1.113209E-1-3.862319E-1 -1.106151E-1 9.081824E-2 1.01918E-2 6.790068E-21.116638E-1 3.847385E-1 1.30467E-1 1.231619E-2 -2.317736E-12.573668E-1 -2.042211E-1 1.549996E-1 1.082321E-1 -2.266159E-11.97777E-1 1.778672E-1 1.221066E-1 2.519967E-1 3.146722E-14.500453E-2 1.76539E-1 3.518108E-1 -3.626136E-1 2.74055E-1-1.711264E-1 1.428153E-1 -4.260976E-2 -2.874583E-1 2.364585E-14.548414E-2 2.800088E-1 -4.641509E-2 3.231425E-1 4.30331E-11.847498E-2 -4.722757E-1 7.230606E-1 1.941018E-1 1.962927E-1-4.832361E-1 1.023676E-2 -3.254878E-1 5.930958E-1 1.687112E-1-6.799976E-2 9.051844E-3 3.991328E-1 4.793355E-1 -2.769163E-1-2.154842E-1 5.718243E-1 -2.110785E-2 5.888315E-2 -5.521649E-16.646839E-1 -2.065062E-1 -5.900517E-1 -8.21879E-2 1.23235E-1-8.718887E-1 -2.085978E-1 1.521712E-1 1.408968E-2 7.516777E-22.520724E-1 7.255397E-1 2.186051E-1 1.702656E-2 -2.565792E-15.809857E-1 -3.851201E-1 2.597108E-1 1.496258E-1 -2.508696E-14.464663E-1 3.35422E-1 2.045966E-1 3.483737E-1 3.4835E-11.015943E-1 3.329171E-1 5.894794E-1 -5.012963E-1 3.033858E-1-3.863046E-1 2.69321E-1 -7.139511E-2 -3.973976E-1 2.617656E-11.026769E-1 5.280405E-1 -7.777117E-2 4.467294E-1 4.763872E-1Factor 12.257423Factor 21.8858Factor 31.675558Factor 41.382453Factor 51.107025(Factor 6)0.943799(Factor 7)0.746528(Factor 8)0.685115(Factor 9)0.533775(Factor 10)0.406981(Factor 11)0.283933(Factor 12)0.091611tot_age0.018475-0.4722760.7230610.1941020.196293tot_income-0.4832360.010237-0.3254880.5930960.168711tot_cust_years-0.0680.0090520.3991330.479335-0.276916tot_children-0.2154840.571824-0.0211080.058883-0.552165single_ind0.664684-0.206506-0.590052-0.0821880.123235married_ind-0.871889-0.2085980.1521710.014090.075168separated_ind0.2520720.725540.2186050.017027-0.256579female_ind0.580986-0.385120.2597110.149626-0.25087ck_acct_ind0.4464660.3354220.2045970.3483740.34835sv_acct_ind0.1015940.3329170.589479-0.5012960.303386sv_avg_bal-0.3863050.269321-0.071395-0.3973980.261766ck_avg_bal0.1026770.52804-0.0777710.4467290.4763870.692355Factor 12.25742318.81185518.8118551Factor 21.885815.71499734.5268521.094104Factor 31.67555813.96298248.4898341.160718Factor 41.38245311.52044460.0102781.277854Factor 51.1070259.2252169.2354881.4279980.0943350.0906670.0003540.546692married_indFactor 1-0.8718887200581422-0.20859782491110260.152171220475469020.0140896820439664790.07516776774047933single_indFactor 10.6646838602170163-0.20650622292797882-0.5900516785085562-0.0821879022073470.12323500269348327female_indFactor 10.580985657198905-0.385120107211039340.259710829703782060.14962579809210705-0.25086956620739154ck_acct_indFactor 10.4464662673855830.335421958079675950.204596629764939470.34837368226088560.3483500115361219separated_indFactor 20.25207237206095890.72553974655042560.21860508167744170.017026560731294178-0.25657924573448265tot_childrenFactor 2-0.21548421963661840.5718242601082107-0.0211078468604547170.05888315163891341-0.552164936950599ck_avg_balFactor 20.102676924722142490.528040476118763-0.0777711735508380.44672940000857810.4763872145466331tot_ageFactor 30.018474977816724567-0.47227574608924310.72306062715279870.194101786271336120.19629269521221712sv_acct_indFactor 30.101594252204379430.33291712399236290.5894794065145859-0.50129632975386760.30338584889280334tot_incomeFactor 4-0.48323609827921910.010236761940101383-0.325487804108379960.59309581438172180.16871115848198545tot_cust_yearsFactor 4-0.067999759915501030.009051843938831850.39913278127841510.4793354884121912-0.276916271141553sv_avg_balFactor 4-0.38630460163706640.2693210141819922-0.0713951142417322-0.397397635111363830.2617655676986764Factor1married_indsingle_indfemale_indck_acct_indFactor2separated_indtot_childrenck_avg_balFactor3tot_agesv_acct_indFactor4tot_incometot_cust_yearssv_avg_balFactor 1married_ind-0.871889Factor 1single_ind0.664684Factor 1female_ind0.580986Factor 1ck_acct_ind0.446466Factor 2separated_ind0.72554Factor 2tot_children0.571824Factor 2ck_avg_bal0.52804Factor 3tot_age0.723061Factor 3sv_acct_ind0.589479Factor 4tot_income0.593096Factor 4tot_cust_years0.479335Factor 4sv_avg_bal-0.397398
6,OTHER,1,6,SUCCEEDED,demo_userVAL_ADS21215CorrelationNonetot_age45.61526619.667444tot_income27456.93140834407.071396tot_cust_years5.8968543.036734tot_children1.7645691.462986single_ind0.3610110.480356married_ind0.4873650.499905separated_ind0.0794220.270432female_ind0.5559570.496923ck_acct_ind0.7003610.458159sv_acct_ind0.5631770.496057sv_avg_bal953.6300622483.810147ck_avg_bal3338.0306976855.7248542.866634E-1 3.134161E-2 -1.342043E-1 2.922964E-1 1.947391E-11.658208E-1 2.373267E-1 8.045005E-2 1.271882E-2 -4.800259E-14.053498E-2 2.528835E-2 -3.424779E-1 2.783469E-1 -2.662151E-11.567881E-1 4.521722E-2 1.375924E-1 -5.405643E-1 -2.455063E-2-3.853254E-1 -4.412655E-3 6.934289E-2 1.827715E-1 -1.666037E-13.533846E-1 -2.222789E-1 -1.180309E-1 8.435323E-2 6.027495E-23.196501E-2 3.73619E-1 2.692329E-2 -3.654822E-1 1.1392E-1-3.486328E-2 3.177749E-2 7.4766E-3 1.141604E-1 5.391063E-11.648735E-2 3.839173E-1 1.869294E-1 2.61537E-1 2.643647E-1-9.170213E-2 2.464688E-1 -4.393388E-1 -8.258159E-2 2.842287E-1-5.246552E-2 1.220672E-1 -5.480306E-1 -1.628277E-1 -1.772378E-18.799448E-2 3.663667E-1 1.494143E-1 2.661958E-1 -2.218607E-16.656244E-1 5.160957E-2 -1.836278E-1 3.586411E-1 2.149119E-13.850312E-1 3.908008E-1 1.100774E-1 1.560571E-2 -5.297512E-19.412109E-2 4.164178E-2 -4.686024E-1 3.415254E-1 -2.93792E-13.640576E-1 7.445824E-2 1.882637E-1 -6.632603E-1 -2.709379E-2-8.947149E-1 -7.266225E-3 9.487984E-2 2.242565E-1 -1.838619E-18.205493E-1 -3.660219E-1 -1.614983E-1 1.034995E-1 6.651875E-27.422186E-2 6.152304E-1 3.683835E-2 -4.484384E-1 1.257208E-1-8.095155E-2 5.232732E-2 1.023001E-2 1.400723E-1 5.949516E-13.828316E-2 6.321885E-1 2.5577E-1 3.209E-1 2.917499E-1-2.129298E-1 4.058549E-1 -6.011344E-1 -1.013258E-1 3.136716E-1-1.218235E-1 2.010055E-1 -7.498541E-1 -1.99786E-1 -1.955977E-12.043207E-1 6.032882E-1 2.044391E-1 3.266162E-1 -2.44843E-1Factor 12.321972Factor 21.646679Factor 31.368271Factor 41.226978Factor 51.103589(Factor 6)0.956726(Factor 7)0.875701(Factor 8)0.852817(Factor 9)0.624229(Factor 10)0.502812(Factor 11)0.410686(Factor 12)0.109541tot_age0.6656240.05161-0.1836280.3586410.214912tot_income0.3850310.3908010.1100770.015606-0.529751tot_cust_years0.0941210.041642-0.4686020.341525-0.293792tot_children0.3640580.0744580.188264-0.66326-0.027094single_ind-0.894715-0.0072660.094880.224257-0.183862married_ind0.820549-0.366022-0.1614980.10350.066519separated_ind0.0742220.615230.036838-0.4484380.125721female_ind-0.0809520.0523270.010230.1400720.594952ck_acct_ind0.0382830.6321880.255770.32090.29175sv_acct_ind-0.212930.405855-0.601134-0.1013260.313672sv_avg_bal-0.1218230.201006-0.749854-0.199786-0.195598ck_avg_bal0.2043210.6032880.2044390.326616-0.2448430.638957Factor 12.32197219.34976719.3497671Factor 21.64667913.72232233.0720891.187474Factor 31.36827111.40225544.4743451.302694Factor 41.22697810.22481354.6991581.375657Factor 51.1035899.19657263.895731.4505240.1055660.1066940.0006430.617016single_indFactor 1-0.8947148781979084-0.0072662246554773180.094879841183983490.22425650027828115-0.18386193322458907married_indFactor 10.820549260970491-0.3660219021358508-0.161498269475626840.103499519893330070.06651874912006173tot_ageFactor 10.66562442688376660.05160956621514899-0.18362780871164070.358641093247408060.21491186112245017ck_acct_indFactor 20.038283164552105740.63218845798054080.25576995685980590.32090000870237560.2917498697329492separated_indFactor 20.07422186167739110.61523040364834980.0368383495605857-0.44843842432095250.12572079007523682ck_avg_balFactor 20.204320725829522020.60328819867615860.204439140166319230.32661624390929506-0.2448430125153925sv_avg_balFactor 3-0.121823471929553780.20100550134750741-0.7498541280761918-0.19978597802091166-0.19559767899642874sv_acct_indFactor 3-0.212929781681293170.4058549197840245-0.6011343705419785-0.101325761172806960.3136715591181914tot_cust_yearsFactor 30.094121092954031140.04164177995569978-0.46860241413018640.34152544283790626-0.29379199381174076tot_childrenFactor 40.364057598695494870.074458239591350640.18826365675015153-0.6632603143321822-0.027093793374735456female_indFactor 5-0.080951551564361470.052327319265053960.0102300117237002290.14007228981542210.5949516024200745tot_incomeFactor 50.38503120239065790.390800842235701050.110077433843197750.015605707590075164-0.5297511723565124Factor1single_indmarried_indtot_ageFactor2ck_acct_indseparated_indck_avg_balFactor3sv_avg_balsv_acct_indtot_cust_yearsFactor4tot_childrenFactor5female_indtot_incomeFactor 1single_ind-0.894715Factor 1married_ind0.820549Factor 1tot_age0.665624Factor 2ck_acct_ind0.632188Factor 2separated_ind0.61523Factor 2ck_avg_bal0.603288Factor 3sv_avg_bal-0.749854Factor 3sv_acct_ind-0.601134Factor 3tot_cust_years-0.468602Factor 4tot_children-0.66326Factor 5female_ind0.594952Factor 5tot_income-0.529751
7,NY,1,6,SUCCEEDED,demo_userVAL_ADS21216CorrelationNonetot_age46.32443319.678415tot_income31286.77062836231.317043tot_cust_years5.7356482.843657tot_children1.7162881.405786single_ind0.3457940.475785married_ind0.5233640.499621separated_ind0.0373830.189762female_ind0.5514020.497517ck_acct_ind0.6542060.475785sv_acct_ind0.5981310.49044sv_avg_bal1238.8841094006.906895ck_avg_bal3117.4134595610.958643-2.482647E-1 1.138369E-1 3.6391E-1 9.66579E-2 -3.985575E-1 -1.434571E-1-9.799241E-2 2.93334E-1 -1.591857E-1 2.344769E-1 9.159755E-2 4.394886E-1-1.279416E-1 5.772031E-2 2.3234E-1 3.233445E-1 2.986456E-2 2.496208E-1-1.419908E-1 7.103862E-2 -2.938522E-1 -3.785803E-1 3.730119E-1 2.438183E-13.8513E-1 -8.129111E-2 1.019003E-2 2.300049E-1 8.238122E-2 5.230694E-2-3.688789E-1 -1.90913E-2 -1.065908E-1 -4.930721E-2 8.384996E-2 -3.295937E-11.037065E-3 -1.28714E-2 2.292304E-1 -4.023375E-1 -3.474747E-1 6.252934E-11.097691E-1 1.093447E-1 1.229175E-1 -4.582459E-1 8.180611E-2 -2.390766E-11.033701E-1 4.324752E-1 2.442498E-1 -1.714206E-1 1.208234E-1 -1.500473E-11.328645E-1 2.647211E-1 -2.192548E-1 -4.797242E-2 -5.081759E-1 -1.943746E-1-1.776848E-2 1.410027E-1 -4.861025E-1 4.067589E-2 -3.823388E-1 4.032824E-22.086835E-2 4.495875E-1 4.890483E-2 1.606586E-1 2.698973E-1 -3.346802E-3-5.72956E-1 1.87974E-1 5.02888E-1 1.210422E-1 -4.238238E-1 -1.499125E-1-2.261511E-1 4.843699E-1 -2.19979E-1 2.936294E-1 9.740432E-2 4.592651E-1-2.952692E-1 9.531106E-2 3.210711E-1 4.04916E-1 3.175781E-2 2.608535E-1-3.276926E-1 1.17303E-1 -4.06075E-1 -4.740863E-1 3.966588E-1 2.547898E-18.888196E-1 -1.342325E-1 1.408162E-2 2.880292E-1 8.760373E-2 5.466069E-2-8.513147E-1 -3.152464E-2 -1.47298E-1 -6.174614E-2 8.916558E-2 -3.444251E-12.393385E-3 -2.1254E-2 3.16774E-1 -5.038369E-1 -3.695026E-1 6.534309E-12.5333E-1 1.805561E-1 1.698599E-1 -5.738495E-1 8.699216E-2 -2.498348E-12.38562E-1 7.141278E-1 3.375293E-1 -2.146655E-1 1.28483E-1 -1.567993E-13.066305E-1 4.371225E-1 -3.029887E-1 -6.007462E-2 -5.403914E-1 -2.031213E-1-4.100688E-2 2.328317E-1 -6.717461E-1 5.093737E-2 -4.065769E-1 4.214297E-24.816087E-2 7.423846E-1 6.758169E-2 2.011887E-1 2.870073E-1 -3.497404E-3Factor 12.307843Factor 21.651257Factor 31.381902Factor 41.252274Factor 51.063394Factor 61.044999(Factor 7)0.928781(Factor 8)0.743875(Factor 9)0.653628(Factor 10)0.485212(Factor 11)0.352499(Factor 12)0.134335tot_age-0.5729560.1879740.5028880.121042-0.423824-0.149913tot_income-0.2261510.48437-0.2199790.2936290.0974040.459265tot_cust_years-0.2952690.0953110.3210710.4049160.0317580.260853tot_children-0.3276930.117303-0.406075-0.4740860.3966590.25479single_ind0.88882-0.1342330.0140820.2880290.0876040.054661married_ind-0.851315-0.031525-0.147298-0.0617460.089166-0.344425separated_ind0.002393-0.0212540.316774-0.503837-0.3695030.653431female_ind0.253330.1805560.16986-0.5738490.086992-0.249835ck_acct_ind0.2385620.7141280.337529-0.2146660.128483-0.156799sv_acct_ind0.3066310.437123-0.302989-0.060075-0.540391-0.203121sv_avg_bal-0.0410070.232832-0.6717460.050937-0.4065770.042143ck_avg_bal0.0481610.7423850.0675820.2011890.287007-0.0034970.725139Factor 12.30784319.23202719.2320271Factor 21.65125713.76047532.9925021.182213Factor 31.38190211.51585144.5083531.292303Factor 41.25227410.43561854.9439721.357543Factor 51.0633948.8616263.8055921.473181Factor 61.0449998.70832472.5139161.486090.0846810.091160.0003470.567635single_indFactor 10.8888195980640217-0.13423251204000050.0140816176492921490.288029182926472670.08760372672890670.05466069150165706married_indFactor 1-0.8513147317276212-0.0315246384149965-0.14729798550427817-0.061746141707377170.08916557898748323-0.344425069380928tot_ageFactor 1-0.57295600717965920.18797399291911150.50288797940991870.12104219664676981-0.4238238179355546-0.1499125243559574ck_avg_balFactor 20.048160873273597080.74238458080569520.06758168627436260.201188677087064030.2870072707344466-0.0034974038356666965ck_acct_indFactor 20.238562020929957650.7141277706148710.33752933253438033-0.214665533833797880.12848295592252906-0.15679925948388765tot_incomeFactor 2-0.226151116716785760.48436987471513665-0.219979034906492180.29362939754278570.097404324527425370.4592651306993795sv_avg_balFactor 3-0.041006875576248620.2328316629813404-0.67174610355920420.05093736647612456-0.406576929082191950.04214296570722035female_indFactor 40.25332996828282250.18055612887684310.16985993145350564-0.57384946868191880.08699215919710845-0.249834812099001tot_childrenFactor 4-0.32769257835088720.11730302011639254-0.40607498374112794-0.47408631826228740.39665880226225130.25478981702748565tot_cust_yearsFactor 4-0.295269227059160870.09531106325229080.321071063481540850.404915959107435160.031757806388049260.260853495806432sv_acct_indFactor 50.30663050713264650.43712253191630435-0.30298872595380044-0.060074622284371985-0.5403913651511572-0.20312126058014948separated_indFactor 60.0023933845567745103-0.0212539964732122340.3167739785808465-0.5038369178016211-0.36950263036141120.65343086012834Factor1single_indmarried_indtot_ageFactor2ck_avg_balck_acct_indtot_incomeFactor3sv_avg_balFactor4female_indtot_childrentot_cust_yearsFactor5sv_acct_indFactor6separated_indFactor 1single_ind0.88882Factor 1married_ind-0.851315Factor 1tot_age-0.572956Factor 2ck_avg_bal0.742385Factor 2ck_acct_ind0.714128Factor 2tot_income0.48437Factor 3sv_avg_bal-0.671746Factor 4female_ind-0.573849Factor 4tot_children-0.474086Factor 4tot_cust_years0.404916Factor 5sv_acct_ind-0.540391Factor 6separated_ind0.653431


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [48]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=FactorAnalysisOut2;
                           analysistype=factor');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,state_code,html,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0
Column Name,Value,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Column Name,Mean,Standard Deviation,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Unnamed: 5_level_3,Unnamed: 6_level_3
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_4,Unnamed: 6_level_4
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5
Column Name,Value,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6
Column Name,Mean,Standard Deviation,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Unnamed: 6_level_8
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_9,Unnamed: 6_level_9
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10
Column Name,Value,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11
Column Name,Mean,Standard Deviation,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Unnamed: 6_level_13
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_14,Unnamed: 6_level_14
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_16,Unnamed: 6_level_16
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17
Column Name,Value,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18
Column Name,Value,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19
Column Name,Mean,Standard Deviation,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20,Unnamed: 6_level_20
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Unnamed: 6_level_21
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_22,Unnamed: 6_level_22
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23
Column Name,Value,Unnamed: 2_level_24,Unnamed: 3_level_24,Unnamed: 4_level_24,Unnamed: 5_level_24,Unnamed: 6_level_24
Column Name,Mean,Standard Deviation,Unnamed: 3_level_25,Unnamed: 4_level_25,Unnamed: 5_level_25,Unnamed: 6_level_25
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Unnamed: 6_level_26
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_27,Unnamed: 6_level_27
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_28,Unnamed: 5_level_28,Unnamed: 6_level_28
Column Name,Value,Unnamed: 2_level_29,Unnamed: 3_level_29,Unnamed: 4_level_29,Unnamed: 5_level_29,Unnamed: 6_level_29
Column Name,Mean,Standard Deviation,Unnamed: 3_level_30,Unnamed: 4_level_30,Unnamed: 5_level_30,Unnamed: 6_level_30
Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6
Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices,Unnamed: 5_level_32,Unnamed: 6_level_32
Mean,Standard Deviation,Minimum,Maximum,Unnamed: 4_level_33,Unnamed: 5_level_33,Unnamed: 6_level_33
1,IL,Factor Analysis SummaryDatabasedemo_userTablenameVAL_ADS2NumberOfVariables12MinimumEigenvalue1NumberOfFactors4MatrixTypeCorrelationRotationNoneGroup By ColumnsColumn NameValuestate_codeILVariable StatisticsColumn NameMeanStandard Deviationtot_age41.69642919.67234tot_income36907.45395450113.243726tot_cust_years5.4107142.955988tot_children1.9642861.409238single_ind0.3571430.479463married_ind0.4464290.497439separated_ind0.0892860.285338female_ind0.5178570.5ck_acct_ind0.7321430.443125sv_acct_ind0.5178570.5sv_avg_bal1844.8063515658.67812ck_avg_bal5803.12473913942.189854EigenvaluesFactor 12.766858Factor 22.066244Factor 31.583153Factor 41.247139(Factor 5)0.952025(Factor 6)0.840843(Factor 7)0.680523(Factor 8)0.623055(Factor 9)0.476341(Factor 10)0.39306(Factor 11)0.250047(Factor 12)0.120713Principal Component LoadingsColumn NameFactor 1Factor 2Factor 3Factor 4tot_age0.742225-0.1544440.218967-0.042584tot_income0.5583850.600203-0.034168-0.127159tot_cust_years0.2382720.0978130.532944-0.538351tot_children0.4760840.009519-0.2767540.384012single_ind-0.9164440.0727740.070528-0.217131married_ind0.688293-0.519725-0.0253760.121738separated_ind0.2295760.752554-0.1071510.054887female_ind0.065706-0.0380430.4626220.574301ck_acct_ind-0.2032430.245816-0.5565740.333375sv_acct_ind-0.3472260.4399840.3474010.434447sv_avg_bal0.12140.5489050.5553150.106725ck_avg_bal0.2710830.523437-0.451066-0.29291VarianceFactor Variance to Total Variance Ratio0.638616Variance Explained By FactorsFactorVariancePercent of TotalCumulative PercentCondition IndicesFactor 12.76685823.05714923.0571491Factor 22.06624417.21870240.2758511.157184Factor 31.58315313.19293853.4687891.322002Factor 41.24713910.39282163.861611.489485DifferenceAbsolute DifferenceMeanStandard DeviationMinimumMaximum0.0975280.1042280.0003280.549196,,,,
Database,demo_user,,,,,
Tablename,VAL_ADS2,,,,,
NumberOfVariables,12,,,,,
MinimumEigenvalue,1,,,,,
NumberOfFactors,4,,,,,
MatrixType,Correlation,,,,,
Rotation,,,,,,
Column Name,Value,,,,,
state_code,IL,,,,,

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,4
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,IL

Column Name,Mean,Standard Deviation
tot_age,41.696429,19.67234
tot_income,36907.453954,50113.243726
tot_cust_years,5.410714,2.955988
tot_children,1.964286,1.409238
single_ind,0.357143,0.479463
married_ind,0.446429,0.497439
separated_ind,0.089286,0.285338
female_ind,0.517857,0.5
ck_acct_ind,0.732143,0.443125
sv_acct_ind,0.517857,0.5

0,1
Factor 1,2.766858
Factor 2,2.066244
Factor 3,1.583153
Factor 4,1.247139
(Factor 5),0.952025
(Factor 6),0.840843
(Factor 7),0.680523
(Factor 8),0.623055
(Factor 9),0.476341
(Factor 10),0.39306

Column Name,Factor 1,Factor 2,Factor 3,Factor 4
tot_age,0.742225,-0.154444,0.218967,-0.042584
tot_income,0.558385,0.600203,-0.034168,-0.127159
tot_cust_years,0.238272,0.097813,0.532944,-0.538351
tot_children,0.476084,0.009519,-0.276754,0.384012
single_ind,-0.916444,0.072774,0.070528,-0.217131
married_ind,0.688293,-0.519725,-0.025376,0.121738
separated_ind,0.229576,0.752554,-0.107151,0.054887
female_ind,0.065706,-0.038043,0.462622,0.574301
ck_acct_ind,-0.203243,0.245816,-0.556574,0.333375
sv_acct_ind,-0.347226,0.439984,0.347401,0.434447

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.766858,23.057149,23.057149,1.0
Factor 2,2.066244,17.218702,40.275851,1.157184
Factor 3,1.583153,13.192938,53.468789,1.322002
Factor 4,1.247139,10.392821,63.86161,1.489485

Mean,Standard Deviation,Minimum,Maximum
0.097528,0.104228,0.000328,0.549196

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,5
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,TX

Column Name,Mean,Standard Deviation
tot_age,45.098107,20.010781
tot_income,34697.395267,46675.590885
tot_cust_years,5.495697,2.77749
tot_children,1.649742,1.330891
single_ind,0.361446,0.480626
married_ind,0.506024,0.500179
separated_ind,0.060241,0.238035
female_ind,0.481928,0.499888
ck_acct_ind,0.73494,0.441555
sv_acct_ind,0.638554,0.480626

0,1
Factor 1,2.533814
Factor 2,1.645747
Factor 3,1.53183
Factor 4,1.178299
Factor 5,1.102945
(Factor 6),0.919173
(Factor 7),0.851168
(Factor 8),0.697682
(Factor 9),0.59323
(Factor 10),0.43637

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5
tot_age,0.583401,0.229946,0.464442,-0.298132,0.084874
tot_income,0.154955,-0.383261,0.628657,-0.047467,-0.271595
tot_cust_years,0.264648,0.042876,0.365011,-0.553549,0.150952
tot_children,0.39904,0.045234,-0.219115,0.532496,-0.122453
single_ind,-0.863852,-0.238962,-0.041584,-0.12459,0.170057
married_ind,0.804078,0.077881,-0.297897,-0.158842,-0.083074
separated_ind,0.028498,0.299745,0.594282,0.506019,-0.243326
female_ind,0.221551,0.620141,0.026024,0.248709,0.441151
ck_acct_ind,0.256488,-0.568122,0.109648,0.291072,0.580412
sv_acct_ind,-0.394312,0.34946,0.347718,0.040315,0.498299

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.533814,21.115115,21.115115,1.0
Factor 2,1.645747,13.714561,34.829676,1.240811
Factor 3,1.53183,12.76525,47.594927,1.286122
Factor 4,1.178299,9.81916,57.414087,1.466424
Factor 5,1.102945,9.191206,66.605293,1.51569

Mean,Standard Deviation,Minimum,Maximum
0.098263,0.0982,0.001168,0.492164

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,5
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,CA

Column Name,Mean,Standard Deviation
tot_age,40.755448,17.959176
tot_income,30083.408232,37812.995502
tot_cust_years,5.936239,2.985223
tot_children,1.958434,1.600454
single_ind,0.412429,0.492371
married_ind,0.40678,0.491332
separated_ind,0.050847,0.21973
female_ind,0.60452,0.489052
ck_acct_ind,0.666667,0.4715
sv_acct_ind,0.519774,0.49971

0,1
Factor 1,2.629359
Factor 2,1.491809
Factor 3,1.273821
Factor 4,1.224898
Factor 5,1.076851
(Factor 6),0.944085
(Factor 7),0.851958
(Factor 8),0.847434
(Factor 9),0.584738
(Factor 10),0.5244

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5
tot_age,0.683337,0.088563,0.181755,0.239018,0.112707
tot_income,0.52771,0.129144,0.087241,0.196751,-0.031924
tot_cust_years,0.025368,-0.189393,0.050658,0.434246,-0.664887
tot_children,0.455712,-0.290552,0.015006,-0.448102,-0.234043
single_ind,-0.864535,0.209631,-0.177293,0.201516,0.028071
married_ind,0.726627,-0.413563,0.081829,0.109043,0.303778
separated_ind,0.068111,0.234237,0.30925,-0.538017,-0.5741
female_ind,-0.273392,0.001886,0.331374,-0.321155,0.265959
ck_acct_ind,0.315711,0.742049,0.133858,-0.117043,0.115164
sv_acct_ind,-0.292691,0.027193,0.740188,-0.053062,0.198341

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.629359,21.911322,21.911322,1.0
Factor 2,1.491809,12.431742,34.343063,1.327603
Factor 3,1.273821,10.615178,44.958241,1.436715
Factor 4,1.224898,10.207487,55.165728,1.465126
Factor 5,1.076851,8.973762,64.13949,1.562597

Mean,Standard Deviation,Minimum,Maximum
0.097709,0.110684,0.000835,0.657502

0,1
ErrorMessage,Constant columns detected...run terminated.

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices

Mean,Standard Deviation,Minimum,Maximum

Column Name,Value
state_code,OH

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,5
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,AZ

Column Name,Mean,Standard Deviation
tot_age,49.732143,14.89244
tot_income,30712.77381,34430.471988
tot_cust_years,5.14881,3.268374
tot_children,1.488095,1.226514
single_ind,0.291667,0.455208
married_ind,0.5,0.500746
separated_ind,0.125,0.331212
female_ind,0.666667,0.472108
ck_acct_ind,0.791667,0.406722
sv_acct_ind,0.708333,0.455208

0,1
Factor 1,2.257423
Factor 2,1.8858
Factor 3,1.675558
Factor 4,1.382453
Factor 5,1.107025
(Factor 6),0.943799
(Factor 7),0.746528
(Factor 8),0.685115
(Factor 9),0.533775
(Factor 10),0.406981

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5
tot_age,0.018475,-0.472276,0.723061,0.194102,0.196293
tot_income,-0.483236,0.010237,-0.325488,0.593096,0.168711
tot_cust_years,-0.068,0.009052,0.399133,0.479335,-0.276916
tot_children,-0.215484,0.571824,-0.021108,0.058883,-0.552165
single_ind,0.664684,-0.206506,-0.590052,-0.082188,0.123235
married_ind,-0.871889,-0.208598,0.152171,0.01409,0.075168
separated_ind,0.252072,0.72554,0.218605,0.017027,-0.256579
female_ind,0.580986,-0.38512,0.259711,0.149626,-0.25087
ck_acct_ind,0.446466,0.335422,0.204597,0.348374,0.34835
sv_acct_ind,0.101594,0.332917,0.589479,-0.501296,0.303386

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.257423,18.811855,18.811855,1.0
Factor 2,1.8858,15.714997,34.526852,1.094104
Factor 3,1.675558,13.962982,48.489834,1.160718
Factor 4,1.382453,11.520444,60.010278,1.277854
Factor 5,1.107025,9.22521,69.235488,1.427998

Mean,Standard Deviation,Minimum,Maximum
0.094335,0.090667,0.000354,0.546692

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,5
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,OTHER

Column Name,Mean,Standard Deviation
tot_age,45.615266,19.667444
tot_income,27456.931408,34407.071396
tot_cust_years,5.896854,3.036734
tot_children,1.764569,1.462986
single_ind,0.361011,0.480356
married_ind,0.487365,0.499905
separated_ind,0.079422,0.270432
female_ind,0.555957,0.496923
ck_acct_ind,0.700361,0.458159
sv_acct_ind,0.563177,0.496057

0,1
Factor 1,2.321972
Factor 2,1.646679
Factor 3,1.368271
Factor 4,1.226978
Factor 5,1.103589
(Factor 6),0.956726
(Factor 7),0.875701
(Factor 8),0.852817
(Factor 9),0.624229
(Factor 10),0.502812

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5
tot_age,0.665624,0.05161,-0.183628,0.358641,0.214912
tot_income,0.385031,0.390801,0.110077,0.015606,-0.529751
tot_cust_years,0.094121,0.041642,-0.468602,0.341525,-0.293792
tot_children,0.364058,0.074458,0.188264,-0.66326,-0.027094
single_ind,-0.894715,-0.007266,0.09488,0.224257,-0.183862
married_ind,0.820549,-0.366022,-0.161498,0.1035,0.066519
separated_ind,0.074222,0.61523,0.036838,-0.448438,0.125721
female_ind,-0.080952,0.052327,0.01023,0.140072,0.594952
ck_acct_ind,0.038283,0.632188,0.25577,0.3209,0.29175
sv_acct_ind,-0.21293,0.405855,-0.601134,-0.101326,0.313672

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.321972,19.349767,19.349767,1.0
Factor 2,1.646679,13.722322,33.072089,1.187474
Factor 3,1.368271,11.402255,44.474345,1.302694
Factor 4,1.226978,10.224813,54.699158,1.375657
Factor 5,1.103589,9.196572,63.89573,1.450524

Mean,Standard Deviation,Minimum,Maximum
0.105566,0.106694,0.000643,0.617016

0,1
Database,demo_user
Tablename,VAL_ADS2
NumberOfVariables,12
MinimumEigenvalue,1
NumberOfFactors,6
MatrixType,Correlation
Rotation,

Column Name,Value
state_code,NY

Column Name,Mean,Standard Deviation
tot_age,46.324433,19.678415
tot_income,31286.770628,36231.317043
tot_cust_years,5.735648,2.843657
tot_children,1.716288,1.405786
single_ind,0.345794,0.475785
married_ind,0.523364,0.499621
separated_ind,0.037383,0.189762
female_ind,0.551402,0.497517
ck_acct_ind,0.654206,0.475785
sv_acct_ind,0.598131,0.49044

0,1
Factor 1,2.307843
Factor 2,1.651257
Factor 3,1.381902
Factor 4,1.252274
Factor 5,1.063394
Factor 6,1.044999
(Factor 7),0.928781
(Factor 8),0.743875
(Factor 9),0.653628
(Factor 10),0.485212

Column Name,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6
tot_age,-0.572956,0.187974,0.502888,0.121042,-0.423824,-0.149913
tot_income,-0.226151,0.48437,-0.219979,0.293629,0.097404,0.459265
tot_cust_years,-0.295269,0.095311,0.321071,0.404916,0.031758,0.260853
tot_children,-0.327693,0.117303,-0.406075,-0.474086,0.396659,0.25479
single_ind,0.88882,-0.134233,0.014082,0.288029,0.087604,0.054661
married_ind,-0.851315,-0.031525,-0.147298,-0.061746,0.089166,-0.344425
separated_ind,0.002393,-0.021254,0.316774,-0.503837,-0.369503,0.653431
female_ind,0.25333,0.180556,0.16986,-0.573849,0.086992,-0.249835
ck_acct_ind,0.238562,0.714128,0.337529,-0.214666,0.128483,-0.156799
sv_acct_ind,0.306631,0.437123,-0.302989,-0.060075,-0.540391,-0.203121

Factor,Variance,Percent of Total,Cumulative Percent,Condition Indices
Factor 1,2.307843,19.232027,19.232027,1.0
Factor 2,1.651257,13.760475,32.992502,1.182213
Factor 3,1.381902,11.515851,44.508353,1.292303
Factor 4,1.252274,10.435618,54.943972,1.357543
Factor 5,1.063394,8.86162,63.805592,1.473181
Factor 6,1.044999,8.708324,72.513916,1.48609

Mean,Standard Deviation,Minimum,Maximum
0.084681,0.09116,0.000347,0.567635


---

## Factor Analysis Scoring

### Purpose

When Factor Analysis is performed and a factor model created based on columns in an input table,  another table containing the same columns can be used to create a table of factor scores.  The scoring process expresses each factor as a linear combination of the input columns.   The score output table contains one or more index (key) columns and factor score columns, one for each factor.  It may also contain columns that have been “retained” from the score input table.  When factor analysis was based on a correlation matrix, scoring input data is normalized by subtracting the mean and dividing by the standard deviation.

Some of the key features of Factor Scoring are outlined below.

- Factor Scoring is performed entirely by generating and executing SQL.  An option is available however to generate the scoring SQL and not execute it.  Refer to the gensqlonly parameter.

- A random sample of the score output table may by requested and returned in a result set.

- If multiple factor models were built by means of one or more group by columns, the resulting score table will include these columns and score the grouped input columns accordingly.

### Required Parameters

- **database**

    The database containing the input table.

- **tablename**

    The name of the input table to score.

- **modeldatabase**

    The database containing the factor model table.

- **modeltablename**

    The name of the table containing the factor model to use in scoring, built by the Factor Analysis function.

- **FactorScoring**

    The FactorScoring parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


### Optional Parameters

- **index**

    By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns should form a unique key for the score output table, or there will be more than one score for a given observation.

- **gensqlonly**

    When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.

- **outputdatabase**

    The database where the output score table will be built.  (If the scoring method is not evaluate,  this parameter is required.)

- **outputtablename**

    The name of the output score table to be built.  (If the scoring method is not evaluate,  this parameter is required.)

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **retain**

    One or more columns from the input table can optionally be specified here to be passed along to the score output table.

- **samplescoresize**

    When a scoring function produces a score table, the user has the option to view a sample of the rows using the "samplescoresize=n" parameter, where n is an integer number of rows to view in a result set.  Cases where a sample is not returned include when you only generating SQL and when you are only evaluating (i.e. not scoring).  By default, a sample of output score rows is not returned.

- **scoringmethod**

    Three scoring methods are available as outlined below. By default, the model is scored but not evaluated, as requested in this manner:  scoringmethod=score.
    - score
    - evaluate
    - scoreandevaluate

---

1.  Lets score the single Factor Analysis created above.  The scores for each factor, for every customer identifier is returned as a result set.  An XML report is also returned for the factor model with an "_rpt" extension

In [49]:
call ${VALDB}.td_analyze('factorscore','database=${QLID};tablename=VAL_ADS;
                          modeldatabase=${QLID};
                          modeltablename=FactorAnalysisOut1;
                          outputdatabase=${QLID};
                          outputtablename=FactorAnalysisScore1;
                          samplescoresize=10;
                          scoringmethod=scoreandevaluate');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,cust_id,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6,Factor 7,Factor 8,Factor 9
1,27265360,0.920364175627314,0.707435926338177,1.3232244681732364,-0.0093186150585445,-0.2834463298734341,0.5684294249513366,-2.138460795767878,-0.4904482601803492,0.8217440270497527
2,14989854,0.8269805651807115,-1.4595948474049154,-0.1508871064915715,0.4691065391633455,-0.1432152030224197,0.5268018517344455,0.0274123494165285,0.5690525595078287,0.0092549564254909
3,13629740,-1.0471236994059,-0.1784845425046314,-1.470941015919394,-0.6046325172536758,-0.6851703704840792,1.4189647142778423,0.0133141253152358,-0.2896375954140978,-0.7098368507054906
4,20443905,1.0551782962848115,-1.7263054160119151,0.3130585787042284,-1.5027212737550544,-0.1534519160912197,0.3616457971674455,-0.1466711893108514,2.376811179159229,1.949371063559291
5,19077996,0.7023090760689115,-1.4310728452189154,0.0519605717400284,0.3283351265065455,-0.1932983264000197,0.6165556910004455,-0.0013817966622314,0.6384730615306288,-0.133199745066909
6,17714333,-1.174191304439682,0.3899500952665865,0.3960710498724131,-1.3335391681042212,0.5623010834724947,1.5083142492524613,0.39579422350118,2.208803948969771,3.281300164750752
7,27269860,1.43319695304053,2.8389893285096197,1.1540993599780975,-1.2787082623216095,-2.2031913002946544,1.6781464373229864,-1.7447067612245322,-0.5482478904544607,0.2498349624960451
8,19082490,0.6011501074340164,-0.150974659607474,0.794652184418822,-0.4518193189103948,0.4877437266582125,-0.4479508719772356,-0.3970316151010923,-0.0175033538336545,-0.0722289237795867
9,19079648,0.8403285717259608,-0.1079034280422927,-0.7066946605379278,-0.6635858254619036,0.6315233129445513,-0.8438504303187857,0.09743744820463,-0.2409133393844815,0.1472063621612959
10,20445690,-1.0467073013568438,0.7837115238433147,-0.9658528163682216,-0.0016381828160403,0.945162370333801,0.9077691090366616,0.008051526114726,-0.3660148715132715,0.3953896609297004


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [50]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=FactorAnalysisScore1;
                           analysistype=factorscore');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html
Variable Name,Standard Error of Estimate,Unnamed: 2_level_1
1,1,Factor Analysis Scoring SummaryVariable NameStandard Error of Estimatetot_age0.5787157430400637tot_income0.6872743661722412tot_cust_years0.7236125249097635tot_children0.6000574547510598single_ind0.3426100825538552married_ind0.4544751782745972separated_ind0.7099246691059673female_ind0.6973546379206703ck_acct_ind0.5654959176282385sv_acct_ind0.6072129466263573sv_avg_bal0.5398433720454824ck_avg_bal0.6122756726452664ca_resident_ind0.46632704540973097ny_resident_ind0.32915308880423666tx_resident_ind0.38772578195820684il_resident_ind0.4984593037162908az_resident_ind0.3968093433709576oh_resident_ind0.70004081925574
Variable Name,Standard Error of Estimate,
tot_age,0.5787157430400637,
tot_income,0.6872743661722412,
tot_cust_years,0.7236125249097635,
tot_children,0.6000574547510598,
single_ind,0.3426100825538552,
married_ind,0.4544751782745972,
separated_ind,0.7099246691059673,
female_ind,0.6973546379206703,

Variable Name,Standard Error of Estimate
tot_age,0.5787157430400637
tot_income,0.6872743661722412
tot_cust_years,0.7236125249097635
tot_children,0.6000574547510598
single_ind,0.3426100825538552
married_ind,0.4544751782745972
separated_ind,0.7099246691059673
female_ind,0.6973546379206703
ck_acct_ind,0.5654959176282385
sv_acct_ind,0.6072129466263573


2.  Next lets score the multiple Factor Analysis created above creating factors by state_code.  The scores for each factor, for every customer identifier by state_code is returned as a result set. An XML report is also returned for the factor model with an "_rpt" extension

In [51]:
call ${VALDB}.td_analyze('factorscore',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          modeldatabase=${QLID};
                          modeltablename=FactorAnalysisOut2;
                          outputdatabase=${QLID};
                          outputtablename=FactorAnalysisScore2;
                          samplescoresize=10;
                          scoringmethod=scoreandevaluate');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,cust_id,state_code,Factor 1,Factor 2,Factor 3,Factor 4,Factor 5,Factor 6
1,23171748,OTHER,0.6009743277376071,-1.5153009156705937,-0.0945539230224775,0.2430208869930242,0.2796228669798371,
2,29978608,CA,1.341250397118669,1.251637693384589,-0.5852720886311874,0.8273590026153393,-1.793989742546112,
3,21806480,CA,0.6497963127988153,-0.133495071357383,0.9099431093836244,-0.2286986123352029,0.3037143355243105,
4,27268440,OTHER,0.800665948179427,-0.4659839048473532,-1.3781690213337237,-0.3903759800606103,-1.8384932633917703,
5,13630850,OTHER,1.0423034988612072,-1.4085776955721936,0.0712158205679224,-0.6312082358186558,-0.9802572529921628,
6,25898691,TX,1.0871747123953328,1.1967423894176872,-0.5816329457320366,-1.1735403881727788,-0.7216681256997599,
7,29979180,OTHER,0.7814785062164841,0.3652522767719626,-0.993220448247541,1.2767307032580182,0.2821752912877676,
8,17720417,TX,0.8179530971981408,-1.3302287824266927,-1.214589329640937,1.215487522362391,-0.8589119254269879,
9,31343066,OTHER,-1.2542169602992563,0.4750898443898486,-0.0917152856399536,0.7323891368534118,-0.8717188691332672,
10,14989887,NY,1.7334887496991866,-0.1000951352211321,-0.3589286652504138,-0.9214134212397844,0.4019149588064442,-0.5641195866102243


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [52]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=FactorAnalysisScore2;
                           analysistype=factorscore');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html
Column Name,Value,Unnamed: 2_level_1
Variable Name,Standard Error of Estimate,Unnamed: 2_level_2
Column Name,Value,Unnamed: 2_level_3
Variable Name,Standard Error of Estimate,Unnamed: 2_level_4
Column Name,Value,Unnamed: 2_level_5
Variable Name,Standard Error of Estimate,Unnamed: 2_level_6
Column Name,Value,Unnamed: 2_level_7
Variable Name,Standard Error of Estimate,Unnamed: 2_level_8
Column Name,Value,Unnamed: 2_level_9
Variable Name,Standard Error of Estimate,Unnamed: 2_level_10
Column Name,Value,Unnamed: 2_level_11
Variable Name,Standard Error of Estimate,Unnamed: 2_level_12
1,1,Factor Analysis Scoring SummaryGroup By ColumnsColumn NameValuestate_codeAZVariable NameStandard Error of Estimatetot_age0.0tot_income0.0tot_cust_years0.0tot_children0.0single_ind0.0married_ind0.0separated_ind0.0female_ind0.0ck_acct_ind0.0sv_acct_ind0.0sv_avg_bal0.0ck_avg_bal0.0Group By ColumnsColumn NameValuestate_codeCAVariable NameStandard Error of Estimatetot_age0.0tot_income0.0tot_cust_years0.0tot_children0.0single_ind0.0married_ind0.0separated_ind0.0female_ind0.0ck_acct_ind0.0sv_acct_ind0.0sv_avg_bal0.0ck_avg_bal0.0Group By ColumnsColumn NameValuestate_codeILVariable NameStandard Error of Estimatetot_age0.0tot_income0.0tot_cust_years0.0tot_children0.0single_ind0.0married_ind0.0separated_ind0.0female_ind0.0ck_acct_ind0.0sv_acct_ind0.0sv_avg_bal0.0ck_avg_bal0.0Group By ColumnsColumn NameValuestate_codeNYVariable NameStandard Error of Estimatetot_age0.6556437347930635tot_income0.6325176658639898tot_cust_years0.6852513237914408tot_children0.6334448644926364single_ind0.3511532450439314married_ind0.3519919623077654separated_ind0.5143215823432199female_ind0.7675675485916016ck_acct_ind0.6739732072642204sv_acct_ind0.62882022245207sv_avg_bal0.6403998925050896ck_avg_bal0.5811424533236349Group By ColumnsColumn NameValuestate_codeOTHERVariable NameStandard Error of Estimatetot_age0.0tot_income0.0tot_cust_years0.0tot_children0.0single_ind0.0married_ind0.0separated_ind0.0female_ind0.0ck_acct_ind0.0sv_acct_ind0.0sv_avg_bal0.0ck_avg_bal0.0Group By ColumnsColumn NameValuestate_codeTXVariable NameStandard Error of Estimatetot_age0.0tot_income0.0tot_cust_years0.0tot_children0.0single_ind0.0married_ind0.0separated_ind0.0female_ind0.0ck_acct_ind0.0sv_acct_ind0.0sv_avg_bal0.0ck_avg_bal0.0
Column Name,Value,
state_code,AZ,
Variable Name,Standard Error of Estimate,
tot_age,0.0,
tot_income,0.0,
tot_cust_years,0.0,
tot_children,0.0,
single_ind,0.0,
married_ind,0.0,

Column Name,Value
state_code,AZ

Variable Name,Standard Error of Estimate
tot_age,0.0
tot_income,0.0
tot_cust_years,0.0
tot_children,0.0
single_ind,0.0
married_ind,0.0
separated_ind,0.0
female_ind,0.0
ck_acct_ind,0.0
sv_acct_ind,0.0

Column Name,Value
state_code,CA

Variable Name,Standard Error of Estimate
tot_age,0.0
tot_income,0.0
tot_cust_years,0.0
tot_children,0.0
single_ind,0.0
married_ind,0.0
separated_ind,0.0
female_ind,0.0
ck_acct_ind,0.0
sv_acct_ind,0.0

Column Name,Value
state_code,IL

Variable Name,Standard Error of Estimate
tot_age,0.0
tot_income,0.0
tot_cust_years,0.0
tot_children,0.0
single_ind,0.0
married_ind,0.0
separated_ind,0.0
female_ind,0.0
ck_acct_ind,0.0
sv_acct_ind,0.0

Column Name,Value
state_code,NY

Variable Name,Standard Error of Estimate
tot_age,0.6556437347930635
tot_income,0.6325176658639898
tot_cust_years,0.6852513237914408
tot_children,0.6334448644926364
single_ind,0.3511532450439314
married_ind,0.3519919623077654
separated_ind,0.5143215823432199
female_ind,0.7675675485916016
ck_acct_ind,0.6739732072642204
sv_acct_ind,0.62882022245207

Column Name,Value
state_code,OTHER

Variable Name,Standard Error of Estimate
tot_age,0.0
tot_income,0.0
tot_cust_years,0.0
tot_children,0.0
single_ind,0.0
married_ind,0.0
separated_ind,0.0
female_ind,0.0
ck_acct_ind,0.0
sv_acct_ind,0.0

Column Name,Value
state_code,TX

Variable Name,Standard Error of Estimate
tot_age,0.0
tot_income,0.0
tot_cust_years,0.0
tot_children,0.0
single_ind,0.0
married_ind,0.0
separated_ind,0.0
female_ind,0.0
ck_acct_ind,0.0
sv_acct_ind,0.0


---

## Logistic Regression

### Purpose

Logistic Regression is one of the most widely used types of statistical analysis.  In Logistic Regression, a set of independent variables (in this case columns) is processed to predict the value of a dependent variable (column) that assumes two values referred to as response (1) and non-response (0).  Actually, the user specifies what value of the dependent variable to treat as the response, and all other values assumed by the dependent variable are treated as non-response.  The result is not however a continuous numeric variable as seen in Linear Regression, but rather a probability between 0 and 1 that the response value is assumed by the dependent variable.  There are many types of analysis that lend themselves to the use of Logistic Regression, and when scoring a model, benefit from the estimation of a probability rather than a fixed value. For example, when predicting who should be targeted for a marketing campaign, the scored customers can be ordered by the predicted probability from most to least likely, and the top n values taken from the customer list.

Some of the key features of this version of Logistic Regression are outlined below:

- The Teradata supplied table operator CALCMATRIX is used to build an ESSCP matrix for purposes of validating the input data, such as by checking for constant values.  Also, to avoid rebuilding this matrix every time the algorithm is run, the user may run the Matrix Analysis separately, saving an ESSCP matrix in a table that can then be input to Logistic Regression.  Refer to the matrixdatabase and matrixtablename parameters.

- One or more group by columns may optionally be specified so that an input matrix is built for each combination of group by column values, and subsequently a separate Logistic Regression model is built for each matrix.  To achieve this, the names of the group by columns are passed to CALCMATRIX as parameters, so it includes them as columns in the matrix table it creates.  Refer to the groupby parameter.

- The stepwise feature for Logistic Regression is a technique for selecting the independent variables in a logistic model.  It consists of different methods of “trying” variables and adding or removing them from a model through a series of forward and backward steps described in the parameter section.  The following parameters are used with this feature, as described in the section of Optional Parameters.
    - stepwise
    - forward
    - forwardonly
    - backward
    - backwardonly
    - enter
    - remove
    

- A Statistics Table is available, displaying the mean and standard deviation of each model variable.  Refer to the statstable parameter.

- A Success Table is available, displaying counts of predicted versus actual values of the dependent variable in the logistic model.  

- A Multi-Threshold Success Table is available.  Refer to the thresholdtable parameter.

- A Lift Table, such as would be used to build a Lift Chart, is available.  Refer to the lifttable parameter.

- A Near Dependency Report is available to identify two or more columns that may be collinear.  This report can be requested by setting parameter neardependencyreport=true and if desired, conditionindexthreshold (default 30) and varianceproportionthreshold (default 0.5).

- The algorithm is partially scalable because the size of each input matrix depends only on the number of independent variables (columns) and not on the size of the input table. The calculations performed on the client workstation however are not scalable when group by columns are used, because each model is built serially based on each matrix in the matrix table.

### Required Parameters

- **columns**

    The input columns representing the independent variables used in building a logistic regression model.  The columns must reside in the table named with the tablename parameter, and within the database specified by the database parameter. For example, columns=c1,c2,c3. When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.

- **database**

    The database containing the input table.

- **dependent**

    The name of the column acting as the dependent variable that is being predicted.

- **Logistic**

    The logistic parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


- **tablename**

    The input table to build a logistic regression model from.
    
### Optional Parameters

- **backward**

    Backward steps, i.e. removing variables from a model, use the P-value of the T-statistic, i.e. the ratio of a B-coefficient to its standard error.  The variable (column) with the largest P-value is removed if the P-value exceeds the criterion to remove.

- **backwardonly**

    This technique is similar to the backward technique except that a forward step is not performed.  It starts with all independent variables in the model.  Backward steps are executed until no more are possible.

- **conditionindexthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table that ends in “_txt”.  One of the threshold parameters for that report is conditionindexthreshold with a default value of 30.

- **constant**

    By default, constant=true so that a constant term is included in the logistic model.   

- **convergence**

    The convergence criterion such that the algorithm stops iterating when the change in log likelihood function falls below this value.  By default, this value is .001.   

- **enter**

    The criterion to enter a variable into the model.  The W-statistic chi-square P-value must be less than this value for a variable to be added.  The default value is 0.05.

- **forward**

    In this technique, starting with no independent variables in the model, a forward step is performed, adding the “best” choice, followed by a backward step, removing the “worst” choice.  Refer to the stepwise parameter for a description of the steps in this technique.

- **forwardonly**

    This technique is similar to the forward technique except that a backward step is not performed.

- **groupby**

    The input columns dividing the input table into partitions, one for each combination of values in the group by columns.  For each partition or combination of values a separate logistic model and XML report object is built.  The group by columns must reside in the table named with the tablename parameter. The default case is no group by columns.  For example:   groupby=column1,column2

- **lifttable**

    A table of information, such as would be required to build a lift chart is available.  It splits up the computed probability values into deciles with the usual counts and percentages to demonstrate what happens when more and more rows of ordered probabilities are accumulated.  It is delivered in the function’s XML output string

- **matrixdatabase**

    The database where the matrix table resides if specified, as indicated by the matrixtablename parameter.

- **matrixtablename**

    Instead of internally building a matrix with the Matrix function each time this analysis is performed, the user may build an ESSCP Matrix once with the Matrix Analysis and save it to a table with this name in matrixdatabase.  The matrix can subsequently be read from this table instead of re-building it each time.   If the matrix table is specified, the columns specified with the columns parameter may be a subset of the columns in this matrix and may be specified in any order.  The columns must however all be present in the matrix.  Further, if group by columns are specified in the matrix, these same group by columns must be specified as such in this analysis.

- **maxiterations**

    The maximum number of attempts to converge on a solution.  By default, this value is 100.   

- **neardependencyreport**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table if specified.  Two threshold parameters are available for this report, conditionindexthreshold (default 30) and varianceproportionthreshold (default 0.5).

- **outputdatabase**

    The database that contains the resulting output table that represents one or more logistic models.  

- **outputtablename**

    The name of the output table representing one or more logistic models. A second output table reporting statistical measures is automatically named on the user’s behalf by appending “_rpt” to the end of this name.  A third XML output table containing requested reports is automatically named on the user’s behalf by appending “_txt” to the end of this name.  If outputdatabase and outputtablename are not both specified, volatile output tables with randomly generated names are created in the logon user database, and the two output result sets are returned to the user instead.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **remove**

    The criterion to remove a variable from the model.  The T-Statistic P-value must be greater than this value for a variable to be removed.  The default value is 0.05.

- **response**

    The value assumed by the dependent column that is to be treated as the response value.  For example, if the dependent column is gender, the response value might be set to ‘F’ using response=Y.

- **statstable**

    An optional data quality report, delivered in the function’s XML output string, including the mean and standard deviation of each model variable, derived from an ESSCP matrix.

- **stepwise**

    When stepwise=true the stepwise procedure is performed.  By default, stepwise=false. 

    - Forward steps, i.e. adding variables to a model, add the variable with the smallest chi-square P-value connected to its special W-statistic, provided the P-value is less than the criterion to enter.
    - Backward steps, i.e. removing variables from a model, use the P-value of the T-statistic, i.e. the ratio of a B-coefficient to its standard error.  The variable (column) with the largest P-value is removed if the P-value exceeds the criterion to remove.
    

- **successtable**

    A table delivered in the function’s XML output string, displaying counts of predicted versus actual values of the dependent variable of the logistic regression model.  The default is to not produce a successtable.  (This report is similar to the Decision Tree Confusion Matix, but the Successtable only includes two values of the dependent variable, namely response versus non-response.)

- **thresholdbegin**

    The beginning threshold value utilized in the Multi-Threshold Success Table.

- **thresholdend**

    The ending threshold value utilized in the Multi-Threshold Success Table.

- **thresholdincrement**

    The difference in threshold values between adjacent rows in the Multi-Threshold Success Table.

- **thresholdtable**

    When this parameter is set to true, the Multi-Threshold Success Table is produced and included in the XML output string in the result table.  The default is thresholdtable=false.  This report can be thought of as a table where each row is a Prediction Success Table, and each row has a different threshold value as generated by the thresholdbegin, thresholdend and thresholdincrement parameters.  What is meant by a threshold here is the value above which the predicted probability indicates a response. 

- **varianceproportionthreshold**

    If neardependencyreport=true, an XML report showing columns that may be collinear is produced and stored in the output table that ends in “_txt”.  One of the threshold parameters for that report is varianceproportionthreshold with a default value of 0.5.
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1) Once again using the VAL_ADS table, lets build a logistic regression model to predict the customer bases propensity to open a credit card account (cc_acct_ind) based upon all non-credit card variables in the analytic data set.  The model coefficients and variable statistics are created within the outputtablename specified.  Additionally, model statistics are created within a table with a "_rpt" extension on the outputtablename.  The reports for the successtabpe, thresholdtable and lifttable are returned in an XML string within a table with a "_txt" extension on the outputtablename.

In [53]:
call ${VALDB}.td_analyze('logistic',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind;
                          dependent=cc_acct_ind;
                          outputdatabase=${QLID};
                          outputtablename=LogisticOut1;
                          statstable=true;
                          successtable=true;
                          thresholdtable=true;
                          lifttable=true;');

Success: 0 rows affected

In [54]:
SELECT * FROM ${QLID}.LogisticOut1 ORDER BY 2 DESC;

Unnamed: 0,Column Name,B Coefficient,Standard Error,Wald Statistic,T Statistic,P-Value,Odds Ratio,Lower,Upper,Partial R,Standardized Coefficient
1,ck_acct_ind,1.41694157310132,0.0529027726520376,717.3762873144,26.783881109996,0.0,4.12448668936362,3.7182534078909,4.57510249695084,0.227507922436023,0.359316822965475
2,sv_acct_ind,0.892435142171466,0.0500383063441665,318.088614013848,17.8350389406317,0.0,2.44106676411975,2.21302833282337,2.69260309888932,0.15122851250749,0.24402679691192
3,separated_ind,0.827057542654958,0.140802414908,34.5025541755162,5.87388748407017,4.38690350712534e-09,2.28658066580227,1.7351430863187,3.01326800218734,0.0484939990664374,0.111816512113528
4,oh_resident_ind,0.429352619416101,0.140300942826966,9.36498432067098,3.06022618782844,0.002217312793556,1.53626265493914,1.16692002960001,2.02250615731537,0.0230842180346325,0.0408938807162516
5,ny_resident_ind,0.361872823811637,0.0729659240839164,24.5964123220205,4.9594770210195,7.17962736151989e-07,1.43601630326707,1.24465980117528,1.65679233900029,0.0404342134842815,0.0698954416062522
6,female_ind,0.31232521735501,0.0459546159404579,46.1908357272112,6.79638401852126,1.1313172620930298e-11,1.3665990624997,1.248890834733,1.49540131585986,0.0565451501720267,0.0854878015521268
7,tot_children,0.0748175492063465,0.0177737799603883,17.7193324842777,4.20943374865049,2.58156184260727e-05,1.07768750800521,1.04079160327442,1.11589136697161,0.0337245609407021,0.0603359719520308
8,single_ind,0.0721097490918346,0.0969888978803446,0.552769306694193,0.743484570582465,0.457205091991668,1.07477329298413,0.88870938524041,1.29979231737209,0.0,0.0191897796083525
9,tot_cust_years,0.0259334164666221,0.0079680312417378,10.5929619538741,3.25468308040493,0.0011388045787583,1.02627261334166,1.0103697593331,1.04242577251156,0.0249345020017582,0.0425205412656912
10,tot_age,0.01293176059438,0.0014963745825018,74.6852203323379,8.64206111598025,0.0,1.01301573740934,1.01004907653894,1.01599111179369,0.0725190840262232,0.137362057698198


In [55]:
SELECT * FROM ${QLID}.LogisticOut1_rpt ORDER BY 1;

Unnamed: 0,rid,Total Observations,Total Iterations,Initial Log Likelihood,Final Log Likelihood,Likelihood Ratio Test G Statistic,Chi-Square Degrees of Freedom,Chi-Square Value,Chi-Square Probability,McFaddens Pseudo R-Squared,Dependent Variable,Dependent Response Value,Total Distinct Values
1,1,10458,5,-6910.53221147233,-5828.50963922532,2164.04514449402,18,28.8692994303784,0,0.156575866971682,cc_acct_ind,1,2


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [56]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=LogisticOut1;
                           analysistype=logistic');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0
Column Name,Mean,Standard Deviation,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1,"Logistic Regression SummaryDatabasedemo_userTablenameVAL_ADSIndependentVariables18DependentVariablecc_acct_indResponseValue1StepwisenoneMemorySize0SamplefalseConstantIncludeConstantVariable StatisticsColumn NameMeanStandard Deviationtot_age44.3363931918148819.266298604195214tot_income30066.26892331227638278.04592303946tot_cust_years5.7647733792312112.9739126261567628tot_children1.78628800917957541.4627226627996837single_ind0.369477911646586330.48268650564612536married_ind0.47255689424364120.4992701787167255separated_ind0.06425702811244980.2452219657381557female_ind0.55957161981258360.4964622746742119ck_acct_ind0.69611780455153950.4599543847377425sv_acct_ind0.56358768406961180.4959638365813491sv_avg_bal1203.30717290854183692.6999970526526ck_avg_bal3595.5612057686537434.232542388909ca_resident_ind0.236947791164658640.4252303206717818ny_resident_ind0.143239625167336020.3503337991449125tx_resident_ind0.11111111111111110.31428470692916693il_resident_ind0.07496653279785810.2633499256684392az_resident_ind0.03212851405622490.17634978414691338oh_resident_ind0.0307898259705488630.17275608918786464cc_acct_ind0.62650602409638560.48375469285712086Prediction Success Table-Estimate ResponseEstimate Non-ResponseActual TotalActual Response4593.80/43.93%1958.20/18.72%6552.00/62.65%Actual Non-Response1958.20/18.72%1947.80/18.62%3906.00/37.35%Actual Total6552/62.65060240963856%3895.6/37.34939759036144%10458.00/100.0%Multi-Threshold Success TableThreshold ProbabilityActual Response, Estimate ResponseActual Response, Estimate Non-ResponseActual Non-Response, Estimate ResponseActual Non-Response, Estimate Non-Response0.0065520390600.05655203887190.10655023863430.15654843819870.2065262636502560.2564589433635430.30635519730268800.356194358268012260.406018534229016160.455857695204318630.505630922188520210.5553311221170721990.6049591593142524810.6544672085118027260.703929262398129250.753196335670232040.802279427342434820.851250530220936970.9042961237838280.9571648173899Cumulative Lift TableDecileCountResponseResponse (%)Captured Response (%)LiftCumulative ResponseCumulative Response (%)Cumulative Captured Response (%)Cumulative Lift11045.00903.0086.4113.781.38903.0086.4113.781.3821046.00876.0083.7513.371.341779.0085.0827.151.3631046.00858.0082.0313.101.312637.0084.0640.251.3441046.00774.0074.0011.811.183411.0081.5452.061.3051046.00755.0072.1811.521.154166.0079.6763.581.2761045.00727.0069.5711.101.114893.0077.9974.681.2471046.00620.0059.279.460.955513.0075.3184.141.2081046.00521.0049.817.950.806034.0072.1392.091.1591046.00327.0031.264.990.506361.0067.5897.081.08101046.00191.0018.262.920.296552.0062.65100.001.00",,,,,,,
Database,demo_user,,,,,,,,
Tablename,VAL_ADS,,,,,,,,
IndependentVariables,18,,,,,,,,
DependentVariable,cc_acct_ind,,,,,,,,
ResponseValue,1,,,,,,,,
Stepwise,none,,,,,,,,
MemorySize,0,,,,,,,,
Sample,false,,,,,,,,
Constant,IncludeConstant,,,,,,,,

0,1
Database,demo_user
Tablename,VAL_ADS
IndependentVariables,18
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Mean,Standard Deviation
tot_age,44.33639319181488,19.26629860419521
tot_income,30066.26892331228,38278.04592303946
tot_cust_years,5.764773379231211,2.9739126261567628
tot_children,1.7862880091795754,1.4627226627996837
single_ind,0.3694779116465863,0.4826865056461253
married_ind,0.4725568942436412,0.4992701787167255
separated_ind,0.0642570281124498,0.2452219657381557
female_ind,0.5595716198125836,0.4964622746742119
ck_acct_ind,0.6961178045515395,0.4599543847377425
sv_acct_ind,0.5635876840696118,0.4959638365813491

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4593.80/43.93%,1958.20/18.72%,6552.00/62.65%
Actual Non-Response,1958.20/18.72%,1947.80/18.62%,3906.00/37.35%
Actual Total,6552/62.65060240963856%,3895.6/37.34939759036144%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6552,0,3887,19
0.1,6550,2,3863,43
0.15,6548,4,3819,87
0.2,6526,26,3650,256
0.25,6458,94,3363,543
0.3,6355,197,3026,880
0.35,6194,358,2680,1226
0.4,6018,534,2290,1616
0.45,5857,695,2043,1863

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,903.0,86.41,13.78,1.38,903.0,86.41,13.78,1.38
2,1046.0,876.0,83.75,13.37,1.34,1779.0,85.08,27.15,1.36
3,1046.0,858.0,82.03,13.1,1.31,2637.0,84.06,40.25,1.34
4,1046.0,774.0,74.0,11.81,1.18,3411.0,81.54,52.06,1.3
5,1046.0,755.0,72.18,11.52,1.15,4166.0,79.67,63.58,1.27
6,1045.0,727.0,69.57,11.1,1.11,4893.0,77.99,74.68,1.24
7,1046.0,620.0,59.27,9.46,0.95,5513.0,75.31,84.14,1.2
8,1046.0,521.0,49.81,7.95,0.8,6034.0,72.13,92.09,1.15
9,1046.0,327.0,31.26,4.99,0.5,6361.0,67.58,97.08,1.08
10,1046.0,191.0,18.26,2.92,0.29,6552.0,62.65,100.0,1.0


2) Now using the VAL_ADS2 table, lets build a logistic regression model to predict the customer bases propensity to open a credit card account (cc_acct_ind) based upon all non-credit card variables in the analytic data set, creating a model for each state_code using the groupby=state_code option.   The model coefficients and variable statistics are created within the outputtablename specified for each state_code model.  Additionally, model statistics are created within a table with a "_rpt" extension on the outputtablename.  The reports for the successtabpe, thresholdtable and lifttable are returned in an XML string within a table with a "_txt" extension on the outputtablename.

In [57]:
call ${VALDB}.td_analyze('logistic',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal;
                          dependent=cc_acct_ind;
                          outputdatabase=${QLID};
                          outputtablename=LogisticOut2;
                          statstable=true;
                          successtable=true;
                          thresholdtable=true;
                          lifttable=true;
                          groupby=state_code');

Success: 0 rows affected

In [58]:
SELECT * FROM ${QLID}.LogisticOut2 ORDER BY 1;

Unnamed: 0,state_code,Column Name,B Coefficient,Standard Error,Wald Statistic,T Statistic,P-Value,Odds Ratio,Lower,Upper,Partial R,Standardized Coefficient
1,AZ,tot_age,0.117174549594408,0.0233447831133687,25.1934067679785,5.01930341461626,8.58516454593783e-07,1.12431566140664,1.07403181168377,1.17695369237018,0.228408338008192,0.962077333542935
2,AZ,tot_cust_years,0.334494945779435,0.0700972709854063,22.7707271334944,4.77186830638634,2.76934367970938e-06,1.39723452779395,1.21787412613862,1.60300993654363,0.216150145166126,0.602742928950799
3,AZ,tot_children,-0.229863167142019,0.173565949488023,1.75391973863825,-1.32435634881185,0.186321048759985,0.794642328245384,0.565499324988455,1.11663516106254,0.0,-0.155436346060081
4,AZ,single_ind,-11.163254890029,225.004068664933,0.0024615062306996,-0.0496135690179578,0.960460998381238,1.41860014917343e-05,4.24667987414101e-197,4.73882289901095e+186,0.0,-2.80163245812041
5,AZ,separated_ind,18.6147301550942,281.777564979305,0.0043641600186298,0.0660617893992426,0.947369501777696,121415627.432199,1.7170242733414e-232,8.58564134103153e+247,0.0,3.39917761548026
6,AZ,female_ind,13.0745094058481,84.517311436573,0.0239309234183428,0.154696229489742,0.877157421038286,476636.490942025,5.4565905352168e-67,4.1634486412586e+77,0.0,3.40311970057719
7,AZ,ck_acct_ind,16.0631648911163,84.5188483381545,0.0361206114639711,0.190054232954626,0.84938601789024,9465506.850035,1.08036308898837e-65,8.293121159105651e+78,0.0,3.60196724029967
8,AZ,sv_acct_ind,-1.77789992760098,0.580643792995016,9.37551318262348,-3.06194597970367,0.002383730433228,0.168992671720307,0.0541530571687977,0.527366774624546,-0.128802947999384,-0.446198012454761
9,AZ,sv_avg_bal,0.0003054667798734,0.0001009466793455,9.15680358966283,3.0260210821577,0.0026770295600513,1.0003055134396,1.00010762071469,1.00050344532183,0.126878849128253,0.605853731504418
10,AZ,ck_avg_bal,-5.24924917628914e-05,3.64711750510335e-05,2.0715473554448,-1.43928709972847,0.151037576034178,0.999947508885944,0.999876033003206,1.00001898987812,-0.0126860516212762,-0.158270687053858


In [59]:
SELECT * FROM ${QLID}.LogisticOut2_rpt ORDER BY 1;

Unnamed: 0,rid,state_code,Total Observations,Total Iterations,Initial Log Likelihood,Final Log Likelihood,Likelihood Ratio Test G Statistic,Chi-Square Degrees of Freedom,Chi-Square Value,Chi-Square Probability,McFaddens Pseudo R-Squared,Dependent Variable,Dependent Response Value,Total Distinct Values
1,1,AZ,336,13.0,-222.285248021082,-92.4573993509798,259.655697340204,12.0,21.0260698174829,0.0,0.584059670292601,cc_acct_ind,1.0,2.0
2,2,CA,2478,6.0,-1688.67534073488,-1349.65181819766,678.047045074445,12.0,21.0260698174829,0.0,0.200762997101435,cc_acct_ind,1.0,2.0
3,3,IL,784,13.0,-518.665578715858,-412.128587203756,213.073983024204,12.0,21.0260698174829,0.0,0.20540594148521,cc_acct_ind,1.0,2.0
4,4,NY,1498,6.0,-925.499664066902,-782.236052499109,286.527223135585,12.0,21.0260698174829,0.0,0.154795962797277,cc_acct_ind,1.0,2.0
5,5,OH,322,,,,,,,,,,,
6,6,OTHER,3878,5.0,-2528.09916938008,-2036.03370158989,984.130935580389,12.0,21.0260698174829,0.0,0.19463851487711,cc_acct_ind,1.0,2.0
7,7,TX,1162,5.0,-791.125156799731,-650.825318136501,280.599677326462,12.0,21.0260698174829,0.0,0.177342153080776,cc_acct_ind,1.0,2.0


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [60]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=LogisticOut2;
                           analysistype=logistic');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,state_code,html,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0
Column Name,Value,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Column Name,Mean,Standard Deviation,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
Column Name,Value,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,Unnamed: 7_level_6,Unnamed: 8_level_6,Unnamed: 9_level_6
Column Name,Mean,Standard Deviation,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,Unnamed: 8_level_7,Unnamed: 9_level_7
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,Unnamed: 9_level_8
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
Column Name,Value,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11
Column Name,Mean,Standard Deviation,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
Column Name,Value,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16
Column Name,Value,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17
Column Name,Mean,Standard Deviation,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_20,Unnamed: 6_level_20,Unnamed: 7_level_20,Unnamed: 8_level_20,Unnamed: 9_level_20
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
Column Name,Value,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22,Unnamed: 6_level_22,Unnamed: 7_level_22,Unnamed: 8_level_22,Unnamed: 9_level_22
Column Name,Mean,Standard Deviation,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23,Unnamed: 7_level_23,Unnamed: 8_level_23,Unnamed: 9_level_23
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_24,Unnamed: 5_level_24,Unnamed: 6_level_24,Unnamed: 7_level_24,Unnamed: 8_level_24,Unnamed: 9_level_24
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_25,Unnamed: 6_level_25,Unnamed: 7_level_25,Unnamed: 8_level_25,Unnamed: 9_level_25
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
Column Name,Value,Unnamed: 2_level_27,Unnamed: 3_level_27,Unnamed: 4_level_27,Unnamed: 5_level_27,Unnamed: 6_level_27,Unnamed: 7_level_27,Unnamed: 8_level_27,Unnamed: 9_level_27
Column Name,Mean,Standard Deviation,Unnamed: 3_level_28,Unnamed: 4_level_28,Unnamed: 5_level_28,Unnamed: 6_level_28,Unnamed: 7_level_28,Unnamed: 8_level_28,Unnamed: 9_level_28
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_29,Unnamed: 5_level_29,Unnamed: 6_level_29,Unnamed: 7_level_29,Unnamed: 8_level_29,Unnamed: 9_level_29
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_30,Unnamed: 6_level_30,Unnamed: 7_level_30,Unnamed: 8_level_30,Unnamed: 9_level_30
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,OTHER,"Logistic Regression SummaryDatabasedemo_userTablenameVAL_ADS2IndependentVariables12DependentVariablecc_acct_indResponseValue1StepwisenoneMemorySize0SamplefalseConstantIncludeConstantGroup By ColumnsColumn NameValuestate_codeOTHERVariable StatisticsColumn NameMeanStandard Deviationtot_age45.6152656008251719.667443933007437tot_income27456.93140794223334407.07139596879tot_cust_years5.8968540484785973.0367341587872954tot_children1.76456936565239821.462986003042028single_ind0.361010830324909770.48035560873665034married_ind0.487364620938628150.49990477979718123separated_ind0.079422382671480150.270431740590103female_ind0.5559566787003610.49692305776100043ck_acct_ind0.70036101083032490.4581589174631188sv_acct_ind0.56317689530685920.4960565827203649sv_avg_bal953.63006240344522483.8101469059893ck_avg_bal3338.0306973553196855.724854458276cc_acct_ind0.64259927797833940.4792960294513349Prediction Success Table-Estimate ResponseEstimate Non-ResponseActual TotalActual Response4673.21/44.69%1878.79/17.97%6552.00/62.65%Actual Non-Response1973.52/18.87%1932.48/18.48%3906.00/37.35%Actual Total6646.73/63.55641614075349%3864.96/36.44358385924651%10458.00/100.0%Multi-Threshold Success TableThreshold ProbabilityActual Response, Estimate ResponseActual Response, Estimate Non-ResponseActual Non-Response, Estimate ResponseActual Non-Response, Estimate Non-Response0.0065520390600.05655113887190.10654933835710.1565322036712350.2064777534404660.25636818431877190.306196356282810780.356019533244614600.405861691218217240.455708844202318830.5055151037188520210.5553021250172821780.6050461506157123350.6546921860141624900.7042142338121326930.753572298095829480.802785376769432120.851858469440435020.90824572813137750.95111644173899Cumulative Lift TableDecileCountResponseResponse (%)Captured Response (%)LiftCumulative ResponseCumulative Response (%)Cumulative Captured Response (%)Cumulative Lift11045.00899.0086.0313.721.37899.0086.0313.721.3721046.00832.0079.5412.701.271731.0082.7826.421.3231046.00786.0075.1412.001.202517.0080.2438.421.2841046.00793.0075.8112.101.213310.0079.1350.521.2651046.00765.0073.1411.681.174075.0077.9362.191.2461045.00731.0069.9511.161.124806.0076.6073.351.2271046.00659.0063.0010.061.015465.0074.6683.411.1981046.00526.0050.298.030.805991.0071.6191.441.1491046.00335.0032.035.110.516326.0067.2196.551.07101046.00226.0021.613.450.346552.0062.65100.001.00",,,,,,,
Database,demo_user,,,,,,,,
Tablename,VAL_ADS2,,,,,,,,
IndependentVariables,12,,,,,,,,
DependentVariable,cc_acct_ind,,,,,,,,
ResponseValue,1,,,,,,,,
Stepwise,none,,,,,,,,
MemorySize,0,,,,,,,,
Sample,false,,,,,,,,
Constant,IncludeConstant,,,,,,,,

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,OTHER

Column Name,Mean,Standard Deviation
tot_age,45.61526560082517,19.66744393300744
tot_income,27456.931407942237,34407.07139596879
tot_cust_years,5.896854048478597,3.036734158787296
tot_children,1.7645693656523982,1.462986003042028
single_ind,0.3610108303249097,0.4803556087366503
married_ind,0.4873646209386281,0.4999047797971812
separated_ind,0.0794223826714801,0.270431740590103
female_ind,0.555956678700361,0.4969230577610004
ck_acct_ind,0.7003610108303249,0.4581589174631188
sv_acct_ind,0.5631768953068592,0.4960565827203649

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4673.21/44.69%,1878.79/17.97%,6552.00/62.65%
Actual Non-Response,1973.52/18.87%,1932.48/18.48%,3906.00/37.35%
Actual Total,6646.73/63.55641614075349%,3864.96/36.44358385924651%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6551,1,3887,19
0.1,6549,3,3835,71
0.15,6532,20,3671,235
0.2,6477,75,3440,466
0.25,6368,184,3187,719
0.3,6196,356,2828,1078
0.35,6019,533,2446,1460
0.4,5861,691,2182,1724
0.45,5708,844,2023,1883

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,899.0,86.03,13.72,1.37,899.0,86.03,13.72,1.37
2,1046.0,832.0,79.54,12.7,1.27,1731.0,82.78,26.42,1.32
3,1046.0,786.0,75.14,12.0,1.2,2517.0,80.24,38.42,1.28
4,1046.0,793.0,75.81,12.1,1.21,3310.0,79.13,50.52,1.26
5,1046.0,765.0,73.14,11.68,1.17,4075.0,77.93,62.19,1.24
6,1045.0,731.0,69.95,11.16,1.12,4806.0,76.6,73.35,1.22
7,1046.0,659.0,63.0,10.06,1.01,5465.0,74.66,83.41,1.19
8,1046.0,526.0,50.29,8.03,0.8,5991.0,71.61,91.44,1.14
9,1046.0,335.0,32.03,5.11,0.51,6326.0,67.21,96.55,1.07
10,1046.0,226.0,21.61,3.45,0.34,6552.0,62.65,100.0,1.0

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,NY

Column Name,Mean,Standard Deviation
tot_age,46.32443257676903,19.678415347098948
tot_income,31286.77062750336,36231.31704329927
tot_cust_years,5.735647530040054,2.843656750622567
tot_children,1.7162883845126835,1.4057859642562447
single_ind,0.3457943925233644,0.4757853997428691
married_ind,0.5233644859813084,0.4996205928509195
separated_ind,0.0373831775700934,0.1897622568697855
female_ind,0.5514018691588785,0.497516917853331
ck_acct_ind,0.6542056074766355,0.4757853997428691
sv_acct_ind,0.5981308411214953,0.4904395029449176

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4926.09/47.10%,1625.91/15.55%,6552.00/62.65%
Actual Non-Response,2366.93/22.63%,1539.07/14.72%,3906.00/37.35%
Actual Total,7293.02/69.73627844712182%,3078.14/30.26372155287818%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6520,32,3816,90
0.1,6507,45,3777,129
0.15,6496,56,3756,150
0.2,6481,71,3740,166
0.25,6455,97,3707,199
0.3,6428,124,3641,265
0.35,6372,180,3513,393
0.4,6300,252,3352,554
0.45,6193,359,3087,819

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,904.0,86.51,13.8,1.38,904.0,86.51,13.8,1.38
2,1046.0,860.0,82.22,13.13,1.31,1764.0,84.36,26.92,1.35
3,1046.0,836.0,79.92,12.76,1.28,2600.0,82.88,39.68,1.32
4,1046.0,794.0,75.91,12.12,1.21,3394.0,81.14,51.8,1.3
5,1046.0,731.0,69.89,11.16,1.12,4125.0,78.89,62.96,1.26
6,1045.0,627.0,60.0,9.57,0.96,4752.0,75.74,72.53,1.21
7,1046.0,573.0,54.78,8.75,0.87,5325.0,72.75,81.27,1.16
8,1046.0,520.0,49.71,7.94,0.79,5845.0,69.87,89.21,1.12
9,1046.0,382.0,36.52,5.83,0.58,6227.0,66.16,95.04,1.06
10,1046.0,325.0,31.07,4.96,0.5,6552.0,62.65,100.0,1.0

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,TX

Column Name,Mean,Standard Deviation
tot_age,45.098106712564544,20.01078124215592
tot_income,34697.3952667814,46675.59088479758
tot_cust_years,5.495697074010327,2.777490189562881
tot_children,1.6497418244406197,1.3308908256468612
single_ind,0.3614457831325301,0.4806261806037744
married_ind,0.5060240963855421,0.5001789785827747
separated_ind,0.0602409638554216,0.2380351896952735
female_ind,0.4819277108433735,0.4998884302117093
ck_acct_ind,0.7349397590361446,0.4415553180132103
sv_acct_ind,0.6385542168674698,0.4806261806037744

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4141.58/39.60%,2410.42/23.05%,6552.00/62.65%
Actual Non-Response,1697.40/16.23%,2208.60/21.12%,3906.00/37.35%
Actual Total,5838.98/55.83266398929049%,4417.2/44.167336010709505%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6552,0,3906,0
0.1,6530,22,3690,216
0.15,6294,258,3169,737
0.2,6098,454,2849,1057
0.25,6044,508,2694,1212
0.3,5934,618,2533,1373
0.35,5729,823,2295,1611
0.4,5476,1076,2006,1900
0.45,5158,1394,1715,2191

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,864.0,82.68,13.19,1.32,864.0,82.68,13.19,1.32
2,1046.0,820.0,78.39,12.52,1.25,1684.0,80.54,25.7,1.29
3,1046.0,781.0,74.67,11.92,1.19,2465.0,78.58,37.62,1.25
4,1046.0,826.0,78.97,12.61,1.26,3291.0,78.68,50.23,1.26
5,1046.0,801.0,76.58,12.23,1.22,4092.0,78.26,62.45,1.25
6,1045.0,723.0,69.19,11.03,1.1,4815.0,76.75,73.49,1.22
7,1046.0,573.0,54.78,8.75,0.87,5388.0,73.61,82.23,1.17
8,1046.0,492.0,47.04,7.51,0.75,5880.0,70.28,89.74,1.12
9,1046.0,395.0,37.76,6.03,0.6,6275.0,66.67,95.77,1.06
10,1046.0,277.0,26.48,4.23,0.42,6552.0,62.65,100.0,1.0

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant
ErrorMessage,Constant columns detected...run terminated.

Column Name,Value
state_code,OH

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,IL

Column Name,Mean,Standard Deviation
tot_age,41.69642857142857,19.67233991812886
tot_income,36907.45395408163,50113.243725770255
tot_cust_years,5.410714285714286,2.9559883588272724
tot_children,1.9642857142857144,1.4092379035173903
single_ind,0.3571428571428571,0.4794633014853841
married_ind,0.4464285714285714,0.4974391637011028
separated_ind,0.0892857142857142,0.2853377376392277
female_ind,0.5178571428571429,0.5
ck_acct_ind,0.7321428571428571,0.4431254373839364
sv_acct_ind,0.5178571428571429,0.5

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4511.95/43.14%,2040.05/19.51%,6552.00/62.65%
Actual Non-Response,2189.46/20.94%,1716.54/16.41%,3906.00/37.35%
Actual Total,6701.41/64.07926945878752%,3433.08/35.92073054121247%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6547,5,3861,45
0.1,6536,16,3841,65
0.15,6529,23,3818,88
0.2,6510,42,3781,125
0.25,6468,84,3725,181
0.3,6364,188,3583,323
0.35,6156,396,3285,621
0.4,5880,672,2910,996
0.45,5577,975,2575,1331

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,845.0,80.86,12.9,1.29,845.0,80.86,12.9,1.29
2,1046.0,839.0,80.21,12.81,1.28,1684.0,80.54,25.7,1.29
3,1046.0,783.0,74.86,11.95,1.19,2467.0,78.64,37.65,1.26
4,1046.0,720.0,68.83,10.99,1.1,3187.0,76.19,48.64,1.22
5,1046.0,738.0,70.55,11.26,1.13,3925.0,75.06,59.91,1.2
6,1045.0,675.0,64.59,10.3,1.03,4600.0,73.32,70.21,1.17
7,1046.0,542.0,51.82,8.27,0.83,5142.0,70.25,78.48,1.12
8,1046.0,549.0,52.49,8.38,0.84,5691.0,68.03,86.86,1.09
9,1046.0,449.0,42.93,6.85,0.69,6140.0,65.24,93.71,1.04
10,1046.0,412.0,39.39,6.29,0.63,6552.0,62.65,100.0,1.0

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,CA

Column Name,Mean,Standard Deviation
tot_age,40.75544794188862,17.959176348602956
tot_income,30083.408232445512,37812.99550168868
tot_cust_years,5.936238902340597,2.985223043514099
tot_children,1.9584342211460852,1.6004539531608368
single_ind,0.4124293785310734,0.4923710174960566
married_ind,0.4067796610169492,0.4913322589181165
separated_ind,0.0508474576271186,0.2197304660958871
female_ind,0.6045197740112994,0.4890522825372915
ck_acct_ind,0.6666666666666666,0.4714996675314881
sv_acct_ind,0.519774011299435,0.4997096748741504

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4446.18/42.51%,2105.82/20.14%,6552.00/62.65%
Actual Non-Response,1881.59/17.99%,2024.41/19.36%,3906.00/37.35%
Actual Total,6327.77/60.50650219927329%,4048.82/39.49349780072672%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6544,8,3819,87
0.1,6524,28,3763,143
0.15,6490,62,3634,272
0.2,6446,106,3374,532
0.25,6362,190,3077,829
0.3,6226,326,2752,1154
0.35,6061,491,2430,1476
0.4,5835,717,2212,1694
0.45,5595,957,2007,1899

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,821.0,78.56,12.53,1.25,821.0,78.56,12.53,1.25
2,1046.0,849.0,81.17,12.96,1.3,1670.0,79.87,25.49,1.27
3,1046.0,828.0,79.16,12.64,1.26,2498.0,79.63,38.13,1.27
4,1046.0,828.0,79.16,12.64,1.26,3326.0,79.51,50.76,1.27
5,1046.0,753.0,71.99,11.49,1.15,4079.0,78.01,62.26,1.25
6,1045.0,708.0,67.75,10.81,1.08,4787.0,76.3,73.06,1.22
7,1046.0,655.0,62.62,10.0,1.0,5442.0,74.34,83.06,1.19
8,1046.0,559.0,53.44,8.53,0.85,6001.0,71.73,91.59,1.14
9,1046.0,353.0,33.75,5.39,0.54,6354.0,67.51,96.98,1.08
10,1046.0,198.0,18.93,3.02,0.3,6552.0,62.65,100.0,1.0

0,1
Database,demo_user
Tablename,VAL_ADS2
IndependentVariables,12
DependentVariable,cc_acct_ind
ResponseValue,1
Stepwise,none
MemorySize,0
Sample,false
Constant,IncludeConstant

Column Name,Value
state_code,AZ

Column Name,Mean,Standard Deviation
tot_age,49.73214285714285,14.892439518005144
tot_income,30712.77380952381,34430.47198848196
tot_cust_years,5.148809523809524,3.268373790626205
tot_children,1.488095238095238,1.2265135407036871
single_ind,0.2916666666666667,0.4552075684215484
married_ind,0.5,0.5007457125694801
separated_ind,0.125,0.3312121563851686
female_ind,0.6666666666666666,0.4721075853439589
ck_acct_ind,0.7916666666666666,0.4067221232882952
sv_acct_ind,0.7083333333333334,0.4552075684215484

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,3535.33/33.81%,3016.67/28.85%,6552.00/62.65%
Actual Non-Response,1332.35/12.74%,2573.65/24.61%,3906.00/37.35%
Actual Total,4867.68/46.54503729202525%,5147.3/53.45496270797475%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,4440,2112,1811,2095
0.1,4224,2328,1674,2232
0.15,4085,2467,1602,2304
0.2,3985,2567,1545,2361
0.25,3883,2669,1492,2414
0.3,3805,2747,1443,2463
0.35,3721,2831,1405,2501
0.4,3658,2894,1370,2536
0.45,3585,2967,1335,2571

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,814.0,77.89,12.42,1.24,814.0,77.89,12.42,1.24
2,1046.0,841.0,80.4,12.84,1.28,1655.0,79.15,25.26,1.26
3,1046.0,709.0,67.78,10.82,1.08,2364.0,75.36,36.08,1.2
4,1046.0,699.0,66.83,10.67,1.07,3063.0,73.22,46.75,1.17
5,1046.0,732.0,69.98,11.17,1.12,3795.0,72.58,57.92,1.16
6,1045.0,661.0,63.25,10.09,1.01,4456.0,71.02,68.01,1.13
7,1046.0,697.0,66.63,10.64,1.06,5153.0,70.4,78.65,1.12
8,1046.0,513.0,49.04,7.83,0.78,5666.0,67.73,86.48,1.08
9,1046.0,569.0,54.4,8.68,0.87,6235.0,66.25,95.16,1.06
10,1046.0,317.0,30.31,4.84,0.48,6552.0,62.65,100.0,1.0


---

## Logistic Regression Scoring

### Purpose

After building a predictive model using the Logistic Regression function, the model can be passed to a Logistic Regression Scoring function for creation of a score table containing predicted values of the dependent variable.  This is done by reading the outputdatabase and outputtablename created by the Logistic Regression function, referring to them here as modeldatabase and modeltablename respectively.

In addition to a score table, this function can optionally produce the following results.

- Success Table
- Multi-Threshold Success Table
- Lift Table

### Required Parameters

- **database**

    The database containing the table to analyze.

- **logisticscore**

    The logisticscore parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


- **modeldatabase**

    The database containing the table representing the logistic regression model input to the analysis.

- **modeltablename**

    The table containing the logistic regression model that is used to score the data.  It must reside in the database indicated by the modeldatabase parameter.

- **outputdatabase**

    The database containing the output table.

- **outputtablename**

    The output table containing the predicted values of the dependent variable.  It must reside in the database indicated by the outputdatabase parameter.

- **tablename**

    The table containing the columns to analyze, representing the dependent and independent variables in the analysis.  It must reside in the database indicated by the database parameter.

    
### Optional Parameters

- **estimate**

    The name of a column in the score output table containing the estimated value of the dependent variable (column).  Note that either estimate or probability must be requested.  Also note that if the estimate column is not unique in the score output table, ‘_tm_’ is automatically placed in front of the name.

- **gensqlonly**

    When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.

- **index**

    By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns should form a unique key for the score output table, or there will be more than one score for a given observation.

- **lifttable**

    A table of information, such as would be required to build a lift chart is available.  It splits up the computed probability values into deciles with the usual counts and percentages to demonstrate what happens when more and more rows of ordered probabilities are accumulated.  It is delivered in the function’s XML output string

- **overwrite**

    When overwrite is set to true or not set, the output table is dropped before creating a new one.

- **probability**

    The name of a column in the score output table containing the probability that the dependent value is equal to the resonse value.  Note that either estimate or probability must be requested.  Also note that if the probability column is not unique in the score output table, ‘_tm_’ is automatically placed in front of the name.

- **retain**

    One or more columns from the input table can optionally be specified here to be passed along to the score output table.

- **samplescoresize**

    When a scoring function produces a score table, the user has the option to view a sample of the rows using the "samplescoresize=n" parameter, where n is an integer number of rows to view in a result set.  Cases where a sample is not returned include when you are only generating SQL and when you are only evaluating (i.e. not scoring).  By default, a sample of output score rows is not returned.

- **scoringmethod**

    Three scoring methods are available as outlined below. By default, the model is scored but not evaluated, as requested in this manner:  scoringmethod=score.
    - score
    - evaluate
    - scoreandevaluate


- **successtable**

    A table delivered in the function’s XML output string, displaying counts of predicted versus actual values of the dependent variable of the model.  The default is to not produce a success table.  (This report is like the Decision Tree Confusion Matrix, but the Success Table only includes two values of the dependent variable, response versus non-response.)

- **thresholdbegin**

    The beginning threshold value utilized in the Multi-Threshold Success Table.

- **thresholdend**

    The ending threshold value utilized in the Multi-Threshold Success Table.

- **thresholdincrement**

    The difference in threshold values between adjacent rows in the Multi-Threshold Success Table.

- **thresholdtable**

    When this parameter is set to true, the Multi-Threshold Success Table is produced and included in the XML output string in the result table.  The default is thresholdtable=false.  This report can be thought of as a table where each row is a Prediction Success Table, and each row has a different threshold value as generated by the thresholdbegin, thresholdend and thresholdincrement parameters.  What is meant by a threshold here is the value above which the predicted probability indicates a response. 

---

1.  First lets score the single Logistic Regression model created above.  Lets also create Lift and Success Tables, populated within a table with _score_txt appended to the model table name.

In [61]:
call ${VALDB}.td_analyze('logisticscore',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          modeldatabase=${QLID};
                          modeltablename=LogisticOut1;
                          outputdatabase=${QLID};
                          outputtablename=LogisticScore1;
                          estimate=Estimate;
                          probability=Probability;
                          retain=cc_acct_ind;
                          samplescoresize=25;
                          lifttable=true;
                          successtable=true;  
                          scoringmethod=scoreandevaluate');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,cust_id,cc_acct_ind,Probability,Estimate
1,31349437,0,0.2676586511497557,0
2,20449680,1,0.3210610863825694,0
3,23168875,1,0.8390932129597739,1
4,31343664,0,0.436780737667565,0
5,17715204,1,0.7052088397906262,1
6,25895784,1,0.7142502996597978,1
7,29991786,1,0.273400574671594,0
8,29995834,1,0.8850289371223888,1
9,23163928,1,0.7954776606858144,1
10,31356429,1,0.820597876340539,1


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [62]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=LogisticScore1;
                           analysistype=logisticscore');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0
-,Estimate Response,Estimate Non-Response,Actual Total,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1,"Prediction Success Table-Estimate ResponseEstimate Non-ResponseActual TotalActual Response4593.80/43.93%1958.20/18.72%6552.00/62.65%Actual Non-Response1958.20/18.72%1947.80/18.62%3906.00/37.35%Actual Total6552/62.65060240963856%3895.6/37.34939759036144%10458.00/100.0%Multi-Threshold Success TableThreshold ProbabilityActual Response, Estimate ResponseActual Response, Estimate Non-ResponseActual Non-Response, Estimate ResponseActual Non-Response, Estimate Non-Response0.0065520390600.05655203887190.10655023863430.15654843819870.2065262636502560.2564589433635430.30635519730268800.356194358268012260.406018534229016160.455857695204318630.505630922188520210.5553311221170721990.6049591593142524810.6544672085118027260.703929262398129250.753196335670232040.802279427342434820.851250530220936970.9042961237838280.9571648173899Cumulative Lift TableDecileCountResponseResponse (%)Captured Response (%)LiftCumulative ResponseCumulative Response (%)Cumulative Captured Response (%)Cumulative Lift11045.00903.0086.4113.781.38903.0086.4113.781.3821046.00876.0083.7513.371.341779.0085.0827.151.3631046.00858.0082.0313.101.312637.0084.0640.251.3441046.00774.0074.0011.811.183411.0081.5452.061.3051046.00755.0072.1811.521.154166.0079.6763.581.2761045.00727.0069.5711.101.114893.0077.9974.681.2471046.00620.0059.279.460.955513.0075.3184.141.2081046.00521.0049.817.950.806034.0072.1392.091.1591046.00327.0031.264.990.506361.0067.5897.081.08101046.00191.0018.262.920.296552.0062.65100.001.00",,,,,,,
-,Estimate Response,Estimate Non-Response,Actual Total,,,,,,
Actual Response,4593.80/43.93%,1958.20/18.72%,6552.00/62.65%,,,,,,
Actual Non-Response,1958.20/18.72%,1947.80/18.62%,3906.00/37.35%,,,,,,
Actual Total,6552/62.65060240963856%,3895.6/37.34939759036144%,10458.00/100.0%,,,,,,
Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response",,,,,
0.00,6552,0,3906,0,,,,,
0.05,6552,0,3887,19,,,,,
0.10,6550,2,3863,43,,,,,
0.15,6548,4,3819,87,,,,,

-,Estimate Response,Estimate Non-Response,Actual Total
Actual Response,4593.80/43.93%,1958.20/18.72%,6552.00/62.65%
Actual Non-Response,1958.20/18.72%,1947.80/18.62%,3906.00/37.35%
Actual Total,6552/62.65060240963856%,3895.6/37.34939759036144%,10458.00/100.0%

Threshold Probability,"Actual Response, Estimate Response","Actual Response, Estimate Non-Response","Actual Non-Response, Estimate Response","Actual Non-Response, Estimate Non-Response"
0.0,6552,0,3906,0
0.05,6552,0,3887,19
0.1,6550,2,3863,43
0.15,6548,4,3819,87
0.2,6526,26,3650,256
0.25,6458,94,3363,543
0.3,6355,197,3026,880
0.35,6194,358,2680,1226
0.4,6018,534,2290,1616
0.45,5857,695,2043,1863

Decile,Count,Response,Response (%),Captured Response (%),Lift,Cumulative Response,Cumulative Response (%),Cumulative Captured Response (%),Cumulative Lift
1,1045.0,903.0,86.41,13.78,1.38,903.0,86.41,13.78,1.38
2,1046.0,876.0,83.75,13.37,1.34,1779.0,85.08,27.15,1.36
3,1046.0,858.0,82.03,13.1,1.31,2637.0,84.06,40.25,1.34
4,1046.0,774.0,74.0,11.81,1.18,3411.0,81.54,52.06,1.3
5,1046.0,755.0,72.18,11.52,1.15,4166.0,79.67,63.58,1.27
6,1045.0,727.0,69.57,11.1,1.11,4893.0,77.99,74.68,1.24
7,1046.0,620.0,59.27,9.46,0.95,5513.0,75.31,84.14,1.2
8,1046.0,521.0,49.81,7.95,0.8,6034.0,72.13,92.09,1.15
9,1046.0,327.0,31.26,4.99,0.5,6361.0,67.58,97.08,1.08
10,1046.0,191.0,18.26,2.92,0.29,6552.0,62.65,100.0,1.0


2.  Next lets score the multiple Logistic Regression model created above for each state.  In this case, we can not evaluate the model by creating Lift and Success Tables.

---

In [63]:
call ${VALDB}.td_analyze('logisticscore',
                         'database=${QLID};
                          tablename=VAL_ADS2;
                          modeldatabase=${QLID};
                          modeltablename=LogisticOut2;
                          outputdatabase=${QLID};
                          outputtablename=LogisticScore2;
                          estimate=Estimate;
                          probability=Probability;
                          retain=cc_acct_ind;
                          samplescoresize=25;
                          scoringmethod=score');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,cust_id,state_code,cc_acct_ind,Probability,Estimate
1,25895043,NY,0,0.5725676399304991,1
2,13630660,CA,0,0.8646288877341187,1
3,25901731,CA,1,0.6358147302590171,1
4,24537042,IL,1,0.901044481405204,1
5,29988090,NY,0,0.7553782487129894,1
6,24532830,OTHER,0,0.0979158794013908,0
7,19081006,TX,1,0.9411363559631272,1
8,24526674,NY,1,0.5764282388186137,1
9,17722523,TX,1,0.7165840983854547,1
10,20440515,CA,1,0.799806427380621,1


Note - No reports for Logistic Scoring with GROUPBY option...

---

## Gain Ratio Decision Trees

### Purpose

Currently, the Teradata Warehouse Miner External Stored Procedure provides decision trees for classification models. They are built largely on the techniques described in [Quinlan] and as such, splits using information gain ratio are provided. Pruning is also provided, also using the gain ratio technique. The concept of Information gain ratio is simple - the more you know about a topic, the less new information you are apt to get about it. To be more concise: If you know an event is very probable, it is no surprise when it happens - that is, it gives you little information that it actually happened.  Taking this a bit further, we can formulate that the amount of information gained is inversely proportional to the probability of an event happening. Given that entropy refers to the probability of an event occuring, we can also say that as the entropy increases, the information gain decreases. A decision tree scoring function is provided to score and/or evaluate a decision tree model. 

### Required Parameters

- **columns**

    The independent input columns used in decision tree building. These columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter.  For example: columns=column1,column2,column3.  When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.

- **database**

    The database containing the input table.

- **decisiontree**

    Identifies the type of function being performed. The decisiontreescore parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


- **dependent**

    The dependent parameter is the name of a column whose values are being predicted. The dependent column is selected from the available columns that reside in the table specified by the database and tablename parameters.

- **tablename**

    The input table to build a predictive model from.

### Optional Parameters

- **algorithm**

    The algorithm the decision tree uses during building. Currently this option only allows gainratio.

- **binning**

    Option to automatically Bincode the continuous independent variables. Continuous data is separated into one hundred bins when this option is selected. If the variable has fewer than one hundred distinct values, this option is ignored. Default is false.

- **max_depth**

    Specifies the maximum number of levels the tree can grow. The default is 100.

- **min_records**

    Specifies how far the decision tree can split. Unless a node is pure (meaning it has only observations with the same dependent value) it splits if each branch that can come off this node contains at least this many observations. The default is a minimum of two cases for each branch.

- **operatordatabase**

    The database where the table operators called by td_analyze reside. If not specified, the database software searches the standard search path for table operators, including the current user database.  For example: operatordatabase=val

- **outputdatabase**

    The database containing the resulting output table when outputstyle=table or view.

- **outputtablename**

    The name of the output table representing the decision tree model.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **pruning**

    Determines the style of pruning to use after the tree is fully built. The default option is gainratio. The only other option at this time is none which does no pruning of the tree.  
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1) Using the VAL_ADS analytic dataset, build a decision tree model to predict propensity of a customer to open a credit card based upon non-credit card related variables.

In [64]:
call ${VALDB}.td_analyze('decisiontree',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          columns=tot_age,tot_income,tot_cust_years,tot_children,single_ind,married_ind,separated_ind,female_ind,ck_acct_ind,sv_acct_ind,sv_avg_bal,ck_avg_bal,ca_resident_ind,ny_resident_ind,tx_resident_ind,il_resident_ind,az_resident_ind,oh_resident_ind;
                          dependent=cc_acct_ind;
                          min_records=2;
                          max_depth=5;
                          binning=false;
                          algorithm=gainratio;
                          pruning=gainratio;
                          outputdatabase=${QLID};
                          outputtablename=DecisionTree1;
                          operatordatabase=${VALDB};');

Success: 0 rows affected

Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [65]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=DecisionTree1;
                           analysistype=decisiontree');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html
Dependent Variable,Unnamed: 1_level_1,Unnamed: 2_level_1
Independent Variables,Unnamed: 1_level_2,Unnamed: 2_level_2
1,1,Decision Tree SummaryTotal observations10458Nodes before pruning25Nodes after pruning9Model accuracy67.45075540256263%VariablesDependent Variablecc_acct_indIndependent Variablestot_agetot_incometot_cust_yearstot_childrensingle_indmarried_indseparated_indfemale_indck_acct_indsv_acct_indsv_avg_balck_avg_balca_resident_indny_resident_indtx_resident_indil_resident_indaz_resident_indoh_resident_indText Treetot_age <= 16.0ck_acct_ind <= 0.0 ---> 0ck_acct_ind > 0.0ck_avg_bal <= 17747.767398648648 ---> 1ck_avg_bal > 17747.767398648648sv_avg_bal <= 365.06358 ---> 0sv_avg_bal > 365.06358 ---> 1tot_age > 16.0 ---> 1
Total observations,10458,
Nodes before pruning,25,
Nodes after pruning,9,
Model accuracy,67.45075540256263%,
Dependent Variable,,
cc_acct_ind,,
Independent Variables,,
tot_age,,
tot_income,,

0,1
Total observations,10458
Nodes before pruning,25
Nodes after pruning,9
Model accuracy,67.45075540256263%

Dependent Variable
cc_acct_ind

Independent Variables
tot_age
tot_income
tot_cust_years
tot_children
single_ind
married_ind
separated_ind
female_ind
ck_acct_ind
sv_acct_ind


---

## Decision Tree Scoring

### Purpose

In order to deploy the gain ratio decision tree model created above, a companion decision tree scoring function is provided to score and/or evaluate a decision tree model.  In addition to a score table, this function can optionally produce the following results.

- Confidence Factors
- Targeted Binary Confidence
- Confusion Matrix
- Profile Tables


### Required Parameters

- **database**

    The database containing the table to analyze.

- **decisiontreescore**

    The decisiontreescore parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


- **modeldatabase**

    The database containing the table representing the decision tree model input to the analysis.

- **modeltablename**

    The table containing the decision tree model in PMML format that is used to score the data.  It must reside in the database indicated by the modeldatabase parameter.

- **outputdatabase**

    The database containing the output table.

- **outputtablename**

    The output table containing the predicted values of the dependent variable.  It must reside in the database indicated by the outputdatabase parameter.

- **tablename**

    The table containing the columns to analyze, representing the dependent and independent variables in the analysis.  It must reside in the database indicated by the database parameter.

### Optional Parameters

- **confusionmatrix**

    A table delivered in the function’s XML output string, displaying counts of predicted versus actual values of the dependent variable of the decision tree model.   It also contains counts of correct and incorrect predictions.  The default is to not produce a confusion matrix.  (This report is similar to the Logistic Regression Success Table, but that table only includes two values of the dependent variable, or response versus non-response.)

- **gensqlonly**

    When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.

- **includeconfidence**

    If selected, the output table will contain a column indicating how likely it is, for a particular leaf node on the tree, that the prediction is correct.  (This option may not be combined with the targetedvalue parameter.)

- **index**

    By default, the primary index columns of the score output table are the primary index columns of the input table. This parameter allows the user to specify one or more columns for the primary index of the score output table. Regardless of whether the user uses the default setting or specifies different columns, the index columns are included both in the Primary Index clause and the select list. In addition, the index columns should form a unique key for the score output table, or there will be more than one score for a given observation.

- **overwrite**

    When overwrite is set to true or not set, the output table is dropped before creating a new one.

- **predicted**

    If the ‘scoringmethod’ parameter is set to ‘score’ or ‘scoreandevaluate’, the name of the predicted value column is entered here. If not entered here, the name of the dependent column in the input table is used.

- **profiletables**

    When selected, a set of two tables are created that can capture decision rules for a given customer or prediction.

- **retain**

    One or more columns from the input table can optionally be specified here to be passed along to the score output table.

- **samplescoresize**

    When a scoring function produces a score table, the user has the option to view a sample of the rows using the "samplescoresize=n" parameter, where n is an integer number of rows to view in a result set.  Cases where a sample is not returned include when you only generating SQL and when you are only evaluating (i.e. not scoring).  By default, a sample of output score rows is not returned.

- **scoringmethod**

    Three scoring methods are available as outlined below. By default, the model is scored but not evaluated, as requested in this manner:  scoringmethod=score.
    - score
    - evaluate
    - scoreandevaluate


- **targetedvalue**

    If selected, the output table will contain a column indicating how likely it is, for a particular leaf node and targeted value of a predicted result with only two values, that the prediction is correct. (This option may not be combined with the targetedvalue parameter.)

---

1) Using the same VAL_ADS analytic dataset for demonstration purposes, score the decision tree model built above that predicts the propensity of a customer to open a credit card based upon non-credit card related variables.

In [66]:
call ${VALDB}.td_analyze('decisiontreescore',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          modeldatabase=${QLID};
                          modeltablename=DecisionTree1;
                          outputdatabase=${QLID};
                          predicted=Predicted;
                          retain=cc_acct_ind;
                          outputtablename=DecisionTreeScore1;
                          scoringmethod=scoreandevaluate;
                          includeconfidence=true;');                          

Success: 0 rows affected

In [67]:
SELECT * FROM ${QLID}.DecisionTreeScore1 SAMPLE 25;

Unnamed: 0,cust_id,cc_acct_ind,Predicted,_tm_confidence
1,20440380,0,1,0.6573262685958834
2,28633332,1,1,0.6573262685958834
3,23176236,1,1,0.6573262685958834
4,16353756,0,1,0.6573262685958834
5,20446170,1,1,0.6573262685958834
6,16361832,1,1,0.6573262685958834
7,24538086,1,1,0.6573262685958834
8,29976914,1,1,0.6573262685958834
9,27265640,1,1,0.6573262685958834
10,19074734,0,1,0.6573262685958834


Note - To view HTML report, double click on the contents of the 'html' colum returned by calling 'report';  alternately,  right click on the contents of the 'html' column and select 'Show Cell as Text...', or copy the contents of the cell and create a HTML report.

In [68]:
call ${VALDB}.td_analyze ('report',
                          'database=${QLID};
                           tablename=DecisionTreeScore1;
                           analysistype=decisiontreescore');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0_level_0,id,html,Unnamed: 3_level_0,Unnamed: 4_level_0
-,Actual 0,Actual 1,Correct,Incorrect
1,1,Decision Tree Scoring SummaryConfusion Matrix-Actual 0 Actual 1 Correct Incorrect Predicted 0515.00/4.92%13.00/0.12%515.00/4.92%13.00/0.12% Predicted 13391.00/32.42%6539.00/62.53%6539.00/62.53%3391.00/32.42%,,
-,Actual 0,Actual 1,Correct,Incorrect
Predicted 0,515.00/4.92%,13.00/0.12%,515.00/4.92%,13.00/0.12%
Predicted 1,3391.00/32.42%,6539.00/62.53%,6539.00/62.53%,3391.00/32.42%
,,,,
,,,,
,,,,

-,Actual 0,Actual 1,Correct,Incorrect
Predicted 0,515.00/4.92%,13.00/0.12%,515.00/4.92%,13.00/0.12%
Predicted 1,3391.00/32.42%,6539.00/62.53%,6539.00/62.53%,3391.00/32.42%
,,,,
,,,,
,,,,


---

## K-Means Clustering

### Purpose

The task of modeling multidimensional data sets encompasses a variety of statistical techniques, including that of ‘cluster analysis’. Cluster analysis is a statistical process for identifying homogeneous groups of data objects.  K-Means clustering is one of the simplest and popular unsupervised machine learning algorithms.  Unsupervised algorithms make inferences from datasets using only input without known, or labelled, outcomes.  The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.  A cluster refers to a collection of data points aggregated together because of certain similarities.  The algorithm requires as input a target number k, which refers to the number of centroids to identify in the dataset, where a centroid is the location representing the center of the cluster.  Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares.  In other words, the K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.

The first parameter for clustering is the Kmeans function name, followed by clustering parameters.  Fast K-Means Clustering returns two data sets that are viewed as result sets.  The first result set is a progress report with two columns, a timestamp and a progress message. Use the result to see how the algorithm converged and what made it stop processing.  The second result set contains cluster means and variances. Specifically, the rows associated with positive cluster IDs contain the average values of each of the clustered columns along with the count for each cluster ID. The rows associated with negative cluster IDs contain the variance of each of the clustered columns for each cluster ID.

### Required Parameters

- **columns**

    The input columns used in clustering. The columns must reside in the table named with the tablename parameter, residing in the database named with the database parameter. For example, columns=c1,c2,c3. When columns=all is entered, all columns in the input table are analyzed.  Other options include allnumeric.

- **database**

    The database containing the input table.

- **Kmeans**

    Identifies the type of function being performed.

- **kvalue**

    The number of clusters to be contained in the cluster model.

- **outputdatabase**

    The database to contain the resulting output table that represents a cluster model.

- **outputtablename**

    The name of the output table representing the cluster model.

- **tablename**

    The name of the table containing the data to cluster.
    
### Optional Parameters

- **continuation**

    If true, clustering begins with values determined by pre-existing result tables rather than random values. The default is false.

- **iterations**

    The maximum number of iterations to perform during modeling. The default is 50.

- **operatordatabase**

    The database where the table operators called by td_analyze reside. If not specified, the database software searches the standard search path for table operators, including the current user database.  For example: operatordatabase=val

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **threshold**

    A decimal value used to determine if the algorithm has converged based on how much the cluster centroids change from one iteration to the next. The default is .001.
    
- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1.  For this example, we use the same ADS, but will perform a Z-Score transformation on the columns we are going to cluster representing the number of standard deviations away from the mean each data point is.  This transformed data set will be used below to perform K-Means clustering.

In [69]:
CALL ${VALDB}.td_analyze('vartran',
                         'database=${QLID};
                          tablename=VAL_ADS;
                          outputstyle=view;
                          outputdatabase=${QLID};
                          outputtablename=ClusterInput;
                          keycolumns=cust_id;
                          zscore=columns(tot_income,tot_age,tot_cust_years);
                          retain=columns(cust_id);');

Success: 0 rows affected

In [70]:
SELECT * FROM ${QLID}.ClusterInput SAMPLE 25;

Unnamed: 0,cust_id,tot_income,tot_age,tot_cust_years
1,20444790,0.8467067708673567,-0.4846203098633057,-1.2659932756358734
2,23172292,0.0878517117446942,0.190165327243985,-0.5934464059518358
3,28620753,-0.7855078747714153,-1.4708454717893462,0.7516473334162392
4,29996450,0.6407825477873279,-0.3808071349237225,0.4153738985742204
5,29994536,3.6043513762197823,0.8649509643512758,1.7604676379422957
6,19086424,-0.1218686023799334,-0.7960598346820553,0.0791004637322017
7,17718948,0.007577318464347,0.1382587397741934,-0.5934464059518358
8,29979884,1.7547432441246642,-0.4846203098633057,-0.2571729711098171
9,31358407,-0.2219281281315449,1.2802036641096086,1.087920768258258
10,28618485,-0.5463985529656503,-0.0693676101049729,-0.9297198407938546


2) Now perform K-Means Clustering on the transformed dataset (VIEW).  The result set returned are status messages per iteration, along with the cluster centroids and variances.

In [71]:
call ${VALDB}.td_analyze('Kmeans',
                         'database=${QLID};
                          tablename=ClusterInput;
                          columns=tot_income,tot_age,tot_cust_years;
                          outputdatabase=${QLID};
                          outputtablename=ClusterModel;
                          operatordatabase=${VALDB};
                          kvalue=3;
                          iterations=5;
                          threshold=0.1;');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,ctime,v
1,2022-09-16 16:36:47.790000,Starting Procedure
2,2022-09-16 16:36:48.270000,Data randomly assigned to clusters initially
3,2022-09-16 16:36:48.420000,Starting K-means loop 1
4,2022-09-16 16:36:49.020000,Completed K-means loop 1
5,2022-09-16 16:36:49.490000,"Comparison complete, cluster aggregate difference is 7.80243333417115E-001"
6,2022-09-16 16:36:50.490000,Starting K-means loop 2
7,2022-09-16 16:36:51.120000,Completed K-means loop 2
8,2022-09-16 16:36:51.490000,"Comparison complete, cluster aggregate difference is 1.24589485192181E-001"
9,2022-09-16 16:36:52.780000,Starting K-means loop 3
10,2022-09-16 16:36:53.470000,Completed K-means loop 3


Unnamed: 0,clusterid,cnt,tot_income,tot_age,tot_cust_years
1,-3,1300,5.83574658111737,0.2961678128099027,1.0033755405838327
2,-2,5172,0.3591560874488461,0.9649001177332373,0.8718166196622293
3,-1,3986,0.2837289678222069,1.269581869164626,1.164009523877999
4,1,3986,-0.1939085628754265,0.7873239641472004,0.6441102404694997
5,2,5172,-0.3711200760489611,-0.6679860813140749,-0.5174968997663075
6,3,1300,2.1075360337773144,0.2288822032769648,0.0701612168325702


---

## K-Means Cluster Scoring

### Purpose

After building a model using the Fast K-Means Clustering algorithm, new data is scored using Fast K-Means Cluster Scoring. The first parameter for Fast K-Means Cluster Scoring is the KmeansScore function name,
followed by cluster scoring parameters. Fast K-Means Cluster Scoring returns one or two data sets that are viewed as result sets. One result set is a progress report with two columns, a timestamp, and a progress message. The other result set is only returned if the samplescoresize parameter is set. It contains a sampling of the rows in the output score table, the actual number of rows determined by the value of the samplescoresize parameter.

### Required Parameters


- **database**

    The database containing the input table to be scored.

- **modeldatabase**

    The database containing the table that represents the cluster model to score.

- **modeltablename**

    The name of the input table containing the cluster model to score.

- **index**

    The names of one or more columns in the input table to use as the primary index of the scored output table.

- **KmeansScore**

    Identifies the type of function being performed.

- **outputdatabase**

    The database containing the resulting scored output table.

- **outputtablename**

    The name of the scored output table to build.

- **tablename**

    The name of the table containing the data to cluster.
    
### Optional Parameters

- **clustername**

    The name of the column representing the cluster identifier. The default is clusterid.

- **fallback**

    An optional flag to indicate (true), that the scored output table has the fallback attribute (that is, have a mirrored copy).

- **operatordatabase**

    The database where the table operators called by td_analyze reside. If not specified, the database software searches the standard search path for table operators, including the current user database. For example: operatordatabase=val

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **retain**

    A comma-separated list naming columns to include in the scored output table unchanged from their names and values in the input table to be scored.

- **samplescoresize**

    When a scoring function produces a score table, the user has the option to view a sample of the rows using the "samplescoresize=n" parameter, where n is an integer number of rows to view in a result set.  Cases where a sample is not returned include when you are only generating SQL and when you are only evaluating (i.e. not scoring).  By default, a sample of output score rows is not returned.

- **columnstoexclude**

    If a column specifier such as all is used in the columns parameter, the columnstoexclude parameter may be used to exclude specific columns from the analysis.  For convenience, when the columnstoexclude parameter is used, dependent variable and group by columns, if any, are automatically excluded as input columns and do not need to be included as columnstoexclude.

---

1.  Cluster Scoring can now be performed on the cluster model just built.  A sample of 10 scored rows is displayed from the result set with includes the cluster assignment and the customer identifier.  Status messages are also returned in a result set.

In [72]:
call ${VALDB}.td_analyze('KmeansScore',
                         'database=${QLID};
                          tablename=ClusterInput;
                          outputdatabase=${QLID};
                          outputtablename=ClusterScore;
                          index=cust_id;
                          modeldatabase=${QLID};
                          modeltablename=ClusterModel;
                          samplescoresize=10;
                          operatordatabase=${VALDB};');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,clusterid,cust_id
1,1,27254180
2,2,16360056
3,1,20441745
4,2,31338627
5,2,31342353
6,2,24533460
7,3,19086354
8,3,21805424
9,1,23172649
10,3,17713020


Unnamed: 0,ctime,v
1,2022-09-16 16:37:29.790000,Starting Procedure
2,2022-09-16 16:37:29.830000,"About to assign clusters in ""demo_user"".""ClusterModel"" to input data."
3,2022-09-16 16:37:30.530000,"Assignment complete creating Score Table ""demo_user"".""ClusterScore""."
4,2022-09-16 16:37:30.570000,Procedure Complete


---

## Association Rules & Sequence Analysis

### Purpose

Association Rules provide various measures concerning items residing in groups.  The measures, support, confidence, lift and Z Score, help to determine the likelihood that one or more items exist in a group, given that another one or more items exist in the same group.  The classic example of this type of study is market basket analysis, in which the groups are shopping carts and the items are the products purchased in the shopping carts.  An association rule might indicate the likelihood that a given shopping cart contains oranges, given that it also contains apples.

Association rules consist of a left part and a right part.  The left part consists of one or more items that are given to reside in a group, and the right part is the consequence that one or more items also reside in the given group.  The measures are defined as follows:

- Support
    Percentage of groups containing the items on the left (left side support), on the right (right side support) or on both sides of a rule (rule support).

- Confidence
    Percentage of groups containing the left side items that also contain the right side items.

- Lift
    A measure of how much the probability is raised that the right side items occur in a group given that the left side items occur in the group.

- Z Score
    A statistical measure of how much the expected and actual values of the number of groups containing all the items in the rule varies.  (Zero means expected and actual are the same.)

A sequence analysis may be optionally requested, wherein there is a sequence of items defined by a sequencecolumn parameter, ordering the items on each side of each rule, with left side items preceding the right side items.  An option is provided called relaxedordering that may be set to true so that items on the left side and the right side may be in any order provided that all left side items precede all right side items.

Processing is performed in generated SQL based on user input parameters.  Each SQL statement is generated and then executed as processing proceeds.  An output table is created for each requested rule combination (1-to-1, 2-to-1, etc.).  As the last processing step, data is selected from these output tables and returned as a result set for up to three output tables.  

An option, gensqlonly, is available to bypass SQL execution and simply return the generated SQL.  When computing Association Rules for very large input tables, it may be worthwhile to use this parameter to examine the SQL for performance characteristics.

### Required Parameters

- **Association**

    The Association parameter:
    - Is required
    - Must be the first parameter
    - Is always enclosed in single quotes


- **database**

    The database containing the table to analyze.

- **groupcolumn**

    The column representing groups in the association rules.

- **itemcolumn**

    The column representing items in the association rules.

- **outputdatabase**

    Specifies the name of the database to contain the analysis results table(s).

- **tablename**

    The table containing the columns to analyze.
    
### Optional Parameters

- **combinations**

    Using the combinations parameter, you may designate how many items are on the left and right side of requested association rules.  You may also request more than one combination.  For example, combinations=11,21 produces an analysis of 1-to-1 and 2-to-1 rules.  If this parameter is not specified, a 1-to-1 analysis is performed by default.

- **dropsupporttables**

    This parameter requests that all intermediate support tables be dropped at the end of processing.  By default, if this parameter is not specified, or if it is set to true, support tables are dropped.  To retain support tables, set dropsupporttables=false.

- **gensqlonly**

    When true, the SQL for the requested function is returned as a result set but not run. When not specified or set to false, the SQL is run but not returned.

- **groupcount**

    The count of the number of groups in the input data is by default calculated by the generated SQL.  Optionally, you may set the group count to a fixed value using this parameter.  This is useful when you are processing a reduced input set saved in a previous run, so that calculations can be based on the number of groups in the original input set and not the reduced set.

- **minimumconfidence**

    The minimum value that the confidence measure of an association rule must have before it is included in a result table.

- **minimumlift**

    The minimum value that the lift measure of an association rule must have before it is included in a result table.

- **minimumsupport**

    The minimum value that the support measure of an association rule must have before it is included in a result table.  When this parameter is utilized, the size of the input data is reduced, potentially impacting the use of the reducedinputtable parameter.  Use of this parameter also causes listwise deletion to be performed, skipping any input rows that have a null group, item or sequence column value.

- **minimumzscore**

    The minimum value that the Z Score measure of an association rule must have before it is included in a result table.

- **orderingprobability**

    When sequence analysis is being performed, the user may either allow the algorithm to determine ordering probabilities (the default) or set the “probability of correct ordering” to a non-zero value between 0 and 1. (Setting it to 1 effectively ignores this principle in lift and Z Score calculations.)  For more details, please refer to Association Rules in Chapter 1 in Teradata Warehouse Miner User Guide: Volume 3, Analytic Functions (B035-2302).

- **outputtablename**

    Specifies the name(s) of the table(s) to store the analysis results. If there is more than one combination requested, a list of names may be supplied here to be matched one-to-one with the combinations.  For example, if combinations=11,21, outputtablename=out11,out21 might be used.  If this parameter is not specified, a default name based on the requested or defaulted combination is used.  For example, if the default combination of 1-to-1 is in effect, the outputtablename is _TWM_1_TO_1_AFFINITY.  Note that the leading “_TWM” will be equal to the resulttableprefix if it is specified.

- **overwrite**

    When overwrite is set to true (default), the output tables are dropped before creating new ones.

- **processtype**

    This parameter may be set to all (default), support or recalculate.

    - all  (default)

        All processing is performed, from building support tables to calculating final affinities.
    
    - support

        The single item support table is built and then processing is halted.  This allows you to view the support table and decide what the minimum support value should be, thus reducing the amount of processing performed.  The single item support table is named _TWM_1_ITEM_SUPPORT and is created in outputdatabase, while the parameter outputtablename is ignored.  (If resulttableprefix is specified, it replaces _TWM in the support table name.)

    - recalculate
    
        The final affinity tables are calculated based on support tables already present.  This requires that the dropsupporttables parameter was set to false in a previous run so that the support tables are available for recalculating the final affinities.

- **relaxedordering**

    This option may be used in conjunction with sequence analysis, that is when a sequence column is specified.  Relaxed ordering occurs when the items on the left side of an association rule may occur in any order (via the sequence column), and the same is the case with the right side items, provided that all left side items precede all right side items.

- **resulttableprefix**

    The resulttableprefix is used in naming the intermediate support tables created during processing, as well as in creating default output table names.  The default prefix is "_TWM".  If the outputtablename parameter is not specified, this prefix is used to create a name that indicates the combination being calculated, so that, for example,  _TWM_1_TO_1_AFFINITY is the default outputtablename for a 1-to-1 combination.

- **sequencecolumn**

    The column providing sequencing of input items if sequence analysis is desired.  This might typically be a column of type date or timestamp.  By default, sequence analysis is not performed.

- **where**

    Specifies the SQL WHERE clause generated within the Association Rules SQL to filter rows selected for analysis.  For example: where=cust_id > 0.  Note that single quotes within the parameter value must be doubled, such as in where=channel <> '' ''.  (Ordinarily, the expression would be where channel <> ' '.  Instead, the expression ends with quote-quote-blank-quote-quote).
___

#### Note on the following "description table options"

A description table may be joined with results tables for purposes of making the results easier to understand.  The item descriptions are however only included in the returned result sets and are not included in the output table(s).  If any of these parameters is specified, all of them must be specified.

- **descriptiondatabase**

    The database containing the description table.

- **descriptiontable**

    The description table to join with the output data.

- **descriptionidentifier**

    The column in the description table that is joined with the result table.

- **descriptioncolumn**

    The column in the description table that contains descriptive item names.
___

#### Note on the following "hierarchy table options"

A hierarchy table may be joined with the input table in order to reduce the amount of input data and compute association rules at a different hierarchical level. Use of this option has an impact on the data saved in the Reduced Input Table when that parameter (reducedinputtable) is requested. If any of these parameters is specified, all of them must be specified.  When this option is utilized, listwise deletion is automatically performed, ignoring rows that contain a null group, item or sequence column value.  

- **hierarchydatabase**

    The database containing the hierarchy table.

- **hierarchytable**

    The hierarchy table to join with the input data.

- **hierarchyitemcolumn**

    The lowest level item column in the hierarchy table to be matched with the item column in the input data.

- **hierarchycolumn**

    The higher level item in the hierarchy table.
___

#### Note on the following left-side-lookup table options

A left-side lookup table may be specified to reduce the rules reported to only those with left-side items that appear in the lookup table. If any of these parameters is specified, all of them must be specified.    

- **leftlookupdatabase**

    The database containing the left-side lookup table.

- **leftlookuptable**

    The left-side lookup table.

- **leftlookupcolumn**

    The column to match with left-side items in rules.
___

#### Note on the following "reduced input options"

If input to the analysis is reduced by using the minimum support option, a hierarchy table or a WHERE clause, the resulting reduced input table may be saved for further analysis.  Note that this option is not affected by the use of a left or right-side lookup parameter or a minimum confidence, lift or Z Score parameter.  Note also that if further analysis is performed on this table, it may be approriate to use the groupcount parameter.  If any of the following two parameters is specified, all of them must be specified.

- **reducedinputdatabase**

    The database containing the reduced input table.

- **reducedinputtable**

    The table in which to store reduced input rows.
___

#### Note on the following "right-side-lookup table options"

A right-side lookup table may be specified to reduce the rules reported to those with right-side items that appear in the lookup table. If any of these parameters is specified, all of them must be specified.

- **rightlookupdatabase**

    The database containing the right-side lookup table.

- **rightlookuptable**

    The right-side lookup table.

- **rightlookupcolumn**

    The column to match with right-side items in rules.

---

1.  The first example uses a minimal number of parameters and produces an output table with default name:  _TWM_1_TO_1_AFFINITY.  By default, this table is also selected from as a result set.

In [73]:
call ${VALDB}.td_analyze('Association',
                         'database=${XSPDB};
                          tablename=twm_credit_tran;
                          groupcolumn=cust_id;
                          itemcolumn=channel;
                          outputdatabase=${QLID};');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,ITEM1OF2,ITEM2OF2,LSUPPORT,RSUPPORT,SUPPORT,CONFIDENCE,LIFT,ZSCORE
1,,A,0.0722100656455142,0.8577680525164114,0.0503282275711159,0.6969696969696969,0.812538651824366,-1.0297652021021615
2,,B,0.0722100656455142,0.4967177242888402,0.0481400437636761,0.6666666666666666,1.342143906020558,1.4107574852051723
3,,C,0.0722100656455142,0.6717724288840262,0.0415754923413566,0.5757575757575757,0.8570723521863586,-0.6898932574083162
4,,E,0.0722100656455142,0.9168490153172868,0.0634573304157549,0.8787878787878787,0.9584870181528892,-0.236300264659495
5,,H,0.0722100656455142,0.0087527352297593,0.0021881838074398,0.0303030303030303,3.462121212121212,1.3236573743085145
6,,K,0.0722100656455142,0.3566739606126914,0.0262582056892779,0.3636363636363636,1.0195203569436695,0.0678494660731004
7,,M,0.0722100656455142,0.9693654266958424,0.0722100656455142,1.0,1.0316027088036115,0.1853459443538201
8,,V,0.0722100656455142,0.3632385120350109,0.037199124726477,0.5151515151515151,1.4182183278568818,1.467329548262911
9,A,,0.8577680525164114,0.0722100656455142,0.0503282275711159,0.0586734693877551,0.812538651824366,-1.0297652021021615
10,A,B,0.8577680525164114,0.4967177242888402,0.437636761487965,0.5102040816326531,1.027150948485121,0.5000954151966431


2.  The second example requests a 1-to-1 and a 2-to-1 analysis, while also requesting 0.1 minimum support. A WHERE clause eliminating rows with blank channel column is also specified. (Note that the blank channel value requires double single quotes, that is quote-quote-blank-quote-quote.)

In [74]:
call ${VALDB}.td_analyze('Association',
                         'database=${XSPDB};
                          tablename=twm_credit_tran;
                          groupcolumn=cust_id;
                          itemcolumn=channel;
                          minimumsupport=0.1;
                          where=channel <> '' '';
                          combinations=11,21;
                          outputdatabase=${QLID};
                          outputtablename=val_example2_11,val_example2_21;');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,ITEM1OF2,ITEM2OF2,LSUPPORT,RSUPPORT,SUPPORT,CONFIDENCE,LIFT,ZSCORE
1,A,B,0.8577680525164114,0.4967177242888402,0.437636761487965,0.5102040816326531,1.027150948485121,0.5000954151966431
2,A,C,0.8577680525164114,0.6717724288840262,0.5973741794310722,0.6964285714285714,1.0367031177291763,0.914933174293392
3,A,E,0.8577680525164114,0.9168490153172868,0.8074398249452954,0.9413265306122448,1.0266974331498708,1.0952300590135025
4,A,K,0.8577680525164114,0.3566739606126914,0.3063457330415755,0.3571428571428571,1.0013146362839611,0.0186589390615769
5,A,M,0.8577680525164114,0.9693654266958424,0.838074398249453,0.9770408163265306,1.007917952734141,0.3759999237228786
6,A,V,0.8577680525164114,0.3632385120350109,0.3019693654266958,0.3520408163265306,0.9691726088025572,-0.4433505406642729
7,B,A,0.4967177242888402,0.8577680525164114,0.437636761487965,0.8810572687224669,1.027150948485121,0.5000954151966431
8,B,C,0.4967177242888402,0.6717724288840262,0.3479212253829322,0.7004405286343612,1.0426753146120622,0.645594375122405
9,B,E,0.4967177242888402,0.9168490153172868,0.4529540481400437,0.9118942731277532,0.9945959017169052,-0.1056458164445539
10,B,K,0.4967177242888402,0.3566739606126914,0.2100656455142232,0.4229074889867841,1.1856976838463824,1.842040084979296


Unnamed: 0,ITEM1OF3,ITEM2OF3,ITEM3OF3,LSUPPORT,RSUPPORT,SUPPORT,CONFIDENCE,LIFT,ZSCORE
1,A,B,C,0.437636761487965,0.6717724288840262,0.3172866520787746,0.725,1.0792345276872963,1.0930394664012584
2,A,B,E,0.437636761487965,0.9168490153172868,0.4048140043763676,0.925,1.008890214797136,0.1555794571914785
3,A,B,K,0.437636761487965,0.3566739606126914,0.1772428884026258,0.4049999999999999,1.1354907975460122,1.2456997247084796
4,A,B,M,0.437636761487965,0.9693654266958424,0.4310722100656455,0.985,1.0161286681715576,0.2959598040871559
5,A,B,V,0.437636761487965,0.3632385120350109,0.1925601750547046,0.44,1.2113253012048193,1.9640647342034827
6,A,C,B,0.5973741794310722,0.4967177242888402,0.3172866520787746,0.5311355311355311,1.069290474576818,0.9621604762024332
7,A,C,E,0.5973741794310722,0.9168490153172868,0.562363238512035,0.9413919413919414,1.0267687761721174,0.6297183997841872
8,A,C,K,0.5973741794310722,0.3566739606126914,0.2297592997811816,0.3846153846153846,1.0783388390750357,0.8714159461418225
9,A,C,M,0.5973741794310722,0.9693654266958424,0.5886214442013129,0.9853479853479854,1.0164876507991631,0.413409952173191
10,A,C,V,0.5973741794310722,0.3632385120350109,0.2363238512035011,0.3956043956043956,1.0891036674169206,1.0027415267825377


3. The third example demonstrates a request for sequence analysis by specifying a sequencecolumn parameter.  It also includes an optional minimumsupport, where and outputtablename parameter.

In [75]:
call ${VALDB}.td_analyze('Association',
                         'database=${XSPDB};
                          tablename=twm_credit_tran;
                          groupcolumn=cust_id;itemcolumn=channel;sequencecolumn=tran_date;
                          minimumsupport=0.1;
                          where=channel <> '' '';
                          outputdatabase=${QLID};outputtablename=val_example3;');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,ITEM1OF2,ITEM2OF2,LSUPPORT,RSUPPORT,SUPPORT,CONFIDENCE,LIFT,ZSCORE
1,A,A,0.8577680525164114,0.8577680525164114,0.7811816192560175,0.9107142857142856,2.123451166180758,18.32181268950856
2,A,B,0.8577680525164114,0.4967177242888402,0.3457330415754923,0.4030612244897959,1.622898498606491,6.928229184679938
3,A,C,0.8577680525164114,0.6717724288840262,0.4770240700218818,0.5561224489795918,1.6556870305125309,8.917231103695832
4,A,E,0.8577680525164114,0.9168490153172868,0.7921225382932167,0.923469387755102,2.0144415761531342,17.45778084859275
5,A,K,0.8577680525164114,0.3566739606126914,0.2319474835886214,0.2704081632653061,1.5162764492299987,4.690262055639944
6,A,M,0.8577680525164114,0.9693654266958424,0.8008752735229759,0.9336734693877552,1.9263601603169485,16.705157634533087
7,A,V,0.8577680525164114,0.3632385120350109,0.2407002188183807,0.2806122448979591,1.545057782149004,5.005420372684139
8,B,A,0.4967177242888402,0.8577680525164114,0.3829321663019693,0.7709251101321586,1.7975141598489617,8.870403267662489
9,B,B,0.4967177242888402,0.4967177242888402,0.1575492341356674,0.3171806167400881,1.2771060955966544,2.222231244365158
10,B,C,0.4967177242888402,0.6717724288840262,0.2538293216630197,0.5110132158590308,1.5213878804402416,4.987764162461534


---

Since we are in a beta phase, the following command can be used to determine the specific release you are running:

In [76]:
call ${VALDB}.td_analyze('version','');

Success: 0 rows affected

WARNING: [Teradata Database] [Warning 3212] The stored procedure returned one or more result sets.

Unnamed: 0,version
1,VAL In-DB 2.1.0.0


In [77]:
DROP TABLE "${QLID}"."VAL_ADS";

Success: 42 rows affected

In [78]:
DROP TABLE "${QLID}"."VAL_ADS2";

Success: 37 rows affected

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2021 Teradata. All Rights Reserved</footer>