# Demo: Using Additional CAS Actions

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [None]:
## Import packages
import swat
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')
import seaborn as sns

## Set options
pd.set_option('display.max_columns', None)

## Connect to CAS
conn = swat.CAS('server.demo.sas.com', 30571, 'student', 'Metadata0', name = 'py04d02')

## Function to load the loans_raw.sashdat file into memory if necessary
def loadLoans():
    conn.loadTable(path = 'loans_raw.sashdat', caslib = 'PIVY',
                   casOut = {'name' : 'loans_raw',
                            'caslib' : 'casuser',
                            'promote' : True})

### 2. Explore Available CAS Tables

a. Use the tableInfo action to view all available in-memory tables in the **Casuser** caslib. If the **LOANS_RAW** CAS table is not available, uncomment the loadLoans function and execute the cell.

In [None]:
#loadLoans()
conn.tableInfo(caslib = 'casuser')

b. Reference the **LOANS_RAW** CAS table where **Category** equals *Mortgage*. Then preview the table.

In [None]:
mTbl = conn.CASTable('loans_raw', 
                     caslib = 'casuser', 
                     where = "Category = 'Mortgage'")
mTbl.head()

### 3. Simple Action Set


a. You can use the [simple.correlation](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/n11jidodvxk3tkn1sy8636iha1po.htm#p0jytlrw1nn19tn1782gu5c2cv43) action to generate a matrix of Pearson correlation coefficients for two or more input columns. By default, univariate descriptive statistics are also generated for the analysis variables. You can disable the univariate descriptive statistics if you do not need them.

In [None]:
colNames = ['Age', 'EmpLength', 'Amount', 'InterestRate', 'LoanLength']
mTbl.correlation(inputs = colNames)

## Alternate version
#mTbl.vars = ['Age', 'EmpLength', 'Amount', 'InterestRate', 'LoanLength']
#mTbl.correlation()

b. The SWAT package also provides the Pandas corr method to return similar results.

In [None]:
mTbl[colNames].corr()

c. Use the [simple.topk](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-simple-topk.htm) action to return the top-K and bottom-K distinct values of each column included in the list. In this example the top five and bottom five mortgage interest rates are returned. The inputs parameter specifies the **InterestRate** column, and the raw parameter specifies to use the raw values and avoid masking values with any SAS formats. The action returns two **SASDataFrames**. The first shows the top five and bottom five interest rates, and the second shows the number of unique values in the column.

In [None]:
mTbl.topk(inputs = 'InterestRate',
          topk = 5,
          bottomk = 5,
          raw = True) # <-avoid masking by autoformatting values

d. You can also use an aggregator within the topk action to find the top n and bottom n values based on an aggregation. Here, we specify the mean aggregator to aggregate **Amount** by each value of **LoanGrade**, and then return the loan grade with the highest and lowest mean value. 

In [None]:
mTbl.topk(inputs = 'LoanGrade',  ## <--group for the aggregator
          weight = 'Amount',     ## <--specify the column to aggregate
          aggregator = 'MEAN',   ## <--how to aggregate
          raw = True)

### 4. Percentile Action Set

a. Load and view the percentile action set. Notice that the percentile action set contains three actions: assess, boxplot, and percentile.

In [None]:
conn.loadActionSet('percentile')
conn.percentile?

b. Group the mortgage loans by **LoanGrade**, and then execute the [percentile.boxplot](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-percentile-boxplot.htm) action to calculate quantiles, high and low whiskers, and outliers of **InterestRate** by each value of **LoanGrade**. Because the table is grouped, you must use the concat_bygroups method to concatenate each individual by group. Then call the *BoxPlot* key to return the **SASDataFrame**.

**Note**: If you do not specify the action set, the boxplot plotting method is called.

In [None]:
mTbl.groupBy = ['LoanGrade']

(mTbl
 .percentile
 .boxplot(inputs = 'InterestRate')
 .concat_bygroups()
 ['BoxPlot'])

c. The [percentile.percentile](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-percentile-percentile.htm) action calculates quantiles and percentiles. Here, the groupby parameter is deleted from the **CASTable** object, and then the percentile action is executed on the **Amount** and **InterestRate** columns. Notice that the action returns a **CASResults** object with a single **SASDataFrame** with percentiles for each column.

In [None]:
del mTbl.groupby
display(mTbl)

mTbl.percentile(inputs = ['Amount', 'InterestRate'])

### 5. dataPreprocess Action Set

a. Load and view the dataPreprocess action set. Notice that the dataPreprocess action set contains a variety of actions.

In [None]:
conn.loadActionSet('dataPreprocess')
conn.dataPreprocess?

b. Execute thie [dataPreprocess.histogram](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-datapreprocess-histogram.htm) action to generate histogram bins and simple bin-based statistics for the **InterestRate** column. Store the **SASDataFrame** from the results of the histogram action by calling the *BinDetails* key in the **CASResults** object.

In [None]:
histDf = mTbl.histogram(inputs = ['InterestRate'],
                        requestPackages = [{'nbins':10, 
                                            'binStart':0,
                                            'niceBinning':False}])['BinDetails']

histDf

c Using the **histDf** variable, you can use the **MidPoint** and **Percent** columns to plot a bar chart to visualize the histogram results.

**Note**: The histogram action enables you to summarize extremely large tables, and then you can use the result of the action to visualize the results on the client.

In [None]:
ax = plt.figure(figsize = (10,5))
ax = sns.barplot(data = histDf, x = 'MidPoint', y = 'Percent', color = 'blue')
ax.set(title = 'Histogram of Mortgage Interest Rates');

### 6. dataShaping Action Set

a. Load and view the [dataShaping](https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casanpg/cas-datashaping-TblOfActions.htm?homeOnFail) action set. Notice that it contains two actions for transposing data, longToWide and wideToLong.

**Note:** The dataShaping action set was added in SAS Viya 2021.1.2. Prior versions should use the [transpose.transpose](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/caspg/cas-transpose-transpose.htm) action.

In [None]:
conn.loadActionSet('dataShaping')
conn.dataShaping?

b. To work with the dataShaping action set, we will create a small five-row wide CAS table as a demonstration from the **customers_raw.csv** data source file.

In [None]:
## Load the customers_raw.csv file into memory with the specified parameters
conn.loadTable(path = 'customers_raw.csv', caslib = 'PIVY', 
               vars = ['ID','LoanCreditCard', 'SavingsAcct', 'CheckingAcct'],
               casOut = {'replace' : True})

## Reference the CAS table
customers = conn.CASTable('customers_raw', caslib = 'casuser')

## Create a five-row DataFrame from the wide table as a sample
df = (customers
      .sort_values('ID')
      .head())

## Upload the DataFrame to CAS
conn.upload(df, 
            casOut = {'name' : 'wideTest', 
                      'replace': True})

## Preview the new table
custTbl = conn.CASTable('wideTest', caslib = 'casuser')
custTbl.head()

c. To transform the wide CAS table to a long table use the [dataShaping.wideToLong](https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-datashaping-widetolong.htm) action. Start by creating two variables to specify the input CAS table and the output CAS table. Then in the wideToLong action, use

- the **inputTbl** variable to specify the input table 
- the id parameter to specify the customer **ID** column 
- the inputs parameter to specify the columns to transpose 
- the variableName parameter to specify the name of the column in the output table that has values that are column names from the input table 
- the valueName parameter to specify the name of a column in the output table that has values from the input table
- the casOut parameter to use the **outputTbl** variable to specify the new CAS table.

The action returns a **CASResults** object with information about the new CAS table.

In [None]:
## Specify the input and output table information
inputTbl = {'name' : 'wideTest', 
            'caslib' : 'casuser'}

outputTbl = dict(name = 'LongTest', 
                 caslib = 'casuser', 
                 replace = True)

## Tranpose the table
conn.wideToLong(table = inputTbl,                                               
                id = 'ID', 
                inputs = ['LoanCreditCard','SavingsAcct','CheckingAcct'],
                variableName = 'AccountType',
                valueName = 'AccountExists',
                casOut = outputTbl)

d. Preview the new CAS table. Notice that the wide table was transposed to a long table.

In [None]:
longTbl = conn.CASTable('LongTest', caslib = 'casuser')
longTbl.head(15)

### 7. Terminate the CAS Session

It's best practice to always terminate the CAS session when you are done.

In [None]:
conn.terminate()