# Demo: Analyzing Continuous Columns

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [None]:
## Import packages
import swat
import pandas as pd

## Set options
pd.set_option('display.max_columns', None)

## Connect to CAS
conn = swat.CAS('server.demo.sas.com', 30571, 'student', 'Metadata0', name = 'py03d04')

## Function to load the loans_raw.sashdat file into memory if necessary
def loadLoans():
    conn.loadTable(path = 'loans_raw.sashdat', caslib = 'PIVY',
                   casOut = {'name' : 'loans_raw',
                            'caslib' : 'casuser',
                            'promote' : True})

### 2. Explore Available CAS Tables and Data Source Files


a. Use the tableInfo action to view all available in-memory tables in the **Casuser** caslib. If the **LOANS_RAW** CAS table is not available, uncomment the loadLoans function and execute the cell.

In [None]:
#loadLoans()
conn.tableInfo(caslib = 'casuser')

b. Reference the **loans_raw** CAS table using the CASTable method and preview the table using the head method.

In [None]:
tbl = conn.CASTable('loans_raw', caslib = 'casuser')
tbl.head()

### 3. Using Familiar Pandas Methods in the SWAT Package

a. Use the describe method to view descriptive statistics of each continuous column of a CAS table. Before you use the describe method, use the Jupyter magic function time in order to time the execution of the Python statement or expression. Take note of the amount of time that it takes the describe SWAT method to execute.

**Note**: Time will vary based on your environment.

In [None]:
%%time
tbl.describe()

b. You can use summary functions as you would in Pandas on a CAS table. Here, we find the max value of the **Amount** column.

In [None]:
(tbl
 .Amount
 .max())

c. You can also query a CAS table using the query method. Here, we query the CAS table for all rows where **Category** equals *Credit Card*. Then we find the mean credit card amount balance of credit cards.

In [None]:
(tbl
 .query('Category = "Credit Card"')
 .Amount
 .mean())

### 4. Summary Action

a. Instead of using the describe method you can use the [simple.summary](https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casanpg/cas-simple-summary.htm?homeOnFail) action. The summary action returns a variety of descriptive statistics like the describe method. Notice that the summary action executed much faster than the describe method.

**Note:** CAS actions are sent directly to the CAS server for processing. Pandas API methods in the SWAT package are converted to a variety of CAS actions through the CAS API to produce similar results to Pandas methods.

In [None]:
tbl.summary()

b. You can use the vars parameter to set the columns to analyze in the CAS table. Here, add the **Amount**, **InterestRate**, and **LoanLength** columns to the CAS table object. Then execute the summary action on the specified columns.

In [None]:
tbl.vars = ['Amount', 'InterestRate', 'LoanLength']
display(tbl)

tbl.summary()

c. Add the subSet parameter to specify the summary statistics to process. Here, the Mean, Max, and Min statistics are requested using the columns contained in the vars parameter.

In [None]:
tbl.summary(subSet = ['Mean','Max','Min'])

d. Add the where parameter to the **CASTable** object to filter for rows where **Category** equals *Credit Card*. Then execute the summary action. Notice that the **CASTable** object now contains both the vars and where parameters.

In [None]:
tbl.where = 'Category = "Credit Card"'
display(tbl)

tbl.summary(subSet = ['Mean','Max','Min'])

e. Create a calculated column using the computedVarsProgram parameter named **MonthlySalary**. Then append the new column to the vars parameter using the append method. Lastly, display the **CASTable** object and summarize the CAS table.

In [None]:
## Create a calculated column
tbl.computedVarsProgram = 'MonthlySalary = round(Salary/12);'

## Append the column name to the list of column inputs
tbl.vars.append('MonthlySalary')
display(tbl)

## Analyze the CAS table
tbl.summary(subSet = ['MIN','MEAN','MAX'])

### 5. Aggregate Action

a. The [aggregation.aggregate](https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casanpg/cas-aggregation-aggregate.htm) action provides much more functionality than the summary action. You first need to load the aggregation action set. Use the loadActionSet action.

In [None]:
conn.loadactionset('aggregation')

b. View the CAS table reference **tbl**. Notice that it is filtering for rows where **Category** equals *Credit Card*, creating the column **MonthlySalary**, and selecting only the **Amount**, **InterestRate**, **LoanLength**, and **MonthlySalary** columns.

In [None]:
tbl

c. Execute the aggregate action on the **tbl** variable. By default, the aggregate action computes only the number of distinct values in each column.

In [None]:
tbl.aggregate()

d. The aggregate action provides much more functionality than the summary action. You need to use the varSpecs parameter to specify the columns to analyze. In the varSpecs parameter, you add a list of dictionaries. Each dictionary specifies the column, the aggregation, and any additional options. The aggregate action is similar to the agg Pandas method.

Within the dictionary you can use the following parameters to request summary statistics:
- The [subSet](https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casanpg/cas-aggregation-aggregate.htm#SAS.cas-aggregation-aggregate-varspecs-summarysubset) parameter requests statistics that the summary action can execute. You can pass a list of summary statistics.

- The [agg](https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casanpg/cas-aggregation-aggregate.htm#SAS.cas-aggregation-aggregate-varspecs-agg) parameter specifies the aggregator to apply to the analysis variable. It can execute summary statistics that the subSet parameter cannot. Only a single aggregator can be used.

In this example, the aggregate action will calculates the median and percentiles of the **Amount** column, and the minimum and maximum of the **InterestRate** column, and the mean, maximum and minimum of the **MonthlySalary** columns. Notice that the action returns a **CASResults** object with four keys, one for each dictionary.

In [None]:
tbl.aggregate(varSpecs = [
                        {'name' : 'Amount', 'agg' : 'MEDIAN'},
                        {'name' : 'Amount', 'agg' : 'PERCENTILE'},
                        {'name' : 'InterestRate', 'subSet' : ['MIN','MAX']},
                        {'name' : 'MonthlySalary', 'subSet' : ['MEAN','MIN','MAX']}
                         ]
             )

### 6. Terminate the CAS Session

It's best practice to always terminate the CAS session when you are done.

In [None]:
conn.terminate()