# Demo: Exploring a CAS Table

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [None]:
## Import packages
import swat
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Set options
pd.set_option('display.max_columns', None)

## Connect to CAS
conn = swat.CAS('server.demo.sas.com', 30571, 'student', 'Metadata0', name = 'py03d01')

## Function to load the loans_raw.sashdat file into memory if necessary
def loadLoans():
    conn.loadTable(path ='loans_raw.sashdat', caslib = 'PIVY',
                   casOut = {'name' : 'loans_raw',
                            'caslib' : 'casuser',
                            'promote' : True})

### 2. Explore Available CAS Tables


a. Use the tableInfo action to view all available in-memory tables in the **Casuser** caslib. If the **LOANS_RAW** CAS table is not available, uncomment the loadLoans function and execute the cell.

In [None]:
#loadLoans()
conn.tableInfo(caslib = 'casuser')

b. Reference the **LOANS_RAW** CAS table from the **Casuser** caslib in the variable **tbl** and view the output. Notice that the **tbl** variable references a CAS table.

In [None]:
tbl = conn.CASTable('loans_raw', caslib = 'casuser')
tbl

### 3. Preview the Table

a. Preview the CAS table using the head method. The head method processes the data on the CAS server and returns the results to the client as a **SASDataFrame**.

In [None]:
tbl.head()

b. You can use the sort_values method to sort the CAS table in the CAS server, and then use the head method to return five rows from the sorted CAS table. The head method returns a **SASDataFrame** to the client. With the **SASDataFrame**, you can use the Pandas loc method to obtain the columns **ID**, **Year**, **Age**, and **Amount** as you would with a **pandas.DataFrame**. 

**Note:** Because the CAS server distributes data blocks among the workers, you must sort the data for guaranteed order.

In [None]:
(tbl
 .sort_values(by = ['Year','Age'], 
              ascending = [True,False])
 .head()
 .loc[:,['ID','Year','Age','Amount']])               

c. Preview the CAS table using the fetch CAS action. The fetch action processes the results in CAS and returns the summarized results as a **CASResults** object to the client.

In [None]:
tbl.fetch(to = 5)

d. Sort the results of the fetch action by a single column using the sortBy parameter. The default sort order is ascending.

In [None]:
tbl.fetch(to = 5, 
          sortBy = 'Year')

e. Sort the results of the fetch action by multiple columns by passing a list to the sortBy parameter.

In [None]:
tbl.fetch(to = 5, 
          sortBy = ['Year', 'Age'])

f. The fetch action allows you to modify the sort order for each column specified within the action. Here, sort the CAS table by ascending **Year** and descending **Age** by specifying a list of columns to sort in the sortBy parameter. Within the list, you can specify a single column to accept the default sort, or you can add a dictionary to modify the sort order.

In [None]:
tbl.fetch(to = 5,
          sortBy = ['Year', 
                    {'name':'Age', 'order':'descending'}
                   ])

g. You can also select specific columns within the action. Here, use the fetchVars parameter and specify the following list of columns: **ID**, **Year**, **Age**, and **Amount**.

In [None]:
tbl.fetch(to = 5,
          sortBy = ['Year', 
                    {'name':'Age', 'order':'descending'}
                   ],
          fetchVars = ['ID','Year','Age','Amount'])

### 4. Explore a CAS Table

a. View the CAS table dimensions using the shape attribute. The shape attribute returns a **tuple** to the client.

In [None]:
tbl.shape

b. Use the [simple.numRows](https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casanpg/cas-simple-numrows.htm?homeOnFail) action to display the number of rows in a CAS table.

In [None]:
tbl.numRows()

c. View the column names of a CAS table using the columns attribute.

In [None]:
tbl.columns

d. View the data types of the CAS table columns using the familiar dtypes attribute.

In [None]:
tbl.dtypes

e. Use the [table.columnInfo](https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/caspg/cas-table-columninfo.htm) action to show a CAS table's column information. The columnInfo action shows additional CAS table column information like column labels and formats if they exist. The columnInfo action is recommended instead of dtypes.

In [None]:
tbl.columnInfo()

### 5. View Distinct and Missing Values

a. Use the info method to print a summary of CAS table information. The info method returns information like the number of nonmissing values, column type, data size, and more.

In [None]:
tbl.info()

b. Use the nmiss method to view the number of missing values in each column of a CAS table. The CAS server returns a **series** to the client.

In [None]:
tbl.nmiss()

c. Use the [simple.distinct](https://documentation.sas.com/doc/en/pgmsascdc/v_018/casanpg/cas-simple-distinct.htm?homeOnFail) action to get the number of distinct and missing values for every column. Store the **SASDataFrame** from within the **CASResults** object in the variable **df** by calling the *Distinct* key after the action.

**Note:** The distinct action is resource intensive, it can take some time to complete execution depending on the environment.

In [None]:
##Execute the distinct action and store the SASDataFrame from the CASResults object
df = tbl.distinct()['Distinct']
df

### 6. Calculate the Percentage of Distinct Values for Each Column

a. Confirm that the variable **df** contains a **SASDataFrame** and then display the results. Remember, the **SASDataFrame** resides on the client.

In [None]:
display(type(df), df)

b. Use the numRows action to store the number of rows in the **LOANS_RAW** CAS table. Call the *numrows* key to return the number to the variable **n**.

In [None]:
n = tbl.numRows()['numrows']
n

c. Because **df** is a **SASDataFrame**, you can use traditional Pandas functionality. Using traditional Pandas on the client, add a new column named **pctDistinct** to the **SASDataFrame** that divides the total number of distinct values by the total rows in the table and round the value. Then sort the **SASDataFrame** by the new **pctDistinct** column.

In [None]:
## Create a new column named pctDistinct
df['pctDistinct'] = round(df.NDistinct/n, ndigits = 6)

## Sort the DataFrame by the pctDistinct column
df.sort_values(by = 'pctDistinct', 
               ascending = False, 
               inplace = True)

## View the SASDataFrame
df

d. Use the Pandas plot method to plot the **PctDistinct** column in the **SASDataFrame**. The visualization displays the percentage of distinct values for each column.

In [None]:
df.plot(kind = 'bar', x = 'Column', y = 'pctDistinct', 
        figsize=(10,6));

### 7. Terminate the CAS Session

It's best practice to always terminate the CAS session when you are done.

In [None]:
conn.terminate()