# Demo: Grouping and Aggregating

### 1. Import Packages and Connect to the CAS Server

Visit the documentation for the SWAT [(SAS Scripting Wrapper for Analytics Transfer)](https://sassoftware.github.io/python-swat/index.html) package.

In [None]:
## Import packages
import swat
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Set options
pd.set_option('display.max_columns', None)

## Connect to CAS
conn = swat.CAS('server.demo.sas.com', 30571, 'student', 'Metadata0', name = 'py03d05')

## Function to load the loans_raw.sashdat file into memory if necessary
def loadLoans():
    conn.loadTable(path = 'loans_raw.sashdat', caslib = 'PIVY',
                   casOut = {'name' : 'loans_raw',
                            'caslib' : 'casuser',
                            'promote' : True})

### 2. Explore Available CAS Tables and Data Source Files


a. Use the tableInfo action to view all available in-memory tables in the **Casuser** caslib. If the **LOANS_RAW** CAS table is not available, uncomment the loadLoans function and execute the cell.

In [None]:
#loadLoans()
conn.tableInfo(caslib = 'casuser')

b. Reference the **LOANS_RAW** CAS table using the CASTable method. Within the CASTable method, add the where parameter to filter for rows where **Category** equals *Credit Card*. This is an alternate method to add the where parameter to a **CASTable** object. Then preview the CAS table using the head method.

In [None]:
## All rows where Category equals Credit Card
ccTbl = conn.CASTable('loans_raw', 
                      caslib = 'casuser', 
                      where = 'Category = "Credit Card"')
display(ccTbl)

## Preview the table
ccTbl.head()

### 3. Pandas groupby Method

a. Use the [groupby](https://sassoftware.github.io/python-swat/generated/swat.cas.table.CASTable.groupby.html#swat.cas.table.CASTable.groupby) method to group the CAS table by **LoanGrade** and store the results in the variable **loan_grp**. This works similarly to the Pandas groupby method. View the object and notice that it creates a **CASTableGroupBy** object. This is similar to the **DataFrameGroupBy** object created in Pandas.

In [None]:
loan_grp = ccTbl.groupby('LoanGrade')
loan_grp

b. Once the CAS table is grouped in a **CASTableGroupBy** object, you can execute summary methods or actions on the **loan_grp** variable as you would a group in Pandas. Here, use the mean method to view the mean of **Amount** for each value of **LoanGrade** for rows where **Category** is *Credit Card*.

In [None]:
(loan_grp
 .Amount
 .mean())

c. You can also execute actions on a **CASTableGroupBy** object. Execute the summary action on **loan_grp**, and store the results of the action in the variable **cr**. Notice that an action on a **CASTableGroupBy** object returns a **CASResults** object with each distinct group as its own **SASDataFrame**. 

In [None]:
cr = loan_grp.summary(inputs = 'Amount', 
                      subSet = ['MEAN'])
cr

d. A **CASResults** object has additional methods and attributes that can be used. The [concat_bygroups](https://sassoftware.github.io/python-swat/generated/swat.cas.results.CASResults.concat_bygroups.html#swat.cas.results.CASResults.concat_bygroups) method concatenates each individual By group into a single **SASDataFrame**. The concat_bygroups method returns a **CASResults** object with a key named *Summary*. Call the *Summary* key to return the **SASDataFrame** in the variable **loan_df**. View the type and value of **loan_df**.

In [None]:
loan_df = cr.concat_bygroups()['Summary']

display(type(loan_df), loan_df)

e. Once you have the **SASDataFrame**, you can work with it as you would a **pandas.DataFrame**. Here, we plot the mean of **Amount** of for each value of **LoanGrade** using Matplotlib. Notice that the mean amount of credit card debt is similar across loan grades.

In [None]:
fig, ax = plt.subplots(figsize = (8,6))
ax.bar(loan_df.index, loan_df.Mean, color = 'blue')
ax.set_title('Average Amount of Credit Card Debt by Loan Grade',
             fontdict = {'fontsize' : 14, 
                         'color' : 'gray'}, 
             loc = 'left');

### 4. Using the groupBy Parameter

a. You can also add the groupBy parameter on a **CASTable** object to achieve similar results. Here, we specify **LoanGrade** as the groupBy parameter value and add the **Amount** column to the vars parameter. View the **CASTable** object created earlier prior to adding the parameters, then and after. Notice that the where parameter still exists, and the vars and groupBy parameters were added.

In [None]:
display(ccTbl)

ccTbl.groupBy = 'LoanGrade'
ccTbl.vars = ['Amount']
ccTbl

b. Once the groupBy parameter is added to the **CASTable** object, you can execute methods on the **CASTable** object that contains the groupBy, vars, and where parameters. Here, the mean of each value of **LoanGrade** is calculated for all *Credit Card* rows using the mean method. Notice that the groupBy parameter returns a **SASDataFrame** instead of a **Series** as previously shown.

In [None]:
ccTbl.mean()

c. You can also execute actions on the **CASTable** object. Here, the summary action is executed to achieve similar results. Remember, executing an action on a group returns a **CASResults** object with a **SASDataFrame** for each unique groupBy value.

To combine all **SASDataFrame** objects in a **CASResults** object, you can use the concat_bygroups method. Then call the *Summary* key to return the **SASDataFrame**.

In [None]:
cr = ccTbl.summary(subSet = ['MEAN'])

loanGrade_df = cr.concat_bygroups()['Summary']
loanGrade_df

### 5. Terminate the CAS Session

It's best practice to always terminate the CAS session when you are done.

In [None]:
conn.terminate()