# Week 14 (Part 1- SASPy)
### SASPy is the module which provides Python Application Programming Interfaces (APIs) to the SAS system. 
#### How to:
* Load a SAS data set into a Pandas dataframe by using the SASPy module
* Load a Pandas dataframe into SAS data set
* Generate the SAS code by using SASPy module
* Generate a profile of the Python data object created from the SAS data set



### Loading a SAS data set into a Python Object using the sasdata method
* Import saspy
* Create a connection with SAS, authenticating and spinning up a SAS session; winlocal is configuration name in the setup
* Create  python object using the sasdata method
* Run descriptive statistics

In [2]:
import saspy
sas = saspy.SASsession(cfgname='winlocal')
iris = sas.sasdata("iris","SASHELP")
iris.describe()

SAS Connection established. Subprocess id is 14128



Unnamed: 0,Variable,Label,N,NMiss,Median,Mean,StdDev,Min,P25,P50,P75,Max
0,SepalLength,Sepal Length (mm),150,0,58.0,58.433333,8.280661,43,51,58.0,64,79
1,SepalWidth,Sepal Width (mm),150,0,30.0,30.573333,4.358663,20,28,30.0,33,44
2,PetalLength,Petal Length (mm),150,0,43.5,37.58,17.652982,10,16,43.5,51,69
3,PetalWidth,Petal Width (mm),150,0,13.0,11.993333,7.622377,1,3,13.0,18,25


If you are using "JupyterLab in SAS University Edition", use the following code block. Notice the null argument in the second line of the code block below.
``` Python
import saspy
sas = saspy.SASsession()
iris = sas.sasdata("iris","SASHELP")
iris.describe()
```

### The print() function prints the class type of the object that is specified as the argument in the type() function.

In [7]:
print(type(iris))

<class 'saspy.sasdata.SASdata'>


In [8]:
type(iris)

saspy.sasdata.SASdata

## to_df() loads the SAS data set in to the pandas dataframe.

In [None]:
import saspy
import pandas as pd
# Convert pandas dataframe using to_df()
pd_iris =iris.to_df()
print(type(pd_iris))

### Running SAS programs in Python notebooks by using a JupyterLab magic command (%%SAS)

In [None]:
import saspy

In [None]:
%%SAS
proc means data=sashelp.iris; run;

"The paired methods df2sd and sd2df are the principle means by which a data set may be transferredT
between SAS and Python. On each destination, the data types, column names, and other basic
elements will be retained; some additional metadata unique to SAS data sets may however be
dropped on the Python side." [A Basic Introduction to SASPy and Jupyter Notebooks By Jason Philips. 2018](https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2822-2018.pdf)

### Loading a SAS data set into a dataframe by using the SASPy module

In [1]:
import saspy
sas = saspy.SASsession(cfgname='winlocal')
class_sds = sas.sd2df(table='class', libref='sashelp')
class_sds.describe()  

SAS Connection established. Subprocess id is 3536



Unnamed: 0,Age,Height,Weight
count,19.0,19.0,19.0
mean,13.315789,62.336842,100.026316
std,1.492672,5.127075,22.773933
min,11.0,51.3,50.5
25%,12.0,58.25,84.25
50%,13.0,62.8,99.5
75%,14.5,65.9,112.25
max,16.0,72.0,150.0


In [None]:
print(type(class_sds))

 ### Generating SAS code by using the SASPy module

In [None]:
import saspy
import pandas as pd
sas = saspy.SASsession(cfgname='winlocal')
w_class = sas.sasdata("CARS","SASHELP")
code=sas.teach_me_SAS(1)
w_class.columnInfo()


In [2]:
import saspy
import pandas as pd
sas = saspy.SASsession(cfgname='winlocal')
%cd C:\Data
p_cars = pd.read_sas('cars.sas7bdat', format='sas7bdat', encoding="utf-8")
p_cars.describe()

SAS Connection established. Subprocess id is 10256

C:\Data


Unnamed: 0,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
count,428.0,428.0,428.0,426.0,428.0,428.0,428.0,428.0,428.0,428.0
mean,32774.85514,30014.700935,3.196729,5.807512,215.885514,20.060748,26.843458,3577.953271,108.154206,186.36215
std,19431.716674,17642.11775,1.108595,1.558443,71.836032,5.238218,5.741201,758.983215,8.311813,14.357991
min,10280.0,9875.0,1.3,3.0,73.0,10.0,12.0,1850.0,89.0,143.0
25%,20334.25,18866.0,2.375,4.0,165.0,17.0,24.0,3104.0,103.0,178.0
50%,27635.0,25294.5,3.0,6.0,210.0,19.0,26.0,3474.5,107.0,187.0
75%,39205.0,35710.25,3.9,6.0,255.0,21.25,29.0,3977.75,112.0,194.0
max,192465.0,173560.0,8.3,12.0,500.0,60.0,66.0,7190.0,144.0,238.0


In [None]:
print(type(p_cars))

In [None]:
%%SAS
proc print data=sashelp.class (obs=5); 
run;

### Generating a profile of the Python data object created from the SAS data set

In [None]:
import saspy
import pandas
import pandas_profiling
sas = saspy.SASsession(cfgname='winlocal')
df_heart = sas.sasdatatodataframe(table='heart', libref='sashelp')
pandas_profiling.ProfileReport(df_heart)

In [1]:
import saspy
import pandas as pd
sas =saspy.SASsession()
sas.saslib(libref='new', path="C:\\Data")


Using SAS Config named: winlocal
SAS Connection established. Subprocess id is 12212

3                                                          The SAS System                            23:32 Friday, November 29, 2019

21         
22         libname new    'C:\Data'  ;
NOTE: Libref NEW was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: C:\Data
23         
24         


In [3]:
py_obvisits17 = sas.sd2df(table='h197g', libref='new', dsopts={"keep": "dupersid var: VSTCTGRY"})

In [51]:
py_obvisits17 = py_obvisits17.astype({"VARSTR":'object', "VARPSU":'object'})

In [52]:
py_obvisits17.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170491 entries, 0 to 170490
Data columns (total 4 columns):
DUPERSID    170491 non-null object
VSTCTGRY    170491 non-null int64
VARSTR      170491 non-null object
VARPSU      170491 non-null object
dtypes: int64(1), object(3)
memory usage: 5.2+ MB


In [23]:
type(py_obvisits17)

pandas.core.frame.DataFrame

In [44]:
py_obvisits17['CLUSTER']= py_obvisits17['VARSTR'].astype(str)+py_obvisits17['VARPSU'].astype(str)

In [45]:
py_obvisits17.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170491 entries, 0 to 170490
Data columns (total 4 columns):
DUPERSID    170491 non-null object
VARSTR      170491 non-null object
VARPSU      170491 non-null object
CLUSTER     170491 non-null object
dtypes: object(4)
memory usage: 5.2+ MB


In [46]:
py_obvisits17.head()

Unnamed: 0,DUPERSID,VARSTR,VARPSU,CLUSTER
0,10001101,1021,1,10211
1,10001101,1021,1,10211
2,10001101,1021,1,10211
3,10001101,1021,1,10211
4,10001101,1021,1,10211


In [16]:
import saspy
import pandas as pd
from IPython.display import display
display(len(py_obvisits17['CLUSTER'].unique().tolist()))
display(len(py_obvisits17['VARSTR'].unique().tolist()))
display(len(py_obvisits17['VARPSU'].unique().tolist()))
display(len(py_obvisits17['DUPERSID'].unique().tolist()))

KeyError: 'CLUSTER'

282

In [14]:
len(py_obvisits17['VARPSU'].unique().tolist())

3

In [15]:
len(py_obvisits17['DUPERSID'].unique().tolist())

22352