In [36]:
import saspy
import pandas as pd
from IPython.display import HTML
sas_session = saspy.SASsession()

### Helpful Resources and References
<div style="font-size:14px">
<br>https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3189-2019.pdf
<br>https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3238-2019.pdf
<br>https://wrds-www.wharton.upenn.edu/documents/1441/SASPy_demo_basic_functions.html
<br>https://www.lexjansen.com/pharmasug/2009/sp/SP10.pdf
<br>https://www.lexjansen.com/phuse/2013/sp/SP05.pdf 
<br>https://www.pharmasug.org/proceedings/2015/SP/PharmaSUG-2015-SP06.pdf
</div>    

In [27]:
libpath = "libname dstore '/home/sp16670/datasets';"

In [28]:
query = sas_session.submit(
libpath + """    

data dstore.occurance;
     do i = 1 to 4; group='A'; response=1; output; end;
     do i = 1 to 6; group='A'; response=0; output; end;
     do i = 1 to 6; group='B'; response=0; output; end;
run;
"""
)    

<br>
<br>
<div style="font-size:18px">Suppose there are a total of 16 subjects in the trial, 10 subjects In Group A and 6 subjects in Group B. PROC FREQ in. SAS can be used to compute the number of subjects, their proportions and the exact confidence intervals.</div>
<br>
<br>

In [42]:
sas_session.assigned_librefs()

['WORK',
 'DSTORE',
 'SASDATA',
 'STPSAMP',
 'SASHELP',
 'MAPS',
 'MAPSSAS',
 'MAPSGFK',
 'SASUSER']

In [56]:
sas_session.sasdata(table = "occurance",libref = "dstore").to_frame()

Unnamed: 0,i,group,response
0,1.0,A,1.0
1,2.0,A,1.0
2,3.0,A,1.0
3,4.0,A,1.0
4,1.0,A,0.0
5,2.0,A,0.0
6,3.0,A,0.0
7,4.0,A,0.0
8,5.0,A,0.0
9,6.0,A,0.0


In [57]:
query = sas_session.submit(
    libpath + 
 """
 proc freq data=dstore.occurance;
  by group;
  tables response / binomial nocum norow;
  exact binomial;
 run;
 """
)

In [60]:
HTML(query['LST'])

response,Frequency,Percent
0,6,60.0
1,4,40.0

Binomial Proportion,Binomial Proportion
response = 0,response = 0.1
Proportion (P),0.6
ASE,0.1549
95% Lower Conf Limit,0.2964
95% Upper Conf Limit,0.9036
,
Exact Conf Limits,
95% Lower Conf Limit,0.2624
95% Upper Conf Limit,0.8784

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.1581
Z,0.6325
One-sided Pr > Z,0.2635
Two-sided Pr > |Z|,0.5271
,
Exact Test,
One-sided Pr >= P,0.377
Two-sided = 2 * One-sided,0.7539

response,Frequency,Percent
0,6,100.0

Binomial Proportion,Binomial Proportion
response = 0,response = 0.1
Proportion (P),1.0
ASE,0.0
95% Lower Conf Limit,1.0
95% Upper Conf Limit,1.0
,
Exact Conf Limits,
95% Lower Conf Limit,0.5407
95% Upper Conf Limit,1.0

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.2041
Z,2.4495
One-sided Pr > Z,0.0072
Two-sided Pr > |Z|,0.0143
,
Exact Test,
One-sided Pr >= P,0.0156
Two-sided = 2 * One-sided,0.0312


<br>
<br>
<div style="font-size:18px">For ‘Group A’, there are 4 subjects with ‘Respsonse’ as “Yes”. Actual proportion of subjects is 0.40 and
the CIs should be computed for that proportion. SAS output from the PROC FREQ indicates that the 95% CI’s
(0.2624, 0.8784) obtained are for the proportion of subjects with Response =0 (p=0.6). Hence the result is incorrect.
<br>
<br>    
For ‘Group B’, since there are no subjects with ‘Response’ as “Yes”, the proportion is zero. There
also the 95% CIs obtained are for the (0.5407, 1.0000) the proportion of subjects with Response =0 (p=1). 
<br>
<br>    
In the above example, when both the levels are available (eg. Group A, 4 subjects are with ‘0 (No)’ and 6
subjects are with ‘1(Yes)’). So while computing the CIs, Lowest level i.e. ‘0’ is used and the CI which is computed for the proportion of subjects with response as ‘0(No)’.
    
To overcome this, it is advised to reset the level of variables in such a way that the response of interest should get the lowest level. To make the response ‘1(Yes)’ lower to the other, we can reset the ‘0 (No)’ to ‘2(No). Following code resets the ‘0’ to ‘2’ and then computes the CIs using PROC FREQ    
</div>
<br>
<br>

In [61]:
query = sas_session.submit(
    libpath + 
 """
data dstore.occurance_2;
 set dstore.occurance;
 if response = 0 then response = 2;   
run;



proc freq data=dstore.occurance_2;
 by group;
 tables response / binomial nocum norow;
 exact binomial;
run;
""")

In [62]:
HTML(query['LST'])

response,Frequency,Percent
1,4,40.0
2,6,60.0

Binomial Proportion,Binomial Proportion
response = 1,response = 1.1
Proportion (P),0.4
ASE,0.1549
95% Lower Conf Limit,0.0964
95% Upper Conf Limit,0.7036
,
Exact Conf Limits,
95% Lower Conf Limit,0.1216
95% Upper Conf Limit,0.7376

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.1581
Z,-0.6325
One-sided Pr < Z,0.2635
Two-sided Pr > |Z|,0.5271
,
Exact Test,
One-sided Pr <= P,0.377
Two-sided = 2 * One-sided,0.7539

response,Frequency,Percent
2,6,100.0

Binomial Proportion,Binomial Proportion
response = 2,response = 2.1
Proportion (P),1.0
ASE,0.0
95% Lower Conf Limit,1.0
95% Upper Conf Limit,1.0
,
Exact Conf Limits,
95% Lower Conf Limit,0.5407
95% Upper Conf Limit,1.0

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.2041
Z,2.4495
One-sided Pr > Z,0.0072
Two-sided Pr > |Z|,0.0143
,
Exact Test,
One-sided Pr >= P,0.0156
Two-sided = 2 * One-sided,0.0312


<br>
<br>
<div style="font-size:18px">Above SAS output indicates that the CI values for ‘Group A’ are now computed correctly for the proportion of subjects with Response =1 (p=0.4).and the values are changed to (0.1216, 0.7376). 
<br>
<br>    
For Group B, there are no subjects with Response=1 (ie. the resulting proportion is zero but level ‘1(Yes)’ is
missing in the dataset). In the absence of a lower level ‘1(Yes), PROC freq considers the level ‘2(No)’ as the lowest level and computes the confidence intervals for the proportion of subjects with Response =2 (p=1).
<br>
<br>
So it can be observed that even after resetting the ‘0’ to ‘2’, 95% CIs (0.5407, 1.0000) obtained from PROC FREQ are not correct.
<br>
<br>
When a required level is missing, we need to add records to the dataset and then to make use the ‘WEIGHT’
statement in PROC FREQ to consider only the relevant records for the CI computations.
<br>
<br>
To add records with lowest level of the target variable, we can create a dataset which has the lowest level against all the treatments. For the above example, the following dataset can be used to add records to the existing ones.            
<br>
<br>    
If this dataset is merged with the original one, using Group and Response as BY variables, it should add a new record to the existing one with Response=1. If all the treatments contain at least one subject with Response=1, no records will be added.
<br>
<br>    
Also a new variable need to be added (here it is wgt) to the dataset in such a way that the newly added records will get a value of ‘0’ and the already existing records will get a value of ‘1’. This variable can then be used in the WEIGHT statement in PROC FREQ to compute CIs correctly (for Response=1), by taking the proportion as zero. SAS code that can be used to add new records and to create weight variable is as follows    
</div>
<br>
<br>

In [63]:
query = sas_session.submit(
    libpath + 
 """
data dstore.occurance_wt;
     group='A'; response=1; output;
     group='B'; response=1; output;
run; 

data dstore.occurance_3;
 merge dstore.occurance_2(in=a) dstore.occurance_wt(in=b);
 by group response;
 if b and not a then wgt=0;
 else wgt=1;
run;
""")

In [65]:
sas_session.sasdata(table = "occurance_3",libref = "dstore").to_frame()

Unnamed: 0,i,group,response,wgt
0,1.0,A,1.0,1.0
1,2.0,A,1.0,1.0
2,3.0,A,1.0,1.0
3,4.0,A,1.0,1.0
4,1.0,A,2.0,1.0
5,2.0,A,2.0,1.0
6,3.0,A,2.0,1.0
7,4.0,A,2.0,1.0
8,5.0,A,2.0,1.0
9,6.0,A,2.0,1.0


<br>
<br>
<div style="font-size:18px">
Then the WEIGHT statement with ‘zeroes’ option can be used to compute the correct proportions and CIs. PROC
FREQ code with WEIGHT statement is as follows. 
<br>
<br>    
It can be observed that the Binomial Proportion is now computed for ANYSIGAE=1 and the computed proportion is 0. Also this provides the correct t 95% CIs (0.0000, 0.4593) for the zero proportion.     
</div>
<br>
<br>

In [66]:
query = sas_session.submit(
    libpath + 
 """
proc freq data=dstore.occurance_3;
 by group;
 tables response / binomial nocum norow;
 exact binomial;
 weight wgt/zeroes;    
run;
""")

In [67]:
HTML(query['LST'])

response,Frequency,Percent
1,4,40.0
2,6,60.0

Binomial Proportion,Binomial Proportion
response = 1,response = 1.1
Proportion (P),0.4
ASE,0.1549
95% Lower Conf Limit,0.0964
95% Upper Conf Limit,0.7036
,
Exact Conf Limits,
95% Lower Conf Limit,0.1216
95% Upper Conf Limit,0.7376

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.1581
Z,-0.6325
One-sided Pr < Z,0.2635
Two-sided Pr > |Z|,0.5271
,
Exact Test,
One-sided Pr <= P,0.377
Two-sided = 2 * One-sided,0.7539

response,Frequency,Percent
1,0,0.0
2,6,100.0

Binomial Proportion,Binomial Proportion
response = 1,response = 1.1
Proportion (P),0.0
ASE,0.0
95% Lower Conf Limit,0.0
95% Upper Conf Limit,0.0
,
Exact Conf Limits,
95% Lower Conf Limit,0.0
95% Upper Conf Limit,0.4593

Test of H0: Proportion = 0.5,Test of H0: Proportion = 0.5.1
ASE under H0,0.2041
Z,-2.4495
One-sided Pr < Z,0.0072
Two-sided Pr > |Z|,0.0143
,
Exact Test,
One-sided Pr <= P,0.0156
Two-sided = 2 * One-sided,0.0313


In [13]:
sas_session.disconnect()

'Succesfully disconnected. Be sure to have a valid network connection before submitting anything else.'