**Alexandre HUANG  
Kevin KAMALANATHAN**

<div align="center"><span style="font-family:Lucida Caligraphy;font-size:32px;color:#FA8072">Examples with DATA & PROC steps</span></div><br><br>

# *What is this notebook for?*

In this notebook you will find various examples of translations of the DATA and PROC steps.
For the moment, the python function "TRANSLATOR" that we have created translates the following statements in the DATA step: DROP, KEEP, WHERE and RENAME but also the SAS SUBSTR function and it also translates two procedures: PROC MEANS and the PROC FREQ.

# *Import of functions from the Analyzer program allowing the translation of SAS codes*

In [1]:
import sys

#Put your path in order to access the python programs Functions_analyzer
sys.path.append("I_Love_SAS_in_Python/Analyzer.py")

from Analyzer import translator

# *Examples with DATA step*

 - **DATA SET only**

In [2]:
print (translator("""
                  DATA table; set NBA;
                  RUN;"""))

table=NBA.copy()



 - **Examples with DROP statement**

In [3]:
print(translator("""
                 DATA table; set NBA;
                 drop COLLEGE NETRTG;
                 RUN;"""))

table=NBA.copy()
table=table.drop(columns=['COLLEGE','NETRTG'])



In [4]:
print(translator("""
                 DATA NBA; set NBA;
                 drop COUNTRY;
                 RUN;"""))

NBA=NBA.drop(columns=['COUNTRY'])



 - **Examples with KEEP statement**

In [5]:
print(translator("""
                 DATA table; set NBA;
                 keep PLAYER TEAM COUNTRY PTS REB;
                 RUN;"""))

table=NBA.copy()
table=table[['PLAYER','TEAM','COUNTRY','PTS','REB']]



In [6]:
print(translator("""
                 DATA NBA; set NBA;
                 keep PLAYER TEAM COUNTRY PTS REB;
                 RUN;"""))

NBA=NBA[['PLAYER','TEAM','COUNTRY','PTS','REB']]



 - **Examples with WHERE statement**

In [7]:
print(translator("""
                 DATA table; set NBA;
                 where age > 20;
                 RUN;"""))

table=NBA.copy()
table=table[table['age']>20]



In [8]:
print(translator("""
                 DATA table; set NBA;
                 where TEAM in ('CLE', 'GSW', 'OKC');
                 RUN;"""))

table=NBA.copy()
table=table[table['TEAM'].isin('CLE','GSW','OKC')]



In [9]:
print(translator("""
                 DATA table; set NBA;
                 where (TEAM eq "OKC" and COUNTRY = 'USA') or TEAM = "OKC";
                 RUN;"""))                

table=NBA.copy()
table=table[((table['TEAM']=="OKC") & (table['COUNTRY']=='USA')) | (table['TEAM']=="OKC")]



In [10]:
print(translator("""
                 DATA table; set NBA;
                 WHERE PTS ge 20 or GP le 50;
                 RUN;"""))    

table=NBA.copy()
table=table[(table['PTS']>=20) | (table['GP']<=50)]



 - **Examples with RENAME statement**

In [11]:
print(translator("""
                 DATA table; set NBA;
                 rename COUNTRY=PAYS PTS=POINTS GP=MATCH_JOUE;
                 RUN;
                 """))

table=NBA.copy()
table=table.rename(columns={'COUNTRY':'PAYS','PTS':'POINTS','GP':'MATCH_JOUE'})



 - **Examples with SUBSTR function**

In [12]:
print(translator("""
                 DATA table;set NBA;
                 new_name = substr(PLAYER,6);
                 RUN;
                 """))

table=NBA.copy()
table['new_name']=table['PLAYER'].str[5:]



In [13]:
print(translator("""
                 DATA NBA;set NBA;
                 new_name = substr(PLAYER,1,5);
                 RUN;
                 """))

NBA['new_name']=NBA['PLAYER'].str[:5]



In [14]:
print(translator("""
                 DATA NBA;set NBA;
                 new_name = substr(PLAYER,3,8);
                 RUN;
                 """))

NBA['new_name']=NBA['PLAYER'].str[2:8]



- **Examples with LOWCASE, UPCASE, INPUT and PUT functions**

In [15]:
print(translator("""
                 DATA NBA;set NBA;
                 new_name = lowcase(PLAYER);
                 RUN;
                 """))

NBA['new_name']=NBA['PLAYER'].str.lower()



In [16]:
print(translator("""
                 DATA NBA;set NBA;
                 new_name = upcase(PLAYER);
                 RUN;
                 """))

NBA['new_name']=NBA['PLAYER'].str.upper()



In [17]:
print(translator("""
                 DATA NBA;set NBA;
                 POINTS = input(PTS,5.);
                 RUN;
                 """))

NBA['POINTS']=NBA['PTS'].astype(int)



In [18]:
print(translator("""
                 DATA NBA;set NBA;
                 MATCH_JOUE = put(GP,5.);
                 RUN;
                 """))

NBA['MATCH_JOUE']=NBA['GP'].astype(str)



## *You can also combine the different statements*

In [19]:
print(translator("""
                DATA table; set NBA ;
                rename COUNTRY=PAYS;
                where TEAM="OKC" or TEAM="CLE";
                JOUEUR = substr(PLAYER,6);
                keep JOUEUR TEAM;                
                RUN;
                """))

table=NBA.copy()
table=table.rename(columns={'COUNTRY':'PAYS'})
table=table[(table['TEAM']=="OKC") | (table['TEAM']=="CLE")]
table['JOUEUR']=table['PLAYER'].str[5:]
table=table[['JOUEUR','TEAM']]



# *Examples with PROC step*

 - **Examples with PROC FREQ**

In [20]:
print(translator("""
                PROC FREQ  data = NBA ;
                table PTS;
                RUN;
                """))

datax =NBA['PTS'].value_counts(dropna = False)
datay = pandas.DataFrame({'PTS': datax.index, 'Frequency': datax.values,'Percent': ((datax.values/datax.values.sum())*100).round(2),'Cumulative Frequency': datax.values.cumsum(),'Cumulative Percent': ((datax.values.cumsum()/datax.values.sum())*100).round(2)})



> Contingency table

In [21]:
print(translator("""
                PROC FREQ  data = NBA ;
                table PTS*TEAM;
                RUN;
                """))


def frequency(ds, vars):
    if len(vars) > 1:
        c1 = ds[vars[0]]
        c2 = []
        for i in range(1,len(vars)):
            c2.append(ds[vars[i]])
        dfs = []
        dfs.append(pd.crosstab(c1,c2).unstack().reset_index().rename(columns={0:'Count'}))
        dfs.append(pd.crosstab(c1,c2, normalize='all').unstack().reset_index().rename(columns={0:'Percent'}))
        dfs.append(pd.crosstab(c1,c2, normalize='columns').unstack().reset_index().rename(columns={0:'Column Percent'}))
        dfs.append(pd.crosstab(c1,c2, normalize='index').unstack().reset_index().rename(columns={0:'Row Percent'}))
        dfs = [df.set_index(vars) for df in dfs]
        df = dfs[0].join(dfs[1:]).reset_index()
        return df
            
frequency(NBA,['PTS','TEAM'])



 - **Examples with  PROC MEANS**

In [22]:
print(translator("""
                 PROC MEANS  data = NBA  ;
                 var PTS REB ;
                 RUN;
                 """))

NBA[['PTS','REB']].describe()



In [23]:
print(translator("""
                PROC MEANS  data = NBA  ;
                class TEAM;
                var PTS REB ;
                RUN;
                """))

NBA.groupby('TEAM')[['PTS','REB']].describe()



> PROC MEANS with nmiss option

In [24]:
print(translator("""
                PROC MEANS  data = NBA  nmiss;
                class TEAM;
                var PTS REB ;
                RUN;
                 """))

NBA.groupby('TEAM')[['PTS','REB']].describe()
nmiss=NBA[['PTS','REB']].isnull().sum()
print(nmiss)

