# Main command:

1. libname yourdatasetname "/path/yourpath" access=readonly;
2. data newdataname; set yourdatasetname.selectedsubdatasetname;
3. keep columnname1, columnname2;
4. label columnname="columntablename";
5. proc sort; by uniqueidentifier;
6. proc contents data=newdataname; run;
7. proc print data=newdataname; run;
8. proc freq; tables columnname1 columname2;
9. run;

# Explortory data analysis:
1. organizing and summarizing raw data
2. looking for important features and patterns
3. looking for any striking deviations from those patterns
4. interpreting your findings in the context of the problem or research question

## Looking at one variable at a time (Univariate/Descriptive Analysis):

### By distribution of a variable:
1. what values the variable takes
2. how often the variable takes those values (count and frequency)

# SAS Programming
## Not case sensitive

## First step: Call in or Point to the dataset that you will be working with:

##### libname mydata "/courses/d1406ae5ba27fe300 " access=readonly;
1. libname ## access to the dataset
2. mydata ## a label chosen as the name that the dataset will be given when it's stored within the SAS library
3. "/courses/d1406ae5ba27fe300" ## provides the location of the SAS cloud where the data is being stored
4. access=readonly ## raw data cannot be modified
5. ; ## every SAS statement ends with a semicolon (command terminator)

##### comment:
/star this is a comment star/

## Second step: data step reads in the specific dataset and prepares it for use
##### data new; set mydata.addhealth_pds;

1. data ## following the libname statement, indicates the beginning of a data step in which a temporary dataset is created
2. set ## tell SAS which data file is set for analysis

## Third step: end the data step, many options
### Example: label statement
variable name followed by an equal sign and then new descriptive variable name within quotation marks
##### LABEL TAB12MDX="Tobacco Dependence Past 12 Months" 
            CHECK321="Smoked Cigarettes in Past 12 Months"
            S3AQ3B1="Usual Smoking Frequency"
            S3AQ3C1="Usual Smoking Quantity";


### Example: end with a PROC statement, a procedure statement
Indicates to SAS you wanna perform some manipulation of the data

##### proc sort; by aid;
Ask SAS to sort the data according to one or more variables, normally sort by the unique identifier

##### proc freq; tables H1GH2 H1DA1 H1DA2 H1DA3 H1DA4 H1DA5 H1DA6 H1DA7 H1ED1 H1ED2;

## Forth step: run the program:
##### run;
execute all the previously entered SAS statements in the program


## SAS programs are made up of two steps:
1. data steps:
you writing code giving SAS instructions on how to manage and manipulate your data
2. proc steps:
Enable you to analyze and present your data, proc steps are pre-written procedures, so the code in the proc step is not giving SAS instructions to execute like the code you write in a data step does.
So you sre just write code controlling how these prewritten procedures run.

# Subset the raw data
## Add LOGIC statements: tell your program to include only those observations that will help answer your research question
![image.png](attachment:image.png)

**if CHECK321=1;**

**if AGE le 25;**

# Example SAS code:

In [None]:
## access the data from cloud data library
libname mylib "/courses/d1406ae5ba27fe300" access=readonly;

## read the specific dataset
data new; set mylib.addhealth_pds;

## add label to variables
label H1GH2="how often have you had a headache" 
    H1DA1="times walk around the house" 
    H1DA2="times did you do your hobbies" 
    H1DA4="times did you go roller-blading/bycycling and so on"
    H1DA5="time did you play an active sport" 
    H1DA6="time did you do exercise" 
    H1DA7="times did you hang out with friends" 
    H1ED1="times absent from school with an excuse" 
    H1ED2="times skipped school without an excuse"

## subset the data 
## no necessary to do subset

## sort data
proc sort; by aid;
proc freq; tables H1GH2 H1DA1 H1DA2 H1DA4 H1DA5 H1DA6 H1DA7 H1ED1 H1ED2;
run;


In [2]:
import pandas as pd
data = pd.read_csv("../data/addhealth.csv")

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
list(data.keys())

['AID',
 'IMONTH',
 'IDAY',
 'IYEAR',
 'SCID',
 'SSCID',
 'COMMID',
 'MACNO',
 'INTID',
 'SCH_YR',
 'BIO_SEX',
 'VERSION',
 'CORE1',
 'CORE2',
 'DISABLE',
 'HIEDBLK',
 'CUBAN',
 'PRICAN',
 'CHINESE',
 'H1GI1M',
 'H1GI1Y',
 'H1GI2',
 'H1GI3',
 'H1GI4',
 'H1GI5A',
 'H1GI5B',
 'H1GI5C',
 'H1GI5D',
 'H1GI5E',
 'H1GI5F',
 'H1GI6A',
 'H1GI6B',
 'H1GI6C',
 'H1GI6D',
 'H1GI6E',
 'H1GI7A',
 'H1GI7B',
 'H1GI7C',
 'H1GI7D',
 'H1GI7E',
 'H1GI7F',
 'H1GI7G',
 'H1GI8',
 'H1GI9',
 'H1GI10',
 'H1GI11',
 'H1GI12',
 'H1GI13M',
 'H1GI13Y',
 'H1GI14',
 'H1GI15',
 'H1GI16M',
 'H1GI16Y',
 'H1GI18',
 'H1GI19',
 'H1GI20',
 'H1GI21',
 'H1DA1',
 'H1DA2',
 'H1DA3',
 'H1DA4',
 'H1DA5',
 'H1DA6',
 'H1DA7',
 'H1DA8',
 'H1DA9',
 'H1DA10',
 'H1DA11',
 'H1GH1',
 'H1GH1A',
 'H1GH2',
 'H1GH3',
 'H1GH4',
 'H1GH5',
 'H1GH6',
 'H1GH7',
 'H1GH8',
 'H1GH9',
 'H1GH10',
 'H1GH11',
 'H1GH12',
 'H1GH13',
 'H1GH14',
 'H1GH15',
 'H1GH16',
 'H1GH17',
 'H1GH18',
 'H1GH19',
 'H1GH20',
 'H1GH21',
 'H1GH22',
 'H1GH23A',
 'H1GH23B',
 'H

In [8]:
sub = data[['AID', 'IMONTH', 'IYEAR', 'H1GI1M', 'H1GI1Y']]

In [11]:
sub.head()

Unnamed: 0,AID,IMONTH,IYEAR,H1GI1M,H1GI1Y
0,57100270,6,95,10,77
1,57101310,5,95,11,76
2,57103171,6,95,10,79
3,57103869,7,95,1,77
4,57104553,7,95,6,76
