# Accountability Cohort Information
_David Kay, dkay@mdek12.org_

---

### Purpose:
create datafile for Accountability Cohort 2011-2014

### Environment:
* OSX 10.11.3
  * 'R' Studio 0.99.892
* 'R' Stats software
  * 'dplyr/stringr' libraries for data munging
  * 'readr' library for advanced data file importing (speed/column class assignment)
  * 'lubridate' library for date/time analysis
  * 'foreign' library for binary file import/export
* Input Data
  * exported by MONTH/YEAR by pl_sql in pipe delim "|" format
```
SY2011-2012/Month1_SY2011-12.txt
SY2012-2013/Month1_SY2012-13.txt
```

## Functions
##### calcPeerGrade.R
```R
calcPeerGrade <- function(bDate,curYear) {
        require(lubridate)                                     #load lubridate library
        curYearStart <- as.Date(paste(curYear,"09","01",sep = "-"),format = "%Y-%m-%d") #reformat date
        currentAge <- as.period(interval(bDate,curYearStart),  # calculate days
                      unit = "days")
        temp <- currentAge %/% 365.25                          #calculate IAW standards
        pGrade <- temp - 5
        pGrade                                                 #return peer grade
}
```
##### readFunc.R
```R
readFunc<-function(filename){
        temp <- read_delim(filename,                       # take filename from list and read file
                     delim="|",
                     progress = interactive(),             # makes it pretty
                     quote = "",               
                     na = "",                              # don't exit if you find this
                     col_names = F,                        # don't import column names
                     col_types = “iiciiccccccicccccccccc”) # assign classes to column vectors
        temp$fileSource <- unlist(strsplit(filename,split = "/"))[2]   # write filename to new column
        temp                                               # return dataframe
}
```

## Psuedo Code for Main File Load (17.7 million observations)

1. Recursively get list of files
2. read file into dataframe
3. for all subsequent files, repeat above and append dataframe to original dataframe
4. write dataframe to binary file

## Psuedo Code for Data Munging
1. Get current year and convert date to 01SEP of that year for pGrade calculation
2. Get student birth date
3. Perform pGrade calculation for grade codes 56,58,78
```
Current Year Start Date (minus) student birth date (equals) interval in days
(Interval in days (divided by) 365.25) (minus) 5
```
4. Search and replace for peer grade codes 52,54,62,64
  1. 52,62 replace pGrade with -1
  2. 54,64 replace pGrade with 0

## Load Main File and Write Binary

_%>% operator is a 'dplyr' function to pass output from left side to right side...keeps code clean_
```R
dir(path="_INSERT_FILE_DIRECTORY_HERE/",pattern='txt$', recursive=T,full.names=T) %>% # list of files
    lapply(FUN=readfunc) %>%# create dataframe by calling readFunc on files
    rbind_all %>% # append all dataframes
    write_rds("INSERT_FILENAME_HERE.rds") # write binary files
```

## Calculate Peer Grade

```R
dataFrame %>%
    calcPeerGrade(stBdayColumn,pGradeColumn)
```

## Output
- 20160315_fullAcctData.rds          #Full concatenated acct data in R binary format
- 20160315_fullAcctData.txt          #Full concatenated acct data in \P delim format
- 20160315_sampleAcctData_10k.rds    #Random sample (n=10,000) in R binary format
- 20160315_sampleAcctData_10k.txt    #Random sample (n=10,000) in \P delim format
