/
art-080-graduates.Rmd
324 lines (196 loc) · 8.71 KB
/
art-080-graduates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
---
title: "Graduates"
vignette: >
%\VignetteIndexEntry{Graduates}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
link-citations: yes
always_allow_html: true
---
```{r child = "../man/rmd/common-setup.Rmd"}
```
```{r}
#| include: false
knitr::opts_chunk$set(fig.path = "../man/figures/art-080-")
```
An undergraduate student who completes their program and earns their first degree is a *completer*. To be counted among their program's *graduates* however usually depends on whether they satisfy the criterion for *timely completion*. We derive a *completion status* variable to filter student-level records to obtain a bloc of graduates.
This vignette in the MIDFIELD workflow.
1. Planning
1. Initial processing
1. `r accent("Blocs")`
- Ever-enrolled
- FYE proxies
- Starters
- `r accent("Graduates")`
1. Groupings
1. Metrics
1. Displays
## Definitions
```{r child = "../man/rmd/define-bloc.Rmd"}
```
completers
: Bloc of students who complete their baccalaureate programs, earning their first degrees.
```{r child = "../man/rmd/define-timely-completion-criterion.Rmd"}
```
completion status
: A derived midfieldr variable indicating whether a student completes a degree, and if so, whether their completion was timely. Possible values are "timely", "late", and "NA". Late completers are often excluded from a count of "graduates."
```{r child = "../man/rmd/define-graduates.Rmd"}
```
## Method
A bloc of graduates (timely completers) can be determined independent of other blocs.
1. Filter source student-level records for data sufficiency and degree-seeking.
1. Determine completion status.
1. Filter graduates (timely completers).
1. Filter by degree program.
The next step might be to subset the graduates if necessary to meet the needs of the metric. For example, the graduation rate metric requires graduates to be a subset of starters in the same program. We postpone this step until treating the metrics.
```{r child = "../man/rmd/note-for-practice-only-2.Rmd"}
```
## Load data
*Start.* If you are writing your own script to follow along, we use these packages in this article:
```{r}
library(midfieldr)
library(midfielddata)
library(data.table)
```
*Load.* Practice datasets. View data dictionaries via `?student`, `?term`, `?degree`.
```{r}
# Load practice data
data(student, term, degree)
```
*Loads with midfieldr.* Prepared data. View data dictionaries via `?study_programs`, `?baseline_mcid`.
- `study_programs` (derived in [Programs](art-040-programs.html#assigning-program-names)).
- `baseline_mcid` (derived in [Blocs](art-050-blocs.html#initial-processing)).
## Initial processing
*Select (optional).* Reduce the number of columns. Code reproduced from [Getting started](art-000-getting-started.html#reusable-code).
```{r}
# Optional. Copy of source files with all variables
source_student <- copy(student)
source_term <- copy(term)
source_degree <- copy(degree)
# Optional. Select variables required by midfieldr functions
student <- select_required(source_student)
term <- select_required(source_term)
degree <- select_required(source_degree)
```
```{r child = "../man/rmd/note-baseline-mcid.Rmd"}
```
```{r}
# Working data frame
DT <- copy(baseline_mcid)
```
## `add_completion_status()`
Add columns to a data frame of student-level records that indicate whether a student completed a degree, and if so, whether their completion was timely.
The input data frame must have columns for ID and the timely completion term. We use `add_timely_term()` again, though alternatively we could have retained the `timely_term` variable from previous code chunks.
```{r}
# Timely term required before completion status
DT <- add_timely_term(DT, term)
# Drop unnecessary columns
DT[, c("term_i", "level_i", "adj_span") := NULL]
```
*Arguments.*
- **`dframe`** Data frame of student-level records keyed by student ID. Required variables are `mcid` and `timely_term`.
- **`midfield_degree`** Data frame of student-level degree observations keyed by student ID. Default is `degree`. Required variables are `mcid` and `term_degree`.
*Equivalent usage.* The following implementations yield identical results,
```{r}
# Required arguments in order and explicitly named
x <- add_completion_status(dframe = DT, midfield_degree = degree)
# Required arguments in order, but not named
y <- add_completion_status(DT, degree)
# Using the implicit default for the midfield_degree argument
z <- add_completion_status(DT)
# Demonstrate equivalence
same_content(x, y)
same_content(x, z)
```
*Output.* Adds the following columns to the data frame.
- **`term_degree`** Character. Term in which a program is completed. Encoded YYYYT. NA indicates non-completion.
- **`completion_status`** Character. Completion status: "timely", indicating degree completion no later than the timely completion term; "late", indicating completion after the timely completion term; and "NA" indicating non-completion.
```{r}
# Add completion status and supporting variables
DT <- add_completion_status(DT, degree)
DT
```
Similar to the details described in the data sufficiency vignette, `add_completion_status()` accepts [Alternate source names](art-020-data-sufficiency.html#alternate-source-names) and uses [Silent overwriting](art-020-data-sufficiency.html#silent-overwriting) when existing columns have the same name as one of the added columns.
### Closer look
Examining the records of selected students in detail.
*Example 1.* The student has a degree term, Spring 1997 (encoded `19963`) indicating successful completion. Their degree term does not exceed their timely completion term, Spring 1998 (encoded `19973`), so their completion status is "timely".
```{r}
# Display one student by ID
DT[mcid == "MID25783162"]
```
*Example 2.* This student too has a degree term, Spring 2017 (encoded `20163`) indicating successful completion. Their degree term exceeds their timely completion term, Spring 2016 (encoded `20153`), so their completion status is "late".
```{r}
# Display one student by ID
DT[mcid == "MID26696871"]
```
*Example 3.* The student's degree term is NA, indicating they did not complete their program. Thus completion status is NA as well.
```{r}
# Display one student by ID
DT[mcid == "MID26697615"]
```
## Graduates
Defining "graduates" as timely completers, we filter to retain records for which completion status is "timely". This data set comprises all timely graduates in the practice data.
```{r}
# Graduates
DT <- DT[completion_status == "timely"]
# Filter for unique IDs
DT <- DT[, .(mcid)]
DT <- unique(DT)
DT
```
## Filter by program
In this case, "filter by program" means to filter by *degree program.* Hence we join variables from `degree` to obtain the CIP codes of the graduates.
*Add variables.* Left join with selected variables from `degree`.
```{r}
# Add degree CIP codes and terms
cols_we_want <- degree[, .(mcid, term_degree, cip6)]
DT <- cols_we_want[DT, on = c("mcid")]
DT
```
If our IDs are all graduates, there should be no NA values in the CIP column after joining. We check as follows, where the result is the integer number of rows with an incomplete case, i.e., an NA in either column. The result is zero.
```{r}
# Verify no NAs in CIP column
sum(!complete.cases(DT))
```
*Filter.* Join program labels and filter to retain the desired programs.
```{r}
# Filter by program
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT
```
*Filter.* Filter to retain observations in the first degree term only. Multiple degrees are allowed but only if they occur in the first degree term.
```{r}
# Filter by first degree term
DT <- DT[, .SD[which.min(term_degree)], by = "mcid"]
DT
```
*Filter.* Drop unnecessary variables and filter for unique observations of ID and degree program label.
```{r}
DT[, c("cip6", "term_degree") := NULL]
DT <- unique(DT)
DT
```
## Reusable code
*Preparation.* The data frame of baseline IDs is the intake for this section.
```{r}
DT <- copy(baseline_mcid)
```
*Graduates.* A summary code chunk for ready reference.
```{r}
# Gather graduates, degree CIPs and terms
DT <- add_timely_term(DT, term)
DT <- add_completion_status(DT, degree)
DT <- DT[completion_status == "timely"]
DT <- degree[DT, .(mcid, term_degree, cip6), on = c("mcid")]
# Filter by programs and first degree terms
DT <- study_programs[DT, on = c("cip6"), nomatch = NULL]
DT <- DT[, .SD[which.min(term_degree)], by = "mcid"]
DT[, c("cip6", "term_degree") := NULL]
DT <- unique(DT)
```
## References
<div id="refs"></div>
```{r child = "../man/rmd/common-closing.Rmd"}
```