-
Notifications
You must be signed in to change notification settings - Fork 21
SGP Data Preparation
This wiki page provides SGP data formatting/preparation instructions for running SGP analyses. To help illustrate these formatting specifications there are exemplar data sets, sgpData and sgpData_LONG, embedded within the SGPdata Package. Ensuring your data is set up in the proper format will minimize later problems often encountered in running SGP analyses.
There a two formats for representing longitudinal (time dependent) student assessment data: WIDE and LONG format. For WIDE format data, each case/row represents a unique student and columns represent variables associated with the student at different times. For LONG format data, time dependent data for the student is spread out across multiple rows in the data set. The SGPdata Package, installed when one installs the SGP Package, includes exemplar WIDE and LONG data sets (sgpData and sgpData_LONG, respectively) to assist in setting up your data.
In general, the lower level functions in the SGP package that do the calculations, studentGrowthPercentiles
and studentGrowthProjections
require WIDE formatted
data whereas the higher level functions that are built toward operational analyses require LONG data. If running anything but the basic analyses, we recommend setting
up your data in the LONG format as much of the capability of the package is built around the user supplying data in that way.
For our purposes, this means that each row represents a unique student by content area by year combination. Thus, in the final long file, each student, by content area by year identifier must be unique.
By contrast, in wide formatted data a row represents a unique student and contains all available information for that student. For example, here are the first four rows (and only the first 7 columns) of the sample data:
> library(SGPdata)
> sgpData_LONG[1:4,1:7]
ID LAST_NAME FIRST_NAME CONTENT_AREA YEAR GRADE SCALE_SCORE
1 1000372 Daniels Corey MATHEMATICS 2011_2012 3 435
2 1000372 Daniels Corey MATHEMATICS 2012_2013 4 461
3 1000372 Daniels Corey MATHEMATICS 2013_2014 5 444
4 1000372 Daniels Corey READING 2011_2012 3 523
Notice that the same student is in each row, but that the rows represent different year and content area combinations. This is what is meant by long formatted data.
The following table gives the variables that are required for the calculation of Student Growth Percentiles and how they should be formatted (if applicable).
-
ID
This column contains the unique student identifiers. This variable is of class character. -
CONTENT_AREA
This column describes the content area for a given row. Most data sets would presumably contain MATHEMATICS and READING, but other values are possible. These values must be capitalized. If analyses utilize embedded meta-data contained in SGPstateData, then these names must match the states’ assessment information contained in the SGPstateData object that is embedded within the SGP Package. Please contact @dbetebenner to have meta-data added to this object. -
YEAR
This column gives either the academic year (e.g., 2011_2012 as in the sample data) or the year in which the assessment took place (e.g., 2011). This variable is of class character. -
GRADE
The grade in which the assessment was administered. The column of this class should be set to character. -
SCALE_SCORE
The assessment scale score for each observation. This column’s class should be set to integer or numeric. -
VALID_CASE
This column identifies those students who should be included in subsequent analyses (value set to VALID_CASE) and those that should not be included (value set to INVALID_CASE. Duplicate cases are often left in the data and flagged as an INVALID_CASE. If your data contains all valid cases, then this variable can be set to all VALID_CASE for all cases.
Although these variables are not required for Student Growth Percentile analyses, they are required for Student Growth Projection (i.e., Growth to Standard analyses), and/or the visualization and reporting functionality:
-
ACHIEVEMENT_LEVEL
The achievement or proficiency category associated with each observed scale score. Values in this column should match the assessment program information included in the SGPstateData object. -
FIRST_NAME
Student first name. A character or a factor. (Only required for individual student reports) -
LAST_NAME
Student last name. A character or a factor. (Only required for individual student reports) -
SCHOOL_NUMBER
Unique identifier for the school/institution in which a student is enrolled for the given year and content area. Either an integer or character. (Only required for aggregations and bubble plots) -
SCHOOL_NAME
Name of the school/institution in which a student is enrolled in a given year. Either a factor or character. ((Only required for aggregations and bubble plots)) -
DISTRICT_NUMBER
A unique identifier for the district/educational authority in which a student is enrolled in a given year. Either an integer or factor. (Only required for aggregations and bubble plots) -
DISTRICT_NAME
District/educational authority name in which a student is enrolled in a given year. Either a factor or character. (Only required for aggregations and bubble plots) -
STATE_ENROLLMENT_STATUS
Binary indicator of whether the student was continuously enrolled in the state and should be included in summary statistics. Indicator must be afactor
, preferably with informative labels such as those in ;Enrolled State: Yes
andEnrolled State: No
. (Only required for aggregations and bubble plots) -
DISTRICT_ENROLLMENT_STATUS
Binary indicator of whether the student was continuously enrolled and should be included in district summary statistics. Indicator must be afactor
, preferably with informative labels such as those in ;Enrolled District: Yes
andEnrolled District: No
. (Only required for aggregations and bubble plots) -
SCHOOL_ENROLLMENT_STATUS
Binary indicator of whether the student was continuously enrolled and should be included in school summary statistics. Indicator must be afactor
, preferably with informative labels such as those in ;Enrolled School: Yes
andEnrolled School: No
. (Only required for aggregations and bubble plots) -
ETHNICITY
Ethnicity and other demographic variables if summarization by those groups is desired via summarizeSGP. (Only required for aggregations and bubble plots)
SGP - Student Growth Percentiles SGP Blog | SGP GitHub Repo | SGP on CRAN