Skip to content
This repository has been archived by the owner on Jan 25, 2024. It is now read-only.
/ readreplace Public archive

Stata program to make replacements specified in an external dataset

License

Notifications You must be signed in to change notification settings

PovertyAction/readreplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Important Notice: This repository is no longer under active development. We have developed a new version of the program called ipareadreplace. The new program is part of the ipacheck package and is housed here. Thank you for your support and understanding.

readreplace

readreplace modifies the dataset currently in memory by making replacements that are specified in an external dataset, the replacements file.

The list of differences saved by the SSC program cfout is designed for later use by readreplace. After the addition of a new variable to the cfout differences file that holds the new (correct) values, the file can be used as the readreplace replacements file.

readreplace is available through SSC: type ssc install cfout in Stata to install.

Certification script

The certification script of readreplace is cscript/readreplace.do. If you are new to certification scripts, you may find this Stata Journal article helpful. See this guide for more on readreplace testing.

Stata help file

Converted automatically from SMCL:

log html readreplace.sthlp readreplace.md

The help file looks best when viewed in Stata as SMCL.

Title

readreplace -- Make replacements that are specified in an external dataset

Syntax

readreplace using filename, id(varlist) variable(varname) value( varname) [options]

options Description ------------------------------------------------------------------------- Main * id(varlist) variables for matching observations with the replacements specified in the using dataset * variable(varname) variable in the using dataset that indicates the variables to replace * value(varname) variable in the using dataset that stores the new values

Import insheet use insheet to import filename; the default use use use to load filename excel use import excel to import filename import(options) options to specify to the import command ------------------------------------------------------------------------- * id(), variable(), and value() are required.

Description

readreplace modifies the dataset currently in memory by making replacements that are specified in an external dataset, the replacements file.

The list of differences saved by the SSC program cfout is designed for later use by readreplace. After the addition of a new variable to the cfout differences file that holds the new (correct) values, the file can be used as the readreplace replacements file.

Remarks

readreplace changes the contents of existing variables by making replacements that are specified in a separate dataset, the replacements file. The replacements file should be long by replacement such that each observation is a replacement to complete. Replacements are described by a variable that contains the name of the variable to change, specified to option variable(), and a variable that stores the new value for the variable, specified to option value(). The replacements file should also hold variables shared by the dataset in memory that indicate the subset of the data for which each change is intended; these are specified to option id(), and are used to match observations in memory to their replacements in the replacements file.

Below, an example replacements file is shown with three variables: uniqueid, to be specified to id(), Question, to be specified to variable(), and CorrectValue, to be specified to value().

+--------------------------------------+ | uniqueid Question CorrectValue | |--------------------------------------| | 105 district 13 | | 125 age 2 | | 138 gender 1 | | 199 district 34 | | 2 am_failure 3 | +--------------------------------------+

For each observation of the replacements file, readreplace essentially runs the following replace command:

replace Question_value = CorrectValue_value if uniqueid == uniqueid_value

That is, the effect of readreplace here is the same as these five replace commands:

replace district = 13 if uniqueid == 105 replace age = 2 if uniqueid == 125 replace gender = 1 if uniqueid == 138 replace district = 34 if uniqueid == 199 replace am_failure = 3 if uniqueid == 2

The variable specified to value() may be numeric or string; either is accepted.

The replacements file may be one of the following formats:

o Comma-separated data. This is the default format, but you may specify option insheet; either way, readreplace will use insheet to import the replacements file. You can also specify any options for insheet to option import(). o Stata dataset. Specify option use to readreplace, passing any options for use to import(). o Excel file. Specify option excel to readreplace, passing any options for import excel to import().

readreplace may be employed for a variety of purposes, but it was designed to be used as part of a data entry process in which data is entered two times for accuracy. After the second entry, the two separate entry datasets need to be reconciled. cfout can compare the first and second entries, saving the list of differences in a format that is useful for data entry teams. Data entry operators can then add a new variable to the differences file for the correct value. Once this variable has been entered, load either of the two entry datasets, then run readreplace with the new replacements file.

The GitHub repository for readreplace is here. Previous versions may be found there: see the tags.

Remarks for promoting storage types

readreplace will change variables' storage types in much the same way as replace, promoting storage types according to these rules:

1. Storage types are only promoted; they are never compressed. 2. The storage type of float variables is never changed. 3. If a variable of integer type (byte, int, or long) is replaced with a noninteger value, its storage type is changed to float or double according to the current set type setting. 4. If a variable of integer type is replaced with an integer value that is too large or too small for its current storage type, it is promoted to a longer type (int, long, or double). 5. When needed, str# variables are promoted to a longer str# type or to strL.

Examples

Make the changes specified in correctedValues.csv . use firstEntry . readreplace using correctedValues.csv, id(uniqueid) variable(question) value(correctvalue)

Same as the previous readreplace command, but specifies option case to insheet to import the replacements file . use firstEntry . readreplace using correctedValues.csv, id(uniqueid) variable(Question) value(CorrectValue) import(case)

Same as the previous readreplace command, but loads the replacements file as a Stata dataset . use firstEntry . readreplace using correctedValues.dta, id(uniqueid) variable(Question) value(CorrectValue) use

Stored results

readreplace stores the following in r():

Scalars r(N) number of real changes

Macros r(varlist) variables replaced

Matrices r(changes) number of real changes by variable

Authors

Ryan Knight Matthew White

For questions or suggestions, submit a GitHub issue or e-mail researchsupport@poverty-action.org.

Also see

Help: [D] generate

User-written: cfout, bcstats, mergeall

About

Stata program to make replacements specified in an external dataset

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages