Dictionary: Stata to R

Adam Ross Nelson edited this page Aug 29, 2017 · 17 revisions

The following table provides some quick translations of Stata commands into R. Since R supports multiple data sets, we need to specify a specific data set to manipulate when using data accessing/modifying commands. We use mydata as the default data set to target.

Stata R Description
cls cat("\014") -OR- cat(rep("\n",50)) Clears Stata output / R console
clear all rm(list=ls()) Clears data, value labels, etc from memory
insheet using "foo.csv", comma names mydata <- read.csv("foo.csv", header=TRUE) Read csv file
cd "mydirectory" setwd("mydirectory") Change working directories
pwd getwd() Display the working directory
reg y x1 x2 summary(lm(y~x1+x2, data=mydata)) Ordinary least squares with constant
reg y x1 x2, nocon summary(lm(y~x1+x2-1, data=mydata)) Ordinary least squares without constant
if (x==y) {...} if (x==y) {...} Initial line condition use to evaluate whether a command(s) should be exectuted
reg y x if (x>0) lm(y~x, data=subset(mydata,x>0)) Select a conditional subset of data
forvalues i=1/100 {...} for (i in 1:100) {...} Loop through integer values of i from 1 to 100
foreach i in "a" "b" "c" {...} for (i in c("a","b","c")) {...} Loop through a list of items
di "Hello World" print("Hello World") Prints "hello world" on screen
do "mydofile.do" source("myRscript.R") Call and run code file
use "mydata.dta", clear load("mydata.Rdata") Load saved workspace/data
save "mydata.dta", replace save.image("mydata.Rdata") Save current workspace/data
di 2345^2 2345^2 Calculate 2345 squared
logit y x summary(glm(y~x,data=mydata,family="binomial")) Perform logit maximum likelihood estimation
probit y x summary(glm(y~x,data=mydata,family=binomial(link = "probit"))) Perform probit maximum likelihood estimation
sort x y mydata[order(mydata$x, mydata$y),] Sort the data frame by variable x
cor x y cor(x,y) Produce a table of correlates between x and y
help command 1. ?command
2. help(command)
Load the help file on a command
edit edit(mydata) Open data editor window (not recommended)
summarize summary(mydata) Provide summary values for data
table x y table(mydata$x,mydata$y) # 1.
ftable(y~x,data=mydata) # 2.
Two way table
hist x hist(mydata$x) Histogram of variable x
scatter x y plot x y Scatter plot of x on y
list mydata Print to screen all of the values of the data frame
list in 1/5 1. head(mydata)
2. mydata[1:5,]
Print to screen first 5 rows of data
generate x2=x^2 mydata$x2 <- mydata$x^2 Create a new variable x2 which is the square of x
replace x=y1+y2 1. mydata$x <- mydata$y1 + mydata$y2
2. mydata$x <- with(mydata, y1 + y2)
Change the x value of data to be equal to y1+y2
for i=1/10 {
di `i'
for (i in 1:10) print(i) Print count from 1 to 10
replace x=0 if x<0 mydata$x[mydata$x<0] <- 0 Replace all values of x less than 0 with zero
drop if x>100 mydata <- subset(mydata,!x>100) Drop observations with x greater than 100
keep if x<100 mydata <- subset(mydata,x<100) Keep observations with x less than 100
drop x mydata$x <- NULL Drop variable x from the data
keep x mydata <- mydata$x Keep only x in the data
append using "mydata2.dta" mydata <- rbind(mydata, mydata2) Append mydata2 to mydata
merge 1:1 index using "mydata2.dta" merge(mydata,mydata2,index) Merge two data sets together by index variable(s)
set obs 1000
gen x=rnormal()
mydata$x <- rnorm(1000) Generate 1000 random normal draws
set obs 1000
gen x=runiform()
mydata$x <- runif(1000) Generate 1000 random uniform draws
set obs 1000
gen x=rbinomial(10,.1)
mydata$x <- rbinom(1000, 10, .1) Generate 1000 random binomial (10,.1) draws
count nrow(mydata) Count the number of observations in the data
foreach v of varlist * {
rename `v' `v'old
names(mydata) <- paste0(names(mydata),"old") Rename all of the variables in the data ...old
rename oldvar newvar colnames(dataframe)[colnames(dataframe)=="oldvar"] <- "newvar" Rename variable.
set obs 100
gen x=rnormal(100)
gen y=x*2 + rnormal(100)*5
mydata<-data.frame(x=x<-rnorm(100), y=x*2 + rnorm(100)*5) Simulate a new data set with y dependent upon x
egen id = group(x y) 1. within(mydata, {ID <- ave(ID, list(x, y), FUN=seq_along)})
2. mydata$ID <- with(mydata, ave(ID, list(x, y), FUN=seq_along))
3. mydata$ID <- ave(ID, list(mydata$x, mydata$y), FUN=seq_along)
Create an identifier ID from variables x and y

Thanks to Sebastian Kranz I have been made aware of a document RStata.pdf by Oscar Torres-Reyna which provides a similar translation.

Of course it is also worth considering purchasing Bob Muenchen's 542 page book "R for Stata Users"

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.