Skip to content
/ BFH Public

Data from Bowen, Fresard, and Hoberg (2021) and code to reproduce results

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



15 Commits

Repository files navigation

RETech is "Rapidly Evolving Technology"

This folder produces a replication of the analysis in Rapidly Evolving Technologies and Startup Exits (SSRN link) by Donald Bowen, Gerard Hoberg, and Laurent Fresard, which is forthcoming in Management Science. Please cite that study when using or referring to any data or code in this repository.

The "front door" to this repo is my website, which contains more background information on the measures.

⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐

Most visitors of this page just want a patent-level dataset with RETech.

If so, follow this link to download the latest data!

⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐ ⭐

Please see the paper for details on the construction of the samples and measures. Questions can be directed to Donald Bowen, and pointers to errors or omissions, and corrections and suggestions are welcome.

Replication, plus data on patents and startups

Replication requires three principle files:

  1. Stata code ( to reproduce all tables and figures in the paper.
    • This uses the two key datasets described next plus some less important datasets in the "auxilliary" data subfolder and code in the subroutines folder. Results are stored in the output folder. Click me to download everything you need to replicate the paper!
    • All estimation results are stored in one excel file (output/bfh-tables.xlsx).
  2. Patent-level data with patents applied for between 1930 and 2010 and granted by 2013 with many variables of interest, including a link to the startup.
    • This is not the raw data: All patent level variables are winsorized at the 1/99% level annually. The citation and KPSS variables are winsorized by grant year, and the remaining variables are winsorized by application year. If you are interested in raw data, please follow the big link above to the updated patent data files.
    • Because pat_lv.dta is 1.3GB, it's not stored here. You can download it by (A) Clicking this link or (B) Downloading this folder to your computer and running, which starts by downloading what you need.
  3. A startup-quarter panel (startup_qtr_panel.dta) for 1980-2010 with time-varying information on startups that receive at least one patent during the sample period. Please note that observations up through 2017 are in the dataset because our dependent variables were forward looking relative to our independent variables and available after 2010. This file is not included here, as it contains licensed data. Please email us if your institution has a license for VenturExpert, SDC, and Dealscan.

Using patent measures in your studies

  1. contains a Stata function to convert patent-level variables into group-time variables (e.g. firm-year, state-year, MSA-quarter). We include the stocking function from our paper, which gets the group's average patent stats over the prior five years, after applying a 20% rate of depreciation.
  2. A companion repository (Patent-Text-Variables (or here) is available containing EIGHT MORE YEARS of patent level RETech and Tech Breadth for patents granted through last year, and will be updated annually. The componanion repo also includes code to
    • Download all google patent pages
    • Parse the patent text in those webpages into (cleaned) "bags of words"
    • Construct textual variables at the patent-level from word bags
    • Convert patent level variables into group-time variables (e.g. firm-year, state-year, MSA-quarter)


Data from Bowen, Fresard, and Hoberg (2021) and code to reproduce results






No packages published