Skip to content

npdoty/rfc-analysis

Repository files navigation

Analyzing RFCs and I-Ds

This project is developing code for the automated analysis of the text of Requests for Comment (RFCs) published by the Internet Engineering Task Force, as part of a larger research project studying privacy in technical standard-setting.

For more information, if you want to use these tools or collaborate on their development, please contact Nick Doty.

Some basic graphs produced with this code are available online.

Usage

Scripts are not fully parameterized or user friendly. Current usage pattern:

  • clone the repository
  • download all RFCs (see "Getting the documents" below) as .txt into a RFC-all directory within the main directory of the repository
  • configure by copying config.ini.example to config.ini and pointing it to your downloaded RFCs
  • python search.py --rfc will create a file rfc-search.json with section titles and lengths and word search counts for every available RFC

Other functionality:

  • search.py can do basic string matching against all RFCs (or similar code for all W3C TRs)
  • search.py --id does the same parsing for Internet-Drafts if you've rsynced them (and added that directory to your config.ini)
  • the graphs/ directory contains d3.js visualizations of some of the measurements

Getting the documents

There are several thousand RFCs and many more drafts and other IETF docs. You can download some or all of those documents for easier local analysis.

Rsync all the documents via ietf-cli

Clone the ietf-cli, add the config file to an appropriate location (and specify where you want all the documents synced) and run ./ietf mirror to download all RFCs, drafts and some minutes and other documents. It's more than 2 GB of data and takes at least a few minutes to download.

Just download the RFCs

The RFC Editor maintains zip and tar files of all the RFCs, in TXT and PDF formats, for download with your browser. The compressed RFC-all.zip file is a couple hundred megabytes.

See also

About

Automated text analysis of Requests for Comment (RFCs)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published