Web scraping from US General Service Administration Website
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
1983-2013.csv
README.md
Script.R

README.md

Web-Scraping-from-USGSA

Purpose:

Web scraping from US General Service Administration Website: http://www.reginfo.gov

The website of US General Service Administration contains rich and detailed information of regulation and rules made by different government branches over years.

A sample rule would be like: http://www.reginfo.gov/public/do/eAgendaViewRule?pubId=200210&RIN=1125-AA38

Title: Protective Orders in Immigration Administration Proceedings
Abstract: This rule amends regulations governing the Executive Office for Immigration Review (EOIR) by authorizing immigration judges to issue protective orders to limit public disclosure of sensitive law enforcement ...
Timetable: ...

The information in timetable is important and valuable for many researchers working in public administration, but the website doesn't provide a convenient way for people to view all this type of data, namely action of a rule and its matched date, on a single page.

This project can help scrap all this type of data in the following format:

Column Name: Action, Date, RIN
Row 1: ...
Row 2: ...

Usage:

To use the script of this project, you have to make sure you have R (http://www.r-project.org/) installed on either your PC or Mac, and follow the following steps:

  1. Download the all the files and put them in the working directory of R
  2. Make sure all files are in the same folder
  3. Ran the whole script first
  4. Run the function totalData(start, end); you have to specify the start and end parameters as the integeters, and their range is from 1 to 43150.

By the way, if you are curious where the RIN number data was retrieved, you can check this page. Click "Download All RIN Data in XML", you will get a XML file. We get all the RIN numbers from the XML file.

Warning:

Please BE SURE to check with US General Service Administration with their specification about web scraping before you use this script.