Web scraping from US General Service Administration Website: http://www.reginfo.gov
The website of US General Service Administration contains rich and detailed information of regulation and rules made by different government branches over years.
A sample rule would be like: http://www.reginfo.gov/public/do/eAgendaViewRule?pubId=200210&RIN=1125-AA38
Title: Protective Orders in Immigration Administration Proceedings
Abstract: This rule amends regulations governing the Executive Office for Immigration Review (EOIR) by authorizing immigration judges to issue protective orders to limit public disclosure of sensitive law enforcement ...
The information in timetable is important and valuable for many researchers working in public administration, but the website doesn't provide a convenient way for people to view all this type of data, namely action of a rule and its matched date, on a single page.
This project can help scrap all this type of data in the following format:
Column Name: Action, Date, RIN
Row 1: ...
Row 2: ...
To use the script of this project, you have to make sure you have R (http://www.r-project.org/) installed on either your PC or Mac, and follow the following steps:
- Download the all the files and put them in the working directory of R
- Make sure all files are in the same folder
- Ran the whole script first
- Run the function totalData(start, end); you have to specify the start and end parameters as the integeters, and their range is from 1 to 43150.
By the way, if you are curious where the RIN number data was retrieved, you can check this page. Click "Download All RIN Data in XML", you will get a XML file. We get all the RIN numbers from the XML file.
Please BE SURE to check with US General Service Administration with their specification about web scraping before you use this script.