Skip to content

Python and R script that will read bound congressional record and process it into a workable csv. Provied is a smaple csv of the first three hundred pages from the 83rd congress.

Notifications You must be signed in to change notification settings

C-SPAN/crParse

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IndexReader

Python and R script that will read bound congressional record and process it into a workable csv. Provied is a smaple csv of the first three hundred pages from the 83rd congress.

The Python files process the Bound record into a txt file, then the r script pasrese the text file into a csv. The final form should be a csv with columns Name, Topic, Page (of the index), NoAmends (number of pages minus amendments).

Future improvements would include making the regex commands more robust as to handle incorrect ocr readings. There may be an upper limit to the amount of progress that can be made on that front as the scan quality of the bound record is low. It may be the case that to perfectly read in the document the scans will have to be redone at a higher resolution.

Bound records can be found at https://www.govinfo.gov/app/collection/crecb

About

Python and R script that will read bound congressional record and process it into a workable csv. Provied is a smaple csv of the first three hundred pages from the 83rd congress.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 54.1%
  • Python 45.9%