IndexReader

Python and R script that will read bound congressional record and process it into a workable csv. Provied is a smaple csv of the first three hundred pages from the 83rd congress.

The Python files process the Bound record into a txt file, then the r script pasrese the text file into a csv. The final form should be a csv with columns Name, Topic, Page (of the index), NoAmends (number of pages minus amendments).

Future improvements would include making the regex commands more robust as to handle incorrect ocr readings. There may be an upper limit to the amount of progress that can be made on that front as the scan quality of the bound record is low. It may be the case that to perfectly read in the document the scans will have to be redone at a higher resolution.

Bound records can be found at https://www.govinfo.gov/app/collection/crecb

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.Rhistory		.Rhistory
.gitattributes		.gitattributes
Bound.csv		Bound.csv
BoundRecordProject.R		BoundRecordProject.R
Index_HALF.txt		Index_HALF.txt
README.md		README.md
readindex.py		readindex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IndexReader

About

Uh oh!

Releases

Packages

Languages

C-SPAN/crParse

Folders and files

Latest commit

History

Repository files navigation

IndexReader

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages