Skip to content

sunlightlabs/tcorps-earmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic Preparation

This assumes you have already prepared individual documents where one document is meant to be one task. To digitize 130 pieces of data, have 130 documents.

  • Add all documents to the data/docs folder. Files in this folder are ignored by git.
  • Run "rake data:docs:load_into_db". This will create a Document for any files whose filenames do not appear in the Document table as a Document's source_file. Do not delete the documents from data/docs yet.
  • Run "rake data:scribd:populate" to send each Document which has not sent its document to Scribd, to Scribd. You can now delete the documents in data/docs.
  • Run "rake data:scribd:update_plain_text" to use the Scribd API to get the plain text for each document. You may want to wait a little bit so that Scribd has time to finish processing the documents you uploaded in step 4. If a document hasn't yet been processed by Scribd, the rake task will let you know its status, and you can try running it again later.
  • Run "rake data:backup:all" to backup the Legislator and Document tables to YAML. These directories are ignored by the repository. You can transfer these to another machine and run "rake data:restore:all" to restore the database from these files, so that (for example) your staging and production machines can use the same Scribd documents you've already created.
    • If you add more columns to either model, that you want to have backed up, edit lib/data_backup_helper.rb and update the arrays of stored fields.
  • To restore the data you've backed up (like on a staging or production machine, for example), run "rake data:restore:all"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published