Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Downloader for FOIAonline #23

Merged
merged 32 commits into from
Aug 28, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
cd93d78
renaming data/ to documents/
konklone Aug 13, 2014
592b617
shell
konklone Aug 13, 2014
87f573a
allow import if needed
konklone Aug 13, 2014
9d5746d
ignore documents/ now
konklone Aug 13, 2014
c23c4cc
3.4.1
konklone Aug 13, 2014
5e5d285
Merge branch 'master' into foiaonline
konklone Aug 19, 2014
b375414
just to suppress a warning
konklone Aug 19, 2014
8853e39
moving stuff around
konklone Aug 19, 2014
f4cd20b
use urlretrieve, update scrapelib
konklone Aug 19, 2014
1c81de2
one more shuffle
konklone Aug 19, 2014
6394447
ignoring things
konklone Aug 19, 2014
05beedd
session preservation and posting works
konklone Aug 19, 2014
13be975
okay we are saving metadata to disk now this is good
konklone Aug 20, 2014
108574e
now it can download all the pages
konklone Aug 20, 2014
e5a6001
add appeals/referrals, document agency slugs, don't overwrite meta
konklone Aug 20, 2014
fcf44d7
dang - assumption that tracking number was unique per-document very w…
konklone Aug 21, 2014
9d6c1e3
helper script to archive metadata for foiaonline
konklone Aug 21, 2014
cfe4ca6
downloading record PDFs and extracting text
konklone Aug 21, 2014
fe20c71
some basic dir iteration over downloaded metadata
konklone Aug 21, 2014
f8d5833
scraping more metadata, handling other file types, not caching landin…
konklone Aug 22, 2014
b416eaf
ignore exemption 5 subtypes explicitly for now
konklone Aug 22, 2014
252ed8c
allow scoping of data fetching
konklone Aug 22, 2014
4f96a67
added resume mode to data download for records
konklone Aug 25, 2014
a9a116e
comment more
konklone Aug 25, 2014
e44bd00
Handle unreleased docs
konklone Aug 25, 2014
ded048f
comments on file types
konklone Aug 25, 2014
2e43ab8
update resume language
konklone Aug 25, 2014
4f2abd5
always download docs as binary from foiaonline
konklone Aug 26, 2014
eb1aaac
okay ignore misc/
konklone Aug 27, 2014
5434f14
capture original release date, handle unexpected dates gracefully
konklone Aug 27, 2014
b9b4c43
not in use yet, but wrote a function to get header details for binaries
konklone Aug 27, 2014
43af6a7
Merge branch 'master' into foiaonline
konklone Aug 28, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 3 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
/contacts/html
<<<<<<< HEAD
/documents/data
.DS_Store
=======
/data/pages
/data/data
__pycache__
*.pyc
node_modules
todo.txt
>>>>>>> 56a18c8a16cd008e8892b96076ca250d25c951e8
/documents/*.html
/misc
1 change: 0 additions & 1 deletion data/.python-version

This file was deleted.

3 changes: 0 additions & 3 deletions data/requirements.txt

This file was deleted.

1 change: 1 addition & 0 deletions documents/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.4.1
File renamed without changes.
10 changes: 10 additions & 0 deletions documents/foiaonline-meta.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/sh

python tasks/foiaonline.py --meta --term=epa
python tasks/foiaonline.py --meta --term=cbp
python tasks/foiaonline.py --meta --term=doc
python tasks/foiaonline.py --meta --term=mspb
python tasks/foiaonline.py --meta --term=flra
python tasks/foiaonline.py --meta --term=nara
python tasks/foiaonline.py --meta --term=pbgc
python tasks/foiaonline.py --meta --term=don
1 change: 1 addition & 0 deletions data/package.json → documents/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "state",
"version": "0.0.0",
"main": "state_pages.js",
"repository": "https://github.com/18f/foia",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
Expand Down
4 changes: 4 additions & 0 deletions documents/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ipython
scrapelib>=0.10
BeautifulSoup4
python-dateutil