forked from typecode/wikileaks
-
Notifications
You must be signed in to change notification settings - Fork 1
anarchivist/cablegate
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Wikileaks CableGate Processing mark@matienzo.org (based on work by andrew@typeslashcode.com) used andrew's existing code to start coming up with a generic parser that will scrape data from the cables in HTML form and create python objects. for an example, please see https://gist.github.com/722901 data from typecode's original readme follows the dashed line below. --- Prerequisites: HTTrack (http://www.httrack.com/httrack-3.43-9C.tar.gz) *to update mirror of cablegate site MongoDB (www.mongodb.org) *database to store parsed cables *ouputs json dump Functionality: Will update Cablegate Web Mirror, and pull all existing cables into a MongoDB Collection where their 'Reference ID' is their '_id', and they contain the following properties: 'reference_id','date_time','classification','origin','header','body' Usage: Make sure that mongod is running. The software is configured to access Mongod at it's default location, so change that if necessary. Confirm that httrack and mongoexport are accesable in your PATH. run 'python process_into_mongo.py' after that run 'mongoexport -d wikileaks -c cables -o dump/cables.json' Notes: The Tornado app that is there right now serves no function. To come..?
About
Some Python to process the Wikileaks Cablegate data.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 85.3%
- C 14.4%
- JavaScript 0.3%