Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Some Python to process the Wikileaks Cablegate data.

branch: master

This branch is 0 commits ahead and 0 commits behind master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 data
Octocat-spinner-32 dump
Octocat-spinner-32 lib
Octocat-spinner-32 .gitmodules
Octocat-spinner-32 README
Octocat-spinner-32 app.py
Octocat-spinner-32 process_into_mongo.py
Octocat-spinner-32 processor.py
README
Wikileaks CableGate Processing
mark@matienzo.org (based on work by andrew@typeslashcode.com)

used andrew's existing code to start coming up with a generic parser
that will scrape data from the cables in HTML form and create python objects.

for an example, please see https://gist.github.com/722901

data from typecode's original readme follows the dashed line below.

--- 

Prerequisites:
  HTTrack (http://www.httrack.com/httrack-3.43-9C.tar.gz)
    *to update mirror of cablegate site
  MongoDB (www.mongodb.org)
    *database to store parsed cables
    *ouputs json dump

Functionality:
  Will update Cablegate Web Mirror, and pull all existing cables into a 
    MongoDB Collection where their 'Reference ID' is their '_id', and they
    contain the following properties:
      'reference_id','date_time','classification','origin','header','body'

Usage:
  Make sure that mongod is running. The software is configured to access
    Mongod at it's default location, so change that if necessary.
  Confirm that httrack and mongoexport are accesable in your PATH.
  run 'python process_into_mongo.py'
  after that
  run 'mongoexport -d wikileaks -c cables -o dump/cables.json'
  
Notes:
  The Tornado app that is there right now serves no function. To come..?
Something went wrong with that request. Please try again.