Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
base fork: Capncavedan/wikileaks
base: fe7dbfd414
...
head fork: Capncavedan/wikileaks
compare: cd258716e3
Checking mergeability… Don't worry, you can still create the pull request.
  • 4 commits
  • 4 files changed
  • 0 commit comments
  • 1 contributor
View
11 README.markdown
@@ -7,7 +7,16 @@ http://www.solrtutorial.com
http://sunspot.github.com
-Synonyms do not work out of the box! Edit `solr/conf/schema.xml` as reflected below:
+Wikileaks cables:
+
+Use torrent file WikiLeaks_uncensored_US_diplomatic_cables_(cables.csv).6644050.TPB.torrent
+
+Note that it is over 1.6 GB, so will take a while to download and also to impot if you choose to import all of it.
+
+It can be parsed with the db/import_cables.rb script; adjust the path within it to your cables.csv file, then open a rails console and use `load "db/import_cables.rb"`
+
+
+Synonyms are not enabled out of the box! Edit `solr/conf/schema.xml` as reflected below:
````
<fieldType name="text" class="solr.TextField" omitNorms="false">
View
BIN  Solr.key
Binary file not shown
View
BIN  WikiLeaks_uncensored_US_diplomatic_cables_(cables.csv).6644050.TPB.torrent
Binary file not shown
View
8 db/leaks_importer.rb → db/import_cables.rb
@@ -2,12 +2,14 @@
require 'american_date'
require 'csv'
+# adjust size of data read to fit your needs -
+# 10 MB is about 1370 cables; 100 MB is about 16,500 cables
data = File.read("/Users/danb/Downloads/cables.csv", 100 * 1024 *1024)
Cable.delete_all
i = 1
-while data.sub!(/("\d+","\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d\d",.*?)\n("\d+","\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d\d.*?")/m, '\2')
- cable = String.new($1).force_encoding('utf-8')
+data.scan(/("\d+","\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d\d",.*?"\n)/m) do |record|
+ cable = String.new(record.first.chomp).force_encoding('utf-8')
cable.gsub!(/\\"/, '""') # CSV does not like double quotes escaped like so: \"
cable.gsub!(/\\'/, "'") # CSV does not like single quotes escaped like so: \'
@@ -15,7 +17,7 @@
# "2","2/25/1972 9:30","72TEHRAN1164","Embassy Tehran","UNCLASSIFIED","72MOSCOW1603|72TEHRAN1091|72TEHRAN263","R 250930Z FEB 72...
CSV.parse(cable) do |row|
numeric_id, cable_date, origin_id, origin_description, classification, destination_id, header, body = row
- puts origin_id
+ puts "#{numeric_id} :: #{origin_id}"
i += 1
Cable.create(

No commit comments for this range

Something went wrong with that request. Please try again.