Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Comparing changes

Choose two branches to see what's changed or to start a new pull request. If you need to, you can also compare across forks.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.
Commits on Dec 10, 2010
elishowk cleaned files e23cc08
elishowk rewritten README and added GNU GPL LICENSE TXt ee9215c
elishowk overwrite cable by default, removed lib, added with dependen…
elishowk fixed attributes bugs bf52bbb
elishowk rewritten README 0688c60
elishowk cleand ignore adc8e2e
elishowk removed data d55a52b
elishowk cleaned data 3073855
elishowk removed unused countries handler 67dd001
Commits on Dec 17, 2010
elishowk starting keyphrases extraction 3ab7dbb
Commits on Dec 20, 2010
elishowk refactoring app structure 18e91d2
elishowk restored cableimporter 1f74514
elishowk updated README 846d9b2
elishowk cleaned prints ed5bc81
elishowk reformated 8057ef6
elishowk reformated 7459809
elishowk reformated README.MD 1c35fa4
Commits on Dec 22, 2010
elishowk cleand cableimporter e69521a
elishowk simplified datamodel and introduced extractor and ngramizer 4b34b13
elishowk pickled tagger fcbe875
elishowk starting to produce ngrams 146453e
elishowk starting to produce ngrams fb9dbca
Commits on Dec 23, 2010
elishowk better sanitizer, but slow c534129
elishowk still extracting 2b155ab
elishowk debugging ngram extraction a853f36
elishowk debugging ngram insertion c39ad96
@jbilcke jbilcke fixed typos found by automatic correction, added some infos for the G…
…ephi part
elishowk Merge branch 'master' of 8203201
@jbilcke jbilcke cleaned some parts talking about gephi, fixed links acf3983
@jbilcke jbilcke small bug in the readme (orphan sentence) 5c66176
elishowk checking ngrams values e15c579
elishowk better ngram extraction, with word size filter 6ca8ae7
elishowk starting cable network producer 1c080ac
elishowk Merge branch 'master' of 0359ac4
Commits on Dec 24, 2010
elishowk more intro 66bc1f4
elishowk correcting bullets errors a6f69c1
elishowk finalizing READM 4618f0a
elishowk moved usage to 7c27487
elishowk fixed conflicts 991abcc
elishowk finishing readme 0c73cc7
elishowk changing date 5416e40
elishowk indexer cleans edges before starting e15b9fe
elishowk some things b46682a
root last changes a95d567
elishowk last changes d3bbc31
Commits on Dec 26, 2010
elishowk beautifulsoup optimized memory usage with soupstrainer ea99f51
elishowk improved beautiful soup parsing with soupstrainer 0cbc138
elishowk corrected content import 24666ed
elishowk indexer reinit edges 3af2f2f
elishowk resolved data corruption on indexing 7ae8e0c
Commits on Jan 04, 2011
elishowk moved cable existing test 1dc124e
Commits on Jan 07, 2011
elishowk debugging cablenetwork de22ca6
elishowk added cooc mapreducer js files b06aefd
Commits on Jan 15, 2011
elishowk adding network 2380745
Commits on Jan 17, 2011
elishowk cooccurrences fbe2a91
Commits on Jan 23, 2011
elishowk starting Document edges processing e44f660
elishowk added timeout to mongo find cb9ae3a
Commits on Jan 24, 2011
elishowk first export version c2c5bb1
Commits on Jan 25, 2011
elishowk corrected cooc graph export a146366
elishowk removed tinasoft dependency, using trained tagger from jperkins b649939
elishowk some little changes 8ef4741
elishowk Merge branch 'master' of 0ce0e78
Commits on Jan 26, 2011
elishowk indentation 3a1ba47
elishowk datetime imported 53728ca
elishowk import more metadata to cables and inserts only half of cooccurrences…
… on indexing
elishowk separating index from network e7d8aef
elishowk first try inserting into neo4j eafe631
Commits on Jan 27, 2011
elishowk adding network export to neo4j 83446d9
elishowk implementing neo4j handler f3961bb
Commits on Jan 28, 2011
elishowk implementing neo4j index e24b187
Commits on Jan 30, 2011
elishowk integrated neo4j with mongodb dd596ce
elishowk separated extract and network to reduce neo4j calls 92ba479
elishowk added support for ngram total occs in mongodb 34fb4e0
elishowk bugs corrected 72fcb8b
elishowk removed limit 37a110b
Commits on Jan 31, 2011
elishowk separated ngrams and cooc into mongodb 943fb55
elishowk separated ngrams and cooc into mongodb 9c98d45
elishowk update_document_cooc debugged 68ca9f8
elishowk corrected update_cooc 54b19bf
elishowk network export without fatal errors ebecf41
Commits on Feb 01, 2011
elishowk next gen cooc graph 33a4461
elishowk lower text for better tagging 945ec3c
elishowk removed js file added a new tagger 21e1498
elishowk Merge branch 'master' of a8436a4
elishowk BeautifulSoup.NavigableString conversion to string 2647e17
elishowk try/except on get nodes 1a1f777
elishowk do not update document node in ngramize 0600a6b
elishowk changed default tagger and tag validation RegExp 7559d18
elishowk removed print b72e559
Commits on Feb 02, 2011
elishowk decomposed and added overwrite to recreate nodes on the fly 7d59312
elishowk add nodes into CableNetwork, removed from CableExtractor ee8ae0a
elishowk testing date query 48c7355
elishowk switched to neo4j with kpype b21eeb8
elishowk 1.0 fd78e41
Commits on Feb 03, 2011
elishowk added mongo queries in network and updates in extract 2252c07
elishowk new tfidf script d3725f9
Commits on Feb 04, 2011
elishowk corrected 9d57f8c
elishowk Merge branch 'master' of 1e19da4
elishowk optimizing cablenetwork c575a62
elishowk cooccurrences again c9fcbbf
@heuer heuer Untested usage of Cablemap's cable extractor 6f2c3dc
@heuer heuer Fixed typo 307bbfd
@heuer heuer Minor code tweaks 3a21f0d
elishowk corrected ngram selection fa5945c
elishowk corrected ngram selection 7d94340
@heuer heuer Handle cables where the subject is None, updated dependencies 2048ae4
@heuer heuer Simplification e6a246e
Commits on Feb 05, 2011
@heuer heuer Fixed info msgs, a counter would be more efficient, though 6b5ba47
elishowk changed dependencies and added links f535af1
Commits on Feb 06, 2011
elishowk merged conflicts 3eb68bb
elishowk great jop 1039412
Commits on Feb 07, 2011
@heuer heuer Better titlecase'ing 382bbf3
Commits on Feb 09, 2011
elishowk Merge branch 'master' of in…
…to titlefy
@heuer heuer Cahlemap's cable.subject returns never None but an empty string iff t…
…he WikiLeaks cable has no subject
Commits on Feb 11, 2011
elishowk updated presentation text 0e74686
elishowk removed tfidf 614e785
Commits on Feb 14, 2011
elishowk Merge branch 'master' of in…
…to heuer
Commits on Mar 16, 2011
elishowk deleted stopwords 47e3e8c
elishowk extraction using multiprocessing c7f069f
elishowk removed old tagger pickle 35771e2
Commits on Mar 17, 2011
elishowk replaced nltk-trainer c8446be
elishowk buildtagger ff8bab1