Skip to content

Utility to extract interesting words from documents, and store their counts and co-occurring documents and sentences.

Notifications You must be signed in to change notification settings

cjmcmurtrie/word-store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordStore

Store interesting words to a table in Python.

Setup

Run the following from command line in root:

$ pip install -r requirements.txt 
$ python setup.py

Example

A single document:

>>> from word_store.store import WordStore
>>> word_store = WordStore()
>>> word_store.add_document(
...         document_id='1234',
...         document_string='This is a document. Another document may follow, with more interesting words.'
...     )
>>> word_store.get_word('document')
{'count': 2, 'documents': ['1234'],
'sentences': ['This is a document.', 'Another document may follow, with more interesting words.']}

Batch update:

>>> from word_store.store import WordStore, batch_update_word_store
>>> word_store = WordStore()
>>> word_store = batch_update_word_store(
...    word_store,
...    documents_path='tests/fictures/documents/'
...)

To get a Pandas DataFrame:

>>> word_store.to_pandas().head()
       word count                                          documents                                          sentences
0      good    13  [tests/fixtures/documents/doc6.txt, tests/fixt...  [ Good morning., Fortunately, however, we've m...
1   morning     2                [tests/fixtures/documents/doc6.txt]  [ Good morning., Outstanding career officials ...
2   senator    12  [tests/fixtures/documents/doc6.txt, tests/fixt...  [As some of you know, Senator Lugar and I rece...
3     lugar     5  [tests/fixtures/documents/doc6.txt, tests/fixt...  [As some of you know, Senator Lugar and I rece...
4  recently     2  [tests/fixtures/documents/doc6.txt, tests/fixt...  [As some of you know, Senator Lugar and I rece...

To save to file:

>>> word_store.save('word_store.json')

Then a previous instance can be loaded.

>>> word_store = WordStore('word_store.json')

To run tests:

$ py.test -s tests/

About

Utility to extract interesting words from documents, and store their counts and co-occurring documents and sentences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published