Skip to content

Commit

Permalink
UP: Perfected readme
Browse files Browse the repository at this point in the history
  • Loading branch information
langmore committed Nov 8, 2013
1 parent 360680b commit b13b5e0
Showing 1 changed file with 25 additions and 13 deletions.
38 changes: 25 additions & 13 deletions README.md
@@ -1,26 +1,34 @@
DSpy
====

Tools, wrappers, etc... for data science with a concentration on text processing

* Utilities to move data from one Data Structure to another
* Strong focus on stream processing of text
* Utilities to use packages outside the normal Python ecosystem
* Command line utilities
* Focus on "medium data", i.e. data too big to fit into memory but too small to necessitate the use of a cluster.
* The *DS* in DSpy clearly relates to *Data Science*. However, it came first from *Data Structure* and the *Dead Sea*. The tools concentrate on streaming text, and the dead sea scrolls are the most famous version of text in a stream (a lake actually...but just pretend and it's really cool).
Tools for data science with a focus on text processing.

* Focuses on "medium data", i.e. data too big to fit into memory but too small to necessitate the use of a cluster.
* Integrates with existing scientific Python stack as well as select outside tools.

Packages
--------

See the `examples/` directory for more details.

* `cmd` Command line utilities
* `modeling` Utilities to help common modeling tasks
* `parallel` Wrappers for Python multiprocessing that add much needed usability and allow for stream processing
* `text` Text processing
* `workflow` High-level wrappers that have helped with our workflow and provide additional examples of code use
### `cmd`
* Unix-like command line utilities. Filters (read from stdin/write to stdout) for files

### `parallel`
* Wrappers for Python multiprocessing that add ease of use
* Memory-friendly multiprocessing

### `text`
* Stream text from disk to formats used in common ML processes
* Write processed text to sparse formats
* Helpers for ML tools (e.g. Vowpal Wabbit, Gensim, etc...)
* Other general utilities

### `workflow`
* High-level wrappers that have helped with our workflow and provide additional examples of code use

### `modeling`
* General ML modeling utilities

Install
-------
Expand Down Expand Up @@ -53,4 +61,8 @@ From the base repo directory, `dspy/`, you can run all tests with

make test

History
-------
The *DS* in DSpy clearly relates to *Data Science*. However, it came first from *Data Structure* and the *Dead Sea*. The tools concentrate on streaming text, and the dead sea scrolls are the most famous version of text in a stream (a lake actually...but just pretend and it's really cool).

[dspyrepo]: https://github.com/columbia-applied-data-science/dspy

0 comments on commit b13b5e0

Please sign in to comment.