Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kayibal committed Aug 24, 2018
1 parent 05ca71e commit e655e88
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,21 @@ Sparsity
========

Sparse data processing toolbox. It builds on top of pandas and scipy to
provide DataFrame like API to work with sparse categorical data.
provide DataFrame like API to work with numerical homogeneous sparse data.

In combination with dask it provides support to execute complex operations on
a concurrent/distributed level.
Sparsity provides pandas like indexing capabilities and group transformations
on Scipy csr matrices. This has prooven to be extremely efficient as will be
shown below.

Furthermore we provide a distributed implementation of this data struture by
relying on the Dask_ framework. This includes distributed sorting, partitioning,
grouping and much more.

Although we try to mimic Pandas API similar to the Das DataFrame API some
operations and parameters don't make sense on sparse or homogeneous data. Thus
some interfaces might be changed slightly in their semantics and/or inputs.

.. _Dask: https://dask.pydata.org/

Install
-------
Expand All @@ -24,7 +35,7 @@ Motivation
Many tasks especially in data analytics and machine learning domain make use of
sparse data structures to support the input of high dimensional data.

This project was started to build an efficient homogenuos sparse data
This project was started to build an efficient homogeneous sparse data
processing pipeline. As of today dask has no support for something as an sparse
dataframe. We process big amounts of highdimensional data on a daily basis at
Datarevenue_ and our favourite language and ETL
Expand Down

0 comments on commit e655e88

Please sign in to comment.