Skip to content

Commit

Permalink
small edits to pydata berlin
Browse files Browse the repository at this point in the history
  • Loading branch information
mrocklin committed May 29, 2015
1 parent 954c3db commit a3db472
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 17 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 7 additions & 7 deletions docs/source/_static/presentations/markdown/dask-array.md
Expand Up @@ -20,19 +20,19 @@ Continuum Analytics

### Related work

* Parallel BLAS implementations - ScaLAPACK, Plasma, ...
* Distributed arrays - PETSc/Trillinos, Elemental, HPF
* Parallel collections - Hadoop/Spark (Dryad, Disco, ...)
* Task scheduling frameworks - Luigi, swift-lang, ...
* Python big-numpy projects: Distarray, Spartan, Biggus
* Parallel BLAS implementations -- ScaLAPACK, Plasma, ...
* Distributed arrays -- PETSc/Trillinos, Elemental, HPF
* Parallel collections -- Hadoop/Spark (Dryad, Disco, ...)
* Task scheduling frameworks -- Luigi, swift-lang, ...
* Python big-numpy projects -- Distarray, Spartan, Biggus
* Custom solutions with MPI, ZMQ, ...

<hr>

### Distinguishing features of `dask.array`

* Full ndarray support, no serious linear algebra
* Shared memory parallelism, not distributed
* Full ndarray support, instead of serious linear algebra
* Focus on shared memory parallelism (workstation, not cluster)
* Immediately usable - `conda/pip` installable
* Dask includes other non-array collections

Expand Down
2 changes: 1 addition & 1 deletion docs/source/_static/presentations/markdown/dask-core.md
Expand Up @@ -27,7 +27,7 @@ Dead simple task scheduling
![](images/embarrassing.gif)


## Useful for more than just arrays
## Dask works for more than just arrays


## `dask.bag`
Expand Down
27 changes: 19 additions & 8 deletions docs/source/_static/presentations/markdown/foundations.md
@@ -1,3 +1,6 @@
### PyData builds off of NumPy and Pandas


### NumPy and Pandas provide foundational data structures

<img src="images/jenga.png" width="100%">
Expand Down Expand Up @@ -37,7 +40,7 @@ Date: Thu Feb 1 08:32:30 2001 +0000
### These limitations affect the PyData ecosystem


### Hardware has changed since 1999
### Hardware has changed since 2001

![](images/multicore-cpu.png)

Expand All @@ -48,7 +51,18 @@ Date: Thu Feb 1 08:32:30 2001 +0000
* Fast Solid State Drives (disk is now extended memory)


### Problems have changed since 1999
### Hardware has changed since 2001

![](images/xeon-phi.jpb)

* Multiple cores
* 4 cores -- cheap laptop
* 32 cores -- workstation
* Distributed memory clusters in big data warehousing
* Fast Solid State Drives (disk is now extended memory)


### Problems have changed since 2001

* Larger datasets
* Messier data
Expand All @@ -67,12 +81,9 @@ Date: Thu Feb 1 08:32:30 2001 +0000

* The Global Interpreter Lock (GIL) stops two Python threads from
manipulating Python objects simultaneously
* Can use multiple processes in simple cases
* PyData could cheat the GIL

because we rely on C/Fortran code

but we don't take advantage of this
* Solutions:
* Compute in separate processes (hard to share data)
* Release the GIL and use C/Fortran code


### PyData rests on single-threaded foundations
Expand Down
Expand Up @@ -26,6 +26,7 @@

* [Bottleneck issue](https://github.com/kwgoodman/bottleneck)


### Final thoughts

[http://dask.pydata.org](http://dask.pydata.org)
Expand Down
Expand Up @@ -34,4 +34,4 @@ Continuum Analytics

* Gigabyte - Fits in memory, need one core (laptop)
* Terabyte - Fits on disk, need ten cores (workstation)
* Petabyte - Fits on many disks, need 1000 cores (distributed cluster)
* Petabyte - Fits on many disks, need 1000 cores (cluster)

0 comments on commit a3db472

Please sign in to comment.