Skip to content

Commit

Permalink
Update slides - June 2024 (#489)
Browse files Browse the repository at this point in the history
  • Loading branch information
tomwhite committed Jun 25, 2024
1 parent 7f3f390 commit 7e460a2
Show file tree
Hide file tree
Showing 6 changed files with 209 additions and 96 deletions.
236 changes: 167 additions & 69 deletions docs/slides/intro/cubed-intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"source": [
"# Cubed: an introduction\n",
"\n",
"Tom White, November 2023"
"Tom White, June 2024"
]
},
{
Expand Down Expand Up @@ -207,9 +207,9 @@
"source": [
"# Example: `reduction`\n",
"\n",
"![`reduction`](../../images/reduction.svg)\n",
"![`reduction`](../../images/reduction_new.svg)\n",
"\n",
"Implemented using multiple rounds of calls to `blockwise` and `rechunk`."
"Implemented using multiple rounds of a tree reduce operation followed by a final aggregation."
]
},
{
Expand Down Expand Up @@ -239,72 +239,159 @@
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"163pt\" height=\"146pt\" viewBox=\"0.00 0.00 163.00 146.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 142)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-142 159,-142 159,4 -4,4\"/>\n",
"<text text-anchor=\"middle\" x=\"77.5\" y=\"-18\" font-family=\"Times,serif\" font-size=\"10.00\">num tasks: 4</text>\n",
"<text text-anchor=\"middle\" x=\"77.5\" y=\"-7\" font-family=\"Times,serif\" font-size=\"10.00\">max projected memory: 100.0 MB</text>\n",
"<!-- array&#45;001 -->\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"230pt\" height=\"319pt\" viewBox=\"0.00 0.00 229.75 318.75\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 314.75)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-314.75 225.75,-314.75 225.75,4 -4,4\"/>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-39.5\" font-family=\"Times,serif\" font-size=\"10.00\">num tasks: 5</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-28.25\" font-family=\"Times,serif\" font-size=\"10.00\">max projected memory: 100.0 MB</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-17\" font-family=\"Times,serif\" font-size=\"10.00\">total nbytes written: 72 bytes</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-5.75\" font-family=\"Times,serif\" font-size=\"10.00\">optimized: True</text>\n",
"<!-- op&#45;001 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>array-001</title>\n",
"<g id=\"a_node1\"><a xlink:title=\"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"<title>op-001</title>\n",
"<g id=\"a_node1\"><a xlink:title=\"name: op-001\n",
"op: asarray\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 2 in &lt;module&gt;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"68.5,-138 9.5,-138 9.5,-102 68.5,-102 68.5,-138\"/>\n",
"<text text-anchor=\"middle\" x=\"39\" y=\"-123\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-001</text>\n",
"<text text-anchor=\"middle\" x=\"39\" y=\"-112\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray </text>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M43.25,-310.75C43.25,-310.75 13.25,-310.75 13.25,-310.75 7.25,-310.75 1.25,-304.75 1.25,-298.75 1.25,-298.75 1.25,-286.75 1.25,-286.75 1.25,-280.75 7.25,-274.75 13.25,-274.75 13.25,-274.75 43.25,-274.75 43.25,-274.75 49.25,-274.75 55.25,-280.75 55.25,-286.75 55.25,-286.75 55.25,-298.75 55.25,-298.75 55.25,-304.75 49.25,-310.75 43.25,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-001</text>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;004 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>array-004</title>\n",
"<g id=\"a_node3\"><a xlink:title=\"shape: (3, 3)\n",
"<!-- array&#45;001 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>array-001</title>\n",
"<g id=\"a_node2\"><a xlink:title=\"name: array-001\n",
"variable: a\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"chunk memory: 32 bytes\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"56.5,-238.75 0,-238.75 0,-202.75 56.5,-202.75 56.5,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-222.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-001</text>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-211.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">a</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- op&#45;001&#45;&gt;array&#45;001 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>op-001-&gt;array-001</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M28.25,-274.45C28.25,-267.16 28.25,-258.48 28.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"31.75,-250.37 28.25,-240.37 24.75,-250.37 31.75,-250.37\"/>\n",
"</g>\n",
"<!-- op&#45;004 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>op-004</title>\n",
"<g id=\"a_node5\"><a xlink:title=\"name: op-004\n",
"op: blockwise\n",
"projected memory: 100.0 MB\n",
"tasks: 4\n",
"num input blocks: (1, 1)\n",
"calls: &lt;module&gt; -&gt; add -&gt; elemwise -&gt; blockwise\n",
"line: 1 in &lt;module&gt;\">\n",
"<polygon fill=\"#dcbeff\" stroke=\"black\" points=\"106.5,-66 47.5,-66 47.5,-30 106.5,-30 106.5,-66\"/>\n",
"<text text-anchor=\"middle\" x=\"77\" y=\"-51\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-004</text>\n",
"<text text-anchor=\"middle\" x=\"77\" y=\"-40\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">add (bw)</text>\n",
"<path fill=\"#dcbeff\" stroke=\"black\" d=\"M80.25,-166.75C80.25,-166.75 50.25,-166.75 50.25,-166.75 44.25,-166.75 38.25,-160.75 38.25,-154.75 38.25,-154.75 38.25,-137 38.25,-137 38.25,-131 44.25,-125 50.25,-125 50.25,-125 80.25,-125 80.25,-125 86.25,-125 92.25,-131 92.25,-137 92.25,-137 92.25,-154.75 92.25,-154.75 92.25,-160.75 86.25,-166.75 80.25,-166.75\"/>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-153.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-004</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-142\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">add</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-130.75\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">tasks: 4</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;001&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>array-001-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M48.39,-101.7C52.76,-93.64 58.06,-83.89 62.9,-74.98\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"66.02,-76.56 67.71,-66.1 59.87,-73.22 66.02,-76.56\"/>\n",
"<!-- array&#45;001&#45;&gt;op&#45;004 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>array-001-&gt;op-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M37.02,-202.48C40.85,-194.94 45.45,-185.87 49.82,-177.26\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"52.88,-178.97 54.28,-168.47 46.64,-175.8 52.88,-178.97\"/>\n",
"</g>\n",
"<!-- op&#45;002 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>op-002</title>\n",
"<g id=\"a_node3\"><a xlink:title=\"name: op-002\n",
"op: asarray\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 1 in &lt;module&gt;\">\n",
"<path fill=\"none\" stroke=\"black\" d=\"M118.25,-310.75C118.25,-310.75 88.25,-310.75 88.25,-310.75 82.25,-310.75 76.25,-304.75 76.25,-298.75 76.25,-298.75 76.25,-286.75 76.25,-286.75 76.25,-280.75 82.25,-274.75 88.25,-274.75 88.25,-274.75 118.25,-274.75 118.25,-274.75 124.25,-274.75 130.25,-280.75 130.25,-286.75 130.25,-286.75 130.25,-298.75 130.25,-298.75 130.25,-304.75 124.25,-310.75 118.25,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-002</text>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;002 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>array-002</title>\n",
"<g id=\"a_node2\"><a xlink:title=\"shape: (3, 3)\n",
"<g id=\"a_node4\"><a xlink:title=\"name: array-002\n",
"variable: b\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"131.5,-238.75 75,-238.75 75,-202.75 131.5,-202.75 131.5,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-222.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-002</text>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-211.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">b</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- op&#45;002&#45;&gt;array&#45;002 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>op-002-&gt;array-002</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M103.25,-274.45C103.25,-267.16 103.25,-258.48 103.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"106.75,-250.37 103.25,-240.37 99.75,-250.37 106.75,-250.37\"/>\n",
"</g>\n",
"<!-- array&#45;002&#45;&gt;op&#45;004 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>array-002-&gt;op-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M94.24,-202.48C90.31,-194.94 85.58,-185.87 81.09,-177.26\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"84.24,-175.71 76.51,-168.47 78.03,-178.95 84.24,-175.71\"/>\n",
"</g>\n",
"<!-- array&#45;004 -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>array-004</title>\n",
"<g id=\"a_node6\"><a xlink:title=\"name: array-004\n",
"variable: c\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 1 in &lt;module&gt;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"145.5,-138 86.5,-138 86.5,-102 145.5,-102 145.5,-138\"/>\n",
"<text text-anchor=\"middle\" x=\"116\" y=\"-123\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-002</text>\n",
"<text text-anchor=\"middle\" x=\"116\" y=\"-112\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray </text>\n",
"nbytes: 72 bytes\">\n",
"<polygon fill=\"#ffd8b1\" stroke=\"black\" points=\"93.5,-89 37,-89 37,-53 93.5,-53 93.5,-89\"/>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-72.75\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-004</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-61.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">c</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;002&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>array-002-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M106.36,-101.7C101.87,-93.64 96.44,-83.89 91.48,-74.98\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"94.45,-73.14 86.53,-66.1 88.34,-76.54 94.45,-73.14\"/>\n",
"<!-- op&#45;004&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>op-004-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M65.25,-124.58C65.25,-117.19 65.25,-108.7 65.25,-100.73\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"68.75,-100.74 65.25,-90.74 61.75,-100.74 68.75,-100.74\"/>\n",
"</g>\n",
"<!-- create&#45;arrays -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>create-arrays</title>\n",
"<g id=\"a_node7\"><a xlink:title=\"name: create-arrays\n",
"op: create-arrays\n",
"projected memory: 100.0 MB\n",
"tasks: 1\">\n",
"<path fill=\"none\" stroke=\"black\" d=\"M209.75,-310.75C209.75,-310.75 160.75,-310.75 160.75,-310.75 154.75,-310.75 148.75,-304.75 148.75,-298.75 148.75,-298.75 148.75,-286.75 148.75,-286.75 148.75,-280.75 154.75,-274.75 160.75,-274.75 160.75,-274.75 209.75,-274.75 209.75,-274.75 215.75,-274.75 221.75,-280.75 221.75,-286.75 221.75,-286.75 221.75,-298.75 221.75,-298.75 221.75,-304.75 215.75,-310.75 209.75,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">create-arrays</text>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">tasks: 1</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- arrays -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>arrays</title>\n",
"<g id=\"a_node8\"><a xlink:title=\"name: arrays\" target=\"None\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"212.25,-238.75 158.25,-238.75 158.25,-202.75 212.25,-202.75 212.25,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-216.88\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">arrays</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- create&#45;arrays&#45;&gt;arrays -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>create-arrays-&gt;arrays</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M185.25,-274.45C185.25,-267.16 185.25,-258.48 185.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"188.75,-250.37 185.25,-240.37 181.75,-250.37 188.75,-250.37\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
Expand Down Expand Up @@ -346,14 +433,26 @@
"source": [
"# Optimization\n",
"\n",
"Cubed will optimize the graph before computing it - by fusing blockwise (map) operations.\n",
"Cubed will automatically optimize the graph before computing it. For example by fusing blockwise (map) operations:\n",
"\n",
"<p float=\"left\">\n",
" <img src=\"fusion-unoptimized.png\" />\n",
" <img src=\"fusion.png\" />\n",
"</p>\n",
" <img src=\"toy-unoptimized.png\" height=\"600\" />\n",
" <img src=\"toy-optimized.png\" height=\"600\"/>\n",
"</p>"
]
},
{
"cell_type": "markdown",
"id": "925fff3c-5531-4953-891e-b382583de56b",
"metadata": {},
"source": [
"# Optimization: an advanced example\n",
"\n",
"In early 2024 we implemented more optimizations to give a **4.8x** performance improvement on the \"Quadratic Means\" climate workload running on Lithops with AWS Lambda, with a **1.5 TB** workload completing in around **100 seconds**\n",
"\n",
"This is a simple case - still lots of optimizations left to do."
"<img src=\"benchmarks-aws.png\" width=\"600\">\n",
"\n",
"More details in [Optimizing Cubed](https://medium.com/pangeo/optimizing-cubed-7a0b8f65f5b7)\n"
]
},
{
Expand Down Expand Up @@ -452,30 +551,30 @@
},
{
"cell_type": "markdown",
"id": "b1fb4379",
"id": "d5a1fddd",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* __Modal__: new serverless platform\n",
" * Very easy to set up since it builds the runtime automatically\n",
" * Tested with ~300 workers"
"* __Lithops__: multi-cloud serverless computing framework\n",
" * Slightly more work to get started since you have to build a runtime environment first\n",
" * Tested on AWS Lambda and Google Cloud Functions with ~1000 workers"
]
},
{
"cell_type": "markdown",
"id": "d5a1fddd",
"id": "b1fb4379",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* __Lithops__: multi-cloud serverless computing framework\n",
" * Slightly more work to get started since you have to build a runtime environment first\n",
" * Tested on AWS Lambda and Google Cloud Functions with ~1000 workers"
"* __Modal__: new serverless platform\n",
" * Very easy to set up since it builds the runtime automatically\n",
" * Tested with ~300 workers"
]
},
{
Expand Down Expand Up @@ -525,7 +624,7 @@
"* Retries\n",
" * Each task is tried three times before failing\n",
"* Stragglers\n",
" * A backup task will be launched if a task is taking significantly longer than average (off by default)"
" * A backup task will be launched if a task is taking significantly longer than average"
]
},
{
Expand All @@ -539,10 +638,10 @@
"source": [
"# Xarray integration\n",
"\n",
"* Tom Nicholas added [Generalize handling of chunked array types](https://github.com/pydata/xarray/pull/7019) to Xarray\n",
" * Xarray can use Cubed as its computation engine instead of Dask\n",
" * Also needs [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray) integration package\n",
"* Examples at https://github.com/pangeo-data/distributed-array-examples"
"* Xarray can use Cubed as its computation engine instead of Dask\n",
" * Just install the [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray) integration package\n",
"* Cubed can use [Flox](https://flox.readthedocs.io/en/latest/) for `groupby` operations\n",
" * Examples at https://flox.readthedocs.io/en/latest/user-stories/climatology-hourly-cubed.html"
]
},
{
Expand All @@ -554,13 +653,12 @@
}
},
"source": [
"# Next steps\n",
"# Try out Cubed!\n",
"\n",
"* Community\n",
"* Examples and use cases\n",
" * Pangeo\n",
" * sgkit\n",
"* [Optimizations](https://github.com/tomwhite/cubed/issues?q=is%3Aissue+is%3Aopen+label%3Aoptimization)"
"* Try it out on your use case\n",
" * Get started at https://cubed-dev.github.io/cubed/\n",
"* Some examples from the Pangeo community:\n",
" * https://github.com/pangeo-data/distributed-array-examples"
]
}
],
Expand Down
69 changes: 42 additions & 27 deletions docs/slides/intro/cubed-intro.slides.html

Large diffs are not rendered by default.

Binary file removed docs/slides/intro/fusion-unoptimized.png
Binary file not shown.
Binary file removed docs/slides/intro/fusion.png
Binary file not shown.
Binary file added docs/slides/intro/toy-optimized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/slides/intro/toy-unoptimized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7e460a2

Please sign in to comment.