From 516288be3f9ee853a74e67622a9968352ef069a5 Mon Sep 17 00:00:00 2001 From: Kjartan Thor Wikfeldt Date: Mon, 12 Nov 2018 22:25:17 +0100 Subject: [PATCH] add exercise on R+python --- exercises.ipynb | 40 ++- solutions.ipynb | 780 +++++++++++++++++++----------------------------- 2 files changed, 351 insertions(+), 469 deletions(-) diff --git a/exercises.ipynb b/exercises.ipynb index 8ef6436..13bb64a 100644 --- a/exercises.ipynb +++ b/exercises.ipynb @@ -141,7 +141,7 @@ "4. The \"share\" column of the dataframe contains the number of Nobel recipients that shared the prize. Have a look at the statistics of this column using \n", "\n", "```python\n", - "nobels[\"share\"].describe()\n", + "nobel[\"share\"].describe()\n", "```\n", "\n", "5. The `describe()` method is smart about data types. Try this: \n", @@ -161,15 +161,14 @@ "7. Next subtract the birth date from the year of receiving the prize and insert it into a new column \"age\":\n", "```python\n", "nobel[\"age\"] = nobel[\"year\"] - nobel[\"born\"].dt.year\n", - "nobel[[\"surname\",\"age\"]].head(10)\n", "```\n", " - Now print the \"surname\" and \"age\" of first 10 entries using the `head()` method.\n", "\n", "8. Now plot results in two different ways:\n", "\n", "```python\n", - "nobel[\"age\"].plot.hist(bins=[20,30,40,50,60,70,80,\n", - " 90,100],alpha=0.6);\n", + "nobel[\"age\"].plot.hist(bins=[20,30,40,50,60,70,80, 90,100],alpha=0.6);\n", + "\n", "nobel.boxplot(column=\"age\", by=\"category\")\n", "```\n", "\n", @@ -340,6 +339,39 @@ "Final note: While parallelizing Python code is often worth it, there are other ways to get higher performance out of Python code. In particular, fast numerical packages like [Numpy](http://www.numpy.org/) should be used, and significant speedup can be obtained with just-in-time compilation with [Numba](https://numba.pydata.org/) and/or C-extensions from [Cython](http://cython.org/).\n" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### *Exercise 6:* Mixing Python and R " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your goal now is to define a pandas dataframe, and pass it into an R cell and plot it with an R plotting library.\n", + "\n", + "1. Run the following code in a code cell and plot it with the basic plot method of pandas dataframes:\n", + "\n", + "```python\n", + "import pandas as pd\n", + "df = pd.DataFrame({\n", + " 'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n", + " 'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]\n", + "})\n", + "```\n", + " \n", + "2. Now take the following R code, and use the `%%R` magic command to pass in and plot the pandas dataframe defined above:\n", + "\n", + "```R\n", + "library(ggplot2)\n", + "ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()\n", + "```\n", + "\n", + "3. Play around with the flags for height, width, units and resolution to get a good looking graph." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/solutions.ipynb b/solutions.ipynb index 8b4c779..3a3a326 100644 --- a/solutions.ipynb +++ b/solutions.ipynb @@ -22,7 +22,7 @@ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "507fc8f4df9f4346b07ffe2cfbe9d384", + "model_id": "ef39542623514e52ad0ea5011b911823", "version_major": 2, "version_minor": 0 }, @@ -83,7 +83,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -99,7 +99,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -108,7 +108,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -119,7 +119,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -144,7 +144,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -161,17 +161,19 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 7, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] }, - "metadata": {}, + "metadata": { + "needs_background": "light" + }, "output_type": "display_data" } ], @@ -189,14 +191,14 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "82.3 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + "83.9 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], @@ -213,16 +215,16 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "0.07941211669822223" + "0.08231310210030643" ] }, - "execution_count": 13, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -240,7 +242,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -253,17 +255,17 @@ { "data": { "text/plain": [ - " 200003 function calls in 0.118 seconds\n", + " 200003 function calls in 0.138 seconds\n", "\n", " Ordered by: internal time\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", - " 1 0.070 0.070 0.118 0.118 :1(walk)\n", - " 99999 0.039 0.000 0.048 0.000 :1(step)\n", - " 99999 0.008 0.000 0.008 0.000 {method 'random' of '_random.Random' objects}\n", + " 1 0.081 0.081 0.138 0.138 :1(walk)\n", + " 99999 0.047 0.000 0.056 0.000 :1(step)\n", + " 99999 0.009 0.000 0.009 0.000 {method 'random' of '_random.Random' objects}\n", " 1 0.001 0.001 0.001 0.001 {built-in method numpy.core.multiarray.zeros}\n", - " 1 0.000 0.000 0.118 0.118 {built-in method builtins.exec}\n", - " 1 0.000 0.000 0.118 0.118 :2()\n", + " 1 0.000 0.000 0.138 0.138 {built-in method builtins.exec}\n", + " 1 0.000 0.000 0.138 0.138 :2()\n", " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}" ] }, @@ -293,11 +295,11 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": {}, "outputs": [], "source": [ - "!pip install line_profiler" + "#!pip install line_profiler" ] }, { @@ -309,7 +311,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -325,11 +327,34 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 13, "metadata": {}, "outputs": [], "source": [ - "%load random_walk.py" + "# %load random_walk.py\n", + "import numpy as np\n", + "\n", + "def main():\n", + " n = 100000\n", + " x = walk(n)\n", + "\n", + "def step():\n", + " import random\n", + " return 1. if random.random() > .5 else -1.\n", + "\n", + "def walk(n):\n", + " x = np.zeros(n)\n", + " dx = 1. / n\n", + " for i in range(n - 1):\n", + " x_new = x[i] + dx * step()\n", + " if x_new > 5e-3:\n", + " x[i + 1] = 0.\n", + " else:\n", + " x[i + 1] = x_new\n", + " return x\n", + "\n", + "if __name__==\"__main__\":\n", + " main()\n" ] }, { @@ -342,7 +367,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -350,41 +375,41 @@ "text/plain": [ "Timer unit: 1e-06 s\n", "\n", - "Total time: 0.98881 s\n", - "File: \n", + "Total time: 0.760819 s\n", + "File: \n", "Function: main at line 4\n", "\n", "Line # Hits Time Per Hit % Time Line Contents\n", "==============================================================\n", " 4 def main():\n", - " 5 1 205.0 205.0 0.0 n = 100000\n", - " 6 1 988605.0 988605.0 100.0 x = walk(n)\n", + " 5 1 6.0 6.0 0.0 n = 100000\n", + " 6 1 760813.0 760813.0 100.0 x = walk(n)\n", "\n", - "Total time: 0.169478 s\n", - "File: \n", + "Total time: 0.133749 s\n", + "File: \n", "Function: step at line 8\n", "\n", "Line # Hits Time Per Hit % Time Line Contents\n", "==============================================================\n", " 8 def step():\n", - " 9 99999 88161.0 0.9 52.0 import random\n", - " 10 99999 81317.0 0.8 48.0 return 1. if random.random() > .5 else -1.\n", + " 9 99999 68856.0 0.7 51.5 import random\n", + " 10 99999 64893.0 0.6 48.5 return 1. if random.random() > .5 else -1.\n", "\n", - "Total time: 0.738541 s\n", - "File: \n", + "Total time: 0.568581 s\n", + "File: \n", "Function: walk at line 12\n", "\n", "Line # Hits Time Per Hit % Time Line Contents\n", "==============================================================\n", " 12 def walk(n):\n", - " 13 1 249.0 249.0 0.0 x = np.zeros(n)\n", - " 14 1 13.0 13.0 0.0 dx = 1. / n\n", - " 15 100000 68894.0 0.7 9.3 for i in range(n - 1):\n", - " 16 99999 494694.0 4.9 67.0 x_new = x[i] + dx * step()\n", - " 17 99999 84339.0 0.8 11.4 if x_new > 5e-3:\n", - " 18 x[i + 1] = 0.\n", + " 13 1 973.0 973.0 0.2 x = np.zeros(n)\n", + " 14 1 3.0 3.0 0.0 dx = 1. / n\n", + " 15 100000 53394.0 0.5 9.4 for i in range(n - 1):\n", + " 16 99999 381885.0 3.8 67.2 x_new = x[i] + dx * step()\n", + " 17 99999 65376.0 0.7 11.5 if x_new > 5e-3:\n", + " 18 1 1.0 1.0 0.0 x[i + 1] = 0.\n", " 19 else:\n", - " 20 99999 90352.0 0.9 12.2 x[i + 1] = x_new\n", + " 20 99998 66949.0 0.7 11.8 x[i + 1] = x_new\n", " 21 1 0.0 0.0 0.0 return x" ] }, @@ -412,7 +437,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -423,7 +448,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -440,7 +465,7 @@ "Name: share, dtype: float64" ] }, - "execution_count": 9, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -451,7 +476,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -464,7 +489,7 @@ "Name: bornCountryCode, dtype: object" ] }, - "execution_count": 10, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -475,7 +500,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -491,7 +516,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 19, "metadata": {}, "outputs": [], "source": [ @@ -507,7 +532,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 20, "metadata": {}, "outputs": [ { @@ -604,7 +629,7 @@ "9 Thomson 50.0" ] }, - "execution_count": 14, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -615,7 +640,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 21, "metadata": {}, "outputs": [ { @@ -639,16 +664,16 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 18, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, @@ -678,7 +703,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 23, "metadata": {}, "outputs": [ { @@ -722,7 +747,7 @@ " name\n", " city\n", " country\n", - " number\n", + " age\n", " \n", " \n", " \n", @@ -748,7 +773,7 @@ " Swedish Gas-Accumulator Co.\n", " Lidingö-Stockholm\n", " Sweden\n", - " 1.0\n", + " 43.0\n", " \n", " \n", " 28\n", @@ -772,7 +797,7 @@ " Uppsala University\n", " Uppsala\n", " Sweden\n", - " 1.0\n", + " 38.0\n", " \n", " \n", " 95\n", @@ -796,7 +821,7 @@ " Royal Institute of Technology\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 62.0\n", " \n", " \n", " 124\n", @@ -820,7 +845,7 @@ " Uppsala University\n", " Uppsala\n", " Sweden\n", - " 1.0\n", + " 63.0\n", " \n", " \n", " 168\n", @@ -844,7 +869,7 @@ " Stockholm University\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 44.0\n", " \n", " \n", " 187\n", @@ -868,7 +893,7 @@ " Uppsala University\n", " Uppsala\n", " Sweden\n", - " 1.0\n", + " 42.0\n", " \n", " \n", " 217\n", @@ -892,7 +917,7 @@ " Uppsala University\n", " Uppsala\n", " Sweden\n", - " 1.0\n", + " 46.0\n", " \n", " \n", " 320\n", @@ -916,7 +941,7 @@ " Uppsala University\n", " Uppsala\n", " Sweden\n", - " 1.0\n", + " 49.0\n", " \n", " \n", " 378\n", @@ -940,7 +965,7 @@ " Karolinska Institutet, Nobel Medical Institute\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 52.0\n", " \n", " \n", " 415\n", @@ -964,7 +989,7 @@ " Karolinska Institutet\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 65.0\n", " \n", " \n", " 445\n", @@ -988,7 +1013,7 @@ " Harvard Medical School\n", " Boston, MA\n", " USA\n", - " 1.0\n", + " 57.0\n", " \n", " \n", " 446\n", @@ -1012,7 +1037,7 @@ " Karolinska Institutet\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 66.0\n", " \n", " \n", " 447\n", @@ -1036,7 +1061,7 @@ " Karolinska Institutet\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 48.0\n", " \n", " \n", " 494\n", @@ -1060,7 +1085,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 64.0\n", " \n", " \n", " 508\n", @@ -1084,7 +1109,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 61.0\n", " \n", " \n", " 518\n", @@ -1108,7 +1133,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 64.0\n", " \n", " \n", " 543\n", @@ -1132,7 +1157,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 56.0\n", " \n", " \n", " 563\n", @@ -1156,7 +1181,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 80.0\n", " \n", " \n", " 598\n", @@ -1180,7 +1205,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 51.0\n", " \n", " \n", " 604\n", @@ -1204,7 +1229,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 57.0\n", " \n", " \n", " 619\n", @@ -1228,7 +1253,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 67.0\n", " \n", " \n", " 634\n", @@ -1252,7 +1277,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 60.0\n", " \n", " \n", " 658\n", @@ -1276,7 +1301,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 74.0\n", " \n", " \n", " 659\n", @@ -1300,7 +1325,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 70.0\n", " \n", " \n", " 692\n", @@ -1324,7 +1349,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 76.0\n", " \n", " \n", " 697\n", @@ -1348,7 +1373,7 @@ " Stockholm School of Economics\n", " Stockholm\n", " Sweden\n", - " 1.0\n", + " 78.0\n", " \n", " \n", " 731\n", @@ -1372,7 +1397,7 @@ " Göteborg University\n", " Gothenburg\n", " Sweden\n", - " 1.0\n", + " 77.0\n", " \n", " \n", " 893\n", @@ -1396,7 +1421,7 @@ " NaN\n", " NaN\n", " NaN\n", - " 1.0\n", + " 80.0\n", " \n", " \n", " 956\n", @@ -1420,7 +1445,7 @@ " Francis Crick Institute\n", " Hertfordshire\n", " United Kingdom\n", - " 1.0\n", + " 77.0\n", " \n", " \n", "\n", @@ -1428,36 +1453,36 @@ "" ], "text/plain": [ - " id firstname surname born died \\\n", - "16 17 Nils Gustaf Dalén 1869-11-30 1937-12-09 \n", - "28 29 Karl Manne Georg Siegbahn 1886-12-03 1978-09-26 \n", - "95 91 Hannes Olof Gösta Alfvén 1908-05-30 1995-04-02 \n", - "124 120 Kai M. Siegbahn 1918-04-20 2007-07-20 \n", - "168 162 Svante August Arrhenius 1859-02-19 1927-10-02 \n", - "187 183 The (Theodor) Svedberg 1884-08-30 1971-02-25 \n", - "217 208 Arne Wilhelm Kaurin Tiselius 1902-08-10 1971-10-29 \n", - "320 305 Allvar Gullstrand 1862-06-05 1930-07-28 \n", - "378 359 Axel Hugo Theodor Theorell 1903-07-06 1982-08-15 \n", - "415 395 Ulf von Euler 1905-02-07 1983-03-09 \n", - "445 424 Torsten N. Wiesel 1924-06-03 0000-00-00 \n", - "446 425 Sune K. Bergström 1916-01-10 2004-08-15 \n", - "447 426 Bengt I. Samuelsson 1934-05-21 0000-00-00 \n", - "494 473 Klas Pontus Arnoldson 1844-10-27 1916-02-20 \n", - "508 485 Karl Hjalmar Branting 1860-11-23 1925-02-24 \n", - "518 495 Lars Olof Jonathan (Nathan) Söderblom 1866-01-15 1931-07-12 \n", - "543 520 Dag Hjalmar Agne Carl Hammarskjöld 1905-07-29 1961-09-18 \n", - "563 543 Alva Myrdal 1902-01-31 1986-02-01 \n", - "598 579 Selma Ottilia Lovisa Lagerlöf 1858-11-20 1940-03-16 \n", - "604 585 Carl Gustaf Verner von Heidenstam 1859-07-06 1940-05-20 \n", - "619 604 Erik Axel Karlfeldt 1864-07-20 1931-04-08 \n", - "634 622 Pär Fabian Lagerkvist 1891-05-23 1974-07-11 \n", - "658 649 Eyvind Johnson 1900-07-29 1976-08-25 \n", - "659 650 Harry Martinson 1904-05-06 1978-02-11 \n", - "692 684 Gunnar Myrdal 1898-12-06 1987-05-17 \n", - "697 689 Bertil Ohlin 1899-04-23 1979-08-03 \n", - "731 722 Arvid Carlsson 1923-01-25 0000-00-00 \n", - "893 868 Tomas Tranströmer 1931-04-15 2015-03-26 \n", - "956 921 Tomas Lindahl 1938-01-28 0000-00-00 \n", + " id firstname surname born died \\\n", + "16 17 Nils Gustaf Dalén 1869-11-30 1937-12-09 \n", + "28 29 Karl Manne Georg Siegbahn 1886-12-03 1978-09-26 \n", + "95 91 Hannes Olof Gösta Alfvén 1908-05-30 1995-04-02 \n", + "124 120 Kai M. Siegbahn 1918-04-20 2007-07-20 \n", + "168 162 Svante August Arrhenius 1859-02-19 1927-10-02 \n", + "187 183 The (Theodor) Svedberg 1884-08-30 1971-02-25 \n", + "217 208 Arne Wilhelm Kaurin Tiselius 1902-08-10 1971-10-29 \n", + "320 305 Allvar Gullstrand 1862-06-05 1930-07-28 \n", + "378 359 Axel Hugo Theodor Theorell 1903-07-06 1982-08-15 \n", + "415 395 Ulf von Euler 1905-02-07 1983-03-09 \n", + "445 424 Torsten N. Wiesel 1924-06-03 0000-00-00 \n", + "446 425 Sune K. Bergström 1916-01-10 2004-08-15 \n", + "447 426 Bengt I. Samuelsson 1934-05-21 0000-00-00 \n", + "494 473 Klas Pontus Arnoldson 1844-10-27 1916-02-20 \n", + "508 485 Karl Hjalmar Branting 1860-11-23 1925-02-24 \n", + "518 495 Lars Olof Jonathan (Nathan) Söderblom 1866-01-15 1931-07-12 \n", + "543 520 Dag Hjalmar Agne Carl Hammarskjöld 1905-07-29 1961-09-18 \n", + "563 543 Alva Myrdal 1902-01-31 1986-02-01 \n", + "598 579 Selma Ottilia Lovisa Lagerlöf 1858-11-20 1940-03-16 \n", + "604 585 Carl Gustaf Verner von Heidenstam 1859-07-06 1940-05-20 \n", + "619 604 Erik Axel Karlfeldt 1864-07-20 1931-04-08 \n", + "634 622 Pär Fabian Lagerkvist 1891-05-23 1974-07-11 \n", + "658 649 Eyvind Johnson 1900-07-29 1976-08-25 \n", + "659 650 Harry Martinson 1904-05-06 1978-02-11 \n", + "692 684 Gunnar Myrdal 1898-12-06 1987-05-17 \n", + "697 689 Bertil Ohlin 1899-04-23 1979-08-03 \n", + "731 722 Arvid Carlsson 1923-01-25 0000-00-00 \n", + "893 868 Tomas Tranströmer 1931-04-15 2015-03-26 \n", + "956 921 Tomas Lindahl 1938-01-28 0000-00-00 \n", "\n", " bornCountry bornCountryCode bornCity \\\n", "16 Sweden SE Stenstorp \n", @@ -1490,36 +1515,36 @@ "893 Sweden SE Stockholm \n", "956 Sweden SE Stockholm \n", "\n", - " diedCountry diedCountryCode ... gender year \\\n", - "16 Sweden SE ... male 1912.0 \n", - "28 Sweden SE ... male 1924.0 \n", - "95 Sweden SE ... male 1970.0 \n", - "124 Sweden SE ... male 1981.0 \n", - "168 Sweden SE ... male 1903.0 \n", - "187 Sweden SE ... male 1926.0 \n", - "217 Sweden SE ... male 1948.0 \n", - "320 Sweden SE ... male 1911.0 \n", - "378 Sweden SE ... male 1955.0 \n", - "415 Sweden SE ... male 1970.0 \n", - "445 NaN NaN ... male 1981.0 \n", - "446 Sweden SE ... male 1982.0 \n", - "447 NaN NaN ... male 1982.0 \n", - "494 Sweden SE ... male 1908.0 \n", - "508 Sweden SE ... male 1921.0 \n", - "518 Sweden SE ... male 1930.0 \n", - "543 Northern Rhodesia (now Zambia) ZM ... male 1961.0 \n", - "563 Sweden SE ... female 1982.0 \n", - "598 Sweden SE ... female 1909.0 \n", - "604 Sweden SE ... male 1916.0 \n", - "619 Sweden SE ... male 1931.0 \n", - "634 Sweden SE ... male 1951.0 \n", - "658 Sweden SE ... male 1974.0 \n", - "659 Sweden SE ... male 1974.0 \n", - "692 Sweden SE ... male 1974.0 \n", - "697 Sweden SE ... male 1977.0 \n", - "731 NaN NaN ... male 2000.0 \n", - "893 Sweden SE ... male 2011.0 \n", - "956 NaN NaN ... male 2015.0 \n", + " diedCountry diedCountryCode ... gender year \\\n", + "16 Sweden SE ... male 1912.0 \n", + "28 Sweden SE ... male 1924.0 \n", + "95 Sweden SE ... male 1970.0 \n", + "124 Sweden SE ... male 1981.0 \n", + "168 Sweden SE ... male 1903.0 \n", + "187 Sweden SE ... male 1926.0 \n", + "217 Sweden SE ... male 1948.0 \n", + "320 Sweden SE ... male 1911.0 \n", + "378 Sweden SE ... male 1955.0 \n", + "415 Sweden SE ... male 1970.0 \n", + "445 NaN NaN ... male 1981.0 \n", + "446 Sweden SE ... male 1982.0 \n", + "447 NaN NaN ... male 1982.0 \n", + "494 Sweden SE ... male 1908.0 \n", + "508 Sweden SE ... male 1921.0 \n", + "518 Sweden SE ... male 1930.0 \n", + "543 Northern Rhodesia (now Zambia) ZM ... male 1961.0 \n", + "563 Sweden SE ... female 1982.0 \n", + "598 Sweden SE ... female 1909.0 \n", + "604 Sweden SE ... male 1916.0 \n", + "619 Sweden SE ... male 1931.0 \n", + "634 Sweden SE ... male 1951.0 \n", + "658 Sweden SE ... male 1974.0 \n", + "659 Sweden SE ... male 1974.0 \n", + "692 Sweden SE ... male 1974.0 \n", + "697 Sweden SE ... male 1977.0 \n", + "731 NaN NaN ... male 2000.0 \n", + "893 Sweden SE ... male 2011.0 \n", + "956 NaN NaN ... male 2015.0 \n", "\n", " category overallMotivation share \\\n", "16 physics NaN 1.0 \n", @@ -1614,41 +1639,41 @@ "893 NaN NaN \n", "956 Francis Crick Institute Hertfordshire \n", "\n", - " country number \n", - "16 Sweden 1.0 \n", - "28 Sweden 1.0 \n", - "95 Sweden 1.0 \n", - "124 Sweden 1.0 \n", - "168 Sweden 1.0 \n", - "187 Sweden 1.0 \n", - "217 Sweden 1.0 \n", - "320 Sweden 1.0 \n", - "378 Sweden 1.0 \n", - "415 Sweden 1.0 \n", - "445 USA 1.0 \n", - "446 Sweden 1.0 \n", - "447 Sweden 1.0 \n", - "494 NaN 1.0 \n", - "508 NaN 1.0 \n", - "518 NaN 1.0 \n", - "543 NaN 1.0 \n", - "563 NaN 1.0 \n", - "598 NaN 1.0 \n", - "604 NaN 1.0 \n", - "619 NaN 1.0 \n", - "634 NaN 1.0 \n", - "658 NaN 1.0 \n", - "659 NaN 1.0 \n", - "692 NaN 1.0 \n", - "697 Sweden 1.0 \n", - "731 Sweden 1.0 \n", - "893 NaN 1.0 \n", - "956 United Kingdom 1.0 \n", + " country age \n", + "16 Sweden 43.0 \n", + "28 Sweden 38.0 \n", + "95 Sweden 62.0 \n", + "124 Sweden 63.0 \n", + "168 Sweden 44.0 \n", + "187 Sweden 42.0 \n", + "217 Sweden 46.0 \n", + "320 Sweden 49.0 \n", + "378 Sweden 52.0 \n", + "415 Sweden 65.0 \n", + "445 USA 57.0 \n", + "446 Sweden 66.0 \n", + "447 Sweden 48.0 \n", + "494 NaN 64.0 \n", + "508 NaN 61.0 \n", + "518 NaN 64.0 \n", + "543 NaN 56.0 \n", + "563 NaN 80.0 \n", + "598 NaN 51.0 \n", + "604 NaN 57.0 \n", + "619 NaN 67.0 \n", + "634 NaN 60.0 \n", + "658 NaN 74.0 \n", + "659 NaN 70.0 \n", + "692 NaN 76.0 \n", + "697 Sweden 78.0 \n", + "731 Sweden 77.0 \n", + "893 NaN 80.0 \n", + "956 United Kingdom 77.0 \n", "\n", "[29 rows x 21 columns]" ] }, - "execution_count": 30, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -1667,16 +1692,23 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "nobel[\"number\"] = 1.0" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Pick a few countries to analyze further" + ] + }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ @@ -1693,7 +1725,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 26, "metadata": {}, "outputs": [ { @@ -1784,7 +1816,7 @@ "United Kingdom 22.0 7.0 6.0 26.0 5.0 22.0" ] }, - "execution_count": 37, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } @@ -1804,7 +1836,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 27, "metadata": {}, "outputs": [ { @@ -1834,7 +1866,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 28, "metadata": {}, "outputs": [], "source": [ @@ -1843,7 +1875,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 29, "metadata": {}, "outputs": [], "source": [ @@ -1871,7 +1903,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 30, "metadata": {}, "outputs": [], "source": [ @@ -1880,7 +1912,32 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Object `%cpp` not found.\n" + ] + } + ], + "source": [ + "# get help on the cpp magic:\n", + "%cpp?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hello World program in C++" + ] + }, + { + "cell_type": "code", + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -1910,40 +1967,34 @@ "### *Exercise 5:* Parallel Python with ipyparallel " ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import module, create client and DirectView object:" + ] + }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 34, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Waiting for connection file: ~/.ipython/profile_default/security/ipcontroller-client.json\n" - ] - }, - { - "ename": "OSError", - "evalue": "Connection file '~/.ipython/profile_default/security/ipcontroller-client.json' not found.\nYou have attempted to connect to an IPython Cluster but no Controller could be found.\nPlease double-check your configuration and ensure that a cluster is running.", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mipyparallel\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mipp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mclient\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mipp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mClient\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdview\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mclient\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda2/envs/juplab/lib/python3.6/site-packages/ipyparallel/client/client.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, url_file, profile, profile_dir, ipython_dir, context, debug, sshserver, sshkey, password, paramiko, timeout, cluster_id, **extra_args)\u001b[0m\n\u001b[1;32m 411\u001b[0m \u001b[0mno_file_msg\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 412\u001b[0m ])\n\u001b[0;32m--> 413\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mIOError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 414\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0murl_file\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 415\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mIOError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mno_file_msg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mOSError\u001b[0m: Connection file '~/.ipython/profile_default/security/ipcontroller-client.json' not found.\nYou have attempted to connect to an IPython Cluster but no Controller could be found.\nPlease double-check your configuration and ensure that a cluster is running." - ] - } - ], + "outputs": [], "source": [ "import ipyparallel as ipp\n", "client = ipp.Client()\n", "dview = client[:]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import modules, export `random` module to engines:" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 35, "metadata": {}, "outputs": [], "source": [ @@ -1954,7 +2005,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 36, "metadata": {}, "outputs": [], "source": [ @@ -1970,17 +2021,32 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 37, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3.05 s ± 97.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], "source": [ "%%timeit -n 1\n", "mcpi(int(1e7))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Function for splitting up the samples and dispatching the chunks to the engines" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 38, "metadata": {}, "outputs": [], "source": [ @@ -1998,9 +2064,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 39, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.71 s ± 30.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], "source": [ "%%timeit -n 1\n", "multi_mcpi(dview, int(1e7))" @@ -2010,274 +2084,50 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Lesson 3 - accelerating python" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import math\n", - "import random\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import seaborn\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Exercise 3.2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def primes(kmax): \n", - " p = []\n", - " result = [] \n", - " if kmax > 1000:\n", - " kmax = 1000\n", - " k = 0\n", - " n = 2\n", - " while k < kmax:\n", - " i = 0\n", - " while i < k and n % p[i] != 0:\n", - " i = i + 1\n", - " if i == k:\n", - " p.append(n)\n", - " k = k + 1\n", - " result.append(n)\n", - " n = n + 1\n", - " return result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "t_py = %timeit -o p = primes(100)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%load_ext Cython" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Start with simplest cython" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%cython -a\n", - "def primes_simplecython(kmax): \n", - " p = []\n", - " result = [] \n", - " if kmax > 1000:\n", - " kmax = 1000\n", - " k = 0\n", - " n = 2\n", - " while k < kmax:\n", - " i = 0\n", - " while i < k and n % p[i] != 0:\n", - " i = i + 1\n", - " if i == k:\n", - " p.append(n)\n", - " k = k + 1\n", - " result.append(n)\n", - " n = n + 1\n", - " return result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "t_cy0 = %timeit -o p = primes_simplecython(100)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now for proper cythonization. Add annotation (`-a`) if you want!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%cython -a\n", - "def primes_cython(int kmax): # The argument will be converted to int or raise a TypeError.\n", - " cdef int n, k, i # These variables are declared with C types.\n", - " cdef int p[1000] # Another C type\n", - " result = [] # A Python type\n", - " if kmax > 1000:\n", - " kmax = 1000\n", - " k = 0\n", - " n = 2\n", - " while k < kmax:\n", - " i = 0\n", - " while i < k and n % p[i] != 0:\n", - " i = i + 1\n", - " if i == k:\n", - " p[k] = n\n", - " k = k + 1\n", - " result.append(n)\n", - " n = n + 1\n", - " return result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "t_cy = %timeit -o p = primes_cython(100)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Factor 20 in speedup" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's compare with just-in-time compilation with numba" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from numba import jit, vectorize, float64" + "### *Exercise 6:* Mixing Python and R " ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 40, "metadata": {}, "outputs": [], "source": [ - "@jit\n", - "def primes_jit(kmax): \n", - " p = []\n", - " result = [] \n", - " if kmax > 1000:\n", - " kmax = 1000\n", - " k = 0\n", - " n = 2\n", - " while k < kmax:\n", - " i = 0\n", - " while i < k and n % p[i] != 0:\n", - " i = i + 1\n", - " if i == k:\n", - " p.append(n)\n", - " k = k + 1\n", - " result.append(n)\n", - " n = n + 1\n", - " return result" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "t_jit = %timeit -o p = primes_jit(100)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\" Python: %.3E\\n Simply Cython: %.3E\\n Proper Cython: %.3E\\n Numba-jit: %.3E\"%(t_py.best,t_cy0.best,t_cy.best,t_jit.best))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Just-in-time compilation comes close to Cython" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice that the slowest run took much much longer than the fastest. \n", - "`cache=True` stores the compiled function in file-based cache and avoids re-compilation on re-running\n" + "import pandas as pd\n", + "df = pd.DataFrame({\n", + " 'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n", + " 'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]\n", + "})" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 41, "metadata": {}, "outputs": [], "source": [ - "@jit(cache=True)\n", - "def primes_jit2(kmax): \n", - " p = []\n", - " result = [] \n", - " if kmax > 1000:\n", - " kmax = 1000\n", - " k = 0\n", - " n = 2\n", - " while k < kmax:\n", - " i = 0\n", - " while i < k and n % p[i] != 0:\n", - " i = i + 1\n", - " if i == k:\n", - " p.append(n)\n", - " k = k + 1\n", - " result.append(n)\n", - " n = n + 1\n", - " return result" + "%load_ext rpy2.ipython" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 43, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "\n" + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ - "%%timeit\n", - "p = primes_jit2(100)" + "%%R -i df -w 6 -h 4 --units cm -r 200\n", + "# the first line says 'import df and make default figure size 5 by 5 inches \n", + "# with resolution 200. You can change the units to px, cm, etc. as you wish.\n", + "library(ggplot2)\n", + "ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line();" ] }, {