Skip to content

Commit

Permalink
add exercise on R+python
Browse files Browse the repository at this point in the history
  • Loading branch information
wikfeldt committed Nov 12, 2018
1 parent 4391472 commit 516288b
Show file tree
Hide file tree
Showing 2 changed files with 351 additions and 469 deletions.
40 changes: 36 additions & 4 deletions exercises.ipynb
Expand Up @@ -141,7 +141,7 @@
"4. The \"share\" column of the dataframe contains the number of Nobel recipients that shared the prize. Have a look at the statistics of this column using \n",
"\n",
"```python\n",
"nobels[\"share\"].describe()\n",
"nobel[\"share\"].describe()\n",
"```\n",
"\n",
"5. The `describe()` method is smart about data types. Try this: \n",
Expand All @@ -161,15 +161,14 @@
"7. Next subtract the birth date from the year of receiving the prize and insert it into a new column \"age\":\n",
"```python\n",
"nobel[\"age\"] = nobel[\"year\"] - nobel[\"born\"].dt.year\n",
"nobel[[\"surname\",\"age\"]].head(10)\n",
"```\n",
" - Now print the \"surname\" and \"age\" of first 10 entries using the `head()` method.\n",
"\n",
"8. Now plot results in two different ways:\n",
"\n",
"```python\n",
"nobel[\"age\"].plot.hist(bins=[20,30,40,50,60,70,80,\n",
" 90,100],alpha=0.6);\n",
"nobel[\"age\"].plot.hist(bins=[20,30,40,50,60,70,80, 90,100],alpha=0.6);\n",
"\n",
"nobel.boxplot(column=\"age\", by=\"category\")\n",
"```\n",
"\n",
Expand Down Expand Up @@ -340,6 +339,39 @@
"Final note: While parallelizing Python code is often worth it, there are other ways to get higher performance out of Python code. In particular, fast numerical packages like [Numpy](http://www.numpy.org/) should be used, and significant speedup can be obtained with just-in-time compilation with [Numba](https://numba.pydata.org/) and/or C-extensions from [Cython](http://cython.org/).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color=\"red\"> *Exercise 6:* Mixing Python and R </font>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your goal now is to define a pandas dataframe, and pass it into an R cell and plot it with an R plotting library.\n",
"\n",
"1. Run the following code in a code cell and plot it with the basic plot method of pandas dataframes:\n",
"\n",
"```python\n",
"import pandas as pd\n",
"df = pd.DataFrame({\n",
" 'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n",
" 'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]\n",
"})\n",
"```\n",
" \n",
"2. Now take the following R code, and use the `%%R` magic command to pass in and plot the pandas dataframe defined above:\n",
"\n",
"```R\n",
"library(ggplot2)\n",
"ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()\n",
"```\n",
"\n",
"3. Play around with the flags for height, width, units and resolution to get a good looking graph."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down

0 comments on commit 516288b

Please sign in to comment.