Merge pull request #3 from juanshishido/master

Chapter 3: dictionaries
AllenDowney · Jan 22, 2017 · b71e738 · b71e738
2 parents f7998f2 + a3099a5
commit b71e738
Show file tree

Hide file tree

Showing 2 changed files with 367 additions and 0 deletions.
diff --git a/03-dict-set/dialcodes.ipynb b/03-dict-set/dialcodes.ipynb
@@ -0,0 +1,166 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Dial codes and Dictionaries\n",
+    "\n",
+    "This notebook contains example code from [*Fluent Python*](http://shop.oreilly.com/product/0636920032519.do), by Luciano Ramalho.\n",
+    "\n",
+    "Code by Luciano Ramalho, modified by Allen Downey.\n",
+    "\n",
+    "MIT License: https://opensource.org/licenses/MIT"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Below, we'll show what happens when we create dictionaries using the data in `DIAL_CODES` when that data is sorted in different ways.\n",
+    "\n",
+    "We'll start by creating the data&mdash;a list of tuples with country codes and country names."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# dial codes of the top 10 most populous countries\n",
+    "DIAL_CODES = [\n",
+    "        (86, 'China'),\n",
+    "        (91, 'India'),\n",
+    "        (1, 'United States'),\n",
+    "        (62, 'Indonesia'),\n",
+    "        (55, 'Brazil'),\n",
+    "        (92, 'Pakistan'),\n",
+    "        (880, 'Bangladesh'),\n",
+    "        (234, 'Nigeria'),\n",
+    "        (7, 'Russia'),\n",
+    "        (81, 'Japan'),\n",
+    "    ]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can create a Python dictionary using the `dict()` function, with `DIAL_CODES` as the argument. Using the `.keys()` method, we can get a list of all of `d1`'s keys."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "d1: dict_keys([880, 1, 86, 55, 7, 234, 91, 92, 62, 81])\n"
+     ]
+    }
+   ],
+   "source": [
+    "d1 = dict(DIAL_CODES)\n",
+    "print('d1:', d1.keys())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that the keys are *not* in the same order as in `DIAL_CODES`.\n",
+    "\n",
+    "Let's create two more dictionaties and sort the input data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "d2: dict_keys([880, 1, 91, 86, 81, 55, 234, 7, 92, 62])\n"
+     ]
+    }
+   ],
+   "source": [
+    "d2 = dict(sorted(DIAL_CODES))\n",
+    "print('d2:', d2.keys())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "d3: dict_keys([880, 81, 1, 86, 55, 7, 234, 91, 92, 62])\n"
+     ]
+    }
+   ],
+   "source": [
+    "d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1]))\n",
+    "print('d3:', d3.keys())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Again, we see the keys are not in the same order as in `DIAL_CODES` or as in `d1`.\n",
+    "\n",
+    "However, the three dictionaries compare equal."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "assert d1 == d2 and d2 == d3"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/03-dict-set/index.ipynb b/03-dict-set/index.ipynb
@@ -0,0 +1,201 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Handling Missing Keys with `setdefault`\n",
+    "\n",
+    "This notebook contains example code from [*Fluent Python*](http://shop.oreilly.com/product/0636920032519.do), by Luciano Ramalho.\n",
+    "\n",
+    "Code by Luciano Ramalho, modified by Allen Downey.\n",
+    "\n",
+    "MIT License: https://opensource.org/licenses/MIT"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we show two ways to create a mapping of words and their occurrences. For each word in a text file, we create a list of tuples, one for each occurrence. The tuple values represent the position (line and column) of the word in the file. (Note that the line and column positions are indexed starting at one.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "file_name = 'text.txt'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# temporary text file\n",
+    "with open(file_name, 'w') as f:\n",
+    "    f.write('Fluent Python notebooks\\nJupyter notebooks')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "WORD_RE = re.compile('\\w+')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# using `.get()` method\n",
+    "index = {}\n",
+    "with open(file_name) as fp:\n",
+    "    for line_no, line in enumerate(fp, 1):\n",
+    "        for match in WORD_RE.finditer(line):\n",
+    "            word = match.group()\n",
+    "            column_no = match.start()+1\n",
+    "            location = (line_no, column_no)\n",
+    "            # this is ugly; coded like this to make a point\n",
+    "            occurrences = index.get(word, [])  # <1>\n",
+    "            occurrences.append(location)       # <2>\n",
+    "            index[word] = occurrences          # <3>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fluent [(1, 1)]\n",
+      "Jupyter [(2, 1)]\n",
+      "notebooks [(1, 15), (2, 9)]\n",
+      "Python [(1, 8)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# print in alphabetical order\n",
+    "for word in sorted(index, key=str.upper):\n",
+    "    print(word, index[word])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# better solution\n",
+    "# using `.setdefault()` method\n",
+    "index = {}\n",
+    "with open(file_name) as fp:\n",
+    "    for line_no, line in enumerate(fp, 1):\n",
+    "        for match in WORD_RE.finditer(line):\n",
+    "            word = match.group()\n",
+    "            column_no = match.start()+1\n",
+    "            location = (line_no, column_no)\n",
+    "            index.setdefault(word, []).append(location)  # <1>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fluent [(1, 1)]\n",
+      "Jupyter [(2, 1)]\n",
+      "notebooks [(1, 15), (2, 9)]\n",
+      "Python [(1, 8)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# print in alphabetical order\n",
+    "for word in sorted(index, key=str.upper):\n",
+    "    print(word, index[word])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This shows us that both blocks of code do the same things. However, we use less lines of code with the `.setdefault()` method and it's also more efficient, using a single lookup as opposed up to three with the `.get()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "os.remove('text.txt')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}