# Regular expressions workshop - material borrowed from Sweigert, chpt 7

<table summary="Shorthand Codes for Common Character Classes" class="calibre9">
<colgroup class="calibre10">
<col class="calibre11"></col>
<col class="calibre11"></col>
</colgroup>
<thead class="calibre12">
<tr class="calibre13">
<th valign="top" class="calibre14">
<p class="calibre4">Shorthand character class</p>
</th>
<th valign="top" class="calibre15">
<p class="calibre4">Represents</p>
</th>
</tr>
</thead>
<tbody class="calibre16">
<tr class="calibre13">
<td valign="top" class="calibre17">
<p class="calibre4"><code class="literal2">\d</code></p>
</td>
<td valign="top" class="calibre18">
<p class="calibre4">Any numeric digit from 0 to 9.</p>
</td>
</tr>
<tr class="calibre19">
<td valign="top" class="calibre17">
<p class="calibre4"><code class="literal2">\D</code></p>
</td>
<td valign="top" class="calibre18">
<p class="calibre4">Any character that is <span class="calibre1"><em class="calibre5">not</em></span> a numeric digit from 0 to 9.</p>
</td>
</tr>
<tr class="calibre13">
<td valign="top" class="calibre17">
<p class="calibre4"><code class="literal2">\w</code></p>
</td>
<td valign="top" class="calibre18">
<p class="calibre4">Any letter, numeric digit, or the underscore character. (Think of this as matching &ldquo;word&rdquo; characters.)</p>
</td>
</tr>
<tr class="calibre19">
<td valign="top" class="calibre17">
<p class="calibre4"><code class="literal2">\W</code></p>
</td>
<td valign="top" class="calibre18">
<p class="calibre4">Any character that is <span class="calibre1"><em class="calibre5">not</em></span> a letter, numeric digit, or the underscore character.</p>
</td>
</tr>
<tr class="calibre13">
<td valign="top" class="calibre17">
<p class="calibre4"><code class="literal2">\s</code></p>
</td>
<td valign="top" class="calibre18">
<p class="calibre4">Any space, tab, or newline character. (Think of this as matching &ldquo;space&rdquo; characters.)</p>
</td>
</tr>
<tr class="calibre19">
<td valign="top" class="calibre20">
<p class="calibre4"><code class="literal2">\S</code></p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">Any character that is <span class="calibre1"><em class="calibre5">not</em></span> a space, tab, or newline.</p>
</td>
</tr>
</tbody>
</table>

<div class="book" title="Review of Regex Symbols">
<div class="titlepage">
<div class="book">
<div class="book">
<h1 class="title2"><a id="calibre_link-2603" class="firstname"></a>Review of Regex Symbols</h1>
</div>
</div>
</div>
<p class="calibre4">This chapter covered a lot of notation, so here&rsquo;s a quick review of what you learned:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4">The <code class="literal1">?</code> matches zero or one of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">*</code> matches zero or more of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">+</code> matches one or more of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">{n}</code> matches exactly <span class="calibre1"><em class="calibre5">n</em></span> of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">{n,}</code> matches <span class="calibre1"><em class="calibre5">n</em></span> or more of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">{,m}</code> matches 0 to <span class="calibre1"><em class="calibre5">m</em></span> of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">{n,m}</code> matches at least <span class="calibre1"><em class="calibre5">n</em></span> and at most <span class="calibre1"><em class="calibre5">m</em></span> of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">{n,m}?</code> or <code class="literal1">*?</code> or <code class="literal1">+?</code> performs a nongreedy match of the preceding group.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">^spam</code> means the string must begin with <span class="calibre1"><em class="calibre5">spam</em></span>.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">spam$</code> means the string must end with <span class="calibre1"><em class="calibre5">spam</em></span>.</p>
</li>
<li class="listitem">
<p class="calibre4">The <code class="literal1">.</code> matches any character, except newline characters.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">\d</code>, <code class="literal1">\w</code>, and <code class="literal1">\s</code> match a digit, word, or space character, respectively.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">\D</code>, <code class="literal1">\W</code>, and <code class="literal1">\S</code> match anything except a digit, word, or space character, respectively.</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">[abc]</code> matches any character between the brackets (such as <span class="calibre1"><em class="calibre5">a</em></span>, <span class="calibre1"><em class="calibre5">b</em></span>, or <span class="calibre1"><em class="calibre5">c</em></span>).</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal1">[^abc]</code> matches any character that isn&rsquo;t between the brackets.</p>
</li>
</ul>
</div>
</div>
<div class="book" title="Case-Insensitive Matching">
<div class="titlepage">
<div class="book">
<div class="book">
<h1 class="title2"><a id="calibre_link-2604" class="firstname"></a>Case-Insensitive Matching</h1>
</div>
</div>
</div>

<table border="0" width="100%" summary="Q and A Set" class="calibre22">
<col width="1%" class="calibre23"></col>
<col class="calibre11"></col>
<tbody class="calibre16">
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2631" class="calibre1"></a><a id="calibre_link-2632" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">1. What is the function that creates <code class="literal2">Regex</code> objects?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2633" class="calibre1"></a><a id="calibre_link-2634" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">2. Why are raw strings often used when creating <code class="literal2">Regex</code> objects?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2635" class="calibre1"></a><a id="calibre_link-2636" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">3. What does the <code class="literal2">search()</code> method return?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2637" class="calibre1"></a><a id="calibre_link-2638" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">4. How do you get the actual strings that match the pattern from a <code class="literal2">Match</code> object?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2639" class="calibre1"></a><a id="calibre_link-2640" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">5. In the regex created from <code class="literal2">r'(\d\d\d)-(\d\d\d-\d\d\d\d)'</code>, what does group <code class="literal2">0</code> cover? Group <code class="literal2">1</code>? Group <code class="literal2">2</code>?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2641" class="calibre1"></a><a id="calibre_link-2642" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">6. Parentheses and periods have specific meanings in regular expression syntax. How would you specify that you want a regex to match actual parentheses and period characters?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2643" class="calibre1"></a><a id="calibre_link-2644" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">7. The <code class="literal2">findall()</code> method returns a list of strings or a list of tuples of strings. What makes it return one or the other?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2645" class="calibre1"></a><a id="calibre_link-2646" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">8. What does the <code class="literal2">|</code> character signify in regular expressions?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2647" class="calibre1"></a><a id="calibre_link-2648" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">9. What two things does the <code class="literal2">?</code> character signify in regular expressions?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2649" class="calibre1"></a><a id="calibre_link-2650" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">10. What is the difference between the <code class="literal2">+</code> and <code class="literal2">*</code> characters in regular expressions?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2651" class="calibre1"></a><a id="calibre_link-2652" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">11. What is the difference between <code class="literal2">{3}</code> and <code class="literal2">{3,5}</code> in regular expressions?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2653" class="calibre1"></a><a id="calibre_link-2654" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">12. What do the <code class="literal2">\d</code>, <code class="literal2">\w</code>, and <code class="literal2">\s</code> shorthand character classes signify in regular expressions?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2655" class="calibre1"></a><a id="calibre_link-2656" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">13. What do the <code class="literal2">\D</code>, <code class="literal2">\W</code>, and <code class="literal2">\S</code> shorthand character classes signify in regular expressions?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2657" class="calibre1"></a><a id="calibre_link-2658" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">14. How do you make a regular expression case-insensitive?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2659" class="calibre1"></a><a id="calibre_link-2660" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">15. What does the <code class="literal2">.</code> character normally match? What does it match if <code class="literal2">re.DOTALL</code> is passed as the second argument to <code class="literal2">re.compile()</code>?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2661" class="calibre1"></a><a id="calibre_link-2662" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">16. What is the difference between these two: <code class="literal2">.*</code> and <code class="literal2">.*?</code></p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2663" class="calibre1"></a><a id="calibre_link-2664" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">17. What is the character class syntax to match all numbers and lowercase letters?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2665" class="calibre1"></a><a id="calibre_link-2666" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">18. If <code class="literal2">numRegex = re.compile(r'\d+')</code>, what will <code class="literal2">numRegex.sub('X', '12 drummers, 11 pipers, five rings, 3 hens')</code> return?</p>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2667" class="calibre1"></a><a id="calibre_link-2668" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">19. What does passing <code class="literal2">re.VERBOSE</code> as the second argument to <code class="literal2">re.compile()</code> allow you to do?</p>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2669" class="calibre1"></a><a id="calibre_link-2670" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">20. How would you write a regex that matches a number with commas for every three digits? It must match the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'42'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'1,234'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'6,368,745'</code></p>
</li>
</ul>
</div>
<p class="calibre4">but not the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'12,34,567'</code> (which has only two digits between the commas)</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'1234'</code> (which lacks commas)</p>
</li>
</ul>
</div>
</td>
</tr>
<tr class="calibre13" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2671" class="calibre1"></a><a id="calibre_link-2672" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">21. How would you write a regex that matches the full name of someone whose last name is Nakamoto? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'Satoshi Nakamoto'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Alice Nakamoto'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Robocop Nakamoto'</code></p>
</li>
</ul>
</div>
<p class="calibre4">but not the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'satoshi Nakamoto'</code> (where the first name is not capitalized)</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Mr. Nakamoto'</code> (where the preceding word has a nonletter character)</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Nakamoto'</code> (which has no first name)</p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Satoshi nakamoto'</code> (where Nakamoto is not capitalized)</p>
</li>
</ul>
</div>
</td>
</tr>
<tr class="calibre19" title="Q:">
<td valign="top" class="calibre21"><a id="calibre_link-2673" class="calibre1"></a><a id="calibre_link-2674" class="calibre1"></a>
<p class="calibre4">Q:</p>
</td>
<td valign="top" class="calibre21">
<p class="calibre4">22. How would you write a regex that matches a sentence where the first word is either <span class="calibre1"><em class="calibre5">Alice</em></span>, <span class="calibre1"><em class="calibre5">Bob</em></span>, or <span class="calibre1"><em class="calibre5">Carol</em></span>; the second word is either <span class="calibre1"><em class="calibre5">eats</em></span>, <span class="calibre1"><em class="calibre5">pets</em></span>, or <span class="calibre1"><em class="calibre5">throws</em></span>; the third word is <span class="calibre1"><em class="calibre5">apples</em></span>, <span class="calibre1"><em class="calibre5">cats</em></span>, or <span class="calibre1"><em class="calibre5">baseballs</em></span>; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'Alice eats apples.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Bob pets cats.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Carol throws baseballs.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Alice throws Apples.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'BOB EATS CATS.'</code></p>
</li>
</ul>
</div>
<p class="calibre4">but not the following:</p>
<div class="book">
<ul class="itemizedlist">
<li class="listitem">
<p class="calibre4"><code class="literal2">'Robocop eats apples.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'ALICE THROWS FOOTBALLS.'</code></p>
</li>
<li class="listitem">
<p class="calibre4"><code class="literal2">'Carol eats 7 cats.'</code></p>
</li>
</ul>
</div>
</td>
</tr>
</tbody>
</table>