## 👸🤴 Digression, Part I: Extracting Quotations with Regex

We probably won't have time to cover this in lecture, but it's potentially fun for those of you who are interested.

### Note: None of this Digression (Part 1 or Part 2) will be on the midterm or the exam. It is not something we require you to know. It's just for fun.

For this task, we will need some cool new Regex characters:
* `.`: "any character except a newline"

And this "quantifier" which you add to a regular expression to signal
* `*`: "zero or more occurences of the thing immediately to my left"

So that the expression
* `.*` means "zero or more characters other than a newline"

To catch one-line quotations, one could try...
* `".*"`: a `"` character, followed by zero or more occurences of any character except a newline, followed by a `"` character 

This actually won't work on *The Sign of the Four* — becuase it's from Project Gutenberg, and PG files use “curly quotes.” **Yes: `"` and `“` and `”` are all actually different characters**!

Note that we need to use “curly quotes” for Project Gutenberg files. And we got *The Sign of the Four* from PG. So we need:
* `“.*”`: a `“` character, followed by zero or more occurences of any character except a newline, followed by a `”` character 

We need one added complexity: the searches are "greedy" (they grab as much as they can), so they sometimes add narration in between the opening and closing quotes. We can fix this by specifying that if you see a close-quote character, immediately stop. The regular expression we need is:

* `“[^”]*”` — where the `[^”]` means "any character except a close quote"

## 👸🤴 Digression, Part II: Creating a Literary Mashup

### Note: Like the Part I of this Digression, this will not be on the midterm or the exam.

Now, let's say you want to do something really fun with these quotations you just extracted... like, say, stick them into *Pride and Prejudice* so that all of Austen's dialogue is replaced with Conan Doyle's!

Here's how I'm going to do it:
* Load up P&P
* Replace all the dialogue in P&P with the phrase "QUOTE_HERE" so that I know where to stick my replacement quotations.
* Then iterate through my list of SOT4 quotations, popping them into P&P one by one

In [None]:
# This loads P&P, a copy of which I have conveniently placed in the same folder as this notebook.

pandp = open("pride_prejudice.txt", encoding="utf-8").read()

print(pandp[470:773])

In [None]:
# This replaces all dialogue in P&P with the phrase "QUOTE_HERE", 
# creating targets I can then replace one-by-one with the SOT4 quotations

pride_of_the_four = re.sub("“[^”]*”", "QUOTE_HERE", pandp)

In [None]:
print(pride_of_the_four[470:650])

In [None]:
## Now I will iterate through the list of SOT4 quotations, using each one to replace one "QUOTE_HERE" in pride_of_the_four

for quotation in sot4_quotations:
    pride_of_the_four = re.sub("QUOTE.HERE", quotation, pride_of_the_four, 1) # the "1" at the end of this line specifies to only make one replacement for each item in sot4_quotations

In [None]:
# This is the amusing result...

print(pride_of_the_four[470:778])

In [None]:
# Below we will learn how to write files. But here's a little preview! This line saves our amazing new mashup novel as "pride-of-the-four.txt"

open("pride-of-the-four.txt", mode="w", encoding="utf-8").write(pride_of_the_four)