Please read the following instructions thoroughly. Neglecting to do so may result in missed points.

### Preamble
**Reminder**: Homeworks are due by 7:00PM ET on Sundays.

Before you turn this problem set in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

### Naming conventions
Be sure the filename of your notebook is in the following form:

    <uni>_<assignment>_<details [optional]>.<extension>
    
For example:

    lr3086_hw01.ipynb
    lr3086_hw01_complete.ipynb
    LR3086_HW01.ipynb
    
To rename a notebook, in the menubar, select File$\rightarrow$Rename. The extension for notebook files, `.ipynb`, will already be appended to the filename, but will be hidden from view within the notebook.
    
This naming format allows for autograding of all assignments. If your files are not named with this format, you should expect a grade of zero for the assignment.

Courseworks may rename your file to something like `lr3086_hw0-1.ipynb` if you resubmit your assignment. This is perfectly fine.

### What Format To Submit In

Most homeworks are in Jupyter notebooks. Once you've finished your homework, unless specified otherwise, please download your work as an `.ipynb` file to your local machine, then upload it to Courseworks when complete (in the menubar, select File$\rightarrow$Download as$\rightarrow$Notebook).

**Failure to submit a Jupyter notebook will result in a grade of zero for the assignment.**

### Grading

Possible points on late assignments are deducted by 50% for each day they are late. For example, if you get 80% of the total possible credits on a homework but hand in that homework a day late, you would get 40%. Assignments two days late get zero points.

Once solutions are posted and graded assignments are handed back, students have 1 week to bring their grading discrepancies to a CA for consideration of possible grading errors.

Because grading is automated, please delete (or comment out) the `raise NotImplmeneted` code before attempting a problem.

Empty un-editable cells in an assignment are there for a reason. They will be filled with tests by the automatic grader. Please do not attempt to remove them.

### Getting Help

Asking for help is a great way to increase your chance of success. However there are some rules. When asking for help (especially from a fellow student), *you can show your helper your code but you can not view theirs*. You work needs to be your own. You can not post screenshots of your current work to Ed Discussions or other tools used for getting help.

If you need to reach out to a CA for help, please do so via Ed Discussions and not via email. Answers given via Ed Discussions will help you as well as other students. Thus, emails will always have a lower priority for response than Ed Discussions questions. If you do email the CA, please make a note of what section you are in. This helps us identify you in Courseworks faster.

Finally, if you do not get a repsonse from a CA within 48 hours, you may email the professor.

---

# Homework 6: Regular Expressions

Total questions: 6 <br/>
Total points: 7

## How to complete this homework

**For the first 5 questions**: In each answer, you'll be asked to define a single variable called `pattern` which should match the given prompt. **Note**: we're _not_ defining functions, just the variable `pattern`.

As an example, if asked for a regular expression that matches the letter a contained anywhere within a string, you should write:

```py
pattern = "a"
```

as the sole contents of the cell. Recall that if you wish to use character classes like `\w`, you likely will want to use a raw string like `r"foo\w\wbar"` so that you don't need to escape the slashes twice.

In [1]:
# Execute this cell before completing the questions

import re

## Question 1

Write a regular expression pattern which matches the substring "cat" anywhere within a string. Your pattern should also successfully match the word "caterpillar" as well as "scathing". Note that casing is important.

[1 point]

In [2]:
# YOUR CODE HERE
pattern = r".*cat.*"

In [3]:
### BEGIN TESTS
# Ensure that `pattern` exists
try:
    pattern
except NameError:
    raise AssertionError("The variable 'pattern' is not defined") from None
else:
    assert True
### END TESTS

In [4]:
### BEGIN TESTS
# Ensure `pattern` fully matches
assert re.search(pattern, "cat")
### END TESTS

In [5]:
### BEGIN TESTS
# Ensure `pattern` partially matches
assert re.search(pattern, "caterpillar")
### END TESTS

In [6]:
### BEGIN TESTS
# Ensure `pattern` matches strings that have `cat` but do not start with `cat`
assert re.search(pattern, "scathing")
assert re.search(pattern, "The cat jumped on the bed.")
### END TESTS

In [7]:
### BEGIN TESTS
# Ensure `pattern` does not match strings as expected
assert not re.search(pattern, "dog")
assert not re.search(pattern, "aca")
assert not re.search(pattern, "ca")
assert not re.search(pattern, "at")
assert not re.search(pattern, "Cat")
### END TESTS

## Question 2

Write a regular expression which matches the substring "cat" at the beginning of a string, and not elsewhere. Your pattern should also successfully match the word "caterpillar" at the beginning of a string.

[1 point]

In [8]:
pattern = r"^cat"

In [9]:
### BEGIN TESTS
# Ensure `pattern` matches beginning of string
assert re.search(pattern, "cat")
assert re.search(pattern, "caterpillar")
### END TESTS

In [10]:
### BEGIN TESTS
# Ensure `pattern` does not match where `cat` does not start a string
assert not re.search(pattern, "scathing")
assert not re.search(pattern, "The cat jumped on the bed.")
### END TESTS

## Question 3

In class, we mostly used the `re.search` function. The `re` module has an additional function called `re.findall` which, rather than finding only one place in a string that matches a given pattern, finds all of the places which match.

Given a string like:

```py
example = """
2010-04-29 1984 ABC-123!
Feb 18th, 2018. 
888-27-8949 2048-07-07
"""
```

write a pattern which matches dates within the string which look like YYYY-MM-DD.

Your answer should again be a single variable `pattern`, but such that `re.findall(pattern, example)` returns `["2010-04-29", "2048-07-07"]` which are the two matching parts of the string.

[1 point]

In [11]:
pattern = r"\d{4}-\d{2}-\d{2}"

In [12]:
### BEGIN TESTS
# Ensure `pattern` finds 2 dates in string
string = """
2010-04-29 1984 ABC-123!
Feb 18th, 2018. 
888-27-8949 2048-07-07
"""
expected = ['2010-04-29', '2048-07-07']
actual = re.findall(pattern, string)
assert expected == actual, f"Expected '{expected}', got '{actual}'"
### END TESTS

## Question 4

A building manager has textual logs of problems occuring in apartment units it manages.

An individual message looks like:

    Problem reported in apartment 3F with plumbing.

or

    Problem in apartment 12a with sink.

or

    Noise reported from apartment 8g.

or

    An apartment owner reports that apartment 4A has been broken into.

Use a pattern with a capture group to extract the apartment information from the string -- include both the floor and unit, so for the first example, your pattern should have `re.search(pattern).group(1)` produce `"3F"`; and for the last one, `"4A".`

If the message does not include an apartment unit, you should not match anything.

[1 point]

In [13]:
pattern = r"apartment (\d+[A-Za-z]?)"

In [14]:
### BEGIN TESTS
# Ensure `pattern` returns a result
string = "Problem reported in apartment 3F with plumbing."
result = re.search(pattern, string)
assert result is not None, "Expected a result, got None"

# Ensure `pattern` returns the desired result
expected = "3F"
actual = result.group(1)
assert expected == actual, f"Expected '{expected}', got '{actual}'"
### END TESTS

In [15]:
### BEGIN TESTS
# Ensure `pattern` returns a result
string = "Problem in apartment 12a with sink."
result = re.search(pattern, string)
assert result is not None, "Expected a result, got None"

# Ensure `pattern` returns the desired result
expected = "12a"
actual = result.group(1)
assert expected == actual, f"Expected '{expected}', got '{actual}'"
### END TESTS

In [16]:
### BEGIN TESTS
# Ensure `pattern` returns a result
string = "An apartment owner reports that apartment 4A has been broken into."
result = re.search(pattern, string)
assert result is not None, "Expected a result, got None"

# Ensure `pattern` returns the desired result
expected = "4A"
actual = result.group(1)
assert expected == actual, f"Expected '{expected}', got '{actual}'"
### END TESTS

In [17]:
### BEGIN TESTS
# Ensure the negative case
assert re.search(pattern, "No incidents.") is None
### END TESTS

## Question 5

Write a pattern which matches strings containing odd numbers anywhere within it, but not even ones.

[1 point]

In [18]:
pattern = r'.*[13579].*'

In [19]:
### BEGIN TESTS
# Ensure single digit odd number matches
assert re.search(pattern, "The number 1 matches")
### END TESTS

In [20]:
### BEGIN TESTS
# Ensure multi-digit with an odd number matches
assert re.search(pattern, "25 does match")
### END TESTS

In [21]:
### BEGIN TESTS
# Ensure single digit even number does not match
assert not re.search(pattern, "2 does not match")
### END TESTS

In [22]:
### BEGIN TESTS
# Ensure multi-digit with no odd numbers does not match
assert not re.search(pattern, "44 does not match")
### END TESTS

## Question 6

Pretend you are an accountant, and need to process earnings reports for three months at a time for a given year. These reports are CSV files that are on a server, and you will have to download. But first, you need to find the particular CSV files that you care about.

You are given a `list` of URLs where some of them are URLs to CSV files, some of them are not. Of the URLs that are CSV files, the filenames contain the date in them. An example of input of URLs:

```python
[
    "https://example.com/dashboard/reports_2021_10.html",
    "https://example.com/reports/cash_flow_2021_10.csv",
    "https://example.com/reports/earnings_2021_10.csv",
    "https://example.com/dashboard/reports_2021_11.html",
    "https://example.com/reports/cash_flow_2021_11.csv",
    "https://example.com/reports/earnings_2021_11.csv",
    "https://example.com/dashboard/reports_2021_12.html",
    "https://example.com/reports/cash_flow_2021_12.csv",
    "https://example.com/reports/earnings_2021_12.csv",
    "https://example.com/dashboard/reports_2022_01.html",
    "https://example.com/reports/cash_flow_2022_01.csv",
    "https://example.com/reports/earnings_2022_01.csv",
    "https://example.com/dashboard/reports_2022_02.html",
    "https://example.com/reports/cash_flow_2022_02.csv",
    "https://example.com/reports/earnings_2022_02.csv",
    "https://example.com/dashboard/reports_2022_03.html",
    "https://example.com/reports/cash_flow_2022_03.csv",
    "https://example.com/reports/earnings_2022_03.csv",
    "https://example.com/dashboard/reports_2022_04.html",
    "https://example.com/reports/cash_flow_2022_04.csv",
    "https://example.com/reports/earnings_2022_04.csv",
    "https://example.com/dashboard/reports_2022_05.html",
    "https://example.com/reports/cash_flow_2022_05.csv",
    "https://example.com/reports/earnings_2022_05.csv",
    "https://example.com/dashboard/reports_2022_06.html",
    "https://example.com/reports/cash_flow_2022_06.csv",
    "https://example.com/reports/earnings_2022_06.csv",
    "https://example.com/dashboard/reports_2022_07.html",
    "https://example.com/reports/cash_flow_2022_07.csv",
    "https://example.com/reports/earnings_2022_07.csv",
    "https://example.com/dashboard/reports_2022_08.html",
    "https://example.com/reports/cash_flow_2022_08.csv",
    "https://example.com/reports/earnings_2022_08.csv",
    "https://example.com/dashboard/reports_2022_09.html",
    "https://example.com/reports/cash_flow_2022_09.csv",
    "https://example.com/reports/earnings_2022_09.csv",
    "https://example.com/dashboard/reports_2022_10.html",
    "https://example.com/reports/cash_flow_2022_10.csv",
    "https://example.com/reports/earnings_2022_10.csv",
    "https://example.com/dashboard/reports_2022_11.html",
    "https://example.com/reports/cash_flow_2022_11.csv",
    "https://example.com/reports/earnings_2022_11.csv",
    "https://example.com/dashboard/reports_2022_12.html",
    "https://example.com/reports/cash_flow_2022_12.csv",
    "https://example.com/reports/earnings_2022_12.csv",
]    
```

Implement the function called `get_q4_earnings` that takes in three arguments: `list` of URLs as `str`s, an `int` representing the year of **earnings** we care about (not cash flow, or dashboard reports), and an `int` of the quarter we care about. The function shoud return a `list` of `str` of the URLs that are the earnings for the year and quarter we care about.

For the above list of URLs, here's the expected output:

```python
>>> get_q4_earnings(url_list=url_list, year=2022, quarter=4)
[
    "https://example.com/reports/earnings_2022_10.csv",
    "https://example.com/reports/earnings_2022_11.csv",
    "https://example.com/reports/earnings_2022_12.csv",
]
```

Your implementation **must** use `re` to find the relevant URLs. Order of the returned value does not matter. 

You may assume all inputs given will be valid. That us, `url_list` will be a list of URL `str`s; values of `quarter` will only an integer from 1-4 (inclusive); and the values of `year` will be a valid 4-digit year. Therefore, you do not need to do any validation with the inputs given.

You may also assume that each URL in `url_list` will either start with `https://example.com/reports/` or `https://example.com/dashboard/`, and that only the filename changes.

[2 points]

In [23]:
def get_q4_earnings(url_list, year, quarter):
    pattern = r"earnings_{}_({})\.csv".format(year, "|".join([f"{str((quarter-1)*3 + i + 1).zfill(2)}" for i in range(3)]))
    return [url for url in url_list if re.search(pattern, url)]
 

In [24]:
### BEGIN TESTS
url_list = [
    "https://example.com/dashboard/reports_2021_10.html",
    "https://example.com/reports/cash_flow_2021_10.csv",
    "https://example.com/reports/earnings_2021_10.csv",
    "https://example.com/dashboard/reports_2021_11.html",
    "https://example.com/reports/cash_flow_2021_11.csv",
    "https://example.com/reports/earnings_2021_11.csv",
    "https://example.com/dashboard/reports_2021_12.html",
    "https://example.com/reports/cash_flow_2021_12.csv",
    "https://example.com/reports/earnings_2021_12.csv",
    "https://example.com/dashboard/reports_2022_01.html",
    "https://example.com/reports/cash_flow_2022_01.csv",
    "https://example.com/reports/earnings_2022_01.csv",
    "https://example.com/dashboard/reports_2022_02.html",
    "https://example.com/reports/cash_flow_2022_02.csv",
    "https://example.com/reports/earnings_2022_02.csv",
    "https://example.com/dashboard/reports_2022_03.html",
    "https://example.com/reports/cash_flow_2022_03.csv",
    "https://example.com/reports/earnings_2022_03.csv",
    "https://example.com/dashboard/reports_2022_04.html",
    "https://example.com/reports/cash_flow_2022_04.csv",
    "https://example.com/reports/earnings_2022_04.csv",
    "https://example.com/dashboard/reports_2022_05.html",
    "https://example.com/reports/cash_flow_2022_05.csv",
    "https://example.com/reports/earnings_2022_05.csv",
    "https://example.com/dashboard/reports_2022_06.html",
    "https://example.com/reports/cash_flow_2022_06.csv",
    "https://example.com/reports/earnings_2022_06.csv",
    "https://example.com/dashboard/reports_2022_07.html",
    "https://example.com/reports/cash_flow_2022_07.csv",
    "https://example.com/reports/earnings_2022_07.csv",
    "https://example.com/dashboard/reports_2022_08.html",
    "https://example.com/reports/cash_flow_2022_08.csv",
    "https://example.com/reports/earnings_2022_08.csv",
    "https://example.com/dashboard/reports_2022_09.html",
    "https://example.com/reports/cash_flow_2022_09.csv",
    "https://example.com/reports/earnings_2022_09.csv",
    "https://example.com/dashboard/reports_2022_10.html",
    "https://example.com/reports/cash_flow_2022_10.csv",
    "https://example.com/reports/earnings_2022_10.csv",
    "https://example.com/dashboard/reports_2022_11.html",
    "https://example.com/reports/cash_flow_2022_11.csv",
    "https://example.com/reports/earnings_2022_11.csv",
    "https://example.com/dashboard/reports_2022_12.html",
    "https://example.com/reports/cash_flow_2022_12.csv",
    "https://example.com/reports/earnings_2022_12.csv",
]

expected = [
    "https://example.com/reports/earnings_2022_10.csv",
    "https://example.com/reports/earnings_2022_11.csv",
    "https://example.com/reports/earnings_2022_12.csv",   
]

actual = sorted(get_q4_earnings(url_list, 2022, 4))
assert expected == actual, f"\nExpected: '{expected}'\nGot: '{actual}'"
### END TESTS

In [25]:
### BEGIN TESTS
url_list = [
    "https://example.com/dashboard/reports_2021_10.html",
    "https://example.com/reports/cash_flow_2021_10.csv",
    "https://example.com/reports/earnings_2021_10.csv",
    "https://example.com/dashboard/reports_2021_11.html",
    "https://example.com/reports/cash_flow_2021_11.csv",
    "https://example.com/reports/earnings_2021_11.csv",
    "https://example.com/dashboard/reports_2021_12.html",
    "https://example.com/reports/cash_flow_2021_12.csv",
    "https://example.com/reports/earnings_2021_12.csv",
    "https://example.com/dashboard/reports_2022_01.html",
    "https://example.com/reports/cash_flow_2022_01.csv",
    "https://example.com/reports/earnings_2022_01.csv",
    "https://example.com/dashboard/reports_2022_02.html",
    "https://example.com/reports/cash_flow_2022_02.csv",
    "https://example.com/reports/earnings_2022_02.csv",
    "https://example.com/dashboard/reports_2022_03.html",
    "https://example.com/reports/cash_flow_2022_03.csv",
    "https://example.com/reports/earnings_2022_03.csv",
    "https://example.com/dashboard/reports_2022_04.html",
    "https://example.com/reports/cash_flow_2022_04.csv",
    "https://example.com/reports/earnings_2022_04.csv",
    "https://example.com/dashboard/reports_2022_05.html",
    "https://example.com/reports/cash_flow_2022_05.csv",
    "https://example.com/reports/earnings_2022_05.csv",
]

expected = [
    "https://example.com/reports/earnings_2022_04.csv",
    "https://example.com/reports/earnings_2022_05.csv",
]

actual = sorted(get_q4_earnings(url_list, 2022, 2))
assert expected == actual, f"\nExpected: '{expected}'\nGot: '{actual}'"
### END TESTS

In [26]:
### BEGIN TESTS
url_list = [
    "https://example.com/dashboard/reports_2021_10.html",
    "https://example.com/reports/cash_flow_2021_10.csv",
    "https://example.com/reports/earnings_2021_10.csv",
    "https://example.com/dashboard/reports_2021_11.html",
    "https://example.com/reports/cash_flow_2021_11.csv",
    "https://example.com/reports/earnings_2021_11.csv",
    "https://example.com/dashboard/reports_2021_12.html",
    "https://example.com/reports/cash_flow_2021_12.csv",
    "https://example.com/reports/earnings_2021_12.csv",
    "https://example.com/dashboard/reports_2022_01.html",
    "https://example.com/reports/cash_flow_2022_01.csv",
    "https://example.com/reports/earnings_2022_01.csv",
    "https://example.com/dashboard/reports_2022_02.html",
    "https://example.com/reports/cash_flow_2022_02.csv",
    "https://example.com/reports/earnings_2022_02.csv",
    "https://example.com/dashboard/reports_2022_03.html",
    "https://example.com/reports/cash_flow_2022_03.csv",
    "https://example.com/reports/earnings_2022_03.csv",
    "https://example.com/dashboard/reports_2022_04.html",
    "https://example.com/reports/cash_flow_2022_04.csv",
    "https://example.com/reports/earnings_2022_04.csv",
    "https://example.com/dashboard/reports_2022_05.html",
    "https://example.com/reports/cash_flow_2022_05.csv",
    "https://example.com/reports/earnings_2022_05.csv",
    "https://example.com/dashboard/reports_2022_06.html",
    "https://example.com/reports/cash_flow_2022_06.csv",
    "https://example.com/reports/earnings_2022_06.csv",
    "https://example.com/dashboard/reports_2022_07.html",
    "https://example.com/reports/cash_flow_2022_07.csv",
    "https://example.com/reports/earnings_2022_07.csv",
    "https://example.com/dashboard/reports_2022_08.html",
    "https://example.com/reports/cash_flow_2022_08.csv",
    "https://example.com/reports/earnings_2022_08.csv",
    "https://example.com/dashboard/reports_2022_09.html",
    "https://example.com/reports/cash_flow_2022_09.csv",
    "https://example.com/reports/earnings_2022_09.csv",
    "https://example.com/dashboard/reports_2022_10.html",
    "https://example.com/reports/cash_flow_2022_10.csv",
    "https://example.com/reports/earnings_2022_10.csv",
    "https://example.com/dashboard/reports_2022_11.html",
    "https://example.com/reports/cash_flow_2022_11.csv",
    "https://example.com/reports/earnings_2022_11.csv",
    "https://example.com/dashboard/reports_2022_12.html",
    "https://example.com/reports/cash_flow_2022_12.csv",
    "https://example.com/reports/earnings_2022_12.csv",
]

expected = []

actual = get_q4_earnings(url_list, 2021, 2)
assert expected == actual, f"\nExpected: '{expected}'\nGot: '{actual}'"
### END TESTS

In [27]:
### BEGIN TESTS
url_list = []
expected = []

actual = get_q4_earnings(url_list, 2021, 2)
assert expected == actual, f"\nExpected: '{expected}'\nGot: '{actual}'"
### END TESTS

In [28]:
# CELL INTENTIONALLY LEFT BLANK - DO NOT ALTER OR DELETE